Building an Invisible Puzzle: Predicting Protein Structure and Function from Sequence Matthew Perella January 31, 2013.

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

Protein Structure Prediction
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Protein Tertiary Structure Prediction
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Archives and Information Retrieval
Protein structure determination. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography,
FASTA and BLAST. FASTA: Introduction FASTA (pronounced FAST-Aye) stands for FAST-All, reflecting the fact that it can be used for a fast protein comparison.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
The Cell, Central Dogma and Human Genome Project.
The Protein Data Bank (PDB)
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Protein Tertiary Structure Prediction Structural Bioinformatics.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Protein Tertiary Structure Prediction
Tertiary Structure Prediction Methods Any given protein sequence Structure selection Compare sequence with proteins have solved structure Homology Modeling.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010.
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Structure database: PDB Tuomas Hätinen. Protein Data Bank A repository for 3-D biological macromolecular structure. It includes proteins, nucleic acids.
Structure prediction: Homology modeling
CS 177 Proteins I (Structure-function relationships) Review of protein structures Computational Modeling Three-dimensional structural analysis in laboratory.
Protein Structure Prediction Graham Wood Charlotte Deane.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Protein Homologue Clustering and Molecular Modeling L. Wang.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
InterPro Sandra Orchard.
Protein Tertiary Structure Prediction Structural Bioinformatics.
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Bioinformatics Overview
PROTEIN MODELLING Presented by Sadhana S.
Protein Families, Motifs & Domains.
Demo: Protein Information Resource
생물정보학 Bioinformatics.
Protein Structure Prediction and Protein Homology modeling
Protein Structures.
Prediction of protein function from sequence analysis
Homology Modeling.
Protein structure prediction.
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Building an Invisible Puzzle: Predicting Protein Structure and Function from Sequence Matthew Perella January 31, 2013

Proteins Abundance 20 Amino Acids Role in nearly all cellular processes Enzymes, hormones, signaling, immune system, muscle fibers, transporters 1 1.Nelson, D. L.; Cox, M. M., Priciples of Biochemistry. 5 ed.; W.H. Freeman and Company: New York, Image obtained from: primary protein structure | protein-pdb.com. structure/. Levels of Protein Structure 2

Understanding Structure and Function Proteomics Characterize structures – Whole-genome sequencing (<1%) – Experimentally X-Ray Crystallography NMR Spectroscopy – Computational Prediction Bioinformatics

Research Wine Spoilage Brettanomyces bruxellensis Vinylphenol Reductase 3 – Vinylphenols – Ethylphenols How?? Vinylphenol Reductase Sequence MPLMTISDSVKDSLTKSEVVPTVIHDKSFLPKGFLTIQYDSGKEV ALGNNIRPADSKNLPRIDFTLNLPSDASSTFNISKDDRFTLIVTD PDAPTRNDEKWSEYLHYLAVDVQLNTFNAENASSNDQLSTAD LKGRTLYPYIGPGPPPKTGKHRYVFLLYKQTPGVTPEAPKDRPN WGTGIRGAGAAEYAEKYKLTPYAVNFFYAQNDQQ 3 3. Tchobanov, I.; Gal, L.; Guilloux-Benatier, M. l.; Remiz, F.; Nardi, T.; Guzzo, J.; Serpaggi, V.; Alexandre, H., Partial vinylphenol reductase purification and characterization from Brettanomyces bruxellensis - Powered by Google Docs. European Federation of Microbiological Studies 2008.

Sequence Databases Protein Data Bank (PDB) – As of Wednesday, January 30th. There are 81,306 characterized structures in the PDB database 4 UniProtKB/Swiss-Prot – 538,849 reviewed sequences 29,266,939 unreviewed sequences 5 – Only 77,110 have experimentally solved structures 4. RCSB PDB - Holdings Report UniProtKB/Swiss-Prot Available at:

Classification Schemes 1.Gene Ontology (GO) 2.Secondary Structure 3.Structural Motifs 4.Family  CATH & SCOP  PROSITE  InterPro  Pfam Sandhya, S. R.; Jayaram, B., Proteins: Sequence to Structure and Function – Current Status. Current protein and peptide science 2010, (11), 498 – 514.

Resources Similar Sequence Searching Multiple Sequence Alignments Prediction – Secondary Structure – 3-D Model Viewing and Editing Software Watson, J. D.; Laskowski, R. A.; Thornton, J. M., Predicting protein function from sequence and structural data. Current Opinion in Structural Biology 2005, 15 (3),

Resource Name 1. Similarity Search (Sequence Alignments) 2. Predictions3. Viewer BLASTα COBALTα Jpredαα Phyre 2ααα SWISS-Modelαα MPI Bioinformatics Tookitαα PsiPredα ClustalWα DNASTAR Lasergene 9 Core Suite Softwareααα CLC Protein Workbench softwareααα Cnc3d viewer (Java)αα Pymol Molecular Graphics System (Java)α UCSF Chimera Molecular Visualization Software v. 1.6 (Python) αα Table 1: Bioinformatics Resource Function Analysis

Methods of Prediction 1.Pattern Recognition pattern recognition techniques are used to find sequences with high similarity in order to infer related structures and functions. 2.Ab Initio prediction method used to create 3-D model to determine structural and functional information using only the sequence Lee, D.; Redfern, O.; Orengo, C., Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol 2007, 8 (12),

Sequence Similarity Searches BLAST PSI-BLAST Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. (1997) "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res. 25: PubMedPubMed

Multiple Alignment MUSCLE CLUSTALW COBALT RC, E., MUSCLE multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2012, 32 (5), Papadopoulos JS and Agarwala R (2007) COBALT: constraint-based alignment tool for multiple protein sequences, Bioinformatics 23: PubMed.PubMed

Template Secondary Structure Annotation

Secondary Structure Prediction

Secondary Structure Annotation

3-D Model Prediction with Template PHYRE-2 – PSI-BLAST – Psi-pred and Diso-pred – Hidden Markov Model (HMM) – HMM alignment – 3-D models from known structures – Maximizing Thermodynamic Stability Modelling insertions and deletions with loop library Modelling of AA side chains using a rotamer library to minimize steric interferences Kelley, L. A. S. M., Protein structure prediction on the web: a case study using the Phyre server. Nature Protocols 2009, 4,

Phyre2 Model Alignment Results Kelley, L. A. S. M., Protein structure prediction on the web: a case study using the Phyre server. Nature Protocols 2009, 4,

3-D Model Prediction

Superimposed Structural alignment Alignment of α-helices and β-sheets Motif conservation Infer similar function from homologues Kelley, L. A. S. M., Protein structure prediction on the web: a case study using the Phyre server. Nature Protocols 2009, 4,

Prediction Analysis QMEAN and SWISS- MODEL used to assess

Models Superimposed on Template

Resources 1. BLAST References COBALT:Multiple Alignment Tool primary protein structure | protein-pdb.com RCSB PDB - Holdings Report Kelley, L. A. S. M., Protein structure prediction on the web: a case study using the Phyre server. Nature Protocols 2009, 4, Lambert, C. L. N., De Bolle X, Depiereux E., ESyPred3D submitting form Lee, D.; Redfern, O.; Orengo, C., Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol 2007, 8 (12), Linding, R. e. a., Protein disorder prediction: Implications for structural proteomics. EMBL - Biocomputing unit: Nelson, D. L.; Cox, M. M., Priciples of Biochemistry. 5 ed.; W.H. Freeman and Company: New York, RC, E., MUSCLE multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2012, 32 (5), Sandhya, S. R.; Jayaram, B., Proteins: Sequence to Structure and Function – Current Status. Current protein and peptide science 2010, (11), 498 – Shenoy, S. R.; Jayaram, B., Proteins: sequence to structure and function--current status. Curr Protein Pept Sci 2010, 11 (7), Tchobanov, I.; Gal, L.; Guilloux-Benatier, M. l.; Remiz, F.; Nardi, T.; Guzzo, J.; Serpaggi, V.; Alexandre, H., Partial vinylphenol reductase purification and characterization from Brettanomyces bruxellensis - Powered by Google Docs. European Federation of Microbiological Studies Watson, J. D.; Laskowski, R. A.; Thornton, J. M., Predicting protein function from sequence and structural data. Current Opinion in Structural Biology 2005, 15 (3),