Protein Structure Prediction

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

Russell Group, Protein Evolution _________ ____. Russell Group, Protein Evolution _________ ____ Rob Russell Cell Networks University of Heidelberg Putting.
Protein Structure Prediction using ROSETTA
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
© Wiley Publishing All Rights Reserved. Analyzing Protein Sequences.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Secondary Structures
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Sequence analysis June 20, 2006 Learning objectives-Understand sliding window programs. Understand difference between identity, similarity and homology.
1 Levels of Protein Structure Primary to Quaternary Structure.
Amino Acids and Proteins 1.What is an amino acid / protein 2.Where are they found 3.Properties of the amino acids 4.How are proteins synthesized 1.Transcription.
An Introduction to Bioinformatics Protein Structure Prediction.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It.
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
Thomas Blicher Center for Biological Sequence Analysis
The Protein Data Bank (PDB)
Introduction to bioinformatics
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function If they align,
Protein Modules An Introduction to Bioinformatics.
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Protein Bioinformatics Course
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Structural Bioinformatics R. Sowdhamini National Centre for Biological Sciences Tata Institute of Fundamental Research Bangalore, INDIA.
Representations of Molecular Structure: Bonds Only.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Secondary structure prediction
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning.
Protein Secondary Structure Prediction G P S Raghava.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
JM - 1 Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction Jarek Meller Jarek Meller Division.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning sequences.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Russell Group, Protein Evolution _________ ____ Rob Russell Cell Networks University of Heidelberg Interactions and Modules: the how and why of molecular.
Protein Properties Function, structure Residue features Targeting Post-trans modifications BIO520 BioinformaticsJim Lund Reading: Chapter , 11.7,
Hyperthermophile subtilases
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
1 Mona Singh What is computational biology?. 2 Mona Singh Genome The entire hereditary information content of an organism.
Protein structure prediction Haixu Tang School of Informatics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.
Homology 3D modeling Miguel Andrade Mainz, Germany Faculty of Biology,
Protein Families, Motifs & Domains.
Protein Structure Prediction and Protein Homology modeling
Protein Bioinformatics Course
Aligning Sequences You have learned about: Data & databases Tools
Protein Structure Prediction
Protein Structures.
Homology Modeling.
Levels of Protein Structure
Protein structure prediction.
Presentation transcript:

Protein Structure Prediction Matthew Betts Russell Group, University of Heidelberg, Germany www.russelllab.org

Structure Function Sequence Active/inactive? Binds/does not bind? Substrate specificity? Function Sequence

What is this about? What we do to find out what a protein might be doing Looking at sequences, with a particular emphasis on finding out something about the protein structure Some background for practical work

Given a sequence, what should you look for? Functional domains (Pfam, SMART, COGS, CDD, etc.) Intrinsic features Signal peptide, transit peptides (signalP) Transmembrane segments (TMpred, etc) Coiled-coils (coils server) Low complexity regions, disorder (e.g. SEG, disembl) Hints about structure?

Given a sequence, what should you look for? “Low sequence complexity” (Linker regions? Flexible? Junk? Transmembrane segment (crosses the membrane) Signal peptide (secreted or membrane attached) Tyrosine kinase (phosphorylates Tyr) Immunoglobulin domains (bind ligands?) SMART domain ‘bubblegram’ for human fibroblast growth factor (FGF) receptor 1 (type P11362 into web site: smart.embl.de)

What about structure? 3D 3D 3D Intrinsic features general mean trouble for structure determination, so they are usually skipped Knock on effect is that structures for large, flexible multi-domain proteins are rare Structure determination/prediction therefore typically restricted to parts (with exceptions obviously)

Structure prediction algorithm Sequence Structure

Best predictions are by homology Is your sequence homologous to a known structure? If yes, then often very good models of structure can be constructed. This is what we will do in the practical

Homology Modelling algorithm +

Homology Modelling Steps Identify a homologue of known structure Get the best alignment of your sequence to the structure Model building Side-chain replacement Loop building Optimisation/relaxation/minimisation

Problems with loops Two subtilisin-like serine proteases

Sanchez et al, Nature Struct. Biol. (Suppl), 7, 986-990, 2001

The Twilight Zone Sander & Schneider (EMBL, ca. 1990) Compared all known structures to each other using sequence comparison. For each fragment of a particular length & sequence identity, simply asked the question: is the structure similar or different. The line to the right is where one can be 90% confident that an alignment of a particular length & sequence identity Below the line, structures can be either similar or different: the twilight zone. (Basis for much of the sequence alignment statistics that are now in use today) Based on Sander & Schneider, Proteins, 9, 56, 1991

Similar structures within the twilight zone sequence identity: 80% 8.8% 4.4% …can we find these similarities without known structures if sequence searches fail? Russell et al, J.Mol. Biol., 1997

Fold Recognition (‘Threading’) ? ? ? ? ? >C562_RHOSH TQEPGYTRLQITLHWAIAGL… Does the sequence “fit” on any of a library of known 3D structures?

Fold Recognition (‘Threading’) Jones, Taylor, Thornton, Nature, 358, 86-89, 1992.

Residue pair potentials Phe GOOD Asp Asp Phe BAD Arg

Fold Recognition Executive Summary Works some of the time Probably best at identifying distant homologues, where sequence identity is in the twilight zone Useful sites: 3D-PSSM, FUGUE, (Gen)-Threader Meta predictions are the best - combine all and get a consensus E.g. bioinfo.pl/meta

If no homology… Is your sequence homologous to a known structure? If no then actual models are less accurate, but structural insights still possible First, secondary structure prediction

Secondary-structure prediction algorithm Neural networks Inductive logic programming Spin-glass theory Human intuition

Secondary-structure prediction E.g. Chou & Fasman, 1974 Helix forming: Glu, Ala, Leu Helix breaking: Pro, Gly Strand forming: Met, Val, Ile Strand breaking: Glu, Lys, Ser, His, Asn Etc. Numerical approach + simple protocol = prediction of secondary structure Said “80%” accuracy. Reality: 50-60% Tested the method on the same proteins used to derive the parameters… big no-no.

Homologous proteins add a lot of information 70% accuracy! SS pred

What about de novo or ab initio prediction? Can you simulate folding using physics to predict the structure of a protein No, not usually. However, advances have been made… David Baker, co-workers and subsequent followers: fragment based structure prediction. De novo not ab initio

Predicting Fragments Preferences learned from all stretches with a similar structure

Assembling Fragments Database of structures Fragments matching the target sequence Assembly of fragments Selection of best model

The Prediction Irony General trend: increasing accuracy is more a function of data than algorithms In other words: as we know more structure, and indeed even sequence data, we get better at predicting Probably we will have a perfect algorithm for protein structure prediction when we know all of the answers Structural genomics & the generally increased pace of structure predictions means there aren’t many really “new” structures anymore

Things to Remember Methods have mostly been developed for soluble, globular proteins or domains Problems with membrane proteins, low-complexity, etc. Many segments in proteins should be studied with other methods: Signal peptides TM regions Coiled-coils Intrinsic Disorder (e.g. http://dis.embl.de)

What we use this for…

Understand molecular interactions Predict molecular interactions We aim to: Understand molecular interactions Predict molecular interactions Focus on those interactions of biomedical importance Apply tools to large datasets Use interaction networks predictively To predict new interactions To predict other details like pathologies, toxicities

Modelling or predicting interactions by homology Your favourite protein N C Your second favourite protein N C Match to known structure Match to known structure Templates in contact? Histidyl adenylate tRNA Synthetase Modelled Interaction

Prediction of Structures of Complexes Five component complex X-ray Two-hybrid network homology (e.g. blast) + Electron microscopy & Mass Spectometry Russell et al, Curr. Opin. Struct Biol. 2004 Aloy & Russell, Nature Rev. Mol. Cell. Biol. 2006 Taverner et al, Adv Chem. Res. 2008

Adding Mechanisms to Interaction Networks Who interacts with whom? What does the interaction look like? Ga/q RGS-4 P Ga/i How strong? How fast? RGS-3 Which piece from which protein?

Bridging the information gap Modelled complexes Aloy & Russell, Nature Rev. Mol. Cell. Biol., 2006.

From Proteomics to Cellular Anatomy? Kuehner et al, Science, 2010

From Proteomics to Cellular Anatomy? Kuehner et al, Science, 2010

Some Links www.russelllab.org/aas Guide to the amino acids www.russelllab.org/gtsp Guide to Structure Prediction meta.bioinfo.pl Meta server (runs virtually all reliable prediction methods)

Structure Prediction Practical www.russelllab.org/wiki Structure Active/inactive? Binds/does not bind? Substrate specificity? Function Sequence In groups of two or more you will attempt to answer functional questions about a particular protein target

Acknowledgements www.russelllab.org Current group members Rob Russell (the boss), Matthew Betts, Leonardo Trabuco, Oliver Wichmann, Mathias Utz, Yvonne Lara Alumni Chad Davis, Olga Kalinina, Ricardo de la Vega, Victor Neduva, Evangelia Petsalaki Damien Devos Complex modeling & interactions collaborators Patrick Aloy (IRB Barcelona) Anne-Claude Gavin (EMBL Heidelberg) Peer Bork (EMBL Heidelberg) Luis Serrano (CRG Barcelona) Achilleas Frangakis (Uni Frankfurt) Bettina Boettcher (Edinburgh)