Protein Structure Analysis - II

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Protein Structure Prediction
Tutorial Homology Modelling. A Brief Introduction to Homology Modeling.
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Protein Tertiary Structure Prediction
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship.
An Introduction to Bioinformatics Protein Structure Prediction.
Protein structure (Part 2 of 2).
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
Thomas Blicher Center for Biological Sequence Analysis
Protein Fold recognition
The Protein Data Bank (PDB)
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Protein Tertiary Structure Prediction Structural Bioinformatics.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Structure Analysis - I
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Protein Structure Prediction and Analysis
Protein Tertiary Structure Prediction
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
COMPARATIVE or HOMOLOGY MODELING
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
Day 2: Protein Sequence Analysis 1.Physico-chemical properties. 2.Cellular localization. 3.Signal peptides. 4.Transmembrane domains. 5.Post-translational.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Structure Prediction Graham Wood Charlotte Deane.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
An Efficient Index-based Protein Structure Database Searching Method 陳冠宇.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Sequence Based Analysis Tutorial
Protein Structures.
Homology Modeling.
Protein structure prediction.
Neural Networks for Protein Structure Prediction Dr. B Bhunia.
Presentation transcript:

Protein Structure Analysis - II PLPTH 890 Introduction to Genomic Bioinformatics Lecture 23 Protein Structure Analysis - II Liangjiang (LJ) Wang ljwang@ksu.edu April 10, 2005

Outline Protein structure alignment (DALI and VAST). Protein secondary structure prediction (PHDsec, PSIPRED, etc). Prediction of 3-D protein structures: Homology modeling. Threading. Ab initio prediction. Protein structural genomics.

Protein Structure Comparison Why is structure comparison important? To understand structure-function relationship. To study the evolution of many key proteins (structure is more conserved than sequence). Comparing 3-D structures is much more difficult than sequence comparison. Protein structure classification: SCOP: Structure Classification Of Proteins. CATH: Class, Architecture, Topology and Homology. Protein structure alignment: DALI and VAST.

Protein Structure Alignment Positions of atoms in two or more 3-D protein structures are compared. Must first determine which atoms to align. At least two sets of three common reference points should be identified. Atoms in structures are matched to minimize the average deviation. Computers are NOT good at comparing 3-D objects (an NP-hard problem). (Baxevanis and Ouellette, 2005)

How to Compare Structures? Feature extraction Description 1 Description 2 Comparison Scores Statistical analysis Similarity, classification

DALI DALI is for Distance matrix ALIgnment. Each structure is represented as a two-dimensional array (matrix) of distances between all pairs of C atoms. Remember what a C atom is? Assume that similar 3-D structures have similar inter-residue distances. DALI uses distance matrices to align protein structures. DALI is available at http://www.ebi.ac.uk/dali/.

VAST VAST is for Vector Alignment Search Tool. Each structure is represented as a set of secondary structure elements (SSEs). SSEs:  helices or  strands. VAST scores pairs of SSEs based on their type, orientation and connectivity. The SSE matches of statistical significance are then extended (similar to BLAST). Structures in MMDB have been pre-computed, and organized as structure neighbors in Entrez. VAST can be accessed at http://www.ncbi.nlm.nih.gov/Structure/VAST/vast.shtml.

Secondary Structure Prediction Given the sequence of a polypeptide, secondary structures are predicted. Assume that secondary structures are fully determined by local interactions among neighboring residues. Early analysis were based on the frequencies of amino acid found in different types of secondary structures. For example, proline occurs at turns, but not in  helices. Modern approaches use machine learning techniques and multiple sequence alignments.

Machine Learning Approach QEALDAAGDKLVVVDF HHHHHHLLLLEEEEEE H – Helix E – Sheet L – Loop Training Dataset Test Dataset Training Testing Classifier (Model) No Yes Prediction Performance?

PHDsec For a given protein sequence: Search for homologous sequences. Produce a multiple sequence alignment. Generate a profile (evolutionary information). PHDsec uses a feed-forward artificial neural network to predict the secondary structures. R A P S K Y E H L Input layer Hidden layer Output layer (PHDsec can be accessed at http://www.predictprotein.org/)

PSIPRED For a given protein sequence: Perform a PSI-BLAST search. Create a profile that conveys the evolutionary information at each position. Feed the profile into a system of neural networks (or support vector machines). PSIPRED can be accessed at http://bioinf.cs.ucl.ac.uk/psipred/.

How to Evaluate the Performance? EVA: an independent server for evaluation of protein structure prediction methods. The best tool for three-state per-residue secondary structure prediction now reaches the accuracy of about 78%. (http://cubic.bioc.columbia.edu/eva/)

Prediction of 3-D Protein Structures There are about 30,000 structures in PDB, but more than 1.8 million non-redundant protein sequences in UniProt (Swiss-Prot + TrEMBL). Computational structure prediction may provide valuable information for most of the protein sequences derived from genome sequencing projects. Three predictive methods: Homology (or comparative) modeling. Threading (or fold recognition). Ab initio structure prediction.

Sequence - Structure Relationship In cells, protein folding is determined by the amino acid sequence. But, protein structures can also be affected by post-translational modifications and the cellular environment. Proteins with ≥ 30% sequence identity tend to have similar structures. However, exceptions do exist … 80-residue stretch (yellow) with 40% sequence identity (Bourne, 2004) (Viral capsid protein, 1PIV:1) (Glycosyltransferase, 1HMP:A)

Homology Modeling Probably the most accurate method for protein structure prediction. Five different steps: Find a known structure related to the query sequence by sequence comparison. Align the query sequence with the known structure (template). Build a model by modifying the backbone and side chains of the template. Refine the model using energy minimization. Validate the model using visual inspection or software tools.

Homology Modeling (Cont’d) Accuracy of structure prediction depends on the percent amino acid sequence identity shared between the query and template. For >50% sequence identity, RMSD (Root Mean Square Deviation) is only 1 Å for main-chain atoms, which is comparable to the accuracy of a medium-resolution NMR structure or a low-resolution X-ray structure. Homology modeling may not be used for predicting protein structures if the sequence identity is less than 30%.

Homology Modeling Servers SWISS-MODEL (http://swissmodel.expasy.org/): A popular site for structure homology modeling. SDSC1 (http://cl.sdsc.edu/hm.html): the #1 ranked server for homology modeling on the EVA site. SDSC1 http://cubic.bioc.columbia.edu/eva/

(Baxevanis and Ouellette, 2005) Threading

Threading (Cont’d) Threading takes a query sequence and passes (threads) it through the 3-D structure of each protein in a fold database (known structures). As a sequence is threaded, the fit of the sequence in the fold is evaluated using some functions of energy or packing efficiency. Threading may find a common fold for proteins with essentially no sequence homology. Structures predicted from threading techniques often are not of high quality (RMSD > 3 Å). Based on EVA results, 3D-PSSM is the best threading server (http://www.sbg.bio.ic.ac.uk/~3dpssm/).

Ab Initio Structure Prediction Ab initio prediction can be used when a protein sequence has no detectable homologues in PDB. Protein folding is modeled based on global free-energy minimization. Since the protein folding problem has not yet been solved, the ab initio prediction methods are still experimental and can be quite unreliable. One of the top ab initio prediction methods is called Rosetta, which was found to be able to successfully predict 61% of structures (80 of 131) within 6.0 Å RMSD (Bonneau et al., 2002). The HMMSTR/Rosetta Server can be accessed at http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php.

Comparing Structure Prediction Methods A – C: homology modeling with 60% (A), 40% (B) and 30% (C) sequence identity. D and E: ab initio protein structure prediction. Predicted structures are in red, and actual structures are in blue. (Baker and Sali, 2000)

Example: Cysteine-Rich Peptides Signal helix and cleavage site NCR: Nodule-specific Cysteine Rich genes in legumes. Avr9: fungal avirulence protein from Cladosporium fulvum. Defensin: antimicrobial peptides. Proteinase inhibitor: Serine proteinase inhibitors. SCR6: S-locus of Brassica, SI, interact with SRK6.

Ab Initio Prediction of Cys Rich Peptides LSG-TC51151 PsENOD3 Defensin (AAG40321, M. sativa) Avr9 (Cladosporium fulvum)

Protein Structural Genomics A worldwide initiative aimed at determining a large number of protein structures in a high throughput mode. In the US, nine structural genomics centers have been funded by the National Institutes of Health (NIH). More information may be found at http://www.rcsb.org/pdb/strucgen.html. TargetDB (http://targetdb.pdb.org/): a centralized registration database for target sequences from the worldwide structural genomics projects.

A Target Selection Pipeline from JCSG Methods TMHMM Protein size (7 - 80 kDa) Low complexity Redundancy BLAST against PDB sequences

Summary Fast and accurate structure alignment is still a very hard problem to be solved. Machine learning techniques are widely used in protein secondary structure prediction. Homology modeling is probably the most reliable method for structure prediction. The protein folding problem has not yet been solved.

Prediction of Solvent Accessibility Solvent accessibility: the relative area of a residue’s surface that is exposed to the surrounding solvent. The solvent-accessible residues may be part of an active site or a binding site, while the buried residues may play an important role in stabilizing the protein structure. PHDacc (http://www.predictprotein.org/): a neural network-based method (similar to PHDsec). Jpred (http://www.compbio.dundee.ac.uk/~www-jpred/): a neural network system that predicts both secondary structure and solvent accessibility.

Predicting Transmembrane Segments Transmembrane segments share common biophysical features (e.g., hydrophobicity). PHDhtm (http://www.predictprotein.org/): Part of the PredictProtein services. Transmembrane helices are predicted using a neural network system. TMHMM (http://www.cbs.dtu.dk/services/TMHMM/): A set of known transmembrane segments are represented as HMMs. A query sequence is matched to a known transmembrane pattern.

Signal Peptide Prediction Extracellular proteins or proteins targeted to subcellular compartments contain short signal peptides (often at the N-terminal). PSORT (http://psort.ims.u-tokyo.ac.jp/): A rule-based expert system for predicting subcellular localization of proteins from their amino acid sequences. The algorithm of k-nearest neighbors is used for reasoning. SignalP (http://www.cbs.dtu.dk/services/SignalP/): predicts the presence and location of signal peptide cleavage sites using a combination of neural networks and HMMs.