Protein structure prediction

Slides:



Advertisements
Similar presentations
Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent.
Advertisements

Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month.
Amino Acids PHC 211.  Characteristics and Structures of amino acids  Classification of Amino Acids  Essential and Nonessential Amino Acids  Levels.
Review of Basic Principles of Chemistry, Amino Acids and Proteins Brian Kuhlman: The material presented here is available on the.
Applications of knowledge discovery to molecular biology: Identifying structural regularities in proteins Shaobing Su Supervisor: Dr. Lawrence B. Holder.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Secondary Structures
Sequence analysis June 20, 2006 Learning objectives-Understand sliding window programs. Understand difference between identity, similarity and homology.
Predicting local Protein Structure Morten Nielsen.
Protein structure prediction Scoring matrices workshop review Learning objectives-Understand the basis of secondary structure prediction programs. Become.
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
1 Levels of Protein Structure Primary to Quaternary Structure.
Amino Acids and Proteins 1.What is an amino acid / protein 2.Where are they found 3.Properties of the amino acids 4.How are proteins synthesized 1.Transcription.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Sequence analysis June 19, 2007 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Protein Secondary Structures Assignment and prediction.
Sequence analysis June 17, 2003 Learning objectives-Review amino acids structures. Understand sliding window programs. Understand difference between identity,
Scoring Matrices June 19, 2008 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Thomas Blicher Center for Biological Sequence Analysis
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Protein Secondary Structures Assignment and prediction.
©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function If they align,
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Protein Secondary Structures Assignment and prediction Pernille Haste Andersen
Protein Secondary Structure Prediction Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia.
Structure Prediction in 1D
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Protein Secondary Structures Assignment and.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
1 Computational Biology, Part 13 Retrieving and Displaying Macromolecular Structures Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Predicting local Protein Structure Morten Nielsen.
Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST.
Protein Structure July 2, 2006 Learning objectives-Understand the basis of the secondary structure prediction program- Psi-PRED. Introduce the concept.
1 Computational Biology, Part 11 Retrieving and Displaying Macromolecular Structures Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Protein structure prediction May 24, 2005 Return of Quiz#3 Writing assignments-please hand in. Learning objectives-Understand the basis of secondary structure.
Motif searching and protein structure prediction May 26, 2005 Hand in written assignments today! Learning objectives-Learn how to read structure information.
Artificial Neural Networks for Secondary Structure Prediction CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (slides by J. Burg)
Protein Structural Prediction. Protein Structure is Hierarchical.
Proteins account for more than 50% of the dry mass of most cells
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
Protein Secondary Structure Prediction Some of the slides are adapted from Dr. Dong Xu’s lecture notes.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Amino Acids & Side Groups Polar Charged ◦ ACIDIC negatively charged amino acids  ASP & GLU R group with a 2nd COOH that ionizes* above pH 7.02nd COOH.
Protein Secondary Structure Prediction
Secondary structure prediction
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.
Protein Secondary Structure Prediction G P S Raghava.
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
A program of ITEST (Information Technology Experiences for Students and Teachers) funded by the National Science Foundation Background Session #3 DNA &
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Amino Acids ©CMBI 2001 “ When you understand the amino acids, you understand everything ”
Proteins.
Proteins Secondary Structure Predictions
X-ray detection xray/facilities.html.
Secondary Structure Prediction Lecture 7 Structural Bioinformatics Dr. Avraham Samson
Protein structure prediction Haixu Tang School of Informatics.
Protein structure prediction June 27, 2003 Learning objectives-Understand the basis of secondary structure prediction programs. Become familiar with the.
Doug Raiford Lesson 14.  Reminder  Involved in virtually every chemical reaction ▪ Enzymes catalyze reactions  Structure ▪ muscle, keratins (skin,
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Introduction to Bioinformatics II
Figure 3.14A–D Protein structure (layer 1)
Haixu Tang School of Inforamtics
Levels of Protein Structure
Presentation transcript:

Protein structure prediction May 15, 2001 Quiz#4 postponed Writing assignment Learning objectives-Understand the basis of secondary structure prediction programs. Understand neural networks. Become familiar with manipulating known protein structures with Cn3D. Workshop-Manipulation of the PTEN protein structure with Cn3D.

What is secondary structure? Two major types: Alpha Helical Regions Beta Sheet Regions Other classification schemes: Turns Transmembrane regions Internal regions External regions Antigenic regions

Some Prediction Methods ab initio methods Based on physical properties of aa’s and bonding patterns Statistics of amino acid distributions Chou-Fasman Position of amino acid and distribution Garnier, Osguthorpe-Robeson (GOR) Neural networks

Chou-Fasman Rules (Mathews, Van Holde, Ahern Amino Acid -Helix -Sheet Turn Ala 1.29 0.90 0.78 Cys 1.11 0.74 0.80 Leu 1.30 1.02 0.59 Met 1.47 0.97 0.39 Glu 1.44 0.75 1.00 Gln 1.27 0.80 0.97 His 1.22 1.08 0.69 Lys 1.23 0.77 0.96 Val 0.91 1.49 0.47 Ile 0.97 1.45 0.51 Phe 1.07 1.32 0.58 Tyr 0.72 1.25 1.05 Trp 0.99 1.14 0.75 Thr 0.82 1.21 1.03 Gly 0.56 0.92 1.64 Ser 0.82 0.95 1.33 Asp 1.04 0.72 1.41 Asn 0.90 0.76 1.23 Pro 0.52 0.64 1.91 Arg 0.96 0.99 0.88 Favors -Helix Favors -Sheet Favors -Sheet

Chou-Fasman First widely used procedure If propensity in a window of six residues (for a helix) is above a certain threshold the helix is chosen as secondary structure. If propensity in a window of five residues (for a beta strand) is above a certain threshold then beta strand is chosen. The segment is extended until the average propensity in a 4 residue window falls below a value. Output-helix, strand or turn.

GOR Position-dependent propensities for helix, sheet or turn is calculated for each amino acid. For each position j in the sequence, eight residues on either side of aaj is considered. It uses a PSSM A helix propensity table contains info. about propensity for certain residues at 17 positions when the conformation of residue j is helical. The helix propensity tables have 20 x 17 entries. The predicted state of aaj is calculated as the sum of the position-dependent propensities of all residues around aaj.

Neural networks Computer neural networks are based on simulation of adaptive learning in networks of real neurons. Neurons connect to each other via synaptic junctions which are either stimulatory or inhibitory. Adaptive learning involves the formation or suppression of the right combinations of stimulatory and inhibitory synapses so that a set of inputs produce an appropriate output.

Neural Networks (cont. 1) The computer version of the neural network involves identification of a set of inputs - amino acids in the sequence, which transmit through a network of connections. At each layer, inputs are numerically weighted and the combined result passed to the next layer. Ultimately a final output, a decision, helix, sheet or coil, is produced.

Neural Networks (cont. 2) 90% of training set was used (known structures) 10% was used to evaluate the performance of the neural network during the training session.

Neural Networks (cont. 3) During the training phase, selected sets of proteins of known structure are scanned, and if the decisions are incorrect, the input weightings are adjusted by the software to produce the desired result. Training runs are repeated until the success rate is maximized. Careful selection of the training set is an important aspect of this technique. The set must contain as wide a range of different fold types as possible, but without duplications of structural types that may bias the decisions.

Neural Networks (cont. 5) An additional component of the PSIPRED procedures involves sequence alignment with similar proteins. The rationale is that some amino acids positions in a sequence contribute more to the final structure than others. (This has been demonstrated by systematic mutation experiments in which each consecutive position in a sequence is substituted by a spectrum of amino acids. Some positions are remarkably tolerant of substitution, while others have unique requirements.) To predict secondary structure accurately, one should place little weight on the tolerant positions, which clearly contribute little to the structure, and strongly emphasize the intolerant positions.

PSIPRED Uses multiple aligned sequences for prediction. Uses training set of proteins with known structure. Uses a two-stage neural network to predict structure based on position specific scoring matrices generated by PSI-BLAST (Jones, 1999) First network converts a window of 15 aa’s into a raw score of h,b,c or terminus Second network filters the first output. For example, an output of hhhhehhhh might be converted to hhhhhhhhh. Can obtain a Q3 value of 70-78% (may be the highest achievable)

three outputs are helix, strand or coil Provides info on tolerant or intolerant positions Column specifies position within the protein 15 groups of 21 units (1 unit for each aa plus one specifying the end) Filtering network three outputs are helix, strand or coil

Example of Output from PSIPRED PSIPRED PREDICTION RESULTS Key Conf: Confidence (0=low, 9=high) Pred: Predicted secondary structure (H=helix, E=strand, C=coil) AA: Target sequence Conf: 923788850068899998538983213555268822788714786424388875156215 Pred: CCEEEEEEEHHHHHHHHHHCCCCCCHHHHHHCCCCCEEEEECCCCCCHHHHHHHCCCCCC AA: KDIQLLNVSYDPTRELYEQYNKAFSAHWKQETGDNVVIDQSHGSQGKQATSSVINGIEAD 10 20 30 40 50 60

3D structure prediction-Threading Threading, alluded to earlier, is a mechanism to address the alignment of two sequences that have <30% identity and are typically considered non-homologous. Essentially, one fits—or threads—the unknown sequence onto the known structure and evaluates the resulting structure’s fitness using environment- or knowledge-based potentials.

Helical Wheel If you can predict an alpha helix it is sometimes useful to be able to tell if the helix is amphipathic. This would indicate whether one face of the helix faces the solvent or perhaps another protein. They have been particularly useful in predicting a “super-secondary” structure known as coiled coils. The helical wheel is based on the ideal alpha helix placing an amino acid every 100* around the circumference of the helix cylinder

Coiled-coil predictors The alpha-helical coiled-coil structure has a strong signature heptad pattern abcdefg where a and d are typically non polar (leucine rich) and e and g are often charged. This makes scoring from a sequence scale plot relatively easy.

3D structure data The largest 3D structure database is the Protein Database It contains over 15,000 records Each record contains 3D coordinates for macromolecules 80% of the records were obtained from X-ray diffraction studies, 16% from NMR and the rest from other methods and theoretical calculations

Part of a record from the PDB ATOM 1 N ARG A 14 22.451 98.825 31.990 1.00 88.84 N ATOM 2 CA ARG A 14 21.713 100.102 31.828 1.00 90.39 C ATOM 3 C ARG A 14 22.583 101.018 30.979 1.00 89.86 C ATOM 4 O ARG A 14 22.105 101.989 30.391 1.00 89.82 O ATOM 5 CB ARG A 14 21.424 100.704 33.208 1.00 93.23 C ATOM 6 CG ARG A 14 20.465 101.880 33.215 1.00 95.72 C ATOM 7 CD ARG A 14 20.008 102.147 34.637 1.00 98.10 C ATOM 8 NE ARG A 14 18.999 103.196 34.718 1.00100.30 N ATOM 9 CZ ARG A 14 18.344 103.507 35.833 1.00100.29 C ATOM 10 NH1 ARG A 14 18.580 102.835 36.952 1.00 99.51 N ATOM 11 NH2 ARG A 14 17.441 104.479 35.827 1.00100.79 N

Molecular Modeling DB (MMBD) Relies on PDB for data It contains over 10,000 structure records Links connect the records to Medline and NCBI’s taxonomy database Sequence “neighbors” of the structures are are provided by BLAST. Structure “neighbors” are provided by VAST algorithm. Cn3D is a molecular graphics viewer that allows one to view the three-dimensional structure.