Proteins Secondary Structure Predictions Structural Bioinformatics.

Slides:



Advertisements
Similar presentations
Amino Acids PHC 211.  Characteristics and Structures of amino acids  Classification of Amino Acids  Essential and Nonessential Amino Acids  Levels.
Advertisements

A Ala Alanine Alanine is a small, hydrophobic
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
1 Lesson 5 Protein Prediction and Classification.
Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB.
Applied Bioinformatics The amino acids. Overview Proteins (sneak preview) – Primary structure – Secondary structure – Tertiary structure The amino acids.
Computing for Bioinformatics Lecture 8: protein folding.
Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.
©CMBI 2001 A Ala Alanine Alanine is a small, hydrophobic residue. Its side chain, R, is just a methyl group. Alanine likes to sit in an alpha helix,it.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
You Must Know How the sequence and subcomponents of proteins determine their properties. The cellular functions of proteins. (Brief – we will come back.
Protein Structure.
Proteins account for more than 50% of the dry mass of most cells
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
©CMBI 2003 MUTANT DESIGN BIO- INFORMATICS QUESTION ‘MOLECULAR BIOLOGY’ BIOPHYSICS.
Now playing: Frank Sinatra “My Way” A large part of modern biology is understanding large molecules like Proteins A large part of modern biology is understanding.
Secondary structure prediction
Doug Raiford Lesson 19.  Framework model  Secondary structure first  Assemble secondary structure segments  Hydrophobic collapse  Molten: compact.
Biological-Engineering for Beginners Biochemistry II: Proteins Leigh Casadaban and Alina Gatowski July 26, 2009.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Amino Acids ©CMBI 2001 “ When you understand the amino acids, you understand everything ”
Proteins.
Chapter 3 Proteins.
Proteins Secondary Structure Predictions
Structural Bioinformatics
Pg. 55. Carbohydrates Organic compounds composed of carbon, hydrogen, and oxygen in a ratio of 1:2:1 Carbohydrates can exist as 1) monosaccharides (simple.
Proteins Secondary Structure Predictions
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Proteins Protos “of prime importance” Big Idea: Proteins perform the actions of the cell, they are coded for by the DNA. DNA is the principal, proteins.
Proteins Structure Predictions Structural Bioinformatics.
Sequence similarity search II Searching for remote homologies.
A PRESENTATION ON AMINO ACIDS AND PROTEINS PRESENTED BY SOMESH SHARMA Chemical Engineering Arham Veerayatan Institute of Engineering Technology.
Amino Acids. Amino acids are used in every cell of your body to build the proteins you need to survive. Amino Acids have a two-carbon bond: – One of the.
1 4. Nucleic acids and proteins in one and more dimensions - second part.
Peptides to Proteins. What are PROTEINS? Proteins are large, complex molecules that serve diverse functional and structural roles within cells.
Prepared By: Syed Khaleelulla Hussaini. Outline Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity.
molecule's structure prediction
Table 1: Essential amino acids profile of a complete protein in comparison to whey protein isolate and rice protein isolate used in this study (Eurofins.
Proteins Tertiary Protein Structure of Enzyme Lactasevideo Video 2.
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
Sequence similarity search Glance to the protein world.
Mandatory to put some order in such a vast wealth of structural knowledge 4. Nucleic acids and proteins in one and more dimensions - second part.
Biochemistry Free For All
Protein structure is conceptually divided into four levels of organization Primary structure is the amino acid sequence of a protein's polypeptide chain.
Protein Folding Notes.
Lecture 3   Proteins Proteins consist of amino-acids linked together in chains through peptide bonds. An amino acid consists of a carbon atom bound to.
Protein Synthesis: Translation
Protein Structure September 7,
Protein Folding.
Protein Sequence Alignments
Proteins.
Conformationally changed Stability
Introduction to Bioinformatics II
3. Proteins Monomer = Amino acids Globular in shape Or Spherical.
Chapter 3 Proteins.
Introduction and Fundamentals of Protein Structure
Proteins Genetic information in DNA codes specifically for the production of proteins Cells have thousands of different proteins, each with a specific.
Conformationally changed Stability
Introduction and Fundamentals of Protein Structure
Do now activity #5 How many strands are there in DNA?
Protein Structure.
“When you understand the amino acids,
Presentation transcript:

Proteins Secondary Structure Predictions Structural Bioinformatics

2 In there were 89,110 protein structures in the protein structure database. Great increase but still a magnitude lower then the total number of protein sequence databases (close to 1,000,000) Was solved in 1958 by Max Perutz John Kendrew of Cambridge University. (Won the 1962 and Nobel Prize in Chemistry ) The first high resolution structure of a protein-myoglobin

3 Predicting the three dimensional structure from sequence of a protein is very hard (some times impossible) However we can predict with relative high precision the secondary structure MERFGYTRAANCEAP…. What can we do to bridge the gap??

What do we mean by Secondary Structure ? Secondary structure are the building blocks of the protein structure: =

5 What do we mean by Secondary Structure ? Secondary structure is usually divided into three categories: Alpha helix Beta strand (sheet) Anything else – turn/loop

6 The different secondary structures are combined together to form the Tertiary Structure of the Proteins

7 RBP Globin Tertiary Secondary ? ? ?

Secondary Structure Prediction Given a primary sequence ADSGHYRFASGFTYKKMNCTEAA what secondary structure will it adopt (alpha helix, beta strand or random coil) ? 8

9 Secondary Structure Prediction Methods Statistical methods –Based on amino acid frequencies –HMM (Hidden Markov Model) Machine learning methods –SVM, Neural networks

10 Chou and Fasman (1974) Name P(a) P(b) P(turn) Alanine Arginine Aspartic Acid Asparagine Cysteine Glutamic Acid Glutamine Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine The propensity of an amino acid to be part of a certain secondary structure (e.g. – Proline has a low propensity of being in an alpha helix or beta sheet  breaker) Success rate of 50% Statistical Methods for SS prediction

11 Secondary Structure Method Improvements ‘Sliding window’ approach Most alpha helices are ~12 residues long Most beta strands are ~6 residues long  Look at all windows of size 6/12  Calculate a score for each window. If >threshold  predict this is an alpha helix/beta sheet TGTAGPQLKCHIQWMLPLKK

12 Improvements since 1980’s Adding information from conservation in MSA Smarter algorithms (e.g. Machine learning, HMM).

13 HMM enables us to calculate the probability of assigning a sequence to a secondary structure TGTAGPOLKCHIQWML HHHHHHHLLLLBBBBB p = ? HMM (Hidden Markov Model) approach for predicting Secondary Structure

14 The probability of observing a residue which belongs to an α-helix followed by a residue belonging to a turn = 0.15 The probability of observing Alanine as part of a β- sheet Table built according to large database of known secondary structures α-helix followed by α-helix Beginning with an α- helix

15 Example What is the probability that the sequence TGQ will be in a helical structure?? TGQ HHH p = 0.45 x x 0.8 x x 0.8x = Success of HMM based methods-> 75%-80%

What can we learn from secondary structure predictions??

csc Mad Cow Disease PrP c to PrP sc PRP c PRP sc

18 How do the protein structure relate to the primary protein sequence??

19 -Early experiments have shown that the sequence of the protein is sufficient to determine its structure (Anfisen) - Protein structure is more conserved than protein sequence and more closely related to function. SEQUENCE

20 How (CAN) Different Amino Acid Sequence Determine Similar Protein Structure ?? Lesk and Chothia 1980

21 The Globin Family

22 Different sequences can result in similar structures 1ecd2hhd

23 We can learn about the important features which determine structure and function by comparing the sequences and structures ?

24 The Globin Family

25 Why is Proline 36 conserved in all the globin family ?

26 Where are the gaps?? The gaps in the pairwise alignment are mapped to the loop regions

27 How are remote homologs related in terms of their structure? b-lactoglobulin RBD

28 PSI-BLAST alignment of RBP and  -lactoglobulin: iteration 3 Score = 159 bits (404), Expect = 1e-38 Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%) Query: 3 WVWALLLLAAWAAAERD CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54 V L+ LA A + S V+ENFD ++ G WY + K Sbjct: 1 MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59 Query: 55 DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ I A +S+ E G + K V PAK Sbjct: 60 NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164 +WI+ TDY+ YA+ YSC + ++ R+P LPPE Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159

29 The Retinol Binding Proteinb-lactoglobulin

30 MERFGYTRAANCEAP…. Taken together FUNCTION

Pfam Database that contains a large collection of multiple sequence alignments of protein families (common structures) Very useful for function prediction.

The zinc-finger family (domain) Known family of Transcription Factors ZINC FINGER DOMAIN Protein sequence

Pfam Based on Profile hidden Markov Models (HMMs) which represents the protein family HMM in comparison to PSSM is a model which considers dependencies between the different columns in the matrix (different residues) and is thus much more powerful!!!!

Profile HMM (Hidden Markov Model) can accurately represent a MSA D16D17D18 D19 M16M17M18M19 I16I19I18I17 100% D 0.8 S 0.2 P 0.4 R 0.6 T 1.0 R 0.4 S 0.6 XXXX 50% D R T R D R T S S - - S S P T R D R T R D P T S D - - S D - - R Match delete insert

Extra Slides (for your interest) 35

residues 5.6 Å Alpha Helix : Pauling (1951) A consecutive stretch of 5-40 amino acids (average 10). A right-handed spiral conformation. 3.6 amino acids per turn. Stabilized by Hydrogen bonds

37 Beta Strand : Pauling and Corey (1951) > An extended polypeptide chains is called β –strand (consists of 5-10 amino acids > The chains are connected together by Hydrogen bonds to form b-sheet β -strand β -sheet

38 Loops Connect the secondary structure elements (alpha helix and beta strands). Have various length and shapes.