Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB.

Slides:



Advertisements
Similar presentations
Proteins: Structure reflects function….. Fig. 5-UN1 Amino group Carboxyl group carbon.
Advertisements

Review.
Amino Acids PHC 211.  Characteristics and Structures of amino acids  Classification of Amino Acids  Essential and Nonessential Amino Acids  Levels.
A Ala Alanine Alanine is a small, hydrophobic
François Fages MPRI Bio-info 2007 Formal Biology of the Cell Protein structure prediction with constraint logic programming François Fages, Constraint.
Review of Basic Principles of Chemistry, Amino Acids and Proteins Brian Kuhlman: The material presented here is available on the.
Proteins Function and Structure.
Proteins. Copyright © 2005 Pearson Education, Inc. publishing as Benjamin Cummings Concept 5.4: Proteins have many structures, resulting in a wide range.
5’ C 3’ OH (free) 1’ C 5’ PO4 (free) DNA is a linear polymer of nucleotide subunits joined together by phosphodiester bonds - covalent bonds between.
Protein Secondary Structures
1 Levels of Protein Structure Primary to Quaternary Structure.
Applied Bioinformatics The amino acids. Overview Proteins (sneak preview) – Primary structure – Secondary structure – Tertiary structure The amino acids.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Proteins Function and Structure. Proteins more than 50% of dry mass of most cells functions include – structural support – storage, transport – cellular.
Computing for Bioinformatics Lecture 8: protein folding.
©CMBI 2001 A Ala Alanine Alanine is a small, hydrophobic residue. Its side chain, R, is just a methyl group. Alanine likes to sit in an alpha helix,it.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
You Must Know How the sequence and subcomponents of proteins determine their properties. The cellular functions of proteins. (Brief – we will come back.
Protein Structure 101 Alexey Onufriev, Virginia Tech
Chapter 27 Amino Acids, Peptides, and Proteins. Nucleic Acids.
Proteins and Enzymes Nestor T. Hilvano, M.D., M.P.H. (Images Copyright Discover Biology, 5 th ed., Singh-Cundy and Cain, Textbook, 2012.)
Proteins account for more than 50% of the dry mass of most cells
Protein Synthesis. DNA RNA Proteins (Transcription) (Translation) DNA (genetic information stored in genes) RNA (working copies of genes) Proteins (functional.
Proteins account for more than 50% of the dry mass of most cells
Proteins Secondary Structure Predictions Structural Bioinformatics.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
LESSON 4: Using Bioinformatics to Analyze Protein Sequences PowerPoint slides to accompany Using Bioinformatics : Genetic Research.
AMINO ACIDS.
Alexey Onufriev, Virginia Tech
Proteins – Amides from Amino Acids
Secondary structure prediction
Learning Targets “I Can...” -State how many nucleotides make up a codon. -Use a codon chart to find the corresponding amino acid.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
NOTES: 2.3 part 2 Nucleic Acids & Proteins. So far, we’ve covered… the following MACROMOLECULES: ● CARBOHYDRATES… ● LIPIDS… Let’s review…
The Structure and Function of Macromolecules Chapter Proteins.
Protein Secondary Structure Prediction G P S Raghava.
Macromolecules of Life Proteins and Nucleic Acids
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Amino Acids ©CMBI 2001 “ When you understand the amino acids, you understand everything ”
Proteins.
Chapter 3 Proteins.
Proteins Secondary Structure Predictions
Structural Bioinformatics
Proteins Secondary Structure Predictions
Protein structure prediction Haixu Tang School of Informatics.
Proteins Structure Predictions Structural Bioinformatics.
GOVERNMENT ENGINEERING COLLEGE, BHARUCH Subject : Organic Chemistry and Unit Process.
PROTEINS FOLDED POLYPEPTIDES © 2007 Paul Billiet ODWSODWS.
Peptides to Proteins. What are PROTEINS? Proteins are large, complex molecules that serve diverse functional and structural roles within cells.
Prepared By: Syed Khaleelulla Hussaini. Outline Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity.
Genomics Lecture 3 By Ms. Shumaila Azam. Proteins Proteins: large molecules composed of one or more chains of amino acids, polypeptides. Proteins are.
Proteins Tertiary Protein Structure of Enzyme Lactasevideo Video 2.
Amino acids.
Protein Folding Notes.
Proteins account for more than 50% of the dry mass of most cells
Proteins.
Proteins account for more than 50% of the dry mass of most cells
Chapter 3 Proteins.
Fig. 5-UN1  carbon Amino group Carboxyl group.
Proteins account for more than 50% of the dry mass of most cells
Proteins Genetic information in DNA codes specifically for the production of proteins Cells have thousands of different proteins, each with a specific.
The 20 amino acids.
Translation.
The 20 amino acids.
Example of regression by RBF-ANN
Proteins Proteins have many structures, resulting in a wide range of functions Proteins do most of the work in cells and act as enzymes 2. Proteins are.
“When you understand the amino acids,
Presentation transcript:

Proteins Structural Bioinformatics

2

3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB (Three Dimensional Structures)

4 “ Perhaps the most remarkable features of the molecule are its complexity and its lack of symmetry. The arrangement seems to be almost totally lacking in the kind of regularities which one instinctively anticipates.” Solved in 1958 by Max Perutz John Kendrew of Cambridge University. Won the 1962 and Nobel Prize in Chemistry. Myoglobin – the first high resolution protein structure

5 Why Proteins Structure ?  Proteins are fundamental components of all living cells, performing a variety of biological tasks.  Each protein has a particular 3D structure that determines its function.  Protein structure is more conserved than protein sequence, and more closely related to function.

6 There Are Four Levels of Protein Structure Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded polypeptide chain Quaternary: arrangement of several polypeptide chains.

7 Symbols for the 20 amino acids A ala alanineM met methionine C cys cysteineN asn aspargine D asp aspartic acidP pro proline E glu glutamic acidQ gln glutamine F phe phenylalanineR arg arginine G gly glycineS ser serine H his histidineT thr threonine I ile isoleucineV val valine K lys lysineW trp tryptophane L leu leucineY tyr tyrosine

8 Secondary Structure Secondary structure is usually divided into three categories: Alpha helix Beta strand (sheet) Anything else – turn/loop

9 3.6 residues 5.6 Å Alpha Helix : Pauling (1951) A consecutive stretch of 5-40 amino acids (average 10). A right-handed spiral conformation. 3.6 amino acids per turn. Stabilized by H-bonds in the backbone between C=O of residue n, and NH of residue n+4. Side-chains point out.

10 Beta Strand : Pauling and Corey (1951) Different polypeptide chains run alongside each other and are linked together by hydrogen bonds. Each section is called β -strand, and consists of 5-10 amino acids. β -strand

11 The strands become adjacent to each other, forming beta-sheet. Beta Sheet 3.47Å 4.6Å 3.25Å 4.6Å (a)Antiparallel (b)Parallel

12 Loops Connect the secondary structure elements. Have various length and shapes. Located at the surface of the folded protein and therefore may have important role in biological recognition processes. Proteins that are evolutionary related have the same helices & sheets but may vary in loop structures.

13 How is the 3D Structure Determined ? 1. Experimental methods (Best approach): X-rays crystallography. NMR. Others. 2. In-silico methods (partial solutions - based on similarity): based on similarity):. Threading - needs a 3D structure, combinatorial complexity. Ab-initio structure prediction - not always successful.

14 X-ray crystallography 1.Obtain an ordered protein crystal. 2.Check x-ray diffraction. The crystal is bombarded with X-ray beams. The collision of the beams with the electrons creates a diffraction pattern.

15 X-ray crystallography 3.Analyze diffraction pattern and produce an electron density map. 4.Thread the known protein sequence into the density map.

16 X-ray crystallography The molecules must be very pure in order to produce perfect and stable crystals. The method is time-consuming and difficult.

17 NMR - Nuclear Magnetic Resonance (since 1945) A sample is immersed in a magnetic field and bombarded with radio waves. The molecule’s nucleus resonate (spin). This motion is determined and is specific for each molecule type.

18 Principles of NMR

19 NMR - Nuclear Magnetic Resonance The NMR technique is very time consuming and expensive, and the sample has to be in a concentrated solution, and is limited to small and soluble molecules.

20 PDB: Protein Data Bank Holds 3D models of biological macromolecules (protein, RNA, DNA). All data are available to the public. Obtained by X-Ray crystallography (84%) or NMR spectroscopy (16%). Submitted by biologists and biochemists from around the world.

21 PDB – Protein Data Bank

22 How Many Structures ? PDB Content Growth

23 Structure Prediction: Motivation Hundreds of thousands of gene sequences translated to proteins (genbanbk, SW, PIR) Only about solved structures (PDB) Experimental methods are time consuming and not always posible Goal: Predict protein structure based on sequence information

24 Structure Prediction: Motivation Understand protein function –Locate binding sites Broaden homology –Detect similar function where sequence differs Explain disease –See effect of amino acid changes –Design suitable compensatory drugs

25 Prediction Approaches Primary (sequence) to secondary structure –Sequence characteristics Secondary to tertiary structure –Fold recognition –Threading against known structures Primary to tertiary structure –Ab initio modelling

26 Secondary structures have an amphiphilic nature : one face polar and the other non polar Non-polar polar  -helix  -sheet non- polar Can we predict the secondary structure from sequence ?

27 Secondary Structure Prediction Methods Chou-Fasman / GOR Method –Based on amino acid frequencies Artificial Neural Network (ANN) methods –PHDsec and PSIpred HMM (Hidden Markov Model) Best accuracy now ~80%

28 Chou and Fasman (1974) Name P(a) P(b) P(turn) Alanine Arginine Aspartic Acid Asparagine Cysteine Glutamic Acid Glutamine Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine The propensity of an amino acid to be part of a certain secondary structure (e.g. – Proline has a low propensity of being in an alpha helix or beta sheet  breaker) Success rate of 50%

29 Secondary Structure Method Improvements ‘Sliding window’ approach Most alpha helices are ~12 residues long Most beta strands are ~6 residues long  Look at all windows of size 6/12  Calculate a score for each window. If >threshold  predict this is an alpha helix/beta sheet TGTAGPOLKCHIQWMLPLKK

30 Improvements in the 1980’s Adding information from conservation in MSA Smarter algorithms (e.g. HMM, neural networks). Success -> ~80%

31 PHDsec and PSIpred PHDsec –Rost & Sander, 1993 –Based on sequence family alignments PSIpred –Jones, 1999 –Based on Position Specific Scoring Matrix Generated by PSI-BLAST Both consider long-range interactions

32 HMM HMM enables us to calculate the probability of assigning a sequence of hidden states to the observation TGTAGPOLKCHIQWMLHHHHHHHLLLLBBBBB p = ? observation Hidden state

33 The probability of observing a residue which belongs to an α- helix followed by a residue belonging to a turn = 0.15 The probability of observing Alanine as part of a β-sheet Table built according to large database of known secondary structures α-helix followed by α-helix Beginning with an α- helix

34 HMM The above table enables us to calculate the probability of assigning secondary structure to a protein Example TGQHHH p = 0.45 x x 0.8 x x 0.8x =

35 SS prediction using ANN Inputs for one position Amino acid at position

36 PHDsec Neural Net Inputs for one position Amino acid at position Hidden layer Outputs H= helix E= strand C= Coil Confidence 0=low,9=high

37 Secondary structure prediction AGADIR - An algorithm to predict the helical content of peptidesAGADIR APSSP - Advanced Protein Secondary Structure Prediction ServerAPSSP GOR - Garnier et al, 1996GOR HNN - Hierarchical Neural Network method (Guermeur, 1997)HNN Jpred - A consensus method for protein secondary structure prediction at University of DundeeJpred JUFO - Protein secondary structure prediction from sequence (neural network)JUFO nnPredict - University of California at San Francisco (UCSF)nnPredict PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom, EvalSec from Columbia UniversityPredictProtein Prof - Cascaded Multiple Classifiers for Secondary Structure PredictionProf PSA - BioMolecular Engineering Research Center (BMERC) / BostonPSA PSIpred - Various protein structure prediction methods at Brunel UniversityPSIpred SOPMA - Geourjon and Del י age, 1995SOPMA SSpro - Secondary structure prediction using bidirectional recurrent neural networks at University of CaliforniaSSpro DLP - Domain linker prediction at RIKENDLP