Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND.

Slides:



Advertisements
Similar presentations
Protein NMR terminology COSY-Correlation spectroscopy Gives experimental details of interaction between hydrogens connected via a covalent bond NOESY-Nuclear.
Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Protein Structure Prediction
Protein Structure – Part-2 Pauling Rules The bond lengths and bond angles should be distorted as little as possible. No two atoms should approach one another.
The amino acids in their natural habitat. Topics: Hydrogen bonds Secondary Structure Alpha helix Beta strands & beta sheets Turns Loop Tertiary & Quarternary.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Applied Bioinformatics The amino acids. Overview Proteins (sneak preview) – Primary structure – Secondary structure – Tertiary structure The amino acids.
Polypeptides – a quick review A protein is a polymer consisting of several amino acids (a polypeptide) Each protein has a unique 3-D shape or Conformation.
©CMBI 2001 Amino Acids “ When you understand the amino acids, you understand everything ”
The Protein Data Bank (PDB)
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Protein Basics Protein function Protein structure –Primary Amino acids Linkage Protein conformation framework –Dihedral angles –Ramachandran plots Sequence.
BMI 731 Protein Structures and Related Database Searches.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Bioinformatica I The amino acids. Things to do today Proteins (high speed sneak preview) – Primary structure – Secondary structure – Tertiary structure.
Protein Structures.
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Protein Tertiary Structure Prediction
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Housekeeping Your performance on the exam has caused me to re-evaluate how homework will be handled I will now be picking up every problem assigned on.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Proteins. Proteins? What is its How does it How is its How does it How is it Where is it What are its.
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
CS 177 Proteins, part 2 (Computational modeling) Review of protein structures Computational Modeling Three-dimensional structural analysis in laboratory.
Protein Folding & Biospectroscopy F14PFB David Robinson Mark Searle Jon McMaster
©CMBI 2001 Amino Acids “ When you understand the amino acids, you understand everything ”
Amino acids and proteins … for AS Biology. Amino acids Proteins are macromolecules consisting of long unbranched chains of amino acids. All amino acids.
Operone lac Principles of protein structure and function Function is derived from structure Structure is derived from amino acid sequence Different.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein Structure (Foundation Block) What are proteins? Four levels of structure (primary, secondary, tertiary, quaternary) Protein folding and stability.
Protein structure and function Part - I
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Proteins Pgs Pgs Allosteric Enzymes  Allosteric enzymes have 2 sites. Active site of the enzyme Additional site where another substance.
Biochemistry - as science; biomolecules; metabolic ways. Structure of proteins, methods of its determination.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
CS 177 Proteins I (Structure-function relationships) Review of protein structures Computational Modeling Three-dimensional structural analysis in laboratory.
Proteins: Molecules with Diverse Structures and Functions
Protein- Secondary, Tertiary, and Quaternary Structure.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
Proteins: 3D-Structure Chapter 6 (9 / 17/ 2009)
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein backbone Biochemical view:
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
Protein Tertiary Structure Prediction Structural Bioinformatics.
Levels of Protein Structure. Why is the structure of proteins (and the other organic nutrients) important to learn?
Enzymes SADIA SAYED. Enzymes are proteins  All enzymes are proteins  Strings of amino acids folding up into distinct structures  The properties of.
Structural organization of proteins
Bioinformatics Overview
Protein Structure BL
Chapter 14 Protein Structure Classification
Protein Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form in a biologically functional.
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
© SSER Ltd..
3.11 Proteins are essential to the structures and activities of life
Amino Acids and Proteins
The Peptide Bond Amino acids are joined together in a condensation reaction that forms an amide known as a peptide bond.
The Peptide Bond Amino acids are joined together in a condensation reaction that forms an amide known as a peptide bond.
Protein Structure Prediction
Protein Structures.
Protein Structure Chapter 14.
CS 177 Proteins, part 2 (Computational modeling)
Protein structure prediction.
Proteins.
The Three-Dimensional Structure of Proteins
Four Levels of Protein Structure
Presentation transcript:

Protein Structure Prediction and Structural Genomics Computer Science Department North Dakota State University Fargo, ND

2 Outline Structure of Protein Prediction Methods CASP Cup

3 Protein Proteins are synthesized as linear chains of amino acids, but they quickly fold into a compact, globular structure. Polypeptide sequence

4 Each amino acid has two parts, a backbone and a side chain. The side chain, R, distinguishes the different amino acids. Backbone is constant for all 20 amino acids. It consists of an amide (--NH2) group, an alpha carbon, and a carboxylic acid (-- COOH) group. Peptide bond formation:

5 The Amino Acids list ID Group CodeGroup Name 1GlyGlycine 2AlaAlanine 3ValValine 4LeuLeucine 5IleIsoleucine 6SerSerine 7ThrThreonine 8CysCysteine 9MetMethionine 10ProProline 11AspAspartic acid 12AsnAsparagine 13GluGlutamic acid 14GlnGlutamine 15HisHistidine 16LysLysine 17ArgArginine 18PhePhenylalanine 19TyrTyrosine 20TrpTryptophan

6 Protein Primary structure: Protein Primary Sequences can be written with a 3-letter code for the 20 amino acids (above) or with a 1-letter code: Ex: Human Insulin A-Chain: GIVEQCCTSICSLYQLENYCN B-Chain: FVNQHLCGSHLVEALYLVCGERGFFYTPKT

7 Protein Secondary structure Protein secondary structure refers to regular, repeated patters of folding of the protein backbone. Patterns result from regular hydrogen bond patterns of backbone atoms.

8 Protein Secondary Structure The two most common folding patterns are the alpha helix and the beta sheet.

9  -helixantiparallel  -sheet Two elements of secondary structure are alpha helices (  =  =- 60 o ) and beta strands (  = -135 o,  =135 o ), which associate with other beta strands to form parallel or anti-parallel beta sheets

10 Only two rotatable bonds in protein The bond between the amide nitrogen and the alpha carbon, referred to as  (phi) angle The bond between the alpha carbon and the carboxyl carbon, referred to as  (psi) angle Secondary Structure

11 Protein Tertiary Structure Final shapes of proteins are determined and stabilized by chemical bonds and forces, including weak bonds like Hydrogen bonds, Ionic bonds, Van der Waals bonds, and Hydrophobic attractions. Tertiary Structure of Ribonuclease: A globular protein Alpha helices, beta sheets, and turns contribute to the Ribonuclease A tertiary structure.

12 Protein Quaternary Structure The arrangement of the individual subunits of a protein with multiple polypeptide subunits gives the protein a quaternary structure Ex: Hemoglobin has 2 alpha and 2 beta subunits. Only proteins with multiple polypeptide subunits can have quaternary structure.

13 Different protein structure formation:

14 The Goal of Protein Structure Prediction “The goal of fold assignment and comparative modeling is to assign, using computational methods, each new genome sequence to the known protein fold or structure that it most closely resembles.” In other words, to class structure into families that share similar folds or motifs and to construct phylogenies.

15 Significant Identifying these shared structural motifs can provide significant insight into the functional mechanisms of the protein family. “The key to understanding the inner workings of cells is to learn the structure of Proteins that form their architecture and carry out their metabolism.” Comparing proteomics with genomics, it is fair to say that “genes were easy” and the real work of bioinformatics has just begun.

16 Protein Classification: Families and superfamilies By definition, proteins that are more than 50% identical in amino acid sequence across their entire length are said to be members of a single family. Superfamilies are groups of protein families that are related by lower but still detectable levels of sequence similarity (and therefore have a common but more ancient evolutionary origin).

17 Protein Classification: Folds Proteins are said to have a common fold if they have the same major secondary structures in the same arrangement and with same topological connections. For example, all alpha proteins, all beta proteins, alpha/beta proteins, membrane and cell surface proteins, etc. In many respects, the term fold is used synonymously with structural motif but generally refers to larger combinations of secondary structures.

18 Protein Classification: Enzyme nomenclature Each enzyme can be assigned a numerical code, such as , where the first number specifies the main class, the second and third numbers correspond to specific subclasses, and the final number represents the serial listing of the enzyme in its subclass.

19 Experimental Techniques X-ray Crystallography NMR Spectroscopy 2D electrophoresis Mass spectrometry Protein microarrays

20 Two Prediction Methods Protein Folding Model to simulate the protein folding process at various levels of abstraction which provides insights into the forces that determine protein structure and the folding process. No algorithm developed to date can determine the native structure of a protein accurately. Comparative Modeling sometimes called homology modeling, seeks to predict the structure of a target protein via comparison with the structures of related proteins.

21 Comparative Modeling Algorithms DALI (Holm1993) STRUCTAL (Gerstein1996) VAST (Gibrat1996) MINAREA (Falicov1996) LOCK (Singh1997) 3dSEARCH (Singh1998)

22 Prediction Algorithm: 3dSEARCH Designed to compute fast but approximate alignments of protein structures based on secondary structure elements alone. The fundamental idea is to represent all secondary structure vectors from all target proteins in a large, highly redundant hash table. Each secondary structure vector from a given query structure can be simultaneously compared to the entire table. It performed surprisingly well given the simplicity of its technique.

23 Prediction Algorithm: VAST Aligning secondary structure elements using graph theory. Steps of VAST Algorithm All element pairs (one from each protein) that have the same type are represented as nodes. Two nodes are connected if the distance and angle within some threshold. Find the maximal subgraph that are fully connected, which is the pairwise alignment. Compute alignment score as well as P-value.

24 Prediction Algorithm: DALI Attempt to compute the optimal similar contact patterns from a 2-d distance matrices. Use branch-and-bound algorithm to find an approximate solution.

25 Prediction Algorithm: STRUCTAL To minimize the root-mean-square difference (RMSD) between two protein backbones. Use dynamic programming to minimize.

26 Prediction Algorithm: MINAREA To compute a triangulation between the C-a atoms of the two proteins in order to minimize the stretched surface area between their backbones. Use dynamic programming (DP) to find the minimum.

27 Prediction Algorithm: LOCK Attempt to find the optimal rigid-body superposition of two structures such that root-mean-square difference (RMSD) between the aligned C-a atoms is minimized. An iterative approach that performs a greedy search to the nearest local minimum in alignment space.

28 Gold Standard for Evaluation Scope database is being widely used and has been recognized as a current standard in structural classification. ( It has been constructed by visual inspection of all structures in Protein Data Bank (PDB). Four levels, ‘class’, ‘fold’, superfamily’, and ‘family’. ‘Class’ are those that have similar overall secondary structure content.

29 CASP Competition CASP competition (Critical Assessment of Techniques for Protein Structure Prediction) Their goal is to help advance the methods of identifying protein structure from sequence.

30 The Challenge Define appropriate structure based distance measurement Efficient algorithms for calculating similarity distance