CS177 Lecture 7 Computational Aspects of Protein Structure II Tom Madej 10.25.04.

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

C A T H C A T H lass rchitecture opology or Fold Group
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Protein Structure Prediction
©CMBI 2001 The amino acids in their natural habitat.
The amino acids in their natural habitat. Topics: Hydrogen bonds Secondary Structure Alpha helix Beta strands & beta sheets Turns Loop Tertiary & Quarternary.
CATH and SCOP Topic 8 Chapters 17 & 18, Gu and Bourne “ Structural Bioinformatics”
Pfam(Protein families )
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Alpha/Beta structures Barrels, sheets and horseshoes.
Beta structures An awful lot of barrels.... Functionally the most diversily populated group (antibodies, enzymes, transport proteins etc…) Second biggest.
Strict Regularities in Structure-Sequence Relationship
Bioinformatics master course DNA/Protein structure-function analysis and prediction Lecture 5: Protein Fold Families Jaap Heringa Integrative Bioinformatics.
Protein structure (Part 2 of 2).
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
Recursive domains in proteins
The following slides present some answers….. Please don’t peek before doing the exercise!
The Protein Data Bank (PDB)
Protein structures in the PDB
Classification and comparison of protein structures Overview Domains as the fundamental unit of classification Major structural classification systems-CATH,
Protein structure Classification Ole Lund, Associate professor, CBS, DTU.
Protein Structure Analysis - I
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Lecture 3. α domain structures Coiled-coil, knobs and hole packing Four-helix bundle Donut ring large structure Globin fold Ridges and grooves model CS882,
IBGP/BMI 705 Lab 4: Protein structure and alignment TA: L. Cooper.
Pairwise sequence alignments Dynamic programming (Needleman-Wunsch), finds optimal alignment Heuristics: Blast (Altschul et al) does not guarantee finding.
Protein Tertiary Structure Prediction
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Macromolecular structure
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Structural databases Lecture 5 Structural Bioinformatics Dr. Avraham Samson
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
Exploiting Structural and Comparative Genomics to Reveal Protein Functions  Predicting domain structure families and their domain contexts  Exploring.
Bioinformatics 2 -- Lecture 8 More TOPS diagrams Comparative modeling tutorial and strategies.
CATH – a hierarchic classification of protein domain structures Rui Kuang.
BMMB597E Protein Evolution Protein classification 1.
CS177 Review/Summary of the Madej lectures Tom Madej
Tertiary structure combines regular secondary structures and loops (coil) Bovine carboxypeptidase A.
Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
©CMBI 2001 Step 5: The amino acids in their natural habitat.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein Strucure Comparison Chapter 6,7 Orengo. Helices α-helix4-turn helix, min. 4 residues helix3-turn helix, min. 3 residues π-helix5-turn helix,
DALI Method Distance mAtrix aLIgnment
Comparing and Classifying Domain Structures
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein backbone Biochemical view:
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Principles of Protein Structure. AMINOACIDS Estereoisomer L Side-chain (-CH 3 ) }carboxyl-COOH amino amino -NH 2.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
Protein Tertiary Structure Prediction Structural Bioinformatics.
Taxonomy & Phylogeny. B-5.6 Summarize ways that scientists use data from a variety of sources to investigate and critically analyze aspects of evolutionary.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Structural Bioinformatics Elodie Laine Master BIM-BMC Semester 3, Genomics of Microorganisms, UMR 7238, CNRS-UPMC e-documents:
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
Chapter 14 Protein Structure Classification
Protein Structure September 7,
CS177 Lecture 13 Review/Summary of the Madej lectures
Classification: understanding the diversity and principles of
Protein Structures.
Protein structure prediction.
DALI Method Distance mAtrix aLIgnment
Protein Structural Classification
The Three-Dimensional Structure of Proteins
Presentation transcript:

CS177 Lecture 7 Computational Aspects of Protein Structure II Tom Madej

Research news (Nature ) Another milestone for the Human Genome Project. –Fills in approx. 99% of the “gene rich” portion of the genome (10% more than the 2001 drafts). –Only 341 remaining gaps, formerly hundreds of thousands. –New estimate of the number of genes: 20,000-25,000. Megabase deletions result in viable mice! –Researchers deleted 1.5 Mb and 0.8 Mb portions of the mouse genome, non-coding regions, and the mice seem to be fine!

Nature Oct. 21, 2004,

Example for last homework I searched “Structure” with the term “Leukemia”. The first structure was 1uc6A. I noticed a couple of Vast neighbors with low percent sequence identity but very similar folds, 1uemA (17.4%), 1uenA (13.7%). I ran PSI-BLAST with query sequence 1uc6A. The CD Search got a hit to “Fibronectin type 3”. 1uemA and 1uenA are also assigned to FN3, but for some reason 1uc6 is not (???). I got lucky, 1uemA and 1uenA were found by PSI- BLAST but did not cross the significance threshold prior to convergence!

Overview of lecture Protein structure –General principles –Structure hierarchy –Supersecondary structures –Superfolds and examples: TIM barrels, OB fold Protein structure comparison algorithms –VAST (Vector Alignment Search Tool) –CE (Combinatorial Extension) Protein fold classification databases –SCOP (Structural Classification of Proteins) –CATH (Class, Architecture, Topology, Homologous superfamily)

General principles Most protein structures are composed of two types of regular structural elements interconnected by less well- structured regions. Regular secondary structure elements (SSEs): α-helices and β-strands. Irregular regions: loops or coil. A pair of SSEs positioned next to each other in space may be parallel or anti-parallel.

General principles (cont.) Helices are stabilized by “internal” hydrogen bonds. Hydrogen bonds will form between an adjacent pair of strands. Strands will form larger structures such as β-sheets or β- barrels. Due to the residue side chains, there are favored packing angles between helices/helices, helices/sheets, and sheets/sheets.

Examples of protein architecture β-sheet with all pairs of strands parallel β-sheet with all pairs of strands anti-parallel Architecture refers to the arrangement and orientation of SSEs, but not to the connectivity.

Examples of protein topology Topology refers to the manner in which the SSEs are connected. Two β-sheets (all parallel) with different topologies.

Exercise Take a look at 1r7sA in Cn3D. Draw a topology diagram showing the way the strands are connected.

Angles between SSEs in contact The data on the next 3 slides gives the cosine of angles between a pair of SSE vectors. The SSE’s were required to be “in contact”, i.e. within 10 Å of each other. Note: The SSEs are not necessarily consecutive in the sequence!

Examples of structures formed by β-strands Triosphosphate isomerase 7timA Retinol binding protein 1rbp Porin 1oh2P

Higher level organization A single protein may consist of multiple domains. Examples: 1liy A, 1bgc A. The domains may or may not perform different functions. Proteins may form higher-level assemblies. Useful for complicated biochemical processes that require several steps, e.g. processing/synthesis of a molecule. Example: 1l1o chains A, B, C.

Example: Replication Protein A E. Bochkareva et al. The EMBO Journal (2002) RPA binds to ssDNA, is involved in recombination, replication, and repair. It is a heterotrimer, consisting of three subunit proteins that bind together. See structure 1l1o.

Supersecondary structures β-hairpin α-hairpin βαβ-unit β4 Greek key βα Greek key

Supersecondary structure: simple units G.M. Salem et al. J. Mol. Biol. (1999)

Supersecondary structure: Greek key motifs G.M. Salem et al. J. Mol. Biol. (1999)

Examples of β4 Greek key motif 1hk0 Human Gamma-D Crystallin; residues 32 thru 64 in domain 1. OB fold (we’ll see this fold later).

Examples of βα Greek key motif 1bgw Topoisomerase; residues 487 thru 540 in domain 5. 1ris Ribosomal protein S6.

Protein folds There is a continuum of similarity! Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity (topology). Sometimes a few SSEs may be missing. Fold classification: To get an idea of the variety of different folds, one must adjust for sequence redundancy and also try to correctly assign homologs that have low sequence identity (e.g. below 25%).

Superfolds (Orengo, Jones, Thornton) Distribution of fold types is highly non-uniform. There are about 10 types of folds, the superfolds, to which about 30% of the other folds are similar. There are many examples of “isolated” fold types. Superfolds are characterized by a wide range of sequence diversity and spanning a range of non-similar functions. It is a research question as to the evolutionary relationships of the superfolds, i.e. do they arise by divergent or convergent evolution?

Superfolds and examples Globin 1hlm sea cucumber hemoglobin; 1cpcA phycocyanin; 1colA colicin α-up-down 2hmqA hemerythrin; 256bA cytochrome B562; 1lpe apolipoprotein E3 Trefoil 1i1b interleukin-1β; 1aaiB ricin; 1tie erythrina trypsin inhibitor TIM barrel 1timA triosephosphate isomerase; 1ald aldolase; 5rubA rubisco OB fold 1quqA replication protein A 32kDa subunit; 1mjc major cold- shock protein; 1bcpD pertussis toxin S5 subunit α/β doubly-wound 5p21 Ras p21; 4fxn flavodoxin; 3chy CheY Immunoglobulin 2rhe Bence- Jones protein; 2cd4 CD4; 1ten tenascin UB αβ roll 1ubq ubiquitin; 1fxiA ferredoxin; 1pgx protein G Jelly roll 2stv tobacco necrosis virus; 1tnfA tumor necrosis factor; 2ltnA pea lectin Plaitfold (Split αβ sandwich) 1aps acylphosphatase; 1fxd ferredoxin; 2hpr histidine-containing phosphocarrier

TIM barrels Classified into 21 families in the CATH database. Mostly enzymes, but participate in a diverse collection of different biochemical reactions. There are intriguing common features across the families, e.g. the active site is always located at the C- terminal end of the barrel.

N. Nagano et al. J. Mol. Biol. (2002)

TIM barrel evolutionary relationships (Nagano, Orengo, Thornton) Sequence analysis with advanced programs such as PSI-BLAST and IMPALA have identified further relationships among the families. Further interesting similarities observed from careful comparison of structures, e.g. a phosphate binding site commonly formed by loops 7, 8 and a small helix. In summary, there is evidence for evolutionary relationships between 17 of the 21 families.

OB (oligonucleotide/oligosaccharide- binding) fold 5-stranded β-barrel with Greek key topology. All OB folds have the same binding face that is involved in their biochemistry.

V. Arcus Curr. Opinion Struct. Biol. (2002)

OB evolutionary relationships SCOP lists 9 superfamilies. Bacterial enterotoxin superfamily consists of two families, almost certainly evolutionarily related. Nucleic acid-binding superfamily has 11 families, if evolutionarily related the ancestral protein would come from the LUCA (Last Universal Common Ancestor). Evidence for common ancestry of all OB folds is probably weaker than for TIM barrels.

Protein structure comparison How to compare 3D protein structures? Analogous computational considerations to sequence comparison, e.g. accuracy, efficiency for database searches, statistical significance of results, etc. Additional complication: working with atomic coordinates in 3D space!

Some protein structure comparison methods VAST (Vector Alignment Search Tool, NCBI) CE (Combinatorial Extension, RCSB/PDB) DALI (EBI)

VAST outline 1.Parse protein structures into SSEs (helices and strands). 2.Fit vectors to SSEs. 3.To compare a pair of proteins attempt to superpose as many vectors as possible, subject to constraints. 4.Evaluate the vector alignment for statistical significance( computer an E-value). 5.If the vector alignment is significant then proceed to a more detailed residue-to-residue alignment (“refined alignment”).

3chy1ipf A Two protein with vectors assigned to SSEs

Vector superpositionRefined alignment VAST comparison of 3chy and 1ipfA

SCOP (Structural Classification of Proteins) Levels of the SCOP hierarchy: –Family: clear evolutionary relationship –Superfamily: probable common evolutionary origin –Fold: major structural similarity

CATH (Class, Architecture, Topology, Homologous superfamily)