A Global View of the Protein Structure Universe and Protein Evolution Sung-Hou Kim University of California, Berkeley, CA U.S.A. June 27, 2006.

Slides:



Advertisements
Similar presentations
Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Using a Mixture of Probabilistic Decision Trees for Direct Prediction of Protein Functions Paper by Umar Syed and Golan Yona department of CS, Cornell.
Multivariate analysis of community structure data Colin Bates UBC Bamfield Marine Sciences Centre.
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
Introduction to Bioinformatics
Correlated Mutations and Co-evolution May 1 st, 2002.
Bioinformatics master course DNA/Protein structure-function analysis and prediction Lecture 5: Protein Fold Families Jaap Heringa Integrative Bioinformatics.
Sequence alignment SEQ1: VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKK VADALTNAVAHVDDPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHA SLDKFLASVSTVLTSKYR.
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
Lecture # 9 Matrix Representation of Symmetry Groups
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
SVD and PCA COS 323. Dimensionality Reduction Map points in high-dimensional space to lower number of dimensionsMap points in high-dimensional space to.
Sequence similarity.
07/05/2004 Evolution/Phylogeny Introduction to Bioinformatics MNW2.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
SVD and PCA COS 323, Spring 05. SVD and PCA Principal Components Analysis (PCA): approximating a high-dimensional data set with a lower-dimensional subspacePrincipal.
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Protein Tertiary Structure Prediction Structural Bioinformatics.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Protein Tertiary Structure Prediction
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
8.1 The Rectangular Coordinate System and Circles Part 1: Distance and Midpoint Formulas.
Bioinformatics master course DNA/Protein structure-function analysis and prediction Lecture 5: Protein Fold Families Centre for Integrative Bioinformatics.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
Amino Acid Scoring Matrices Jason Davis. Overview Protein synthesis/evolution Protein synthesis/evolution Computational sequence alignment Computational.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Pattern Recognition Introduction to bioinformatics 2006 Lecture 4.
Sequence alignment SEQ1: VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKK VADALTNAVAHVDDPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHA SLDKFLASVSTVLTSKYR.
Construction of Substitution Matrices
Calculating branch lengths from distances. ABC A B C----- a b c.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Manifold learning: MDS and Isomap
Global Annotation of the Protein Kinase Family Michael Gribskov University of California, San Diego.
Signal & Weight Vector Spaces
Construction of Substitution matrices
Sequence comparisons April 9, 2002 Review homework Learning objectives-Review amino acids. Understand difference between identity, similarity and homology.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures Rachel Kolodny Patrice Koehl Michael Levitt Stanford University.
Nonlinear Dimension Reduction: Semi-Definite Embedding vs. Local Linear Embedding Li Zhang and Lin Liao.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu Several motifs (  -sheet, beta-alpha-beta, helix-loop-helix) combine to form a compact globular.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
Chapter 14 Protein Structure Classification
Multiple sequence alignment (msa)
Demo: Protein Information Resource

Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Overview Identify similarities present in biological sequences and present them in a comprehensible manner to the biologists Objective Capturing Similarity.
courtesy of C. Chothia Most proteins in biology have been produced by the duplication, divergence and recombination of the members of a small.
Prediction of Protein Structure and Function on a Proteomic Scale
Volume 112, Issue 7, Pages (April 2017)
EVIDENCE FOR EVOLUTION
Protein structure prediction.
Different Genes ~ Protein Primary Structure
Multidimensional Scaling
Anindita Dutta, Ivet Bahar  Structure 
A Molecular Dynamics Study of Ca2+-Calmodulin: Evidence of Interdomain Coupling and Structural Collapse on the Nanosecond Timescale  Craig M. Shepherd,
Lecture 19: Evolution/Phylogeny
NonLinear Dimensionality Reduction or Unfolding Manifolds
Volume 81, Issue 1, Pages (July 2001)
Structure of the Oxygen Sensor in Bacillus subtilis
Phylogeny and the Tree of Life
Presentation transcript:

A Global View of the Protein Structure Universe and Protein Evolution Sung-Hou Kim University of California, Berkeley, CA U.S.A. June 27, 2006

Topics I.Global view of the protein structure universe II. Mapping of protein functions on the structural universe III. Global view of the evolution of proteins

J. Hou G. Sims I.-G. Choi S.-R. Jun C. Zhang

I. Mapping the Protein Structure Universe: Structural Demography

The Protein Universe 500 – 20,000 genes per organism >13.6  10 6 species >10 10 – protein sequences but……….. ~10 5 protein sequence families ~10 4 protein structure families ~10 3 protein fold domain families

“Mapping” by Metric Matrix Distance Geometry (Classical Multidimensional Scaling) Pair-wise relational distances with “errors” Most likely (consistent) global relational “mapping” d 1,2 x1 x2x3 x4 d 2,4 d 1,3 d 2,3 d 3,4 d 1,4

Method Take all protein structures in PDB (>35,000) Construct a non-redundant set at 25% sequence identity (~2000 structures) Calculate all-to-all pair-wise structural similarities, then convert to dissimilarity scores Apply metric matrix distance geometry to find the global position of each structure in N- dimensional space 3-D plot to capture the major features of the protein structure space

Protein Structure Distance Matrix (~2000 structures with <25% sequence ID) P1P2P3P4 P5 P6 ……………P1898 P1 P2 P3 P4 P5 P6. P1898 D 3,4

Eigen values Positional coordinates in 1898 dimensional space. Major feature extraction in 3-dimension

The Protein Structure Universe (2005)

A1A2 A5 A3 A4 A1: (2ERL:_) MATING PHEROMONE ER-1; A2: (1ELW:B) TPR1-DOMAIN OF HOP; A3: (1A6M:_) MYOGLOBIN; A4: (1E85:A) CYTOCHROME C’; A5: (1M57:C) CYTOCHROME C OXIDASE; Four demographic regions of the protein structure universe

Four Protein Fold Classes  nn n n m +

Major Features of the Protein Structural Space 1.Protein structural space is sparsely populated 2.Four elongated regions corresponding to four protein “fold” classes 3.Small to large size distribution along three of four “feature axes”

II. Mapping of Functions (1) Enzymatic functions

Molecular functions: Basic chemistry EC

EC3: Hydrolases

EC6: Ligases

II. Mapping of Functions (2) Metal Binding

Ca Co Cu Fe Mn Mo Ni Zn Multi-bound Not bound Metal Binding

Zn

Cu

Major Features of Functional Mapping Maximum diversity in architectural preference for a given molecular function: “scaffold” selection vs. design

III. Evolution of Proteins (a) “Ages” of Protein Families

Method: “Common Structural Ancestor”

The “age” of the “common structural ancestor” of a protein family “Age” of CSA

Ages of the Common Structural Ancestors Population averaged Chain length has similar distribution

III. Evolution of Proteins (b) Protein Fold Classes

ML Relative “age” of common structural ancestors

III. Evolution of Proteins (e) Protein Families

Hypothesis: Multiple Origins of Protein Families

Summary Mapping of protein structures— Sparse except four highly populated demographic regions (structural selection) Mapping of molecular functions— Opportunistic use of structural features for molecular function (selection, not design) Mapping of CSA ages— (1) Evolution of protein fold classes (2)”Multiple origin model” for the evolution of protein families

Organismic evolution by natural selection for environment may be founded on Molecular evolution by structural selection for function