Protein Sectors: Evolutionary Units of Three-Dimensional Structure Cell (2009) Najeeb Halabi, Olivier Rivoire, Stanislas Leibler, and Rama Ranganathan.

Slides:



Advertisements
Similar presentations
Noise & Data Reduction. Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis - Spectrum.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
TEMPLATE DESIGN © Statistical Coupling Analysis of the Photosystem II D1 Protein Janan Zhu 1 ; Nicholas Polizzi 2 ; 1.
Mutidimensional Data Analysis Growth of big databases requires important data processing.  Need for having methods allowing to extract this information.
IDEAL 2005, 6-8 July, Brisbane Multiresolution Analysis of Connectivity Atul Sajjanhar, Deakin University, Australia Guojun Lu, Monash University, Australia.
Pfam(Protein families )
Classification systems have changed over time as information has increased. Section 2: Modern Classification K What I Know W What I Want to Find Out L.
Introduction to Bioinformatics
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
Protein Sectors: Evolutionary Units of Three-Dimensional Structure Najeeb Halabi, Olivier Rivoire, Stanislas Leibler, and Rama Ranganthan Cell 138, ,
Mining frequent patterns in protein structures: A study of protease families Dr. Charles Yan CS6890 (Section 001) ST: Bioinformatics The Machine Learning.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Protein Modules An Introduction to Bioinformatics.
Presentation for the current results on red blood cell study Jia-Rong, Yeh March 6, 2009.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
Principal Component Analysis Principles and Application.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Scaffold Download free viewer:
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Information theoretic interpretation of PAM matrices Sorin Istrail and Derek Aguiar.
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
Image recognition using analysis of the frequency domain features 1.
Electrical and Computer Systems Engineering Postgraduate Student Research Forum 2001 Experimental measurements of dielectric and conduction properties.
OPTIMIZATION OF FUNCTIONAL BRAIN ROIS VIA MAXIMIZATION OF CONSISTENCY OF STRUCTURAL CONNECTIVITY PROFILES Dajiang Zhu Computer Science Department The University.
1/17 Identification of thermophilic species by the amino acid compositions deduced from their genomes Reporter: Yu Lun Kuo
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Discovering the Correlation Between Evolutionary Genomics and Protein-Protein Interaction Rezaul Kabir and Brett Thompson
Calculating branch lengths from distances. ABC A B C----- a b c.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
SINGULAR VALUE DECOMPOSITION (SVD)
Slicing and dicing the 153-year record of monthly sea level at San Francisco, California using singular spectrum analysis Larry Breaker Moss Landing Marine.
Li Chen 4/3/2009 CSc 8910 Analysis of Biological Network, Spring 2009 Dr. Yi Pan.
November 18, 2000ICTCM 2000 Introductory Biological Sequence Analysis Through Spreadsheets Stephen J. Merrill Sandra E. Merrill Marquette University Milwaukee,
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Motif Search and RNA Structure Prediction Lesson 9.
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Protein Structure and Function. Proteins are organic compounds made from amino acids held together by peptide bonds.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Bivariate Association. Introduction This chapter is about measures of association This chapter is about measures of association These are designed to.
Principal Component Analysis (PCA)
Volume 9, Issue 10, Pages (October 2016)
There are four levels of structure in proteins
Volume 17, Issue 5, Pages (October 2016)
Robert G. Smock, Lila M. Gierasch  Cell 
Volume 3, Issue 1, Pages (July 2016)
Volume 112, Issue 7, Pages (April 2017)
Josh H. McDermott, Eero P. Simoncelli  Neuron 
Volume 19, Issue 7, Pages (July 2011)
Volume 79, Issue 4, Pages (August 2013)
Volume 3, Issue 1, Pages (July 2016)
Non-negative Matrix Factorization (NMF)
Structural Insights into the Inhibition of Wnt Signaling by Cancer Antigen 5T4/Wnt- Activated Inhibitory Factor 1  Yuguang Zhao, Tomas Malinauskas, Karl.
Protein Sectors: Evolutionary Units of Three-Dimensional Structure
Luis Sanchez-Pulido, John F.X. Diffley, Chris P. Ponting 
Volume 103, Issue 6, Pages (September 2012)
Alemayehu A. Gorfe, Barry J. Grant, J. Andrew McCammon  Structure 
Rita Pancsa, Daniele Raimondi, Elisa Cilia, Wim F. Vranken 
Alignment IV BLOSUM Matrices
Volume 109, Issue 7, Pages (October 2015)
Volume 11, Issue 7, Pages (May 2015)
Structural Determinants of Sleeping Beauty Transposase Activity
Presentation transcript:

Protein Sectors: Evolutionary Units of Three-Dimensional Structure Cell (2009) Najeeb Halabi, Olivier Rivoire, Stanislas Leibler, and Rama Ranganathan presented by Jianewei Zhu

Summary Proteins display a hierarchy of structural features at primary, secondary, tertiary, and higher-order levels, an organization that guides our current understanding of their biological properties and evolutionary origins. Here, we reveal a structural organization distinct from this traditional hierarchy by statistical analysis of correlated evolution between amino acids. Applied to the S1A serine proteases, the analysis indicates a decomposition of the protein into three quasi- independent groups of correlated amino acids that we term ‘‘protein sectors.’’

Summary Each sector is physically connected in the tertiary structure, has a distinct functional role, and constitutes an independent mode of sequence divergence in the protein family. Functionally relevant sectors are evident in other protein families as well, suggesting that they may be general features of proteins. We propose that sectors represent a structural organization of proteins that reflects their evolutionary histories.

Introduction Data support two main findings: –protein domains have a heterogeneous internal organization of amino acid interactions that can comprise multiple functionally distinct subdivisions (the sectors) –these sectors define a decomposition of proteins that is distinct from the hierarchy of primary, secondary, tertiary, and quaternary structure. We propose that the sectors are features of protein structures that reflect the evolutionary histories of their conserved biological properties.

Results From Amino Acid Sequence to Sectors Statistical Independence Structural Connectivity Biochemical Independence Independent Sequence Divergence Sectors in Other Protein Families

Experimental Procedures Sequence alignment construction, annotation, and sequence analyses(SCA) to get sectors Minimum discriminatory information(MDI) method to analysis statistical independence Interpret sectors’ structural connectivity by others’ previous approach Protein purification and kinetic assays to measure catalytic power of biochemical independence, and thermal denaturation assays to measure stability of biochemical independence PCA of the corresponding similarity matrices to provide independent sequence divergence

From Amino Acid Sequence to Sectors Multiple sequence alignments(MSA) Measures of positional conservation Measures of sequence similarity SCA calculations Spectral cleaning Sector identification “Pseudo sectors” Representation of significant correlations

Multiple sequence alignments(MSA) PS: 3TGI in S1A family is rat trypsin FamilyPDBSequencesPositions S1A 3TGI PDZ 1BE PAS 2V0W SH2 1AYA SH3 2ABL 49252

Measures of positional conservation

Measures of sequence similarity

SCA calculations

Spectral cleaning Eigenvalue spectra for the matrix corresponding to the S1A serine protease family (top panel) and for a hundred trials for randomizing the S1A sequence alignment (bottom panel). The randomization process scrambles the order of amino acids in each alignment column independently; thus amino acid frequencies at positions are never changed. This analysis shows that the bulk of the spectrum (comprising the lowest 218 out of 223 total eigenvalues) can be attributed to limited sampling of sequences.

Spectral cleaning Among the significant modes, the first mode has a distinctive property: it describes a "coherent" correlation of all positions and historical noise is expected to produce coherent correlations between sequence positions SCA matrices with a dominant first mode, the first eigenvector should just report the net contribution of each position to the total correlation. The first mode is irrelevant for decomposing the protein sequence into functional units and is removed.

Sector identification

The Red Sector The Blue Sector The Green Sector

Sector identification

The image and instruction is on page 22 of the supplmental data pdf.

“Pseudo sectors”

Representation of significant correlations (E) SCA matrix after reduction of statistical noise and of global coherent correlations. The 65 positions that remain fall into three groups of positions (red, blue, and green, termed ‘‘sectors’’), each displaying strong intragroup correlations and weak intergroup correlations. In each sector, positions are ordered by descending magnitude of contribution (Figure S3), showing that sector positions comprise a hierarchy of correlation strengths.

Statistical Independence The minimum discriminatory information (MDI) –method aims at generalizing the definition of positional conservation based on relative entropies to include correlations between positions. –Its principles are completely distinct from the SCA method. –If two sectors are independent, then the correlation entropy of two taken together must be the sum of their correlation entropies taken individually.

Statistical Independence

Structural Connectivity

No sector corresponds to any known subdivision of proteins by primary structure segments, secondary structure elements, or subdomain architecture.

Structural Connectivity

Biochemical Independence Protein purification and kinetic assays to measure catalytic power of biochemical independence Thermal denaturation assays to measure stability of biochemical independence

Biochemical Independence

Independent Sequence Divergence PCA of the corresponding similarity matrices to provide independent sequence divergence The image and instruction is on page 7 of the protein sectors pdf.

Sectors in Other Protein Families Two sectors are evident in the PSD95/Dlg1/ZO1 (PDZ) domain family of protein interaction modules (blue and red, Figures 7A and S11) Two sectors are also evident in the Per/Arnt/Sim (PAS) domain. Physically contiguous sectors are also evident in the SH2 and SH3 families of interaction modules (Figures 7C, 7D, S13, and S14). The image and instruction is on page 9 of the protein sectors pdf.

What can we learn?