Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab.

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Advertisements

Weixi Zhong Mentor: Dr. Andrew Cameron Center for Computational Regulatory Genomics California Institute of Technology.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
OUTLINE Scoring Matrices Probability of matching runs Quality of a database match.
Consistent probabilistic outputs for protein function prediction William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Research Methodology of Biotechnology: Protein-Protein Interactions Yao-Te Huang Aug 16, 2011.
Profiles for Sequences
Distinguishing Regulators of Biomolecular Pathways Mentor: Dr. Xiwei Wu City of Hope Sean Caonguyen SoCalBSI 8/21/08.
The Cobweb of life revealed by Genome-Scale estimates of Horizontal Gene Transfer Fan Ge, Li-San Wang, Junhyong Kim Mourya Vardhan.
Southern California Bioinformatics Summer Institute Wendie Johnston, Beverly Krilowicz, Jamil Momand, Sandra Sharp, Nancy Warter- Perez.
A Genomic Survey of Polymorphism and Linkage Disequilibrium Imran Mohiuddin Magnus Nordborg, Ph.D. University of Southern California.
Mutual Information Mathematical Biology Seminar
Southern California Bioinformatics Summer Institute Wendie Johnston, Beverly Krilowicz, Jamil Momand, Sandra Sharp, Nancy Warter-Perez.
Southern California Bioinformatics Summer Institute Wendie Johnston, Beverly Krilowicz, Jamil Momand, Sandra Sharp, Nancy Warter-Perez.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Larry Lam Southern California Bioinformatics Summer Institute 2009 Graeber Lab – Crump Institute for Molecular Imaging UCLA A Data Management and Analysis.
Similar Sequence Similar Function Charles Yan Spring 2006.
Evaluation of Two Methods to Cluster Gene Expression Data Odisse Azizgolshani Adam Wadsworth Protein Pathways SoCalBSI.
Study of Arabidopsis’ Copper Regulation by High Throughput Sequence Data Analysis Steven A. Cardenas, SoCal BSI Dr. Pellegrini, PI, UCLA Dr. Casero Diaz-Cano,
Protein Evolution Jean Yeh, SoCalBSI Mike Thompson, UCLA Summer 2005.
Is Forkhead Box N1 (FOXN1) significant in both men and women diagnosed with Chronic Fatigue Syndrome? Charlyn Suarez.
Exploring the Biology of Disulfide-Rich Hyperthermophiles through Protein Phylogenetic Profiles Navapoln Ramakul 1, Morgan Beeby 12, and Todd O. Yeates.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
BWBmin Administrative Web Interface for Paracel BioView WorkBench Frances Tong Marc Rieffel, PhD Paracel Southern California Bioinformatics Summer Institute.
A Computational Analysis of the H Region of Mouse Olfactory Receptor Locus 28 Deanna Mendez SoCalBSI August 2004.
Materials and Methods Abstract Conclusions Introduction 1. Korber B, et al. Br Med Bull 2001; 58: Rambaut A, et al. Nat. Rev. Genet. 2004; 5:
Computational studies of intramolecular disulfide bonded catenanes as a novel stabilizing mechanism in thermophilic microbes August 23, 2007 Daniel Park.
Genetic Effects of Stress in Vervet Monkey Olivera Grujic Dr. Eleazar Eskin’s Lab, UCLA Dr. Nelson Freimer’s Lab,UCLA SoCalBSI, 2008.
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Functional Associations of Protein in Entire Genomes Sequences Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding.
Here it is…. …Your Super Duper Faster than Ever Pooper Scooper Review of Statistics and Data Analysis!
Sequence analysis – an overview A.Krishnamachari
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
A Study of Residue Correlation within Protein Sequences and its Application to Sequence Classification Christopher Hemmerich Advisor: Dr. Sun Kim.
Genomic ORFans: Past, Present and Future Naomi Siew and Daniel Fischer Ben-Gurion University Be’er-Sheva, Israel.
Protein and RNA Families
Anis Karimpour-Fard ‡, Ryan T. Gill †,
Genome Analysis II Comparative Genomics Jiangbo Miao Apr. 25, 2002 CISC889-02S: Bioinformatics.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
PPI team Progress Report PPI team, IDB Lab. Sangwon Yoo, Hoyoung Jeong, Taewhi Lee Mar 2006.
Anis Karimpour-Fard 1, Corrella Detweiler 2, Ryan T. Gill 3, and Lawrence Hunter 1 1 University of Colorado School of Medicine 2 MCD-Biology, University.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Functional and Evolutionary Attributes through Analysis of Metabolism Sophia Tsoka European Bioinformatics Institute Cambridge UK.
Use of Logic Relationships to Decipher Protein Network Organization Peter M. Bowers, Shawn J. Cokus, David Eisenberg, Todd O. Yeates Presented by Krishna.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
1 Computational functional genomics Lital Haham Sivan Pearl.
Pathway Ranking Tool Dimitri Kosturos Linda Tsai SoCalBSI, 8/21/2003.
1 Tom Edgar’s Contribution to Model Reduction as an introduction to Global Sensitivity Analysis Procedure Accounting for Effect of Available Experimental.
Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment Raja Jothi, Teresa.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics.
Sequence similarity, BLAST alignments & multiple sequence alignments
Alyssa Kent 6/1/2013 C-MORE Student Symposium
박 종 빈 (Jongbin Park, M.S. Candidate Student)
Genome Annotation Continued
Large Scale Data Integration
CISC 841 Bioinformatics (Spring 2006) Inference of Biological Networks
Presented by Meeyoung Park
Functional Impact of Transposable Element using Bioinformatic Analysis
GENE ANNOTATION AND NETWORK INFERENCE BY PHYLOGENETIC PROFILING
Gene Family Ancestral State Phylogenetic Profiling
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Gautam Dey, Tobias Meyer  Cell Systems 
Yamanishi, M., Itoh, M., Kanehisa, M.
Association between genome size and the dN/dS ratio for archaeal (A; n = 21) and bacterial (B; n = 28) genome pairs and association between coding density.
Presentation transcript:

Finding detailed relationships between proteins specific to phenotypes among microbial organisms Daniel Park Molecular Biology Institute, UCLA Yeates lab SoCalBSI August 24, 2006

OUTLINE Phylogenetic profiles Ternary logic analysis Building COG & phenotype profiles Results of logic analysis

OUTLINE Phylogenetic profiles Ternary logic analysis Building COG & phenotype profiles Results of logic analysis

PHYLOGENETIC PROFILES Turning an earlier question on its side: From, “What proteins are found in a genome?” To, “What genomes contain a given protein?”

VARIATIONS OF PHYLOGENETIC PROFILES Relationships between protein families Relationships between protein family profile and given target ‘phenotype’ profile

OUTLINE Phylogenetic profiles Ternary logic analysis Building COG & phenotype profiles Results of logic analysis

COMPLEXITY OF CELLULAR PROCESSES

HIGHER ORDER RELATIONSHIPS: TERNARY LOGIC ANALYSIS A B

8 LOGIC TYPES FOR PHYLOGENETIC PROFILE TRIPLETS

MEASURING MUTAL INFORMATION BETWEEN TWO PROFILES Where U is the uncertainty coefficient relating profiles x and y H is the Shannon entropy of the probability distributions Range of U : [0,1]Ex. U = 0.88  88% decrease in uncertainty High value of U indicates high mutual information between x and y

MEASURING MUTAL INFORMATION AMONG THREE PROFILES U(c | f(a,b)) where f(a,b) is the logical combination of a and b Constraints: U(c|a) < x U(c|b) < x U(c|f(a,b)) > y

OUTLINE Phylogenetic profiles Ternary logic analysis Building COG & phenotype profiles Results of logic analysis

COGs: CLUSTERS OF ORTHOLOGOUS GROUPS Set of orthologous proteins from at least three different lineages Cluster  Functional group

COMBINATIONS OF COG PROFILES MATCHING A PHENOTYPE

ASSOCIATING MORE GENOMES WITH COGS

` BUILDING COG PROFILES 81,480 proteins 354 bacterial genomes 4,613 COGs

BUILDING PHENOTYPE PROFILES

OUTLINE Phylogenetic profiles Ternary logic analysis Building COG & phenotype profiles Results of logic analysis

Cumulative no. of protein triplets recovered at an uncertainty coefficient score greater than a given threshold

Frequency for each of the eight logic function types observed

CORRELATIONS WITH PHENOTYPES: TEMPERATURE RANGE For U > 0.8, one relationship between proteins was found: Hyperthermophilicity = and( COG0432, !COG0225 ) U ( Hyp. | COG0432 ) = 0.26 U ( Hyp. | COG0225 ) = 0.29 U ( Hyp. | and ( COG0432, !COG0225 ) ) = 0.71 [S] COG0432: Uncharacterized conserved protein [O] COG0225: Peptide methionine sulfoxide reductase

LOGICAL COMBINATION OF COG PROFILES MATCHING A PHENOTYPE PROFILE c = hyperthermophilicity f = and( COG0432, !COG0225 ) a = COG0432(Uncharacterized conserved protein) b = !COG0225(Peptide methionine sulfoxide reductase)

CONCLUSIONS There may be a correlation between the absence of methionine sulfoxide reductase and the presence of an uncharacterized conserved protein in hyperthermophiles.

CONCLUSIONS –Classified ~80,000 proteins from 354 bacterial genomes into ~4,600 COGs –Built COG and phenotype profile matrices for 354 fully sequenced bacterial genomes –Support that ternary relationships among COGs are biologically significant –Support that some logic types are seen in biology more than others:1 (and) 5 7 (xor)

FUTURE DIRECTIONS Build a richer database of phenotype profiles Investigate relationships at lower cutoffs Experimentally characterize the unknown COG0432 by crystallography

ACKNOWLEDGEMENTS Todd Yeates Matteo Pellegrini Yeates lab Morgan Beeby Brian O’Connor Rest of the lab SoCalBSI 2006 Jamil Momand Wendie Johnston Sandra Sharp Nancy Warter-Perez Ronnie Cheng Fellow participants