Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon.

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Advertisements

Control of Expression In Bacteria –Part 1
Chapter 18 Regulation of Gene Expression in Prokaryotes
Journal Club Jenny Gu October 24, Introduction Defining the subset of Superfamilies in LUCA Examine adaptability and expansion of particular superfamilies.
3.1 Nucleic Acids are Informational Macromolecule  Diagram and describe the structure of the DNA molecule including:  The monomer and its parts (all.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Medical Genetics & Genomics
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
Protein Targeting by Functional Linkage of Non-Homologous Proteins with examples from M. tuberculosis Genome-wide functional linkage map Structural Genomics.
Current Topics of Genomics and Epigenomics. Outline  Motivation for analysis of higher order chromatin structure  Methods for studying long range chromatin.
Gene Ontology John Pinney
Chapter 18 Regulation of Gene Expression.
Four of the many different types of human cells: They all share the same genome. What makes them different?
Research Methodology of Biotechnology: Protein-Protein Interactions Yao-Te Huang Aug 16, 2011.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Biological Gene and Protein Networks
Summary 1.Eukaryotic cells keep genetic information in DNA enclosed in cell nucleus and mitochondria and chloroplasts (plants); 2.The genomes of several.
Protein domains vs. structure domains - an example.
Protein-protein interactions Ia. A combined algorithm for genome-wide prediction of protein function. Edward M. Marcotte, Matteo Pellegrini, Michael J.
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
Introduction to BioInformatics GCB/CIS535
Protein interaction Computational (inferred) Experimental (observed)
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. (1999). Detecting protein function and protein-protein interactions from genome sequences.
Affinity chromatography/mass spec Bait protein GST Page 252.
General Microbiology (MICR300)
Protein Interactions and Disease Audry Kang 7/15/2013.
Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale Joseph L. DeRisi, Vishwanath R. Iyer, Patrick O. Brown Science Vol. 278.
GTL User Facilities Facility II: Whole Proteome Analysis Michelle V. Buchanan.
The Chemistry of Microbiology Chapter 02 Revised
Cellular Metabolism Chapter 4. Introduction Metabolism is many chemical reactionss Metabolism breaks down nutrients and releases energy= catabolism Metabolism.
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Protein analysis and proteomics (Part 2 of 2). Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan.
Data Content of the BioCyc Databases. BioCyc Tier 1 Databases.
Functional Associations of Protein in Entire Genomes Sequences Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding.
Improving PPI Networks with Correlated Gene Expression Data Jesse Walsh.
Gene Expression and Regulation
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
Gene structure in prokaryotes * In prokaryotic cells such as bacteria, genes are usually found grouped together in operons. * The operon is a cluster of.
Finish up array applications Move on to proteomics Protein microarrays.
DNA, RNA, & Proteins Vocab review Chapter 12. Main enzyme involved in linking nucleotides into DNA molecules during replication DNA polymerase Another.
Reconstruction of Transcriptional Regulatory Networks
Proteome and interactome Bioinformatics.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Anis Karimpour-Fard ‡, Ryan T. Gill †,
How Does A Cell Know? Which Gene To Express Which Gene To Express& Which Gene Should Stay Silent? Which Gene Should Stay Silent?
PPI team Progress Report PPI team, IDB Lab. Sangwon Yoo, Hoyoung Jeong, Taewhi Lee Mar 2006.
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
Bioinformatics and Computational Biology
Introduction to biological molecular networks
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
How many interactions are there? ~6,200 genes ~6,200 proteins x 2-10 interactions/protein ~12, ,000 interactions Yeast.
1 Computational functional genomics Lital Haham Sivan Pearl.
Regulation of Gene expression
Chapter – 10 Part II Molecular Biology of the Gene - Genetic Transcription and Translation.
Chapter 7: The Blueprint of Life, from DNA to Protein.
Transcription(I) 王之仰.
1st lesson Medical students Medical Biology Molecular Biology
FLiPS Functional Linkage Prediction Service.
How Proteins are Made Biology I: Chapter 10.
Transcription and Translation
Genome-wide Reconstruction of OxyR and SoxRS Transcriptional Regulatory Networks under Oxidative Stress in Escherichia coli K-12 MG1655  Sang Woo Seo,
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
From Mendel to Genomics
DNA, RNA, & Proteins Vocab review
Presentation transcript:

Genome-wide Functional Linkage Maps Methods for inferring functional linkages: Complexes, Pathways Rosetta stone Phylogenetic profiles Gene neighbors Operon method (Microarray method) The Genome-wide functional linkage Map in M. tb Assessing accuracy of functional linkages Functional linkages in structural genomics Analyzing parallel pathways The DIP and ProLinks databases

Diphtheria Toxin Dimer vs. Monomer Bennett et al., PNAS, Vol. 91, (1994)

Marcotte et al. (1999) Science, 285, 751

PHYLOGENETIC PROFILE METHOD Pellegrini et al (1999) PNAS 96, 4285

The Gene Neighbor Method for Inferring Functional Linkages genome 1... genome 2genome 3 genome 4 A A A A B B B B C C C C A B C A statistically significant correlation is observed between the positions of proteins A and B across multiple genomes. A functional relationship is inferred between proteins A and B, but not between the other pairs of proteins:

gene Abbbb gene B gene C OPERON or GENE CLUSTER method of inferring functional linkages in the genome of Mycobacterium tuberculosis The 100 bp threshold is chosen because it gives the broadest coverage consistent with high accuracy Research of Michael Strong

vs Network Interaction Map vs. Genome-Wide Functional Linkage Map Strong, Graeber et al. (2003) Nucleic Acid Research, 31, 7099

Figure 7. M. Strong, T. Graeber et al.

Requiring 2 or more functional linkages: 1,865 genes make 9,766 linkages

A E F C B D

A E F C B D Cluster A: 6 genes; 5 annotated 4 linkages 5 genes coding for DNA replication or repair The 6 th gene inferred to be involved in DNA binding, and in fact encodes a Zn-ribbon

A E F C B D Cluster A: 6 genes; 5 annotated 5 linkages 5 genes coding for DNA replication or repair The 6 th gene inferred to be involved in DNA binding, and in fact encodes a Zn-ribbon None of the genes is a homolog

A E F C B D Cluster B: 6 genes; 7 linkages 3 genes: Ser/Thr kinase or phophatase activities 2 genes: cell wall biosynth. 1 gene: unannotated Gene 14, pknB (a Ser/Thr kinase) contains PASTA domains (penicillin-binding serine/threonine kinase associated)

A E F C B D Cluster B: 6 genes; 7 linkages 3 genes: Ser/Thr kinase or phophotase activities 2 genes: cell wall biosynth. 1 gene: unannotated Gene 19 is unannotated. It contains A FHA (Forkhead associated) domain, which binds phosphothreonine containing proteins.

A E F C B D Cluster D: Links gene 50 (a penicillin binding protein involved in cell wall synthesis) to gene 51 (an integral membrane protein).

A E F C B D E is a functional link between gene 16 (pbkA in cell wall biosynthesis) and gene 50 (the penicillin binding protein involved in cell wall biosynthesis)

Some columns show similar linkages, so cluster like columns, using Eisen et al.(1998) procedure, CLUSTER

Hierarchical clustering of the TB Whole Genome Functional Linkage Map Research of Michael Strong and Tom Graeber Functional modules range in size From 2 to > 100 linkages Dozens of off diagonal functional linkages

Detoxification Polyketide and non-ribosomal Peptide synthesis Energy Metabolism, oxidoreductases Polyketide and non- ribosomal,Degradation of Fatty acids, and Energy Metabolism Degradation of Fatty acids Research of Michael Strong and Tom Graeber

Detoxification Polyketide and non-ribosomal peptide synthesis Energy Metabolism, oxidoreductase Deg. of Fatty Acids Virulence Energy Metabolism, oxidoreductase Amino acid Biosynthesis Emergy Metab. Respiration Aerobic Lipid Biosynthesis Degradation of Fatty Acids Amino Acid Biosynthesis (Branched) Synthesis and Modif. Of Macromolecules, rpl,rpm, rps Biosynthesis of Cofactors, Prosthetic groups Purine, Pyrimidine nucleotide biosynthesis Novel Group Sugar Metabolism Aromatic Amino Acid Biosynthesis Energy Metabolism, Anaerobic Respiration Two component systemsCell Envelope Cytochrome P450Chaperones Biosynthesis of cofactors Cell Envelope, Cell Division Transport/Binding Proteins Energy Metabolism TCA Broad Regulatory, Serine Threonine Protein Kinase Cell Envelope, Murein Sacculus and Peptidoglycan Transport/Binding Proteins Cations Energy Metabolism, ATP Proton Motive force Fig 4. M. Strong, T. Graeber et al.

Detoxification Polyketide and non-ribosomal peptide synthesis Energy Metabolism, oxidoreductase Deg. of Fatty Acids Virulence Energy Metabolism, oxidoreductase Amino acid Biosynthesis Emergy Metab. Respiration Aerobic Lipid Biosynthesis Degradation of Fatty Acids Amino Acid Biosynthesis (Branched) Biosynthesis of Cofactors, Prosthetic groups Purine, Pyrimidine nucleotide biosynthesis Novel Group Sugar Metabolism Aromatic Amino Acid Biosynthesis Energy Metabolism, Anaerobic Respiration Two component systemsCell Envelope Cytochrome P450Chaperones Biosynthesis of cofactors Cell Envelope, Cell Division Transport/Binding Proteins Energy Metabolism TCA Broad Regulatory, Serine Threonine Protein Kinase Cell Envelope, Murein Sacculus and Peptidoglycan Transport/Binding Proteins Cations Energy Metabolism, ATP Proton Motive force One of 7 modules of unannotated linkages, perhaps undiscovered pathways or complexes

HisG HisF HisI / HisI2 HisA HisH HisB HisC / HisC2 HisB HisD Pathway Reconstruction from Functional Linkages All 9 enzymes of the histidine biosynthesis pathway are linked, and are clustered separately from other amino acid synthetic pathways

CtaD CtaECtaC Functional Linkages Among Cytochrome Oxidase Genes CtaB Functional linkages relate all 3 components of cytochrome oxidase complex and also CtaB, the cytochrome oxidase assembly factor These genes are at four different chromosomal locations Membrane proteins linked to soluble proteins

Quantitative Assessment of Inferred Protein Complexes Research of Edward Marcotte, Matteo Pellegrini, Michael Thompson and Todd Yeates

Calculating Probabilities of Co- evolution Phylogenetic Profile Rosetta Stone Gene Neighbor Operon N= number of fully sequenced genomes n= number of homologs of protein A m = number of homologs of protein B k = number of genomes shared in common X= fractional separation of genes n = intergenic separation

Combining Inferences of Co- Evolution from 4 Methods We use a Bayesian approach to combine the probabilities from the four methods to arrive at a single probability that two proteins co-evolve: where positive pairs are proteins with common pathway annotation and negative pairs are proteins with different annotation

ProLinks Database ~ 10,000,000 Functional Linkages inferred from 83 fully sequenced genomes

Benchmarking this Approach Against Known Complexes Ecocyc: Karp et al. NAR, 30, 56 (2002) True positive interactions are between subunits of known complexes and false positive ones are between subunits of different complexes. For high confidence links, we find 1/3 of true interactions with only one 1/1000 of the false positive ones Random Research of Matteo Pellegrini

Benchmarking our Approach Against Known Complexes True positive interactions are between subunits of known complexes and false positive ones are between subunits of different complexes. For the first few hundred pairs of high confidence links, about 50% are between subunits of known complexes

Example Complex: NADH Dehydrogenase I 11 of 13 subunits detected

Example Complex: NADH Dehydrogenase I 11 of 13 subunits detected 3 false positives

From Inferred Protein Linkages to Structures of Complexes Research of Michael Strong, Shuishu Wang, Markus Kauffman

PE, PE-PGRS, and PPE Proteins in M. tuberculosis 38 PE proteins; 61 PE-PGRS proteins; 68 PPE proteins Together compromise about 5 % of the genome No function is known, but some appear to be membrane bound No structure is known: always insoluble when expressed Goal: use functional linkages to predict a complex between a PE and a PPE protein: express complex, and determine its structure Research of Shuishu Wang and Michael Strong The Problem of PE and PPE Proteins in M. tb

Construction of a co-expression vector to test for protein-protein interactions (Mike Strong) pET 29b(+) T7 promoter lac oper. RBS Nde1 HindIIIKpn1NcoI RBS gene A gene B Thrombin site His tag polycistronic mRNA transcription translation protein A protein B (with His tag) If proteins interact (protein-protein interaction) If proteins do not interact

When co-expressed, the PE and PPE proteins, inferred to interact, do form a soluble complex, Mr = 35,200 Sedimentation equilibrium experiments: Rv2430c + Rv2431c fraction 49, in 20mM HEPES, 150mM NaCl, pH 7.8 Concentration OD , 0.45, 0.15 Expected Mr: Rv 2431c (PE) 10,687 ( from Mass Spec) Rv2430c+His tag (PPE) 24,072 ( from Mass Spec) Possibly suggests a 1:1 complex between these two proteins

Crystallization trials of the Complex Between PE Protein Rv2430c and PPE Protein Rv2431c

Database of Interacting Proteins Experimentally detected interactions from the scientific literature Currently ~ 44,000 interactions

The DIP Database DOE-MBI LSBMM, UCLA

* * * Live DIP Gives the States of Proteins Transitions Documented

ProLinks Database and the Protein Navigator Contains some 10,000,000 inferred functional linkages from 83 genomes Available at Soon to be expanded to 250 fully sequenced genomes Eventually to be reconciled with DIP

Summary A X Y Z B V C A protein’s function is defined by the cellular context of its linkages Many functional linkages are revealed from genomic and microarray data (high coverage) Validity of functional linkages can be assessed by compar- ison to known complexes, and to expression data, and by keyword recovery Clustered genome-wide functional maps can reveal and organize information on complexes and pathways Functional linkages can reveal protein complexes suitable for structural studies

Protein Interactions Analysis of M.tb. Genome Michael Strong Whole Genome Interaction Maps Michael Strong & Tom Graeber Methods of Inferring Interactions Edward Marcotte, Matteo Pellegrini, Todd Yeates Michael Thompson, Richard Llwellyn Database of Interacting Proteins Lukasz Salwinski, Joyce Duan, Ioannis Xenarios, Robert Riley, Christopher Miller Parallel pathways Huiying Li