An integrated approach to understanding regulation in biological systems: from protein sequences to molecular interaction networks An integrated approach.

Slides:



Advertisements
Similar presentations
Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.
Advertisements

The multi-layered organization of information in living systems
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Medical Genetics & Genomics
Darwinian Genomics Csaba Pal Biological Research Center Szeged, Hungary.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Gene regulation in cancer 11/14/07. Overview The hallmark of cancer is uncontrolled cell proliferation. Oncogenes code for proteins that help to regulate.
Research Methodology of Biotechnology: Protein-Protein Interactions Yao-Te Huang Aug 16, 2011.
A Real-life Application of Barabasi’s Scale-Free Power-Law Presentation for ENGS 112 Doug Madory Wed, 1 JUN 05 Fri, 27 MAY 05.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
2 March, 2005 Chapter 12 Mutational dissection Normal gene Altered gene with altered phenotype mutagenesis.
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Protein domains vs. structure domains - an example.
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae Speaker: Zhu YANG 6 th step, 2006.
Systems Biology Biological Sequence Analysis
Data visualization in the post-genomics era Carol Morita Genentech, Inc.
Indiana University Bloomington, IN Junguk Hur Computational Omics Lab School of Informatics Differential location analysis A novel approach to detecting.
Alternative splicing and evolution Daniel Jeffares.
BACKGROUND E. coli is a free living, gram negative bacterium which colonizes the lower gut of animals. Since it is a model organism, a lot of experimental.
Biological networks: Types and origin
ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Bryan Heck Tong Ihn Lee et al Transcriptional Regulatory Networks in Saccharomyces cerevisiae.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Protein Interactions and Disease Audry Kang 7/15/2013.
CELL Building blocks of life Chemical system able to maintain its structure and reproduce Cells  tissues  organs  Organism PROKARYOTES Primitive organisms.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Analyzing transcription modules in the pathogenic yeast Candida albicans Elik Chapnik Yoav Amiram Supervisor: Dr. Naama Barkai.
Network Biology Presentation by: Ansuman sahoo 10th semester
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
Finish up array applications Move on to proteomics Protein microarrays.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Reconstructing gene networks Analysing the properties of gene networks Gene Networks Using gene expression data to reconstruct gene networks.
Reconstruction of Transcriptional Regulatory Networks
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Protein and RNA Families
Computational systems biology: from the generation of testable hypothesis to uncovering organizing principles in living systems Computational systems biology:
Functional and Evolutionary Attributes through Analysis of Metabolism Sophia Tsoka European Bioinformatics Institute Cambridge UK.
Structure, evolution and dynamics of transcriptional regulatory networks M. Madan Babu, PhD National Institutes of Health.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Introduction to biological molecular networks
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Proteomics, the next step What does each protein do? Where is each protein located? What does each protein interact with, if anything? What role does it.
Biol 729 – Proteome Bioinformatics Dr M. J. Fisher - Protein: Protein Interactions.
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metabolic Network Science, Vol 292, Issue 5518, , 4 May 2001.
Shortest Path Analysis and 2nd-Order Analysis Ming-Chih Kao U of M Medical School
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
1 Genomics Advances in 1990 ’ s Gene –Expressed sequence tag (EST) –Sequence database Information –Public accessible –Browser-based, user-friendly bioinformatics.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Robustness, clustering & evolutionary conservation Stefan Wuchty Center of Network Research Department of Physics University of Notre Dame title.
1 Lesson 12 Networks / Systems Biology. 2 Systems biology  Not only understanding components! 1.System structures: the network of gene interactions and.
Network Analysis Goal: to turn a list of genes/proteins/metabolites into a network to capture insights about the biological system 1.Types of high-throughput.
1 What forces constrain/drive protein evolution? Looking at all coding sequences across multiple genomes can shed considerable light on which forces contribute.
Molecules and mechanisms of epigenetics. Adult stem cells know their fate! For example: myoblasts can form muscle cells only. Hematopoetic cells only.
System Structures Identification
Predicting Active Site Residue Annotations in the Pfam Database
Functional Impact of Transposable Element using Bioinformatic Analysis
Protein Complex Discovery
Protein Complex Discovery
Presentation transcript:

An integrated approach to understanding regulation in biological systems: from protein sequences to molecular interaction networks An integrated approach to understanding regulation in biological systems: from protein sequences to molecular interaction networks NCBI, NLM National Institutes of Health NCBI, NLM National Institutes of Health M. Madan Babu, PhD

Explosion of information about living systems Major Challenge – Integration of information Generate experimentally testable hypothesis Uncover organizing principles of specific processes at the systems level Expression 5,000 different conditions 20 organisms (SMD, GEO, ArrayExpress) Interaction 100,000 interactions 5 organisms (Bind, DIP, publications) Structure 33,000 structures from 10,000 organisms (PDB, MSD) Sequence 45,000,000 sequences from 160,000 organisms (Genbank, NCBI)

Integration of data to uncover general organizing principles Network transformation procedure to reveal hidden features Integration of data to generate testable hypotheses Sequence, Structure, Expression and Interaction data provides convincing support Regulation in Biological Systems Transcriptional regulatory network in yeast Discovery of sequence specific transcription factors in the malarial parasite Integration of data to investigate cell size regulation

Mosquito Human Mosquito Liver RBC Previous comparative genomic analysis of eukaryotes suggested lack of detectable transcription factors in Plasmodium 5300 genes with over 700 metabolic enzymes Extensive complement of chromosomal regulatory proteins Extensive complement signaling proteins (GTPases, kinases) Large number of genes Complex life cycle Genes need to be regulated

Alternative regulatory mechanisms Chromatin-level regulation Post-translational modification RNA based regulation Undetected transcription factors Distantly related or unrelated to known DNA binding domains Possible explanations for the paradoxical observation Proteome of Plasmodium Profiles & HMMs of known DBDs bZIP Homeo MADs AT-hook Forkhead ARID PF14_ ? AT-Hook SEG Uncharacterized Globular domain ~60 aa

Characterization of the globular domain – sequence analysis I Non-redundant database Lineage specific expansion in Apicomplexa Plasmodium falciparum Plasmodium vivax Cryptosporidium parvum Theileria annulata Cryptosporidium hominis Profiles + HMM of this region Non-redundant database + Floral Homeotic protein Q (Triticum) 49L, an endonuclease (X. oryzae phage Xp10) Globular region maps to AP2 DNA-binding domain

Non-redundant database + AP2 DNA-binding Domain from D. Psychrophila DP2593 MAL6P1.287 (Plasmodium falciparum) Cgd6_1140/Chro (Cryptosporidium) AP2 DNA-binding domain maps to the Globular region Characterization of the globular domain – sequence analysis II Multiple sequence alignment of all globular domains JPRED/PHD Sequence of secondary structure is similar to the AP2 DNA-binding domain Homologs of the conserved globular domain constitutes a novel family of the AP2 DNA-binding domain S1S2S3H1

Characterization of the globular domain – structural analysis I A. thaliana ethylene response factor (ATERF1 - 1gcc – NMR structure) Binds GC rich sequences S1S2S3H1 S1S2S3H1 Predicted SS of ApiAP2 SS of ATERF1 S1S2S3 H1 12 residues show a strong pattern of conservation and these are involved in key stabilizing hydrophobic interactions that determine the path of the backbone in the three strands and helix of the AP2 domain Core fold of the ApiAp2 domain will be similar to the plant AP2 DNA-binding domain

Characterization of the globular domain – structural analysis II Y186 R147 K156 T175 R170 W154 R152 R150 E160 W172 W162 G5 C6 C7 G21 G20 G18 G17 R G5 (oxo group) D/N --- A (amino group) R G20 (oxo group) S/T --- A (amino group) Changes in base-contacting residues suggest binding to AT-rich sequence S2S3 Charged residues in the insert may contact multiple phosphate groups to provide affinity ApiAp2 domain binds DNA in a sequence specific manner

RBC infection & merozoite burst Characterization of the globular domain – expression analysis I Mosquito Human Mosquito Complex life cycle Liver RBC Intra-erythrocyte developmental cycle DeRisi Lab mRNA expression profiling Using microarray (sorbitol syncronization)

Characterization of the globular domain – expression analysis II Co-expressed genes Average expression profile of all genes 0 46 Time points Ring stage Trophozoite stage Early Schizont stage Schizont stage 22 Transcription factors 046 Time points Striking expression pattern in specific developmental stages suggests that they could mediate transcriptional regulation of stage specific genes

Characterization of the globular domain – interaction analysis I Protein interaction network of P. falciparum Protein (1267) Physical interaction (2846) LaCount et. al. Nature (2005) Modified Y2H: Gal4 DBD + Protein + auxotrophic gene RNA isolated from mixed stages of Intra-erythrocyte developmental cycle Guilt by association Function of interacting neighbors provides clues about function of the protein

Characterization of the globular domain – interaction analysis II Protein interaction network of P. falciparum Guilt by association supports the role of ApiAp2 proteins to be involved in regulation of gene expression ApiAp2 proteins (13) Chromatin proteins (8) 50% hypothetical Nucleosome assembly HMG protein Glycolytic enzymes Antigenic proteins Have a PPint domain MAL8P1.153 (ES) PFD0985w (S) PF10_0075 (T) PF07_0126 (R) Network of ApiAp2 proteins (97 interactions, 93 proteins) Gcn5

SequenceStructureExpressionInteraction Conclusion - I Integration of different types of experimental data allowed us to discover potential transcription factors in the Plasmodium genome Integration of data can generate experimentally testable hypotheses

Integration of data to uncover general organizing principles Network transformation procedure to reveal hidden features of the network Integration of data to generate testable hypotheses Sequence, Structure, Expression and Interaction data provides convincing support Regulation in Biological Systems Transcriptional regulatory network in yeast Discovery of sequence specific transcription factors in the malarial parasite Integration of data to investigate cell size regulation

Networks in Biology Nodes Links Interaction A B Network Proteins Physical Interaction Protein-Protein A B Protein Interaction Metabolites Enzymatic conversion Protein-Metabolite A B Metabolic Transcription factor Target genes Transcriptional Interaction Protein-DNA A B Transcriptional

Transcriptional interactions in yeast Transcription factors (157) Target genes (4410) Regulatory interactions (12873) Small-scale biochemical experiments Large-scale ChIP-chip experiments

N (k)  k  1 Scale-free structure Presence of few nodes with many links and many nodes with few links Transcriptional networks are scale-free Scale free structure provides robustness to the system

Scale-free networks exhibit robustness Robustness – The ability of complex systems to maintain their function even when the structure of the system changes significantly Tolerant to random removal of nodes (failure) Vulnerable to targeted attack of hubs (attack) Prediction: hubs are crucial components in such networks

Paradoxical observations on the yeast transcriptional network: Gene deletion data Very few regulatory hubs are essential for viability Prediction: Mutations that disrupt hubs will affect viability of the cell Only 5 of the 33 hubs are essential for survival in yeast under standard growth conditions knockoutViable? Not essential Mbp1p (regulates 235 genes) Cin5p (regulates 246 genes) Observation: Organisms are viable when hubs are disrupted Most hubs are still not essential when grown in different conditions knockout Condition1 Viable EssentialViable

Connectivity, k % of genomes conserved Paradoxical observations on the yeast transcriptional network: Genomic data Prediction: Since hubs are crucial components, they should be conserved in evolution Observation: Hubs are not conserved in evolution and can be lost or replaced

Prediction: Since hubs are crucial components, they should be conserved in evolution Observation: Hubs are not conserved in evolution across organisms Prediction: Mutations that disrupt hubs will affect viability of the cell Observation: Organisms are viable when hubs are disrupted Integration of data provides apparently contradictory views of predictions and observations Hidden structure, other than the scale-free structure, could provides additional robustness to the transcriptional network

Integration of data to uncover general organizing principles Network transformation procedure to reveal hidden features Integration of data to generate testable hypotheses Sequence, Structure, Expression and Interaction data provides convincing support Regulation in Biological Systems Transcriptional regulatory network in yeast Discovery of sequence specific transcription factors in the malarial parasite Integration of data to investigate cell size regulation

Network transformation procedure to reveal hidden properties A B CD E F G Transcriptional regulatory network A B C D E F G Co-regulation network Transcription factors (157) Co-regulatory association (3459) Transcription factors (157) Target genes (4410) Regulatory interactions (12873) On average 82 targetsOn average 44 co-regulatory partners

Global structure All interactions in a cell Global structure All specific co-regulatory associations Organization of the co-regulation network analogy to the organization of the transcriptional structures Basic unit Transcriptional interaction Transcription factor Target gene Basic unit Co-regulatory association Transcription factor Local structure Patterns of interconnections Local structure Patterns of co-regulatory associations

Number of co-regulatory partners Number of target genes No. co-regulatory partners v/s No. target genes Most hubs, seem to share their target genes with many other transcription factors, suggesting potential backup from other transcription factors Few < 44 Few < 82 Many ≥ 82 Many ≥ (e.g: Abf1, Fhl1 ) 59 (e.g: Lys14, Rgt1) 36 (e.g: Nrg1, Msn4) 44 (e.g: Arr1, Cup9)

Transcriptional networkCo-regulation network Figure 3 Degree, k = 0.12 = 82 = 0.48 = 44 Fraction of nodes Revealing a hidden distributed architecture by network transformation

Scale-free architecture Distributed architecture Degree, k # nodes Degree, k # nodes Structure provides insight into features such as robustness Scale-free architecture tolerates failure Distributed architecture tolerates attacks of hubs Failure Attack

Hidden distributed architecture provides robustness against mutations that disrupt hubs Fraction of nodes removed Fraction of interactions remaining Transcriptional network Co-regulation network Attack The distributed structure of the co-regulation network provides robustness under attack Fraction of interactions remaining Transcriptional network Co-regulation network Failure The scale-free structure of the transcriptional network provides robustness under failure Fraction of nodes removed P-value = < 10 -3

No. of co-regulatory partners Number of Transcription factors Real transcriptional network has a significantly different Structure from what is expected by chance. This is not a property of all scale-free networks Real network Random network

Average co-regulation partners in random networks Number of random networks Average value = 38.4 SD = 2.2 Z-score = 2.5 P-value = 9 x Possible role of selection in arriving at a higher number of co-regulating partners Distribution of average number of co-regulatory partners in 1000 random networks Value observed in real network = 44

Conclusion - II A particular network transformation procedure allowed us to elucidate the hidden structure in the transcriptional network Integration of data can uncover organizing principles in living systems The transformation procedure reveals the hidden distributed architecture of the co-regulatory network which protects the overall transcriptional program when hubs are disrupted

Integration of data to uncover general organizing principles Network transformation procedure to reveal hidden features of the network Integration of data to generate testable hypotheses Sequence, Structure, Expression and Interaction data provides convincing support Regulation in Biological Systems Transcriptional regulatory network in yeast Discovery of sequence specific transcription factors in the malarial parasite Integration of data to investigate cell size regulation

Apply these concepts and methods to a specific problem “Regulation of cell size in yeast” This study identified ~400 genes in yeast which gave rise to abnormal cell-size when knocked out 15% are ORFs of unknown function Are they evolutionarily conserved? What are their molecular functions? What are their expression patterns? What are their localization patterns? What are the morphological changes? What are their interaction partners? Which TFs regulate their expression? Which signaling proteins are involved? Which metabolic pathways are involved? Do orthologs behave similarly?

Approach to investigate cell size regulation Are they evolutionarily conserved? Presence/absence in Humans, worm, fly What are their molecular functions? Sequence & structure What are their expression patterns? Expression pattern & co-expression analysis What are their localization patterns? Membrane Nucleus What are the morphological changes? No. Nucleus Vesicle size What are their interaction partners? Protein complexes Evolutionary conservation of partners Which TFs regulate their expression? Transcription factors involved Evolutionary conservation of regulators Which signaling proteins are involved? Signaling proteins involved Evolutionary conservation Which metabolic pathways are involved? Metabolic pathways involved Evolutionary conservation Do orthologs behave similarly?

Integrate the interaction networks in yeast to understand the structure and organizing principles of the integrated network Protein1 Protein2 Physical interaction Protein1 Protein2 Transcriptional interaction Protein1 Protein2 Metabolic interaction Snapshots of the same protein in different contexts Protein1 Protein2 Signaling interaction Snapshots of the same protein in different directions PNET + TNET MNET + SNET Reconstruct the multi-dimensional Interaction network Reconstruct the three dimensional structure

Development of methods and concepts “division of labor in cellular systems” Initiators (transporters) Propagators (transcription) Effectors (enzymes) Integrated network Protein-ProteinTranscriptionalMetabolic Concept developmentMethod development Graph decomposition method can identify “cores” in molecular Networks analogous to “cores” in protein structures

BalajiLakshminarayan Aravind Acknowledgements National Center for Biotechnology Information National Institutes of Health