Bioinformatics 3 V7 – Function Annotation, Gene Regulation Mon., Nov 5, 2012.

Slides:



Advertisements
Similar presentations
Bioinformatics 3 – WS 14/15V 6 – Bioinformatics 3 V6 – Biological PPI Networks - are they really scale-free? - network growth - functional annotation in.
Advertisements

PowerPoint Presentation Materials to accompany
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Chapter 18 Regulation of Gene Expression.
Four of the many different types of human cells: They all share the same genome. What makes them different?
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Gene and Protein Networks II Monday, April CSCI 4830: Algorithms for Molecular Biology Debra Goldberg.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.
Bioinformatics 3 – WS 14/15V 7 – Bioinformatics 3 V7 – Gene Regulation Fri., Nov 14,
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale Joseph L. DeRisi, Vishwanath R. Iyer, Patrick O. Brown Science Vol. 278.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Network Analysis and Application Yao Fu
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Genetics: Chapter 7. What is genetics? The science of heredity; includes the study of genes, how they carry information, how they are replicated, how.
Reconstruction of Transcriptional Regulatory Networks
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Gene Regulation, Part 1 Lecture 15 Fall Metabolic Control in Bacteria Regulate enzymes already present –Feedback Inhibition –Fast response Control.
1 Gene Regulation Organisms have lots of genetic information, but they don’t necessarily want to use all of it (or use it fully) at one particular time.
How Does A Cell Know? Which Gene To Express Which Gene To Express& Which Gene Should Stay Silent? Which Gene Should Stay Silent?
Gene Expression and Networks. 2 Microarray Analysis Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM) Unsupervised.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Introduction to biological molecular networks
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
© 2011 Pearson Education, Inc. Lectures by Stephanie Scher Pandolfi BIOLOGICAL SCIENCE FOURTH EDITION SCOTT FREEMAN 17 Control of Gene Expression in Bacteria.
Controlling Gene Expression. Control Mechanisms Determine when to make more proteins and when to stop making more Cell has mechanisms to control transcription.
Bioinformatics 3 V7 – Gene Regulation
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
 Signal Transduction transmits signals from outside to the inside of the cell  Integer Linear Programming model is used to unravel STN.
Bioinformatics 3 V8 – Gene Regulation Fri, Nov 9, 2012.
7. Lecture WS 2007/08Bioinformatics III1 V7 Molecular decomposition of graphs – functional nets Most cellular processes result from a cascade of events.
Bioinformatics 3 – WS 15/16V 6 – V6 – Biological PPI Networks - are they really scale-free? - network growth - functional annotation in the network Mon,
1 Lesson 12 Networks / Systems Biology. 2 Systems biology  Not only understanding components! 1.System structures: the network of gene interactions and.
1 What forces constrain/drive protein evolution? Looking at all coding sequences across multiple genomes can shed considerable light on which forces contribute.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
CSCI2950-C Lecture 12 Networks
V6 – Biological PPI Networks - are they really scale-free
Bioinformatics 3 V6 – Biological Networks are Scale- free, aren't they? Fri, Nov 2, 2012.
Biological networks CS 5263 Bioinformatics.
System Structures Identification
Gene Expression 3B – Gene regulation results in differential gene expression, leading to cell specialization.
Control of Gene Expression
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Schedule for the Afternoon
Bioinformatics 3 V6 – Biological PPI Networks - are they really scale-free? - network growth - functional annotation in the network Fri, Nov 8, 2013.
7.2 Transcription & Gene Expression
SEG5010 Presentation Zhou Lanjun.
Principle of Epistasis Analysis
Presentation transcript:

Bioinformatics 3 V7 – Function Annotation, Gene Regulation Mon., Nov 5, 2012

Bioinformatics 3 – WS 12/13V 7 – 2 Network Meta-Growth Q: When I find a new protein and its (already known) partners in an experiment and I add that to a database, do I get a scale-free network? Which proteins are in databases??? the experimentally accessibly ones!!! → costs for the experiment → experience required for purification, methods, analysis… → existing assays for similar proteins → personal interests (to get funding: preference for {cancer, HIV, Alzheimer…}) Higher probability to find proteins related to known ones growing network with preferential attachment

Bioinformatics 3 – WS 12/13V 7 – 3 What Does a Protein Do? Enzyme Classification scheme (from enzymes.org/) enzymes.org

Bioinformatics 3 – WS 12/13V 7 – 4 MIPS FunCat Classification Browser from

Bioinformatics 3 – WS 12/13V 7 – 5 Digging Deeper

Bioinformatics 3 – WS 12/13V 7 – 6 Details and Proteins

Bioinformatics 3 – WS 12/13V 7 – 7 Un-Classified Proteins? Many unclassified proteins: → estimate: ~1/3 of the yeast proteome not annotated functionally → BioGRID: 4495 proteins in the largest cluster of the yeast physical interaction map have a MIPS functional annotation

Bioinformatics 3 – WS 12/13V 7 – 8 Partition the Graph Large PPI networks were built from: HT experiments (Y2H, TAP, synthetic lethality, coexpression, coregulation, …) predictions (gene profiling, gene neighborhood, phylogenetic profiles, …) → proteins that are functionally linked genome 1 genome 2 genome 3 sp 1 sp 2 sp 3 sp 4 sp 5 Identify unknown functions from clustering of these networks by, e.g.: shared interactions (similar neighborhood → power graphs) membership in a community similarity of shortest path vectors to all other proteins (= similar path into the rest of the network)

Bioinformatics 3 – WS 12/13V 7 – 9 Protein Interactions Nabieva et al used the S. cerevisiae dataset from GRID of 2005 (now BioGRID) → 4495 proteins and physical interactions in the largest cluster

Bioinformatics 3 – WS 12/13V 7 – 10 Function Annotation Task: predict function (= functional annotation) for a protein from the available annotations Similar: How to assign colors to the white nodes? Use information on: distance to colored nodes local connectivity reliability of the links …

Bioinformatics 3 – WS 12/13V 7 – 11 Algorithm I: Majority Schwikowski, Uetz, and Fields, " A network of protein–protein interactions in yeast" Nat. Biotechnol. 18 (2000) 1257 Consider all neighbors and sum up how often a certain annotation occurs → score for an annotation = count among the direct neighbors → take the 3 most frequent functions Majority makes only limited use of the local connectivity → cannot assign function to next-neighbors For weighted graphs: → weighted sum

Bioinformatics 3 – WS 12/13V 7 – 12 Extended Majority: Neighborhood Hishigaki, Nakai, Ono, Tanigami, and Takagi, "Assessment of prediction accuracy of protein function from protein–protein interaction data", Yeast 18 (2001) 523 Look for overrepresented functions within a given radius of 1, 2, or 3 links → use as function score the value of a χ 2 –test Neighborhood does not consider local network topology ? ? Both examples are treated identical with r = 2 Neighborhood can not (easily) be generalized to weighted graphs!

Bioinformatics 3 – WS 12/13V 7 – 13 Minimize Changes: GenMultiCut "Annotate proteins so as to minimize the number of times that different functions are associated with neighboring proteins" Karaoz, Murali, Letovsky, Zheng, Ding, Cantor, and Kasif, "Whole-genome annotation by using evidence integration in functional-linkage networks" PNAS 101 (2004) 2888 → generalization of the multiway k-cut problem for weighted edges, can be stated as an integer linear program (ILP) Multiple possible solutions → scores from frequency of annotations

Bioinformatics 3 – WS 12/13V 7 – 14 Nabieva et al: FunctionalFlow Extend the idea of "guilty by association" → each annotated protein is a source of "function"-flow → simulate for a few time steps → choose the annotation with the highest accumulated flow Each node u has a reservoir R t (u), each edge a capacity constraint (weight) w u,v Initially: Then: downhill flow with capacity contraints Score from accumulated in-flow: and Nabieva et al, Bioinformatics 21 (2005) i302

Bioinformatics 3 – WS 12/13V 7 – 15 An Example accumulated flow thickness = current flow Sometimes different annotations for different number of steps

Bioinformatics 3 – WS 12/13V 7 – 16 Comparison Change score threshold for accepting annotations → ratio TP/FP → FunctionalFlow performs best in the high-confidence region → many false predictions!!! unweighted yeast map Nabieva et al, Bioinformatics 21 (2005) i302 For FunctionalFlow: six propagation steps (diameter of the yeast network ≈ 12)

Bioinformatics 3 – WS 12/13V 7 – 17 Comparison Details Neighborhood with r = 1 or 2 comparable to FunctionalFlow for high-confidence region, performance decreases with increasing r → bad idea to ignore local connectivity Majority vs. r = 1 → counting neighboring annotations is more effective than χ 2 -test Multiple runs (solutions) of FunctionalFlow (with slight random perturbations of the weights) → increases prediction accuracy Nabieva et al, Bioinformatics 21 (2005) i302

Bioinformatics 3 – WS 12/13V 7 – 18 Weighted Graphs Compare: unweighted weight 0.5 per experiment weight for experiments according to (estimated) reliability Largest improvement → individual experimental reliabilities Performance of FunctionalFlow with differently weighted data: Nabieva et al, Bioinformatics 21 (2005) i302

Bioinformatics 3 – WS 12/13V 7 – 19 Additional Information Use genetic linkage to modify the edge weights → better performance (also for Majority and GenMultiCut) Nabieva et al, Bioinformatics 21 (2005) i302 (Note the clever choice of symbols in the plot…)

Bioinformatics 3 – WS 12/13V 7 – 20 Summary: Static PPI-Networks "Proteins are modular machines" How are they related to each other? 1) Understand "Networks" prototypes (ER, SF, …) and their properties (P(k), C(k), clustering, …) 2) Get the data experimental and theoretical techniques (Y2H, TAP, co-regulation, …), quality control and data integration (Bayes) 3) Analyze the data compare P(k), C(k), clusters, … to prototypes → highly modular, clustered with sparse sampling → PPI networks are not scale-free 4) Predict missing information network structure combined from multiple sources → functional annotation Next step: environmental changes, cell cycle → changes (dynamics) in the PPI network – how and why?

Bioinformatics 3 – WS 12/13V 7 – 21 Turn, Turn, Turn… From Lichtenberg et al, Science 307 (2005) 724: → certain proteins only occur during well-defined phases in the cell cycle → how is protein expression regulated?

Bioinformatics 3 – WS 12/13V 7 – 22 External Triggers Re-routing of metabolic fluxes during the diauxic shift in S. cerevisiae → changes in protein abundances (measured via mRNA levels) anaerobic fermentation: fast growth on glucose → ethanol aerobic respiration: ethanol as carbons source Note: "quorum sensing" — different bacteria have different strategies DeRisi et al., Science 278 (1997) 680 Diauxic shift

Bioinformatics 3 – WS 12/13V 7 – 23 Diauxic shift affects hundreds of genes Cy3/Cy5 labels, comparison of 2 probes at 9.5 hours distance; w and w/o glucose Red: genes induced by diauxic shift (710 genes 2-fold) Green: genes repressed by diauxic shift (1030 genes 2-fold DeRisi et al., Science 278 (1997) 680 Optical density (OD) illustrates cell growth;

Bioinformatics 3 – WS 12/13V 7 – 24 Flux Re-Routing during diauxic shift: expression increases expression unchanged expression diminishes DeRisi et al., Science 278 (1997) 680 fold change metabolic flux increases → how are these changes coordinated?

Bioinformatics 3 – WS 12/13V 7 – 25 Gene Expression Sequence of processes: from DNA to functional proteins DNAmRNA inactive mRNA protein inactive protein transcription RNA processing: capping, splicing transport translation post- translational modifications degradation nucleuscytosol → regulation at every step!!! most prominent: activation or repression of the transcription initiation transcribed RNA

Bioinformatics 3 – WS 12/13V 7 – 26 Transcription Initiation In eukaryotes: several general transcription factors have to bind specific enhancers or repressors may bind then the RNA polymerase binds and starts transcription Alberts et al. "Molekularbiologie der Zelle", 4. Aufl.

Bioinformatics 3 – WS 12/13V 7 – 27 Layers upon Layers Biological regulation via proteins and metabolites Projected regulatory network Remember: genes do not interact directly

Bioinformatics 3 – WS 12/13V 7 – 28 Conventions for GRN Graphs Gene regulation networks have "cause and action" → directed networks A gene can enhance or suppress the expression of another gene → two types of arrows activation self- repression Nodes: genes that code for proteins which catalyze products … → everything projected onto respective gene

Bioinformatics 3 – WS 12/13V 7 – 29 Luminescence in V. fischeri Miller, Bassler, 2001 V. fischeri lives in symbiotic association with eukaryotic hosts. Function: generate light Squid: camouflage against moon light Fish Monocentris japonicus: attracts a mate by light seduction

Bioinformatics 3 – WS 12/13V 7 – 30 The Complete Picture? Sketch from Miller & Bassler used to explain the mechanism: What is missing? emitted light signal degradation of AI and proteins threshold details of the "reactions"

Bioinformatics 3 – WS 12/13V 7 – 31 A Slightly More Complete Picture Add luminescenceLuxA + LuxB → luciferase → light Beware: the picture contains different reaction mechanisms (various associations, transcription + translation, diffusion, …)

Bioinformatics 3 – WS 12/13V 7 – 32 Still not the Complete Picture Add degradation and oligomerisation: Modeling problem in biology: → convert hand-waving verbal descriptions into consistent models Still missing here: dimerization of LuxR:AI the Ain-pathway

Bioinformatics 3 – WS 12/13V 7 – 33 And There is One More Detail… two auto-inducers: 3OC6HSL and C8HSL two genes (ain and lux) Nadine Schaadt, BSc. thesis → which of all these reactions are important for the dynamic behavior of the system? → systemic model? → interactions with cellular environment → predictions? → is everything known? But: the model is still incomplete

Bioinformatics 3 – WS 12/13V 7 – 34 E. coli Regulatory Network BMC Bioinformatics 5 (2004) 199

Bioinformatics 3 – WS 12/13V 7 – 35 Hierarchies Network from standard layout algorithm Network with all regulatory edges pointing downwards → → a few global regulators () control all the details Largest WCC: 325 operons (3/4 of the complete network) WCC = weakly connected component (ignore directions of regulation) Ma et al., BMC Bioinformatics 5 (2004) 199 Lowest level: operons that code for TFs with only auto- regulation, or no TFs Next layer: delete nodes of lower layer, identify TFs that do not regulate other operons in this layer (only lower layers) Continue …

Bioinformatics 3 – WS 12/13V 7 – 36 Global Regulators in E. coli Ma et al., BMC Bioinformatics 5 (2004) 199

Bioinformatics 3 – WS 12/13V 7 – 37 Modules Remove top three layers and determine WCCs → just a few modules Ma et al., BMC Bioinformatics 5 (2004) 199

Bioinformatics 3 – WS 12/13V 7 – 38 Putting it back together Ma et al., BMC Bioinformatics 5 (2004) 199 The ten global regulators are at the core of the network, some hierarchies exist between the modules

Bioinformatics 3 – WS 12/13V 7 – 39 Naming a few! Ma et al., BMC Bioinformatics 5 (2004) 199

Bioinformatics 3 – WS 12/13V 7 – 40 Summary Static PPI networks: → topology, measures, data sources, … Changes during cell cycle, adaptation to environmental changes, … → Gene Regulation → many biological steps → often modeled on the gene level only Next lecture: Regulatory motifs → static and dynamic behavior