Schedule for the Afternoon 13:00 – 13:30 ChIP-chip lecture 13:30 – 14:30 Exercise 14:30 – 14:45 Break 14:45 – 15:15 Regulatory pathways lecture 15:15 – 15:45 Exercise (complete previous exercises) 15:45 – 16:00 Wrap up
and the “Active Modules” approach Microarrays for transcription factor binding location analysis (chIP-chip) and the “Active Modules” approach
Protein-DNA interactions: ChIP-chip Lee et al., Science 2002 Simon et al., Cell 2001
ChIP-chip Microarray Data Differentially represented intergenic regions provides evidence for protein-DNA interaction
Network representation of TF-DNA interactions
Dynamic role of transcription factors Harbison C, Gordon B, et al. Nature 2004
Mapping transcription factor binding sites Harbison C, Gordon B, et al. Nature 2004
Affymetrix tiling arrays
ChIP-Seq with Illumina (Solexa) Genome Analyzer
Integrating gene Expression Data with Interaction Networks
Data Integration Need computational tools able to distill pathways of interest from large molecular interaction databases
List of Genes Implicated in an Experiment Jelinsky S & Samson LD, Proc. Natl. Acad. Sci. USA Vol. 96, pp. 1486–1491,1999 How do we interpret these results?
KEGG http://www.genome.jp/kegg/
Activated Metabolic Pathways
Types of Information to Integrate Data that determine the network (nodes and edges) protein-protein protein-DNA, etc… Data that determine the state of the system mRNA expression data Protein modifications Protein levels Growth phenotype Dynamics over time
Network Perturbations Environmental: Growth conditions Drugs Toxins Genetic: Gene knockouts Mutations Disease states
Finding Activated Sub-graphs Active Modules
Finding Activated Modules/Pathways in a Large Network is Hard Finding the highest scoring sub-network is NP hard, so we use heuristic search algorithms to identify a collection of high-scoring sub-networks (local optima) Simulated annealing and/or greedy search starting from an initial sub-network “seed” Considerations: Local topology, sub-network score significance (is score higher than would be expected at random?), multiple states (conditions) So now that we have a scoring system, we can turn to the problem of finding the high-scoring pathways themselves.
Activated Sub-graphs Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovering regulatory and signaling circuits in molecular interaction networks. Bioinformatics. 2002;18 Suppl 1:S233-40.
Scoring a Sub-graph Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovering regulatory and signaling circuits in molecular interaction networks. Bioinformatics. 2002;18 Suppl 1:S233-40.
Significance Assessment of Active Module Score distributions for the 1st - 5th best scoring modules before (blue) and after (red) randomizing Z-scores (“states”). Randomization disrupts correlation between gene expression and network location. Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics. 2002;18 Suppl 1:S233-40.
Network Regions of Differential Expression After Gene Deletions Ideker, Ozier, Schwikowski, Siegel. Bioinformatics (2002)
Network based classifier of cancer