Download presentation
Presentation is loading. Please wait.
1
Building biological networks from diverse genomic data Chad Myers Department of Computer Science, Lewis-Sigler Institute for Integrative Genomics Princeton University PRIME Workshop on Pathway Databases and Modeling Tools June 16, 2006
2
2 Motivation: building biological networks from experimental data Explosion of functional genomic DATA KNOWLEDGE of components and inter-relationships that lead to function ? Find missing pathway components Detect uncharacterized crosstalk between pathways Discover novel pathways
3
3 Motivation: building biological networks from experimental data noisy How can we harness this information without sacrificing precision?
4
4 Directed network discovery: involving the biologist in the search process Previous approaches to network analysis from genomic data: largely undirected global approaches that detect interesting network features Incorporating expert direction can: Improve sensitivity and precision by using context information Focus on relevant information for biologist user (allows interactivity) Two-hybrid interaction network, yeast (SH3 domain) Boone lab Previous work: Bader et al. (2003), Asthana et al. (2004) Yamanashi et al. (2004,2005), Kato et al. (2005)
5
5 bioPIXIE system overview bioPIXIE: Pathway Inference from eXperimental Interaction Evidence
6
6 Overview How do we integrate heterogeneous evidence? Expert-driven network discovery Making it usable: practical visualization and other interface considerations Does it work? (evaluation experiments and biological validation) Challenges/opportunities and future work
7
7 Heterogeneous data integration Diverse forms of data: what’s a unifying framework? Variable coverage, reliability, and relevance Integration scheme should utilize information in data when available, but be robust when missing physical binding genetic interaction cellular localization expression sequence (TF motifs, coding,…) Bayes net Map to associations of genes/proteins
8
8 Bayes net for evidence integration Functional Relationship Microarray correlation Shared transcription factors Purified complex Affinity precipitation 2 Hybrid Synthetic lethality Synthetic rescue Co- localization We infer: Input evidence: grouped by lab (source) and by type Structure: Naïve Bayes (~60 nodes) (also tried TAN) CPT’s: learned from GO gold standard Fully-connected, weighted graph of proteins …
9
9 Overview How do we integrate heterogeneous evidence? Expert-driven network discovery Making it usable: practical visualization and other interface considerations Does it work? (evaluation experiments and biological validation) Challenges/opportunities and future work
10
10 Expert-driven network discovery Local search in the PPI network centered at the query Which proteins should we extract as a single, functionally coherent group? Should consider: confidence in links and topology surrounding query group
11
11 Extracting relevant proteins Basic idea: compute expected linkage to query set e ij = P ( protein i is functionally related to protein j | evidence) X ij : binary RV with prob. e ij S Q ( p i ): # of links from protein i to query set, Q Find proteins that maximize: What about indirect links to the query set?
12
12 Graph search: handling indirect links Solution: iterative expanding search where indirect links to the query through high confidence neighbors are counted
13
13 Overview How do we integrate heterogeneous evidence? Expert-driven network discovery Making it usable: practical visualization and other interface considerations Does it work? (evaluation experiments and biological validation) Challenges/opportunities and future work
14
14 Making bioPIXIE usable Guiding principles: Accessibility (users can access most recent data with little effort) Simplicity vs. flexibility Drill-down (details, e.g. supporting exp. data, hidden until requested) Browseable
15
15 Graph visualization
16
16 Overview How do we integrate heterogeneous evidence? Expert-driven network discovery Making it usable: practical visualization and other interface considerations Does it work? (evaluation experiments and biological validation) Challenges/opportunities and future work
17
17 Evaluation experiments Recovering known network components: How much does integration help? Results averaged over 31 pathways, processes, and complexes (KEGG, GO, MIPS) 10 random proteins as query set and try to recover remaining members
18
18 Evaluation experiments (2) Recovering known network components: Do naïve methods of integration/search work just as well? Results averaged over 31 pathways, processes, and complexes (KEGG, GO, MIPS) 10 random proteins as query set and try to recover remaining members
19
19 Biological validation: finding new components S. cerevisiae uncharacterized gene, YPL077C Predicted involvement in chromosome segregation Using bioPIXIE to characterize unknown genes
20
20 Biological validation: finding new components P-value based on blind counting: 1.98x10 -7, Fisher’s exact test
21
21 (Helmut Pospiech) Biological validation: novel links between pathways DNA replication initiation: Cdc7: “switch” that starts replication (activated by Dbf4) Linked to Hsp90 complex by our method Hsp90 (yeast- hsc82,hsp82): Cytosolic molecular chaperone that participates in the folding of several signaling kinases and hormone receptors
22
22 Genetic analysis of DNA replication-Hsp90 link 10 5 cells wt dbf4Δ hsp82Δ dbf4Δhsp82Δ wt dbf4Δ hsc82Δ dbf4Δhsc82Δ wt dbf4Δ cpr7Δ dbf4Δcpr7Δ RT 30°C 37°C YKO Dbf4 vs. hsp82, hsc82 and co-chaperones: cpr7, sti1, cdc37
23
23 Overview How do we integrate heterogeneous evidence? Expert-driven network discovery Making it usable: practical visualization and other interface considerations Does it work? (evaluation experiments and biological validation) Challenges/opportunities and future work
24
24 Practical challenges/opportunities Visualizing complex networks of interactions in a meaningful way how does it scale with added data? easy user navigation around the network Data-centric vs. established knowledge views How do we overlay current knowledge of pathways with predictions derived from experimental data?
25
25 Future work An observation: The more specific we can be about the end goal, the better the accuracy of our prediction
26
26 Future work Exploiting relevance and reliability variation: context- specific integration
27
27 Summary bioPIXIE can facilitate precise network discovery from experimental data using: Bayesian data integration Expert-directed search Web-based dynamic interface bioPIXIE is an effective tool for browsing genomic evidence and generating specific, testable hypotheses http://pixie.princeton.edu
28
28 Acknowledgements http://pixie.princeton.edu Olga Troyanskaya Drew Robson Adam Wible Kara Dolinski Camelia Chiriac Matt Hibbs Curtis Huttenhower David Botstein Lab Leonid Kruglyak Lab Thank you!
29
29 Evaluation experiments (3): what about noise in the query set? AUPRC # of random proteins out of 20 total query proteins
30
30 Evaluation experiments (4) Comparing with existing approaches SEEDY: proteins ranked by max. direct connection to query Comple xpande r:
31
31 30°C 37°C HU 0 mM HU 50 mMHU 100 mM wt cpr7Δ sti1Δ dbf4Δ hsp82Δ hsc82Δ dbf4Δhsc82Δ dbf4Δsti1Δ dbf4Δcpr7Δ dbf4Δhsp82Δ wt cpr7Δ sti1Δ dbf4Δcpr7Δ wt cpr7Δ sti1Δ dbf4Δcpr7Δ hsp82Δ hsc82Δ dbf4Δ dbf4Δhsp82Δ dbf4Δhsc82Δ dbf4Δsti1Δ Hydroxyurea sensitivity (replication inhibitor) 10 6 cells
32
32 Is this interaction specific to DNA replication? 37°C wt cpr7Δ sti1Δ dbf4Δ hsp82Δ hsc82Δ dbf4Δhsc82Δ dbf4Δsti1Δ dbf4Δcpr7Δ dbf4Δhsp82Δ wt cpr7Δ sti1Δ dbf4Δcpr7Δ wt cpr7Δ sti1Δ dbf4Δcpr7Δ hsp82Δ hsc82Δ dbf4Δ dbf4Δhsp82Δ dbf4Δhsc82Δ dbf4Δsti1Δ 10 6 cells MMS treatment has no apparent effect at RT, 30°C or 37°C (shown) MMS sensitivity (induces DNA damage) Conclusions: Hsp90 complex plays specific role in DNA replication Hsc82 and hsp82 do not have identical function Possible new link between signaling cascades, stress, and DNA replication Our system generates specific, testable hypotheses
33
33
34
34
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.