Download presentation
Presentation is loading. Please wait.
1
January, 2009 Jaime Carbonell et al Carnegie Mellon University jgc@cs.cmu.edu Data-Intensive Scalability in Machine Learning and Computational Proteomics
2
January, 2009© 2009, Jaime G. Carbonell2 Active and Proactive Learning Training data: Objective: learn decision function with minimal training (sampling) Functional space: Fitness Criterion: a.k.a. loss function Sampling Strategy:
3
January, 2009© 2009, Jaime G. Carbonell3 Computational Challenge True decision F’s are in non-linear high-D manifolds. Only simplified functional forms (e.g. d-trees, hyperplanes) can be tractably explored today Require global optimization and shared model 3-5 order of magnitude beyond current workstations Non-Euclidian manifolds Optimal cost-sensitive sampling requires full model sharing (clouds are not the best computational model)
4
January, 2009© 2009, Jaime G. Carbonell4 Predicting Quaternary Protein Folds by Structural Homology & First Principles Triple beta-spirals [van Raaij et al. Nature 1999] Virus fibers in adenovirus, reovirus and PRD1 Double barrel trimer [ Benson et al, 2004] Coat protein of adenovirus, PRD1, STIV, PBCV
5
January, 2009© 2009, Jaime G. Carbonell5 Linked Segmentation Conditional Random Fields [Liu & Carbonell] Goal: Predict how protein complex will fold Nodes: Secondary protein structures and/or simple folds Edges: Local interactions and long-range inter-chain and intra-chain interactions L-SCRF: conditional probability of y given x is defined as Joint Labels
6
January, 2009© 2009, Jaime G. Carbonell6 Classification: Training : learn the model parameters λ Minimizing regularized negative log loss Iterative search algorithms by seeking the direction whose empirical values agree with the expectation Complex graphs results in huge computational complexity Ideal case: Co-train a multiverse of models Exploit large common substructures Immediately propagate constrains among variants Requires complex computation on co-resident models Computational Challenges
7
January, 2009© 2009, Jaime G. Carbonell7 Human-PPI (Revise 08) HIV-Human PPI (Revise) Learning Protein Interaction Networks Intra- and Inter-Organism [Qi, Klein- Seetharaman, Tastan, Carbonell] Pairwise Interactions Pathway Function Implication Func ? Func A Protein Complex PSB 05 PROTEINS 06 BMC Bioinfo 07 CCR 08 ISMB 08 (Preparation) Genome Biology 08 PPI Network Domain/Motif Interactions
8
January, 2009© 2009, Jaime G. Carbonell8 HIV-Human Protein Interactions HIV-1 depends on the cellular machinery in every aspect of its life cycle. Fusion Reverse transcription Maturation Budding Transcription Peterlin and Torono, Nature Rev Immu 2003.
9
January, 2009© 2009, Jaime G. Carbonell9 Computational Challenges in Inducing the Interactome Degree distribution / hub analysis / pair-wise coupling checking Graph modules analysis (from bi-clustering study) Protein-family based graph patterns (receptors / subclasses / ligands) ) 9 O(10 6 ) different proteins O(10 4 ) largest network induced to date at right Want to Learn interactions from induced structural fold models (previous slides) Requires O(10 (2+3) ) memory and computation [100X for full interactome, 1000X for high- fidelity model]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.