Presentation is loading. Please wait.

Presentation is loading. Please wait.

January, 2009 Jaime Carbonell et al Carnegie Mellon University Data-Intensive Scalability in Machine Learning and Computational Proteomics.

Similar presentations


Presentation on theme: "January, 2009 Jaime Carbonell et al Carnegie Mellon University Data-Intensive Scalability in Machine Learning and Computational Proteomics."— Presentation transcript:

1 January, 2009 Jaime Carbonell et al Carnegie Mellon University jgc@cs.cmu.edu Data-Intensive Scalability in Machine Learning and Computational Proteomics

2 January, 2009© 2009, Jaime G. Carbonell2 Active and Proactive Learning  Training data: Objective: learn decision function with minimal training (sampling) Functional space:  Fitness Criterion: a.k.a. loss function  Sampling Strategy:

3 January, 2009© 2009, Jaime G. Carbonell3 Computational Challenge  True decision F’s are in non-linear high-D manifolds. Only simplified functional forms (e.g. d-trees, hyperplanes) can be tractably explored today Require global optimization and shared model 3-5 order of magnitude beyond current workstations Non-Euclidian manifolds  Optimal cost-sensitive sampling requires full model sharing (clouds are not the best computational model)

4 January, 2009© 2009, Jaime G. Carbonell4 Predicting Quaternary Protein Folds by Structural Homology & First Principles  Triple beta-spirals [van Raaij et al. Nature 1999] Virus fibers in adenovirus, reovirus and PRD1  Double barrel trimer [ Benson et al, 2004] Coat protein of adenovirus, PRD1, STIV, PBCV

5 January, 2009© 2009, Jaime G. Carbonell5 Linked Segmentation Conditional Random Fields [Liu & Carbonell]  Goal: Predict how protein complex will fold  Nodes: Secondary protein structures and/or simple folds  Edges: Local interactions and long-range inter-chain and intra-chain interactions  L-SCRF: conditional probability of y given x is defined as Joint Labels

6 January, 2009© 2009, Jaime G. Carbonell6  Classification:  Training : learn the model parameters λ Minimizing regularized negative log loss Iterative search algorithms by seeking the direction whose empirical values agree with the expectation Complex graphs results in huge computational complexity  Ideal case: Co-train a multiverse of models Exploit large common substructures Immediately propagate constrains among variants Requires complex computation on co-resident models Computational Challenges

7 January, 2009© 2009, Jaime G. Carbonell7 Human-PPI (Revise 08) HIV-Human PPI (Revise) Learning Protein Interaction Networks Intra- and Inter-Organism [Qi, Klein- Seetharaman, Tastan, Carbonell] Pairwise Interactions Pathway Function Implication Func ? Func A Protein Complex PSB 05 PROTEINS 06 BMC Bioinfo 07 CCR 08 ISMB 08 (Preparation) Genome Biology 08 PPI Network Domain/Motif Interactions

8 January, 2009© 2009, Jaime G. Carbonell8 HIV-Human Protein Interactions HIV-1 depends on the cellular machinery in every aspect of its life cycle. Fusion Reverse transcription Maturation Budding Transcription Peterlin and Torono, Nature Rev Immu 2003.

9 January, 2009© 2009, Jaime G. Carbonell9 Computational Challenges in Inducing the Interactome  Degree distribution / hub analysis / pair-wise coupling checking  Graph modules analysis (from bi-clustering study)  Protein-family based graph patterns (receptors / subclasses / ligands) ) 9 O(10 6 ) different proteins O(10 4 ) largest network induced to date at right  Want to Learn interactions from induced structural fold models (previous slides) Requires O(10 (2+3) ) memory and computation [100X for full interactome, 1000X for high- fidelity model]


Download ppt "January, 2009 Jaime Carbonell et al Carnegie Mellon University Data-Intensive Scalability in Machine Learning and Computational Proteomics."

Similar presentations


Ads by Google