Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

A Brief Introduction to the Draft English Language Arts 8-12 IRP
Your performance improvement partner 2/25/
Part- I {Conic Sections}
ECE555 Lecture 8/9 Nam Sung Kim University of Wisconsin – Madison
The following 5 questions are about VOLTAGE DIVIDERS. You have 20 seconds for each question What is the voltage at the point X ? A9v B5v C0v D10v Question.
2 nd Semester Final Review IX Practice and MiniQuiz.
Inside the binary adder. Electro-mechanical relay A solid state relay is a switch that is controlled by a current. When current flows from A to B, the.
The Science of Biology The study of living things.
Questionnaire on Water Consumption in Sweden P1 From what source do you consume water from most frequently ? P2 Is your home connected to the public system.
Questionnaire on Water Consumption in Sweden P1 From what source do you consume water from most frequently ? P2 Is your home connected to the public system.
Managerial Accounting
John J. Wild Sixth Edition
John J. Wild Sixth Edition
1 On the Long-Run Behavior of Equation-Based Rate Control Milan Vojnović and Jean-Yves Le Boudec ACM SIGCOMM 2002, Pittsburgh, PA, August 19-23, 2002.
Virtual Reality Unit 5 Reading 1st period Unit 5 Reading 1st period.
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Computational Assistance for Systems Biology of Aging Thanks to.
M & M’s Counting Activity
P1 RJM 16/10/02EG1C2 Engineering Maths: Matrix Algebra Tutorial 1 A mass (weight 20N) is suspended by two wires as shown in the figure: relevant distances.
Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures Amir Hormati1, Yoonseo Choi1, Manjunath Kudlur3, Rodric Rabbah2,
Clustering II.
Microsoft Office Overview Microsoft Word b Word Processor Create letters, reports and forms.
1 © 2003 Philips Electronics BV, Rob van Ommering, FMCO 2003, November 7 th, 2003 ThreadsDarwin HorCom Q & ¬ AKoalaTeddy Rob van Ommering Philips Research.
Degree Distribution of XORed Fountain codes
1 Toward a Modeling Theory for Predictable Complex Software Designs by Levent Yilmaz Auburn Modeling and Simulation Laboratory Department of Computer Science.
DEPARTMENT OF EDUCATION AND TRAINING The Intent of the Australian Curriculum KATHERINE REGION 2012.
Splines IV – B-spline Curves
Exposure-AE-Dropout Analysis in Patients treated with pregabalin. Raymond Miller Pfizer Global Research and Development.
10.1 Behavior Cell-network Protein Gene G2G2 G3G3 G1G1 P1P1 Environment (stimuli, nutrients, temperature, etc.) G4G4 P2P2 P3P3 P4P4 C2C2 C3C3 C4C4 B2B2.
GCSE Sawston VC Gary Whitton – Head of Science.
Distributed Computing 9. Sorting - a lower bound on bit complexity Shmuel Zaks ©
( ( ) quantum bits conventional bit
22C:19 Discrete Math Relations Fall 2010 Sukumar Ghosh.
A. S. Morse Yale University University of Minnesota June 4, 2014 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A.
Modelling and Identification of dynamical gene interactions Ronald Westra, Ralf Peeters Systems Theory Group Department of Mathematics Maastricht University.
Sequential Sampling Designs for Small-Scale Protein Interaction Experiments Denise Scholtens, Ph.D. Associate Professor, Northwestern University, Chicago.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Introduction to Graphs
Evidence for dynamically organized modularity in the yeast protein- protein interaction network Han, et al
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
6/26/2015 Function Prediction of Protein Complexes with Domain Correlation Ya Zhang Xue-Wen Chen University of Kansas Introduction  Protein complexes:
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
1 Visual Analysis of Large Heterogeneous Social Networks by Semantic and Structural Abstraction Zequian shen, Kwan-Liu Ma, Tina Eliassi-Rad Department.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
1Managed by UT-Battelle for the Department of Energy Molecular signatures characterize plant response to single and multiple environmental stresses Contact:
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
Reconstructing gene networks Analysing the properties of gene networks Gene Networks Using gene expression data to reconstruct gene networks.
Proteome and interactome Bioinformatics.
Abstract:  Estimated percentages of cellulose, hemi-cellulose, lignin, and other minor proximate components in biomass materials.  Analyzed by elemental.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
TAP(Tandem Affinity Purification) Billy Baader Genetics 677.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Colour and Texture. Extract 3-D information Using Vision Extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition.
How many interactions are there? ~6,200 genes ~6,200 proteins x 2-10 interactions/protein ~12, ,000 interactions Yeast.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Worksheet Answers Matrix worksheet And Matrices Review.
Class 2: Graph Theory IST402.
The Genomics: GTL Program Environmental Remediation Sciences Program Spring Workshop April 3, 2006.
Observation vs. Inferences The Local Environment.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
Functional organization of the yeast proteome by systematic analysis of protein complexes Presented by Nathalie Kirshman and Xinyi Ma.
Building and Analyzing Genome-Wide Gene Disruption Networks
Overview Gene Ontology Introduction Biological network data
Hypothetical example of co‐complex interactions being scored by weighted matrix model Hypothetical example of co‐complex interactions being scored by weighted.
SVD, PCA, AND THE NFL By: Andrew Zachary.
Lynn Petukhova, Angela M. Christiano 
Inferring Cellular Processes from Coexpressing Genes
Interactome Networks and Human Disease
Germline variants influencing primary tumor type.
Presentation transcript:

Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing, and Visualization for Graphs Stanford University August 1-2, 2003

Graphic from: U.S. Department of Energy Human Genome Program

High-throughput Protein Complex Identification Gavin, et al. (Nature, 2002) –TAP : Tandem Affinity Purification Ho, et al. (Nature, 2002) –HMS-PCI: High-throughput Mass Spectromic Protein Complex Identification

Protein Complex Identification Using TAP Data Spoke Model Matrix Model Bader, et al. (Nature Biotechnology, 2002)

Protein-Complex Affiliation Network Incidence Matrix C1C1 C2C2 C3C3 C4C4 C5C5 …CmCm P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 … PnPn … … … … … … … … … …0 A =

Cohesive vs. Dynamic Protein Complexes Cohesive Complex: a complex of invariable composition whose proteins are associated only with that complex and its particular function

Cohesive Complex Affiliation Network Incidence Matrix C1C1 Bait Hit 1 Hit 2 Hit 3 Hit 4 Hit A =

Cohesive vs. Dynamic Protein Complexes Dynamic Complex: complex composed of proteins that may also be involved in other complexes

Dynamic Complex Affiliation Network Incidence Matrices A = C1C1 C2C2 C3C3 C4C4 C5C5 Bait11111 Hit Hit Hit Hit Hit C1C1 C2C2 Bait11 Hit 110 Hit 201 Hit 310 Hit 401 Hit 510 A = C1C1 C2C2 Bait11 Hit 111 Hit 211 Hit 301 Hit 401 Hit 501 A =

All 5 complexes above would yield the same TAP Data:

Statistical Inference Problem What is A? A captures the cohesive/dynamic distinction. At best, we observe all but the main diagonal of X=AA. Current analyses focus on X, not on A.

Protein Complex Data as a Directed Graph ?

Cohesive Complex described in Gavin, et al.

Dynamic Complex described in Gavin, et al.

Largest Connected Component in Gavin, et al. using Bait Proteins Only, Colored by Outdegree

Gavin DataHo Data

SubGraph of Bait Proteins from Previous Graphs with Outdegree 7 Gavin DataHo Data

Examples of Distinct Complexes Identified by Gavin, et al.

Back to Affiliation Networks C1 B11 B21 B31 A = B1B2B3 B1111 B2111 B3111 X=AA = One Three-Way Conversation

Affiliation Networks C1C2C3 B1110 B2101 B3011 A = B1B2B3 B1211 B2121 B3112 X=AA = Three Two-Way Conversations

Statistical Inference Problem Which A is correct? –A uniquely defines X, but X does not uniquely define the observable part of A. Extra information and directed graph model for the TAP data –Cellular Component Data –Gene Expression Data –Hit Data

Possible Use of Hit Data to Help Estimate A

Conclusions In the protein complex setting, directed graphs are useful for EDA, as well as framing the correct questions for statistical inference. Statistical inference problem for cohesive and dynamic protein complex identification should focus on A, not X. Digraph model of the TAP data better reflects what we actually observe, and is informative for estimating A.