T Monday, June 15, 2015Monday, June 15, 2015Monday, June 15, 2015Monday, June 15, 2015.

Slides:



Advertisements
Similar presentations
Zoran Majkic Integration of Ontological Bayesian Logic Programs
Advertisements

Bayesian mixture models for analysing gene expression data Natalia Bochkina In collaboration with Alex Lewin, Sylvia Richardson, BAIR Consortium Imperial.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Network integration and function prediction: Putting it all together Slides courtesy of Curtis Huttenhower Harvard School of Public Health Department.
Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Social networks, in the form of bibliographies and citations, have long been an integral part of the scientific process. We examine how to leverage the.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Discovery Challenge Gene expression datasets On behalf of Olivier Gandrillon.
Putting genetic interactions in context through a global modular decomposition Jamal.
BIAS AND CONFOUNDING Nigel Paneth. HYPOTHESIS FORMULATION AND ERRORS IN RESEARCH All analytic studies must begin with a clearly formulated hypothesis.
Pathways & Networks analysis COST Functional Modeling Workshop April, Helsinki.
Andrey Alexeyenko M edical E pidemiology and B iostatistics Network biology and cancer data integration.
Expression signatures as biomarkers: solving combinatorial problems with gene networks Andrey Alexeyenko Department of Medical Epidemiology and Biostatistics,
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
7 June 06 1 UW Point Source Detection and Localization: Compare with DC2 truth Toby Burnett University of Washington.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Power. The Four Components to a Statistical Conclusion The number of units (e.g., people) accessible to study The salience of the program relative to.
Systems biology in cancer research. What is systems biology? = Molecular physiology? “… physiology is the science of the mechanical, physical, and biochemical.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Andrey Alexeyenko M edical E pidemiology and B iostatistics Gene network approach in epidemiology.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Using DNA Subway in the Classroom Red Line Lesson Sketch.
JM - 1 Introduction to Bioinformatics: Lecture VIII Classification and Supervised Learning Jarek Meller Jarek Meller Division.
Using DNA Subway in the Classroom Red Line Lesson Sketch.
Top 10 Instructional Strategies
Testing Theories: Three Reasons Why Data Might not Match the Theory Psych 437.
Networks and Interactions Boo Virk v1.0.
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Network & Systems Modeling 29 June 2009 NCSU GO Workshop.
Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis.
 Building Networks. First Decisions  What do the nodes represent?  What do the edges represent?  Know this before doing anything with data!
Bioinformatics lectures at Rice University Li Zhang Lecture 9: Networks and integrative genomic analysis
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Multiple Testing Matthew Kowgier. Multiple Testing In statistics, the multiple comparisons/testing problem occurs when one considers a set of statistical.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Pennsylvania Standard J. Geometry Standard
META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Prediction statistics Prediction generally True and false, positives and negatives Quality of a prediction Usefulness of a prediction Prediction goes Bayesian.
Statistical Testing with Genes Saurabh Sinha CS 466.
BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
By: Amira Djebbari and John Quackenbush BMC Systems Biology 2008, 2: 57 Presented by: Garron Wright April 20, 2009 CSCE 582.
Bioinformatics lectures at Rice University Li Zhang Lecture 11: Networks and integrative genomic analysis-3 Genomic data
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Dependency networks Sushmita Roy BMI/CS 576 Nov 25 th, 2014.
Introduction to biological molecular networks
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Exposure Assessment for Health Effect Studies: Insights from Air Pollution Epidemiology Lianne Sheppard University of Washington Special thanks to Sun-Young.
Shankar Subramaniam University of California at San Diego Data to Biology.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
Ch 8 Estimating with Confidence 8.1: Confidence Intervals.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
Integers on a Number Line Grade 6 Standard 1.C.1.a.
Effect of Alcohol on Brain Development NormalFetal Alcohol Syndrome.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Biological data representation and data mining Xin Chen
David Amar, Tom Hait, and Ron Shamir
BIAS AND CONFOUNDING Nigel Paneth.
a Cytoscape plugin to assess enrichment of
Statistical Testing with Genes
BIAS AND CONFOUNDING
Statistical Testing with Genes
LOOKING FOR FOUNDATIONS
Presentation transcript:

T Monday, June 15, 2015Monday, June 15, 2015Monday, June 15, 2015Monday, June 15, 2015

From high-throughput data to network biology: gain in statistical power and biological relevance Stockholm Bioinformatics Centre Andrey Alexeyenko

PLoS Med (8):e124

Why Most Published Research Findings Are False “Positive facts”: the discoveries we are after, e.g. genomic associations, differentially expressed genes, relations “phenotype disease” etc. Statistical model: no positive facts, and an allowed rate of Type I error True negatives False positives Positive factsTrue positives Biological reality: negative facts are the vast majority, positive facts are yet to be discovered Negative facts

Network is just a graph! The fact that I can draw a network does not yet make it a biological reality!..

Conversion “data pieces  confidence” in a Bayesian framework

A

Enrichment of functional groups Enrichment analysis in the networks turns to be more powerful than on gene lists

Enrichment of functional groups

Partial correlations

r PLC = 0.88 r PLC = 0.95 r PLC = 0.76

Benjamini-Hochberg correction

Quantitative modeling of multi-component system with mutually dependent elements

Why going “list  network” is an advancement? Functional context “Anchoring”, i.e. interdependence Biological interpretability Statistical features Data integration Many of those can be applied to the lists as well, but mind the flexibility!

Ways to augment confidence Trivial: 1) increase power 2) decrease false prediction rate Data integration –Evaluation prior to integration! Consider biological context Remove spurious edges Generalize to a higher level of organization

Ways to evaluate confidence Supervised learning Balance comprehensiveness and complexity (s.c. information criteria) Benjamini-Hochberg Show it a biologist Go out to the real world and test

Ways to employ confidence Initialize network Add node and edge attributes to the network Filter network elements for higher relevance Build more complex models accounting for confidence