Dependency networks Sushmita Roy BMI/CS 576 Nov 26 th, 2013.

Slides:

Advertisements

Similar presentations

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Advertisements

CGeMM – University of Louisville Mining gene-gene interactions from microarray data - Coefficient of Determination Marcel Brun – CGeMM - UofL.

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.

Ch11 Curve Fitting Dr. Deshi Ye

Simple Linear Regression

Decision Tree under MapReduce Week 14 Part II. Decision Tree.

GENIE – GEne Network Inference with Ensemble of trees Van Anh Huynh-Thu Department of Electrical Engineering and Computer Science, Systems and Modeling,

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

A new crossover technique in Genetic Programming Janet Clegg Intelligent Systems Group Electronics Department.

Ensemble Learning: An Introduction

End of Chapter 8 Neil Weisenfeld March 28, 2005.

Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.

Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.

Modeling Gene Interactions in Disease CS 686 Bioinformatics.

Probabilistic Graphical models for molecular networks Sushmita Roy BMI/CS 576 Nov 11 th, 2014.

Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.

© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.

1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.

Statistics 350 Lecture 17. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.

Classification and Prediction: Regression Analysis

Ensemble Learning (2), Tree and Forest

Regression and Correlation Methods Judy Zhong Ph.D.

Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.

Chapter 10. Sampling Strategy for Building Decision Trees from Very Large Databases Comprising Many Continuous Attributes Jean-Hugues Chauchat and Ricco.

Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.

Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.

Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.

Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.

Class review Sushmita Roy BMI/CS 576 Dec 11 th, 2014.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression Regression Trees.

Computational methods to inferring cellular networks Stat 877 Apr 15 th 2014 Sushmita Roy.

Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.

Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.

Dependency networks Sushmita Roy BMI/CS 576 Nov 25 th, 2014.

Introduction to biological molecular networks

Flat clustering approaches

Curve Fitting Pertemuan 10 Matakuliah: S0262-Analisis Numerik Tahun: 2010.

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Module Networks BMI/CS 576 Mark Craven December 2007.

Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

NTU & MSRA Ming-Feng Tsai

Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,

Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.

Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.

CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.

Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.

Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.

Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.

Chapter 7. Classification and Prediction

Dependency Networks: GENIE3

Introduction to Machine Learning and Tree Based Methods

Multi-task learning approaches to modeling context-specific networks

Trees, bagging, boosting, and stacking

Introduction to Data Mining, 2nd Edition

Computational Network Biology Biostatistics & Medical Informatics 826

Evaluation of inferred networks

Ensemble learning Reminder - Bagging of Trees Random Forest

Introduction to Sensor Interpretation

Introduction to Sensor Interpretation

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

Dependency networks Sushmita Roy BMI/CS Nov 26 th, 2013

Goals for today Introduction to Dependency networks GENIE3: A network inference algorithm for learning a dependency network from gene expression data Comparison of various network inference algorithms

What you should know What are dependency networks? How they differ from Bayesian networks? Learning a dependency network from expression data Evaluation of various network inference methods

Graphical models for representing regulatory networks Bayesian networks Dependency networks Structure Msb 2 Sho1 Ste20 Random variables encode expression levels T ARGET R EGULATORS X1X1 X2X2 Y3Y3 X1X1 X2X2 Y3Y3 Edges correspond to some form of statistical dependencies Y 3 =f(X 1,X 2 ) Function

Dependency network A type of probabilistic graphical model As in Bayesian networks has – A graph component – A probability component Unlike Bayesian network – Can have cyclic dependencies Dependency Networks for Inference, Collaborative Filtering and Data visualization Heckerman, Chickering, Meek, Rounthwaite, Kadie 2000

Notation X i : i th random variable X={X 1,.., X p } : set of p random variables x i k : An assignment of X i in the k th sample x -i k : Set of assignments to all variables other than X i in the k th sample

Dependency networks ??? … XjXj Regulators Function: f j can be of different types. Learning requires estimation of each of the fj functions In all cases it is trying to minimize an error of predicting X j from its neighborhood: fjfj

Different representations of the fj function If X is continuous – f j can be a linear function – f j can be a regression tree – f j can be a random forest An ensemble of trees If X is discrete – f j can be a conditional probability table – f j can be a conditional probability tree

Linear regression Y (output) X (input) Linear regression assumes that output (Y) is a linear function of the input (X) SlopeIntercept

Estimating the regression coefficient Assume we have N training samples We want to minimize the sum of square errors between true and predicted values of the output Y.

An example random forest for predicting gene expression … Ensemble of Regression trees Output 1 Input A selected path for a set of genes Sox6>0.5

Considerations for learning regression trees Assessing the purity of samples under a leaf node – Minimize prediction error – Minimize entropy How to determine when to stop building a tree? – Minimum number of data points at each leaf node – Depth of the tree – Purity of the data points under any leaf node

Algorithm for learning a regression tree Input: Output variable X j, Input variables X j Initialize tree to single node with all samples under node – Estimate m c : the mean of all samples under the node S: sum of squared error Repeat until no more nodes to split – Search over all input variables and split values and compute S for possible splits – Pick the variable and split value that has the highest improvement in error

GENIE3: GEne Network Inference with Ensemble of trees Solves a set of regression problems – One per random variable Models non-linear dependencies Outputs a directed, cyclic graph with a confidence of each edge Focus on generating a ranking over edges rather than a graph structure and parameters Inferring Regulatory Networks from Expression Data Using Tree-Based Methods Van Anh Huynh-Thu, Alexandre Irrthum, Louis Wehenkel, Pierre Geurts, Plos One 2010

GENIE3 algorithm sketch For each gene j, generate input/output pairs – LS j ={(x -j k,x j k ),k=1..N} – Use a feature selection technique on LS j such as tree building to compute w ij for all genes i ≠ j – w ij quantifies the confidence of the edge between X i and X j Generate a global ranking of regulators based on each w ij

GENIE3 algorithm sketch Figure from Huynh-Thu et al.

Feature selection in GENIE3 Random forest to represent the fj Learning the Random forest Generate M=1000 bootstrap samples At each node to be split, search for best split among K randomly selected variables – K was set to p-1 or (p-1) 1/2

Computing the importance weight of each predictor Feature importance is computed at each test node Remember there can be multiple test nodes per regulator For a test node importance is given by the reduction in variance if we make a split on that node Test nodeSet of data samples that reach the test node #S : Size of the set S Var(S): variance of the output variable in set S

Computing the importance of a predictor For a single tree the overall importance is then sum over over all points in the tree where this node is used to split For an ensemble the importance is averaged over all trees.

Computational complexity of GENIE3 Complexity per variable – O(TKNlog N) – T is the number of trees – K is the number of random attributes selected per split – N is the learning sample size

Evaluation of network inference methods Assume we know what the “right” network is One can use Precision-Recall curves to evaluate the predicted network Area under the PR curve (AUPR) curve quantifies performance

AUPR based performance comparison

DREAM: Dialogue for reverse engineeting assessments and methods Community effort to assess regulatory network inference DREAM 5 challenge Previous challenges: 2006, 2007, 2008, 2009, 2010 Marbach et al. 2012, Nature Methods

Where do different methods rank? Marbach et al., 2010 Community Random

Comparing module (LeMoNe) and per-gene (CLR) methods

Summary of network inference methods Probabilistic graphical models provide a natural representation of networks A lot of network inference is done using gene expression data Many algorithms exist, we have seen three – Bayesian networks Sparse candidates Module networks – Dependency networks – GENIE3 Algorithms can be grouped into per-gene and per- module