Dependency networks Sushmita Roy BMI/CS 576 Nov 25 th, 2014.

Dependency networks Sushmita Roy BMI/CS 576 www.biostat.wisc.edu/bmi576 sroy@biostat.wisc.edu Nov 25 th, 2014

RECAP Probabilistic graphical models provide a natural way to represent biological networks So far we have see Bayesian networks: – Sparse candidates – Module networks Today we will focus on dependency networks

What you should know What are dependency networks? How they differ from Bayesian networks? GENIE3 algorithm for learning a dependency network from expression data Different ways to represent conditional distributions Evaluation of various network inference methods

Graphical models for representing regulatory networks Bayesian networks Dependency networks Structure Msb 2 Sho1 Ste20 Random variables encode expression levels T ARGET R EGULATORS X1X1 X2X2 Y3Y3 X1X1 X2X2 Y3Y3 Edges correspond to some form of statistical dependencies Y 3 =f(X 1,X 2 ) Function

Dependency network A type of probabilistic graphical model As in Bayesian networks has – A graph component describing the dependency structure between random variables – Each variable X j is associated with a prediction function f j to predict X j from the state of its neighbors Unlike Bayesian network – Can have cyclic dependencies Dependency Networks for Inference, Collaborative Filtering and Data visualization Heckerman, Chickering, Meek, Rounthwaite, Kadie 2000

Notation X i : i th random variable X={X 1,.., X p } : set of p random variables x i k : An assignment of X i in the k th sample x -i k : Set of assignments to all variables other than X i in the k th sample

Learning dependency networks ??? … XjXj Regulators f j can be of different types. Learning requires estimation of each of the f j functions In all cases learning requires us to minimize an error of predicting X j from its neighborhood: fjfj

Different representations of the f j function If X j is continuous – f j can be a linear function – f j can be a regression tree – f j can be a random forest An ensemble of trees If X j is discrete – f j can be a conditional probability table – f j can be a conditional probability tree

GENIE3: GEne Network Inference with Ensemble of trees Solves a set of regression problems – One per random variable Uses an Ensemble of regression trees to represent f j – Models non-linear dependencies Outputs a directed, cyclic graph with a confidence of each edge Focus on generating a ranking over edges rather than a graph structure and parameters Inferring Regulatory Networks from Expression Data Using Tree-Based Methods Van Anh Huynh-Thu, Alexandre Irrthum, Louis Wehenkel, Pierre Geurts, Plos One 2010

Recall our very simple regression tree example X2X2 X3X3 e1e1 e2e2 X 2 > e 1 X 2 > e 2 YES NO YES X3X3

An Ensemble of trees A single tree is prone to “overfitting” Instead of learning a single tree, Ensemble models make use of a collection of trees

– Prediction is A Random forest: An Ensemble of Trees …… tree t 1 tree t T split nodes leaf nodes x -j Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

GENIE3 algorithm sketch For each X j, generate learning samples of input/output pairs – LS j ={(x -j k,x j k ), k=1..N} – On each LS j learn f j to predict the value of X j – f j is either a Random forest or Extra trees – Estimate w ij for all genes i ≠ j w ij quantifies the confidence of the edge between X i and X j Generate a global ranking of edges based on each w ij

GENIE3 algorithm sketch Figure from Huynh-Thu et al. Predictor ranking

Learning f j in GENIE3 Random forest or Extra Trees to represent the f j Learning the Random forest – Generate M=1000 bootstrap samples – At each node to be split, search for best split among K randomly selected variables – K was set to p-1 or (p-1) 1/2, where p is the number of regulators/parents Learning the Extra-Trees – Learn 1000 trees – Each tree is built from the original learning sample – At each test node, the best split is determined among K random splits, each determined by randomly selecting one input (without replacement) and a threshold

Computing the importance weight of a predictor Importance is computed at each interior node Remember there can be multiple interior nodes per regulator For an interior node, importance is given by the reduction in variance if we make a split on that node Interior node Set of data samples that reach this node #S : Size of the set S Var( S ): variance of the output variable in set S S t : subset of S when a test at N is true S f : subset of S when a test at N is false

Computing the importance weight of a predictor For a single tree the overall importance is then sum over all points in the tree where this node is used to split For an ensemble the importance is averaged over all trees To avoid bias towards highly variable genes, normalize the expression genes to all have unit variance

Computational complexity of GENIE3 Complexity per variable – O(TKNlog N) – T is the number of trees – K is the number of random attributes selected per split – N is the learning sample size

Evaluation of network inference methods Assume we know what the “right” network is One can use Precision-Recall curves to evaluate the predicted network Area under the PR curve (AUPR) curve quantifies performance Precision= # of correct edges # of predicted edges Recall= # of correct edges # of true edges

AUPR based performance comparison

Some comments about expression-based network inference methods We have seen two types of algorithms to learn these networks – Per-gene methods Sparse candidate: learn regulators for individual genes GENIE3 – Per-module methods Module networks: learn regulators for sets of genes/modules – Other implementations of module networks exist LIRNET: Learning a Prior on Regulatory Potential from eQTL Data – Su In Lee et al, Plos genetics 2009 (http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjou rnal.pgen.1000358) LeMoNe: Learning Module Networks – Michoel et al 2007 (http://www.biomedcentral.com/1471- 2105/8/S2/S5)

Many implementations of per-gene methods Mutual Information – Context Likelihood of relatedness (CLR) – ARACNE Probabilistic methods – Bayesian network: Sparse Candidates Regression – TIGRESS – GENIE-3

DREAM: Dialogue for reverse engineeting assessments and methods Community effort to assess regulatory network inference DREAM 5 challenge Previous challenges: 2006, 2007, 2008, 2009, 2010 Marbach et al. 2012, Nature Methods

Where do different methods rank? Marbach et al., 2010 Community Random

Comparing module (LeMoNe) and per-gene (CLR) methods

Dependency networks Sushmita Roy BMI/CS 576 Nov 25 th, 2014.

Similar presentations

Presentation on theme: "Dependency networks Sushmita Roy BMI/CS 576 Nov 25 th, 2014."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dependency networks Sushmita Roy BMI/CS 576 Nov 25 th, 2014.

Similar presentations

Presentation on theme: "Dependency networks Sushmita Roy BMI/CS 576 Nov 25 th, 2014."— Presentation transcript:

Similar presentations

About project

Feedback