DREAM4 Puzzle – inferring network structure from microarray data Qiong Cheng
Outline Gene Network Gene Regulatory Systems and Related Work FunGen: Reconstructing Biological Networks Using Conditional Correlation Analysis ARACNE: Algorithm for Reconstructing Accurate Cellular Network
Gene Network Directed network –nodes : genes –edges : regulation –including loops –Scale-free: Degree distribution: –power law P(k) ~ k -λ
Genetic Network Generation Schematic Jong Modeling and simulation of genetic regulatory systems: a literature review. J. Comput Biol 2002;9(1):67-103
Random Network Model ER model –each pair of nodes connected by an edge with probability p –Independence of the edges –poisson degree distribution (e.g. P(k) ~ e -k for k) BA model –Scale-free distribution ( P(k) ~ k -x ) –Process: new nodes prefer attached to already high degree nodes
Random Network Model Module extraction from source random scale- free network (used by DREAM3) –Hierarchical scale-free network –Extraction: Random seed node + iteratively adding neighbor nodes with highest modularity Q Marbach D, Schaffter T, Mattiussi C, and Floreano D (2009) Generating Realistic in silico Gene Networks for Performance Assessment of Reverse Engineering Methods. J Comput Biol, 16(2):229–239
Microarray Data Distributions Benford’s law ( in base 10): P(D)=log 10 (1+D -1 ) Zipf’s law: microarray data log-normal distribution as a potential distribution for normalization of the bulk of the corrected spot intensities Noise Source: “Make Sense Of Microarray Data Distributions”
Reverse Engineering Clustering + … Correlation measures + … Optimization method –Bayesian network (conditional independence via DAG) –Markov chains –Dynamic Bayesian network –Expectation maximization (max likelihood) –GA –Neuron network Simulation –Piecewise-linear differential equations –Stochastic equations –Stochastic/hybrid petri-net –Boolean network Regression techniques
FunGen : Reconstructing Biological Networks Using Conditional Correlation Analysis Synthetic network Network dynamics Simulation protocol - perturbation Conditional correlation –Correlation is symetric –Matrix is non-symetric –May lead to indirect connection False positive ( indirect connection ) + false negative ( noise ) – error = FP/(FP+TN) + FN/(FN+TP) Reduce false positive –Choose optimal ρ_opt –Triangle reduction construction
ARACNE: Algorithm for Reconstructing Accurate Cellular Network Assume two-way interaction: pairwise potential determines all statistical dependencies + uniform marginal distributions Mutual information (MI) = measure of relatedness Independency Data processing inequality: if genes g 1 and g 3 interact through g 2 then ARACNE starts with network so for every edge look at gene triplets and remove edge with smallest MI Ignore the direction of the edges Reconstruct tree-network topologies exactly –higher-order potential interactions will not be accounted for (ARACNE’s algorithm will open 3-gene loops). –A two-gene interaction will be detected iff there are no alternate paths.
ARACNE – Example & Evaluation Synthetic networks: ER, BA Performance to be assessed via Precision-Recall curves (PRCs) Example:
(Demo) Sample input data file Input_file_name.exp N = 3 # genes M = 2 # microarrays Input file has N+1=4 lines each lines has M+2 (2M+2) fields AffyIDHG_U95Av2SudHL6.CHPST486.CHP G1G G2G G3G header line annotation name Microarray chip names (value,p-value)-chip1 Source from ARACNE slides
(Demo, cont’d) Sample output data file input_data_file_name[non-default_param_vals].adj # lines = N = # genes G1: G2: G3: G4: G5: G6: G7: G8: G9: G10: AffyIDID# Associated gene ID# MI value Source from ARACNE slides