Simulation and Application on learning gene causal relationships Xin Zhang
Introduction High-throughput genetic technologies empowers to study how genes interact with each other; Simulation to evaluate how well IC algorithm learns gene causal relationships; We present an algorithm (mIC algorithm) for learning causal relationship with knowledge of topological ordering information, and apply it on Melanoma dataset; Apply mIC algorithm on Melanoma dataset;
Steps for Simulation Study Construct a causal network N; Generate datasets based on the causal network; Learning the simulated data using causal algorithms (e.g. IC algorithm) to obtain network N´; Compare the original network N with obtained network N´ w.r.t precision and recall;
Modeling and simulation of a causal Boolean network (BN) Boolean network: A C B f C=f(A,B) Constructing a causal structure; Assign parameters (proper functions) for each node with casual parents; Assign probability distribution;
Constructing Boolean Network 1.Generate M BNs with up to 3 causal parents for each node; 2.For each BN, generate a random proper function for each node; 3.Assign random probabilities for the root gene(s); 4.Given one configuration, get probability distribution; 5.Collect 200 data points for each network; 6.Repeat above steps 3-5 for all M networks.
Constructing Causal Structure A C B E D
Steps for constructing causal structure
Proper function (1) Proper function: The function that reflects the influence of the operators. Example: By simplifying f, c is a function of a with c = a b is a pseudo predictor of c, and has no effect on c. f is not a proper function.
Proper function (2) Definition: With n predictors, the number of proper function is given by:
Probability Distribution
Generating dataset
Steps of learning gene causal relationships Step1: obtain the probability distribution and data sampling; Step2: apply algorithms to find causal relations; Step3: compare the original and obtained networks based on the two notions of precision and recall; Step4: repeat step 1-3 for every random network;
Comparing two networks A DC BA DC B Original Network Obtained Network
Precision and Recall Original graph is a DAG, while obtained graph has both directed and undirected edges; Orig GraphObt. Graph FN TP TN FP PFN, PTP PTN, PFP Recall = ATP/(AFN+ATP), Precision = ATP/(ATP + AFP)
Observational equivalence and Transitive Closure Two DAGs are said to be observational equivalent (OE) if they have the same skeleton and the same set of v- structure; A DC BA DC B OE Transitive closure (TC): A ->B -> C with A -> C cc(x,y): is true if there is a directed or an undirected edge from x to y; pcc(x,y): is true if there is a path from x to y consisting of properly directed and undirected edges pcc(x,y):= cc(x,y) | pcc(x,z) pcc(z,y)
Result for IC algorithm
How to improve IC algorithm The original IC algorithm did not have good results on learning gene causal relationships; A possible way to improve the performance is to incorporate extra information; If we know the topological ordering of the regulatory network, it would be helpful to improve the learning result;
Gene topological ordering If a specific gene is the causal parent of another gene; In a pathway, if one gene appears before another gene; If one gene is at the beginning or at the end of the pathway; IC algorithm + topological ordering information
mIC algorithm mIC algorithm based on IC, but incorporates both topological ordering information with steady state data to infer causality; 3 Steps of mIC algorithm: –Find conditional independence: For each pair of gene g i and g j in a dataset, test pairwise conditional independence. If they are dependent, search for a set S ij = {g k | g i and g j are independent given g k, with i<k<j, or j<k<i}. Construct an undirected graph G such that g i and g j are connected with an edge if an only if they are pairwise dependent and no S ij can be found; –Find v-structure: For each pair of nonadjacent genes g i and g j with common neighbor g k, if g k S ij, and k>i, k>j, add arrowheads pointing at g k, such as g i ->g k <- g j ; –Orientate more directed edges according to rules: Orientate the undirected edges without creating new cycles and v-structures;
Results from mIC algorithm
Melanoma dataset The 10 genes involved in this study chosen from 587 genes from the melonoma data; Previous studies show that WNT5A has been identified as a gene of interest involved in melanoma; Controlling the influence of WNT5A in the regulation can reduce the chance of melanoma metastasizing;
Applying mIC algorithm on Melanoma Dataset WNT5A Partial biological prior knowledge: MMP3 is expected to be the end of the pathway Pirin causatively influences WNT5A – In order to maintain the level of WNT5A we need to directly control WNT5A or through pirin. WNT5A directly causes MART-1
Conclusion Evaluated IC algorithm using simulation data; We presented mIC algorithm that can infer gene causal relationship from steady state data with gene topological ordering information; Performed simulation based on Boolean network to evaluate the performance of the causal algorithms; We applied mIC algorithm to real biological microarray data Melanoma dataset; The result showed that some of the important causal relationships associated with WNT5A gene have been identified using mIC algorithm.