Department of Mathematics, Northern New Mexico College1 Constructing linked gene sets by analyzing conditional probabilities in microarrays Jose Pacheco1, Stuart Winter2, Ksenia Matlawska-Wasowska2, Judy Cannon3, and David Torres1 Department of Mathematics, Northern New Mexico College1 Department of Pediatrics, University of New Mexico Health Sciences Center2 Department of Pathology, University of New Mexico Health Sciences Center3 Abstract Constructing conditional probabilities using permutations Computing Genes Sets Relationship with Distance Correlation In the PLoS ONE publication, “Self-Contained Statistical Analysis of Gene Sets,” [1] we describe a permutation method that not only computes the p-value of a gene set but also the conditional probability or dependence of genes, P(A|B). P(A|B) is the probability that gene A is differentially expressed given that gene B is differentially expressed. These dependencies will allow us to construct gene sets. Our project will create new gene sets associated with T-lineage Acute Lymphoblastic Leukemia (T-ALL) and migration to the Central Nervous System (CNS) based on these dependencies using expression levels from a microarray. We will use a T-ALL CNS vs non-CNS microarray with 54,675 probes/genes and 49 patients. Given the probability levels of individual genes, a modified Fisher’s method can be used to compute the p-value of a gene set to set a lower limit pmin given the individual p-values of its genes pk. A differentially expressed gene A will be connected to first generation genes Ai that show a high level of dependence. A differentially expressed gene A can be linked to a gene B through direct dependence and through shared dependencies among their respective first generation genes Ai and Bi. Applying the Method to T-ALL Accounting for Dependencies Among Genes A microarray was used to compare 22 T-ALL CNS patients with 27 T-ALL non-CNS patients and 54,675 genes/probes. Dependencies were calculated using the 22 T-ALL CNS patients. Fisher’s method assumes gene independence (i.e. the differential expression of one gene does not influence the differential expression of another gene). However genes within a gene set are related and thus may exhibit dependencies. To overcome this statistical dilemma, the phenotypic labels of a gene are permuted. The p-value of the original unpermuted gene is then computed by determining its rank among the large number of permutations. Introduction T-ALL is a heterogeneous disease characterized largely by chromosomal translocations which manifest themselves in the arrested development of thymocytes. Treatment of T-ALL is complicated when the disease migrates to the Central Nervous System (CNS) and CNS migration is also associated with relapse. Genes associated with T-ALL include Notch and CD3D, and molecules associated with CNS relapse include the chemokine receptor CCR7 and CARMA1 [2-3]. Yeoh et al. [2] identify genes associated with T-ALL relapse and emphasize that a collection of genes and not a single gene is necessary for an accurate prediction of relapse. Our project continues our work in identifying and creating gene sets associated with T-ALL CNS migration using microarrays since the progression of the disease may take multiple pathways and involve many genes. Results Gene p-value Connections PLDN 1.0e-5 CEP27 4.4e-4 15 FLJ36840 5.2e-4 14 CCDC3 1.2e-4 14 CNTNAP3B 5.2e-4 12 ADCY5 1.7e-4 14 CCL18 4.8e-4 12 LOC146439 6.2e-4 11 RTP1 4.4e-4 10 LASS6 1.2e-3 9 C150RF42 4.8e-4 9 TCL6 1.2e-3 9 PRDM13 7.3e-4 9 References [1] Torres et al. Self-contained statistical analysis of gene sets. PLoS ONE. 2016, 1-18. e0163918. doi:10.1371/journal.pone.0163918 [2] Oruganti, SR et al. CARMA1 is a novel regulator of T-ALL disease and leukemic cell migration to the CNS. Leukemia. 2017; 31, 255-238. doi: 10.1038/leu.2016.272 [3] Yeoh E-J et al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell. 2002; 1, 133-143. [4] Maiorov EG et al. Identification of interconnected markers for T-cell acute lymphoblastic leukemia. BioMed Research International. 2013; 1-20. http://dx.doi.org/10.1155/2013/210253. Computing Dependencies In the permutation procedure, the number of times that a gene A, a gene B, and both gene A and B are differentially expressed is tracked. The equation p(A and B) = p(A)p(B|A) is then used to compute the dependency p(A|B). Acknowledgements The authors would like to acknowledge the NM-INBRE.