Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er
Motivation Gliobastoma: most common primary brain tumour in adults. Newly diagnosed patients have an average survival of 1 year. Need for better models of the network. Data used to create models: microarrays # genes 8000 # candidate regulators 800 # samples 120
Module networks Bayesian model that benefits from high correlation of groups of variables [2] Algorithm similar to EM (but hard decisions). Loop: –Module assignment step: assign variables to modules –Structure search step: calculate CPD for each module Module 1 Module 2 Module 4 Module 3
Regression trees as CPD Regression trees are used for each module’s CPD Internal nodes: condition on a single variable Leaf nodes: parameters for normal distribution Bayesian score Exhaustively calculates score for each split for each regulator …… target gene’s values sorted by regulator pdf of normal-gammaprior on structure (complexity+biological penalties) x<0.3 y>-0.2
Incorporating pathway information Biological pathways: contain sets of genes and represent chains of biochemical reactions that perform some function Aberrations in gliobastoma tend to occure as disjunctions within pathways: derregulating 1 component is usually enough to alter the function of the whole pathway [4] Idea: use pathway information to obtain a better model Methodology: extend node conditions to disjunctions of conditions on pathway elements We will use 15 sets of regulators (20-30 genes per set) –5 sets of regulators of pathways known to be related to cancer. –5 sets of regulators of other pathways –5 sets of regulators chosed at random
Problem setting Concept class: disjunction of threshold functions on a single variable Loss functions: -Bayesian score (biological penalty?) Potential number of hypotheses: 2^{m} Related classification problem tackled by Marchand and Shah (2005) and Kestler et al. (2006).
Bibliography 1.Pe'er, D., Bayesian Network Analysis of Signaling Networks: A Primer. Sci. STKE, (281): p. pl4-. 2.Segal, E., et al., Module networks: identifying regulatory modules and their condition- specific regulators from gene expression data. Nat Genet, (2): p Lee, S.-I., et al., Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification. Proceedings of the National Academy of Sciences, (38): p Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, (7216): p Kestler, H., W. Lindner, and A. Müller, Learning and Feature Selection Using the Set Covering Machine with Data-Dependent Rays on Gene Expression Profiles, in Artificial Neural Networks in Pattern Recognition p Marchand, M. and M. Shah, PAC-Bayes Learning of Conjunctions and Classification of Gene-Expression Data, in Advances in Neural Information Processing Systems , MIT Press: Cambridge, MA. p