Download presentation
Presentation is loading. Please wait.
Published byNelson O’Connor’ Modified over 9 years ago
1
Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er
2
Motivation Gliobastoma: most common primary brain tumour in adults. Newly diagnosed patients have an average survival of 1 year. Need for better models of the network. Data used to create models: microarrays # genes 8000 # candidate regulators 800 # samples 120
3
Module networks Bayesian model that benefits from high correlation of groups of variables [2] Algorithm similar to EM (but hard decisions). Loop: –Module assignment step: assign variables to modules –Structure search step: calculate CPD for each module Module 1 Module 2 Module 4 Module 3
4
Regression trees as CPD Regression trees are used for each module’s CPD Internal nodes: condition on a single variable Leaf nodes: parameters for normal distribution Bayesian score Exhaustively calculates score for each split for each regulator …… target gene’s values sorted by regulator pdf of normal-gammaprior on structure (complexity+biological penalties) x<0.3 y>-0.2
5
Incorporating pathway information Biological pathways: contain sets of genes and represent chains of biochemical reactions that perform some function Aberrations in gliobastoma tend to occure as disjunctions within pathways: derregulating 1 component is usually enough to alter the function of the whole pathway [4] Idea: use pathway information to obtain a better model Methodology: extend node conditions to disjunctions of conditions on pathway elements We will use 15 sets of regulators (20-30 genes per set) –5 sets of regulators of pathways known to be related to cancer. –5 sets of regulators of other pathways –5 sets of regulators chosed at random
6
Problem setting Concept class: disjunction of threshold functions on a single variable Loss functions: -Bayesian score (biological penalty?) Potential number of hypotheses: 2^{m} Related classification problem tackled by Marchand and Shah (2005) and Kestler et al. (2006).
7
Bibliography 1.Pe'er, D., Bayesian Network Analysis of Signaling Networks: A Primer. Sci. STKE, 2005. 2005(281): p. pl4-. 2.Segal, E., et al., Module networks: identifying regulatory modules and their condition- specific regulators from gene expression data. Nat Genet, 2003. 34(2): p. 166-176. 3.Lee, S.-I., et al., Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification. Proceedings of the National Academy of Sciences, 2006. 103(38): p. 14062-14067. 4.Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 2008. 455(7216): p. 1061-1068. 5.Kestler, H., W. Lindner, and A. Müller, Learning and Feature Selection Using the Set Covering Machine with Data-Dependent Rays on Gene Expression Profiles, in Artificial Neural Networks in Pattern Recognition. 2006. p. 286-297. 6.Marchand, M. and M. Shah, PAC-Bayes Learning of Conjunctions and Classification of Gene-Expression Data, in Advances in Neural Information Processing Systems 17. 2005, MIT Press: Cambridge, MA. p. 881-888.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.