Download presentation
Presentation is loading. Please wait.
1
1 Robust diagnosis of DLBCL from gene expression data from different laboratories DIMACS - RUTCOR Workshop on Boolean and Pseudo-Boolean Functions in Memory of Peter L. Hammer January 19-22, 2009
2
2 Peter L Hammer Sorin Alexe David E Axelrod RUTGERS UNIV Gustavo Stolovitzky IBM TJ WATSON RESEARCH Gyan Bhanot Arnold J Levine INSTITUTE FOR ADVANCED STUDY PRINCETON David Weissmann CANCER INSTITUTE OF NEW JERSEY
3
3 Overview Motivation Pattern-based ensemble classifiers Case study – compare data from two labs for DLBCL vs FL diagnosis Shipp et al. (2002) Nature Med.; 8(1), 68-74. (Whitehead Lab) Stolovitzky G. (2005) In Deisboeck et al Complex Systems Science in BioMedicine (in press) (preprint: http://www.wkap.nl/prod/a/Stolovitzky.pdf). (DellaFavera Lab) Alexe, Alexe, Axelrod, Hammer, Weissmann (2005) Artificial Intelligence in Medicine Bhanot, Alexe, Stolowitzky, Levine (2005) Genome Informatics
4
4 Non-Hodgkin lymphomas FLlow grade non-Hodgkin lymphoma / no cure if advanced stage second most frequent subtype of nodal lymphoid malignancies Incidence has risen from 2–3/ to more than 5–7/ 100,000/year (’50 –’00) t(14;18) translocation:over-expression of anti-apoptotic bcl2 25-60% FL cases evolve to DLBCL DLBCL high grade non-Hodgkin lymphoma / high variability to treatment most frequent subtype of NHL < 2 years survival if untreated Biomarkers: FL transformation to DLBCL p53/MDM2 (Moller et al., 1999) p16 (Pyniol, 1998) p38MAPK (Elenitoba-Johnson et al., 2003) c-myc (Lossos et al., 2002)
5
5 Gene arrays Gene arrays are a way to study the variation of mRNA levels between different types of cells. This allows diagnosis and inference of pathways that cause disease / early stage diagnosis Identify molecular profiles of disease – personalized medicine
6
6 Lymphoma datasets Data:WI (Shipp et al., 2002) Affy HuGeneFL CU (DallaFavera Lab, Stolovitzky, 2005) Affy Hu95Av2 Samples: WI: 58 DLBCL & 19 FL CU: 14 DLBCL & 7 FL Genes: WI: 6817 CU: 12581
7
7 Diagnosis problem Input Training (biomedical) data: 2 classes: FL and DLBCL m samples described by N >> features Output Collection of robust biomarkers, models Robust, accurate classifier / tested on out-of-sample data
8
8
9
9 Patterns (Logical Analysis of Data, Hammer 1988) Positive Patterns Negative Patterns Model - Exhaustive collections of patterns -Pattern space -Classification / attribute analysis / new class identification
10
10 Data Preprocessing 50 % P calls, UL = 16000, LL = 20 2/1 stratify WI data to train/test CU data test Normalize data to median 1000 per array Generate 500 data sets using noise + k fold stratified sampling + jackknife Find genes with high correlation to phenotype using t-test or SNR. Keep genes that are in > 90% of datasets
11
11 Choosing support sets Create quality patterns using small subsets of genes, validate using weighted voting with 10 fold cross validation Sort genes by their appearance in good patterns Select top genes to cover each sample by at least 10 patterns Alexe, Alexe, Hammer, Vizvari (2005)
12
12 The 30 genes that best distinguish FL from DLBCL
13
13 Genes identified by LAD (AIIM 2005) to distinguish DLBCL from FL
14
14 Examples of FL and DLBCL patterns WI training data: Each DLBCL case satisfies at least one of the patterns P1 and P2 Each FL case satisfies the pattern N1 (and none of the patterns P1 and P2)
15
15 Pattern data
16
16 Meta-classifier performance
17
17 Error distribution: raw and pattern data
18
18 Biology based method
19
19 p53 related genes identified by filtering procedure FL DLBCL progression
20
20 p53 pattern data
21
21 Examples of p53 responsive genes patterns WI data: Each DLBCL case satisfies one of the patterns P1, P2, P3 Each FL case satisfies one of the patterns N1, N2, N3
22
22 p53 combinatorial biomarker 77% FL & 21% DLBCL cases (3.7 fold) at most one gene over-expressed 79% DLBCL & 23% FL cases (3.4 fold) at least two genes over-expressed Each individual gene: over- expressed in about 40-70% DLBCL & 20-40% FL (specificity 50-60%, sensitivity 60-70%)
23
23 What are these genes? Plk1 (stpk13): polo-like kinase serine threonine protein kinase 13, M-phase specific cell transformation, neoplastic, drives quiescent cells into mitosis over-expressed in various human tumors Takai et al., Oncogene, 2005: plk1 potential target for cancer therapy, new prognostic marker for cancer Mito et al, Leuk Lymph, 2005: plk1 biomarker for DLBCL Cdk2 (p33): cyclin -dependent kinase: G2/M transition of mitotic cell cycle, interacts with cyclins A, B3, D, E P53 tumor suppressor gene (Levine 1982)
24
24 Conclusions Pattern-based meta-classifier is robust against noise Good prediction of FL DLBCL Biology based analysis also possible Yields useful biomarker Should study biologically motivated sets of genes build pathways
25
25 Thank you for your attention ! <>
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.