Robust diagnosis of DLBCL from gene expression data from different laboratories DIMACS - RUTCOR Workshop on Boolean and Pseudo-Boolean Functions in Memory.

Slides:



Advertisements
Similar presentations
CD10, scored as positive versus negative all path 1 path 2 path 3 path 4 path 5 path 6 path 7 path 8 path 9 CD10 can be reproducibly scored, but is very.
Advertisements

Kwee Yong, UCL Cancer Institute
Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
Clinical Trial Designs for the Evaluation of Prognostic & Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
MOLECULAR GENETICS OF B CELL LYMPHOMAS: AN UPDATE Michel Trudel, MD, FRCPC Shaikh Khalifa Medical Center.
1 Robust diagnosis DLBCL from gene expression data from different laboratories Dimacs Workshop, June 22, 2005 Gyan Bhanot, IBM Research.
Introduction Integrative Analysis of Genomic Variants in Carcinogenesis Syed Haider, Arek Kasprzyk, Pietro Lio Artificial Intelligence and Computational.
III 1 Sorin Alexe RUTCOR, Rutgers University, Piscataway, NJ URL: rutcor.rutgers.edu/~salexe Datascope - a new tool.
Logical Analysis of Diffuse Large B Cell Lymphoma Gabriela Alexe 1, Sorin Alexe 1, David Axelrod 2, Peter Hammer 1, and David Weissmann 3 of RUTCOR(1)
Gene expression patterns of breast cancer phenotype revealed by molecular profiling Gabriela Alexe, IBM Research DIMACS Workshop on Detecting and Processing.
4 th NETTAB Workshop Camerino, 5 th -7 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini
Introduction of Cancer Molecular Epidemiology Zuo-Feng Zhang, MD, PhD University of California Los Angeles.
Supervised gene expression data analysis using SVMs and MLPs Giorgio Valentini
1 Robust diagnosis of DLBCL from gene expression data from different laboratories DIMACS - RUTCOR Workshop on Boolean and Pseudo-Boolean Functions in Memory.
Malignant Melanoma and CDKN2A
Classification of multiple cancer types by multicategory support vector machines using gene expression data.
Whole Genome Expression Analysis
Biomarker and Classifier Selection in Diverse Genetic Datasets J AMES L INDSAY 1 E D H EMPHILL 2 C HIH L EE 1 I ON M ANDOIU 1 C RAIG N ELSON 2 U NIVERSITY.
Exagen Diagnostics, Inc., all rights reserved Biomarker Discovery in Genomic Data with Partial Clinical Annotation Cole Harris, Noushin Ghaffari.
University of Washington Institute of Technology Tacoma, WA, USA Ecole des Hautes Etudes en Santé Publique Département Infobiostat Rennes, France Isabelle.
1 Classifying Lymphoma Dataset Using Multi-class Support Vector Machines INFS-795 Advanced Data Mining Prof. Domeniconi Presented by Hong Chai.
The Broad Institute of MIT and Harvard Classification / Prediction.
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics.
Selection of Patient Samples and Genes for Disease Prognosis Limsoon Wong Institute for Infocomm Research Joint work with Jinyan Li & Huiqing Liu.
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed.
Evolution-informed Modeling discover biomarkers for precision oncology Li Liu, M.D. August 22, 2016.
R3 조 욱 Salivary Transcriptomic Biomarkers for Detection of Resectable Pancreatic Cancer Articles LEI ZHANG, JAMES J. FARRELL, HUI ZHOU, DAVID ELASHOFF,
Kelci J. Miclaus, PhD Advanced Analytics R&D Manager JMP Life Sciences
High-throughput genomic profiling of tumor-infiltrating leukocytes
David Amar, Tom Hait, and Ron Shamir
Combinatorial interactions of cyclins and cyclin-dependent kinases (cdks) during the cell cycle. Progression from G0 through the restriction point in G1.
Classification with Gene Expression Data
Predicting Recurrence in Clear Cell Renal Cell Carcinoma
Wijendra Senarathne1, Peggy Gates1, Semir Vranic2, Zoran Gatalica1
Cancer – a disease of many mutations
An Artificial Intelligence Approach to Precision Oncology
Lipocalin 2 (LCN2) is a promising target for cholangiocarcinoma treatment and bile LCN2 level is a potential cholangiocarcinoma diagnostic marker Chun-Yi.
Classifiers!!! BCH339N Systems Biology / Bioinformatics – Spring 2016
Poster: Session B #114: 1pm-2pm
Classifiers!!! BCH364C/394P Systems Biology / Bioinformatics
Chronic immune activation in HIV associated Non Hodgkin lymphoma and the effect of antiretroviral therapy Brian Flepisi University of the Western Cape.
Global Transcriptional Dysregulation in Breast Cancer
Controlling a Cell’s Progress through the Cell Cycle
A Functional Map of Oncogenic States for Breast Cancer
Impact of Formal Methods in Biology and Medicine Final Review
Impact of Formal Methods in Biology and Medicine
Claudio Lottaz and Rainer Spang
High-level TNFSF13 predict a good response to post-operative chemotherapy in patients with basal-like breast cancer: A systematic review 林惠鈺1,2 歸家豪1,3.
Logical Analysis Of Data (LAD) Applied To Mass Spectrometry Data To Predict Rate Of Decline Of Kidney Function M. Lipkowitz1, M. Subasi2, E. Subasi2,
Impact of Formal Methods in Biology and Medicine
Program Goals. Targeting New Pathways for Overcoming Endocrine Resistance in Breast Cancer.
Dan Gordon  Gastroenterology  Volume 114, Issue 4, (April 1998)
Focus on lymphomas Cancer Cell
Loyola Marymount University
Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Loyola Marymount University
Lymphoma in Pediatrics 23rd Nov 2018
Proposed spiral model for prostate cancer progression.
Altered Caspase-8 Expression
Loyola Marymount University
Didi Amar and Tom Hait Group meeting October 2013
Loyola Marymount University
Loyola Marymount University
Claudio Lottaz and Rainer Spang
Presentation transcript:

Robust diagnosis of DLBCL from gene expression data from different laboratories DIMACS - RUTCOR Workshop on Boolean and Pseudo-Boolean Functions in Memory of Peter L. Hammer January 19-22, 2009

Peter L Hammer Gustavo Stolovitzky Sorin Alexe David E Axelrod RUTGERS UNIV Gustavo Stolovitzky IBM TJ WATSON RESEARCH Gyan Bhanot Arnold J Levine INSTITUTE FOR ADVANCED STUDY PRINCETON David Weissmann CANCER INSTITUTE OF NEW JERSEY

Overview Motivation Pattern-based ensemble classifiers Case study – compare data from two labs for DLBCL vs FL diagnosis Shipp et al. (2002) Nature Med.; 8(1), 68-74. (Whitehead Lab) Stolovitzky G. (2005) In Deisboeck et al Complex Systems Science in BioMedicine (in press) (preprint: http://www.wkap.nl/prod/a/Stolovitzky.pdf). (DellaFavera Lab) Alexe, Alexe, Axelrod, Hammer, Weissmann (2005) Artificial Intelligence in Medicine Bhanot, Alexe, Stolowitzky, Levine (2005) Genome Informatics

Non-Hodgkin lymphomas FL low grade non-Hodgkin lymphoma / no cure if advanced stage second most frequent subtype of nodal lymphoid malignancies Incidence has risen from 2–3/ to more than 5–7/ 100,000/year (’50 –’00) t(14;18) translocation:over-expression of anti-apoptotic bcl2 25-60% FL cases evolve to DLBCL DLBCL high grade non-Hodgkin lymphoma / high variability to treatment most frequent subtype of NHL < 2 years survival if untreated Biomarkers: FL transformation to DLBCL p53/MDM2 (Moller et al., 1999) p16 (Pyniol, 1998) p38MAPK (Elenitoba-Johnson et al., 2003) c-myc (Lossos et al., 2002)

Gene arrays Gene arrays are a way to study the variation of mRNA levels between different types of cells. This allows diagnosis and inference of pathways that cause disease / early stage diagnosis Identify molecular profiles of disease – personalized medicine

Lymphoma datasets Data: WI (Shipp et al., 2002) Affy HuGeneFL CU (DallaFavera Lab, Stolovitzky, 2005) Affy Hu95Av2 Samples: WI: 58 DLBCL & 19 FL CU: 14 DLBCL & 7 FL Genes: WI: 6817 CU: 12581

Diagnosis problem Input Output Training (biomedical) data: 2 classes: FL and DLBCL m samples described by N >> features Output Collection of robust biomarkers, models Robust, accurate classifier / tested on out-of-sample data

Patterns (Logical Analysis of Data, Hammer 1988) Positive Patterns Negative Patterns Model -Exhaustive collections of patterns Pattern space Classification / attribute analysis / new class identification

Data Preprocessing 50 % P calls, UL = 16000, LL = 20 2/1 stratify WI data to train/test CU data test Normalize data to median 1000 per array Generate 500 data sets using noise + k fold stratified sampling + jackknife Find genes with high correlation to phenotype using t-test or SNR. Keep genes that are in > 90% of datasets

Choosing support sets Create quality patterns using small subsets of genes, validate using weighted voting with 10 fold cross validation Sort genes by their appearance in good patterns Select top genes to cover each sample by at least 10 patterns Alexe, Alexe, Hammer, Vizvari (2005)

The 30 genes that best distinguish FL from DLBCL selenoprotein

Genes identified by LAD (AIIM 2005) to distinguish DLBCL from FL

Examples of FL and DLBCL patterns WI training data: Each DLBCL case satisfies at least one of the patterns P1 and P2 Each FL case satisfies the pattern N1 (and none of the patterns P1 and P2)

Pattern data

Meta-classifier performance

Error distribution: raw and pattern data

Biology based method

p53 related genes identified by filtering procedure FL  DLBCL progression p53 related genes identified by filtering procedure

p53 pattern data

Examples of p53 responsive genes patterns WI data: Each DLBCL case satisfies one of the patterns P1, P2, P3 Each FL case satisfies one of the patterns N1, N2, N3

p53 combinatorial biomarker 77% FL & 21% DLBCL cases (3.7 fold) at most one gene over-expressed 79% DLBCL & 23% FL cases (3.4 fold) at least two genes over-expressed Each individual gene: over- expressed in about 40-70% DLBCL & 20-40% FL (specificity 50-60%, sensitivity 60-70%)

What are these genes? Plk1 (stpk13): polo-like kinase serine threonine protein kinase 13, M-phase specific cell transformation, neoplastic, drives quiescent cells into mitosis over-expressed in various human tumors Takai et al., Oncogene, 2005: plk1 potential target for cancer therapy, new prognostic marker for cancer Mito et al, Leuk Lymph, 2005: plk1 biomarker for DLBCL Cdk2 (p33): cyclin -dependent kinase: G2/M transition of mitotic cell cycle, interacts with cyclins A, B3, D, E P53 tumor suppressor gene (Levine 1982)

Conclusions Pattern-based meta-classifier is robust against noise Good prediction of FL  DLBCL Biology based analysis also possible Yields useful biomarker Should study biologically motivated sets of genes  build pathways

Thank you for your attention ! <> Thank you for your attention !