Knowledge Discovery in Biomedicine Limsoon Wong Institute for Infocomm Research
Copyright © 2004 by Limsoon Wong Plan Knowledge discovery in brief Eg 1: Optimizing treatment of childhood ALL Eg 2: Predicting survivals of patients with DLBC lymphoma Concluding remarks
Copyright © 2004 by Limsoon Wong Knowledge Discovery in Brief
Jonathan’s rules: Blue or Circle Jessica’s rules: All the rest Whose block is this? Jonathan’s blocks Jessica’s blocks What is Knowledge Discovery? Copyright © 2004 by Limsoon Wong
Question: Can you explain how? What is Knowledge Discovery? Copyright © 2004 by Limsoon Wong
Some classifiers/learning methods Steps of Knowledge Discovery Training data gathering Feature generation –k-grams, colour, texture, domain know-how,... Feature selection –Entropy, 2, CFS, t-test, domain know-how... Feature integration –SVM, ANN, PCL, CART, C4.5, kNN,...
Copyright © 2004 by Limsoon Wong Knowledge Discovery for Optimizing Treatment of Childhood ALL Image credit: Yeoh et al, 2002
Childhood ALL Major subtypes: T-ALL, E2A-PBX, TEL-AML, BCR-ABL, MLL genome rearrangements, Hyperdiploid>50, Diff subtypes respond differently to same Tx Over-intensive Tx –Development of secondary cancers –Reduction of IQ Under-intensiveTx –Relapse The subtypes look similar Conventional diagnosis –Immunophenotyping –Cytogenetics –Molecular diagnostics Unavailable in most ASEAN countries Copyright © 2004 by Limsoon Wong
Copyright © 2004 by Jinyan Li and Limsoon Wong Single-Test Platform of Microarray & Knowledge Discovery training data collection feature selection Image credit: Affymetrix feature generation feature integration
Conventional Tx: intermediate intensity to all 10% suffers relapse 50% suffers side effects costs US$150m/yr Our optimized Tx: high intensity to 10% intermediate intensity to 40% low intensity to 50% costs US$100m/yr Copyright © 2004 by Jinyan Li and Limsoon Wong High cure rate of 80% Less relapse Less side effects Save US$51.6m/yr Impact
Copyright © 2004 by Limsoon Wong Knowledge Discovery for Predicting Survival of Patients with DLBC Lymphoma Image credit: Rosenwald et al, 2002
Copyright © 2004 by Limsoon Wong Diffuse Large B-Cell Lymphoma DLBC lymphoma is the most common type of lymphoma in adults Can be cured by anthracycline-based chemotherapy in 35 to 40 percent of patients DLBC lymphoma comprises several diseases that differ in responsiveness to chemotherapy Intl Prognostic Index (IPI) –age, “Eastern Cooperative Oncology Group” Performance status, tumor stage, lactate dehydrogenase level, sites of extranodal disease,... Not good for stratifying DLBC lymphoma patients for therapeutic trials Use gene-expression profiles to predict outcome of chemotherapy?
Knowledge Discovery from Gene Expression of “Extreme” Samples “extreme” sample selection knowledge discovery from gene expression 240 samples 80 samples 26 long- term survivors 47 short- term survivors 7399 genes 84 genes T is long-term if S(T) < 0.3 T is short-term if S(T) > 0.7
p-value of log-rank test: < Risk score thresholds: 0.7, 0.5, 0.3 Kaplan-Meier Plot for 80 Test Cases
(A) IPI low, p-value = (B) IPI intermediate, p-value = Improvement Over IPI
(A) W/o sample selection (p =0.38) (B) With sample selection (p=0.009) No clear difference on the overall survival of the 80 samples in the validation group of DLBCL study, if no training sample selection conducted Merit of “Extreme” Samples
Copyright © 2004 by Limsoon Wong Knowledge Discovery for A Few Other Biomedical Applications
Develop systems to recognize protein peptides that bind MHC molecules Develop systems to recognize hot spots in viral antigens Predict Epitopes, Find Vaccine Targets Vaccines are often the only solution for viral diseases Finding & developing effective vaccine targets (epitopes) is slow and expensive process
Dragon’s 10x reduction of TSS recognition false positives Recognize Functional Sites, Help Scientists Effective recognition of initiation, control, & termination of biological processes is crucial to speeding up & focusing scientific expts Data mining of bio seqs to find rules to recognize & understand functional sites
Knowledge extraction system to process free text extract protein names extract interactions Understand Proteins, Fight Diseases Understanding function & role of protein needs organised info on interaction pathways Such info are often reported in scientific paper but are seldom found in structured db
Copyright © 2004 by Limsoon Wong Benefits of Bioinformatics To the patient: –Better drug, better treatment To the pharma: –Save time, save cost, make more $ To the scientist: –Better science
Copyright © 2004 by Limsoon Wong References A. Yeoh et al, “Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling”, Cancer Cell, 1: , 2002 A. Rosenwald et al, “The use of molecular profiling to predict survival after chemotherapy for diffuse large B-cell lymphoma”, NEJM, 346: , 2002 H. Liu et al, “Selection of patient samples and genes for outcome prediction”, Proc. CSB2004, pages
Copyright © 2004 by Limsoon Wong Any Question?
Copyright © 2004 by Limsoon Wong To be presented 10/10/04, am Raffles Convention Centre NHG-IBM Symposium