Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

Intelligent Systems On the efficacy of Occam's Razor as a model selection criterion for classification learning Geoff.
A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I.
Part II: Discriminative Margin Clustering Joint work with: Rob Tibshirani, Dept of Statistics Patrick O. Brown, School of Medicine Stanford University.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Classification: Support Vector Machine 10/10/07. What hyperplane (line) can separate the two classes of data?
Clustering (Part II) 10/07/09. Outline Affinity propagation Quality evaluation.
Functions of Several Variables Copyright © Cengage Learning. All rights reserved.
Diagnosis of Ovarian Cancer Based on Mass Spectrum of Blood Samples Committee: Eugene Fink Lihua Li Dmitry B. Goldgof Hong Tang.
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Copyright  2003 limsoon wong Diagnosis of Childhood Acute Lymphoblastic Leukemia and Optimization of Risk-Benefit Ratio of Therapy Limsoon Wong Institute.
Chapter 9: Introduction to the t statistic
Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.
Chapter 5 Data mining : A Closer Look.
A hybrid method for gene selection in microarray datasets Yungho Leu, Chien-Pan Lee and Ai-Chen Chang National Taiwan University of Science and Technology.
Classification of multiple cancer types by multicategory support vector machines using gene expression data.
Knowledge Discovery in Biomedicine Limsoon Wong Institute for Infocomm Research.
Copyright  2003 limsoon wong Data Mining of Gene Expression Profiles for the Diagnosis and Understanding of Diseases Limsoon Wong Institute for Infocomm.
Molecular Diagnosis Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Selection of Patient Samples and Genes for Disease Prognosis Limsoon Wong Institute for Infocomm Research Joint work with Jinyan Li & Huiqing Liu.
Using Emerging Patterns to Analyze Gene Expression Data Jinyan Li BioComputing Group Knowledge & Discovery Program Laboratories for Information Technology.
Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006.
Copyright  2004 limsoon wong A Practical Introduction to Bioinformatics Limsoon Wong Institute for Infocomm Research Lecture 3, May 2004 For written notes.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
The Broad Institute of MIT and Harvard Differential Analysis.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Copyright  2004 limsoon wong Using WEKA for Classification (without feature selection)
Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong For written notes on this lecture, please read chapter 14 of The Practical Bioinformatician, CS2220:
Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong For written notes on this lecture, please read chapter 3 of The Practical Bioinformatician, CS2220:
Copyright © 2011 Pearson, Inc. 9.4 Day 1 Sequences Goals: Find limits of convergent sequences.
Copyright  2004 limsoon wong CS2220: Computation Foundation in Bioinformatics Limsoon Wong Institute for Infocomm Research Lecture slides for 13 January.
Solving the Fragmentation Problem of Decision Trees by Discovering Boundary Emerging Patterns Jinyan Li and Limsoon Wong Speaker: Sarah Chan CSIS DB Seminar.
Date of download: 5/29/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Gene Expression Signatures, Clinicopathological Features,
Classification Using Top Scoring Pair Based Methods Tina Gui.
 We investigated for biomarkers that distinguish metastatic or recurring disease with non-metastatic disease, with a particular focus on breast cancer.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Homozygous deletions within chromosome 9q23.
Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.
Slides for KDD07 Mining statistically important equivalence classes and delta-discriminative emerging patterns Jinyan Li School of Computer Engineering.
Date of download: 10/13/2017 Copyright © ASME. All rights reserved.
Heping Zhang, Chang-Yung Yu, Burton Singer, Momian Xiong
Gene Expression Classification
Balancing selection characterizes both MAE genes and cell type-specific genes (expressed in either nsc or asl but not in both). Balancing selection characterizes.
Volume 1, Issue 2, Pages (March 2002)
Genetic Profiling of BRAF Inhibitor–Induced Keratoacanthomas Reveals No Induction of MAP Kinase Pathway Expression  Rajan P. Kulkarni, Seema Plaisier,
Alternative promoters in gastric cancer (GC).
Global approach to the diagnosis of leukemia using gene expression profiling by Torsten Haferlach, Alexander Kohlmann, Susanne Schnittger, Martin Dugas,
Multiple Decision Trees ISQS7342
Tumor intrinsic subtype is reflected in cancer-adjacent tissue.
Volume 135, Issue 6, (December 2008)
Functions of Several Variables
Altered Caspase-8 Expression
Decreased expression of cell cycle– and apoptosis-related genes in Aml1-excised leukemia cell. Decreased expression of cell cycle– and apoptosis-related.
Comparison ofMyc-induced zebrafish liver tumors with different stages of human HCC and seven mouse HCC models. Comparison ofMyc-induced zebrafish liver.
Functional classification and visualization of differentially expressed genes. Functional classification and visualization of differentially expressed.
CPPED1 (A) and PPARγ2 (B) mRNA expressions in cultured SGBS cells during adipocyte differentiation. CPPED1 (A) and PPARγ2 (B) mRNA expressions in cultured.
Effect of using β-actin as a denominator of IL-2 BAL fluid cell mRNA levels. Effect of using β-actin as a denominator of IL-2 BAL fluid cell mRNA levels.
M-Wnt and E-Wnt cells cluster tightly with claudin-low and basal-like breast tumors, respectively, by microarray analysis. M-Wnt and E-Wnt cells cluster.
VEGF expression in neoplastic and normal prostate tissue.
Overview of TIMER modules on the website.
EN1 expression in breast cancer and clinical outcome.
Clustering analysis of DTC-associated genes.
High-risk neuroblastoma molecular subtypes classification and inference of master regulators. High-risk neuroblastoma molecular subtypes classification.
A and B, linearity of the preamplification step shown by a similar expression pattern of ERα mRNA in four breast tumor samples pre– and post–linear amplification.
T-cell expression of second-generation CARs
Volume 28, Issue 3, Pages e7 (July 2019)
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Subtype classification of breast functional screening results.
Christopher G. Abraham, Joaquín M. Espinosa  Cancer Cell 
Presentation transcript:

Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong

Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Part 4: Interesting Rules and Patterns

Copyright © 2004 by Jinyan Li and Limsoon Wong Outline Some interesting decision trees Performance of CS4 Demo

Copyright © 2004 by Jinyan Li and Limsoon Wong Some Interesting Decision Trees

Decision Tree on a Prostate Data Set Singh et.al, Cancer Cell 1: , instances 52 tumor samples 50 normal samples ~12,500 numeric features –Each one represents a gene (or probe) –Its value is expression level of that gene Copyright © 2004 by Jinyan Li and Limsoon Wong

32598_at 40707_at 33886_at Tumor Normal <=29>29 <= 10 > 10 <= -6 > -6 > _at Normal <=5 3(+1) 6 C4.5 Tree Copyright © 2004 by Jinyan Li and Limsoon Wong

Rule Translation The tree can be translated into 5 rules Two of them are significant rules, but the rest three are trivial The two significant rules dominate in the two classes: normal class and tumor class 32598_at 40707_at 33886_at Tumor Normal 34950_at Normal Copyright © 2004 by Jinyan Li and Limsoon Wong

32598_at 40707_at 33886_at Tumor Normal 34950_at Normal Copyright © 2004 by Jinyan Li and Limsoon Wong Significance of the Rules Two significant rules –If x <= 29 and y <=10 and z <= 5, then this is a tumor cell (94%), where x, y, z represent 32598_at, 33886_at, 34950_at respectively –If x > 29 and 40707_at > - 6, then this is a normal cell (82%) Three trivial rules: 12%, 6%, 6%

Another Gene Expression Data Set Yeoh et al., Cancer Cell 1: , 2002 Differentiating MLL subtype from other subtypes of childhood leukemia Training data –14 MLL vs 201 others Test data –6 MLL vs 106 others Number of features –12558 Copyright © 2004 by Jinyan Li and Limsoon Wong

4 mistakes on test data The Decision Tree Copyright © 2004 by Jinyan Li and Limsoon Wong

Given a test sample, at most 3 of the 4 genes’ expression values are needed to make a decision! Translating the Tree into a Mathematical Function

Copyright © 2004 by Jinyan Li and Limsoon Wong Performance of CS4

Copyright © 2004 by Jinyan Li and Limsoon Wong Four Points to Demonstrate Whether top-ranked features have similar gain ratios Whether cascading trees have similar training performance Whether the trees have similar structure Whether the expanding tree committees can reduce the test errors gradually

Copyright © 2004 by Jinyan Li and Limsoon Wong For differentiation between the subtype Hyperdip>50 and some other subtypes of childhood leukemia An Example

Copyright © 2004 by Jinyan Li and Limsoon Wong Gain Ratios of Top 20 features Gain ratios are: 0.39, 0.36, 0.35, 0.33, 0.33, 0.33, 0.33, 0.32, 0.31, 0.30; 0.30, 0.30, 0.30, 0.29, 0.29, 0.28, 0.28, 0.28, 0.28, The difference between the 1st and the 20th is only In fact, the two features’ partitionings differ in a few samples

Copyright © 2004 by Jinyan Li and Limsoon Wong Training and Test Performance

Copyright © 2004 by Jinyan Li and Limsoon Wong Two Observations The first tree does not always have the best performance Alternative trees rooted by other top-ranked features may have better performance than the first tree

The Power of Committee Copyright © 2004 by Jinyan Li and Limsoon Wong

Compared to Bagging & Boosting Bagging made similar number of mistakes: 2 mistakes However, Boosting made 13 mistakes

Copyright © 2004 by Jinyan Li and Limsoon Wong Demo