Download presentation
Presentation is loading. Please wait.
Published byHanna Mäkelä Modified over 5 years ago
1
Discovering Constrained Association Rules to Predict Heart Disease
Carlos Ordonez*, Edward Omiecinski, Levien de Braal, Cesar Santana, et.al. Georgia Tech Emory University *working for Teradata (NCR) IEEE ICDM 2001
2
Motivation Goal: help in heart disease diagnosis
Basic Data Mining technique Similar to expert system rules Combinatorial: causes=>disease Simplicity: easy to interpret Privacy preserving Reliability: having two statistical measures
3
Medical data issues Rich attribute types. Attributes must be transformed into binary. Small data set size, n=655 patients Noisy, there exist many missing values. Errors in data collection. Naive approach: thousands/millions of associations and rules. Negation makes problem worse
4
Good rules: IF Age>=70, Smokes=Y, Gender=M THEN RCA>=50 s=0.4 c=1 IF Gender=F, Age<70 THEN LAD>=70 s=0.2 c=1.0 IF Gender=M, Age<70 THEN RCA<50 Bad rules: IF Age>=70 THEN Smokes=Y IF LAD>=70 THEN RCA>=50, IF Gender=M,Age>=60,Smokes=Y THEN LAD,RCA
5
Algorithm overview Map attributes to items
Mine association rules (A-priori) Phase 1: generate frequent associations above minimum support Phase 2: generate rules with minimum confidence
6
Mapping attributes to binary data
Uniformly treat as categorical or numerical Manual: ranges are determined by MD Each categorical value becomes an item Each numerical range becomes an item. Missing info handling simplified Each value/range can be negated
7
Important constraints
Max rule size: simplicity. Phase 1 faster. A: Antecedent, C: Consequent. Medically meaningful. Phase 2 faster. G: Group constraint: eliminate trivial or irrelevant associations. Phase 1 and 2 faster. Negation: more combinations Support= 2/n
8
Medical attributes
9
Experimental results Minimum support frequency: 2 Max rule size: 4
Time: 12 minutes Associations: 36, % of time Rules: 2, % of time.
10
Medical significance Specificity Sensitivity
Gold standard: catheterization
11
Usage of rules Confirming knowledge. Used to validate Expert System IF-THEN rules Discovering knowledge. Surprising to domain expert. Distinguish healthy and sick patients
12
Rules predicting no heart disease
IF Sex=F THEN 0<=LCX<50, s=22% c=73% IF Smokes=N THEN not(70<=RCA<100.1), s=29% c=71% IF Age<40,Diab=N THEN 0<=0 LAD<50, s=2% c=82% IF 40<=Age<60,Sex=F,Diab=N THEN RCA<50, s=7% c=80%
13
Rules predicting heart disease
IF 0.2<=AP<1.1,PCarSur=Y THEN not(LAD<50) not(RCA<50), s=1% c=80% IF 60<Age, 0.2<=AP<1.1,Smokes=Y THEN not(LAD<50) s=10% c=83% IF 60<Age, 0.2<=SA<1.1,FHCAD=Y THEN not(LAD<50) s=2% c=100% IF 60<Age, 0.2<=AP<1.1,Sex=F THEN not(LAD<50) s=5% c=94%
14
Conclusions Mapping attributes is required Constraining is essential
Some of the findings were unexpected Future work: find more useful constraints, finer ranges, improve missing info handling, validate by clustering and decision trees
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.