Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discovering Constrained Association Rules to Predict Heart Disease

Similar presentations


Presentation on theme: "Discovering Constrained Association Rules to Predict Heart Disease"— Presentation transcript:

1 Discovering Constrained Association Rules to Predict Heart Disease
Carlos Ordonez*, Edward Omiecinski, Levien de Braal, Cesar Santana, et.al. Georgia Tech Emory University *working for Teradata (NCR) IEEE ICDM 2001

2 Motivation Goal: help in heart disease diagnosis
Basic Data Mining technique Similar to expert system rules Combinatorial: causes=>disease Simplicity: easy to interpret Privacy preserving Reliability: having two statistical measures

3 Medical data issues Rich attribute types. Attributes must be transformed into binary. Small data set size, n=655 patients Noisy, there exist many missing values. Errors in data collection. Naive approach: thousands/millions of associations and rules. Negation makes problem worse

4 Good rules: IF Age>=70, Smokes=Y, Gender=M THEN RCA>=50 s=0.4 c=1 IF Gender=F, Age<70 THEN LAD>=70 s=0.2 c=1.0 IF Gender=M, Age<70 THEN RCA<50 Bad rules: IF Age>=70 THEN Smokes=Y IF LAD>=70 THEN RCA>=50, IF Gender=M,Age>=60,Smokes=Y THEN LAD,RCA

5 Algorithm overview Map attributes to items
Mine association rules (A-priori) Phase 1: generate frequent associations above minimum support Phase 2: generate rules with minimum confidence

6 Mapping attributes to binary data
Uniformly treat as categorical or numerical Manual: ranges are determined by MD Each categorical value becomes an item Each numerical range becomes an item. Missing info handling simplified Each value/range can be negated

7 Important constraints
Max rule size: simplicity. Phase 1 faster. A: Antecedent, C: Consequent. Medically meaningful. Phase 2 faster. G: Group constraint: eliminate trivial or irrelevant associations. Phase 1 and 2 faster. Negation: more combinations Support= 2/n

8 Medical attributes

9 Experimental results Minimum support frequency: 2 Max rule size: 4
Time: 12 minutes Associations: 36, % of time Rules: 2, % of time.

10 Medical significance Specificity Sensitivity
Gold standard: catheterization

11 Usage of rules Confirming knowledge. Used to validate Expert System IF-THEN rules Discovering knowledge. Surprising to domain expert. Distinguish healthy and sick patients

12 Rules predicting no heart disease
IF Sex=F THEN 0<=LCX<50, s=22% c=73% IF Smokes=N THEN not(70<=RCA<100.1), s=29% c=71% IF Age<40,Diab=N THEN 0<=0 LAD<50, s=2% c=82% IF 40<=Age<60,Sex=F,Diab=N THEN RCA<50, s=7% c=80%

13 Rules predicting heart disease
IF 0.2<=AP<1.1,PCarSur=Y THEN not(LAD<50) not(RCA<50), s=1% c=80% IF 60<Age, 0.2<=AP<1.1,Smokes=Y THEN not(LAD<50) s=10% c=83% IF 60<Age, 0.2<=SA<1.1,FHCAD=Y THEN not(LAD<50) s=2% c=100% IF 60<Age, 0.2<=AP<1.1,Sex=F THEN not(LAD<50) s=5% c=94%

14 Conclusions Mapping attributes is required Constraining is essential
Some of the findings were unexpected Future work: find more useful constraints, finer ranges, improve missing info handling, validate by clustering and decision trees


Download ppt "Discovering Constrained Association Rules to Predict Heart Disease"

Similar presentations


Ads by Google