Download presentation
Presentation is loading. Please wait.
Published byDwain Farmer Modified over 9 years ago
1
1 Mining Episode Rules in STULONG dataset N. Méger 1, C. Leschi 1, N. Lucas 2 & C. Rigotti 1 1 INSA Lyon - LIRIS FRE CNRS 2672 2 Université d’Orsay – LRI CNRS UMR 8623 This work has been partially funded by the European Project AEGIS (IST-2000-26450).
2
2 Content Motivation About WinMiner Data Mining Effort Conclusion
3
3 Motivation : Data STULONG Data : A 20 year longitudinal study of risk factors related to atherosclerosis in a population of middle-aged men Tables ENTRY and CONTROL: –1216 patients described by: Identification and social characteristics Behavior Health events Physical and biochemical examinations –From 1 up to 21 control per patients A sequence of controls for each patient
4
4 Motivation: Medical issues identified risks factors no treatment available necessity to consider a global risk instead of concentrating prevention efforts on individual ones risk comportments dramatically increases cardio-vascular disease emergence, but no one knows when Relations between risk factors and clinical demonstration of atherosclerosis? Time intervals over which these relations are valid?
5
5 Motivation: WinMiner WinMiner: a single optimised way to find sequential patterns in data along with their optimal time intervals, under user constraints WinMiner suggests to experts possible temporal dependencies among occurrences of event types WinMiner outputs "small" collections of sequential patterns
6
6 About WinMiner Mining context large event sequences episode & episode rules AB ABC ABC
7
7 About WinMiner Selecting patterns support: how many times an episode/episode rule occurs within an event sequence? A B A B C confidence: what is the probability of the RHS of an episode rule to occur knowing that its LHS already occured? A B C patterns are selected using: –a minimum support threshold –a minimum confidence threshold
8
8 About WinMiner Selecting the optimal window span confidence w minimum confidence window span such that the episode rule is frequent First Local Maximum (FLM) C1 C2 C2 <= C1 - C1*decRate optimal window span
9
9 About WinMiner WinMiner : –checks all possible episode rules satisfying to frequency and confidence thresholds –outputs only the FLM-rules, along with their respective optimal window sizes –uses a maximal gap constraint
10
10 DM effort: Aims Give to the medical expert: a mean to follow both the evolution of risk factors and: (1) impact of medical intervention (2) modifications in patients’ behavior in addition: –significant time periods of observation –frequency –probability
11
11 DM effort: Data preprocessing Mainly focused on table CONTROL (1226 patients/10572 examinations) Joint operations to export information from table ENTRY Categorization of some factors Choice of relevant factors according to: –Medical expertise –Mining approach Table Contr_Mod_2
12
12 DM Effort: Data preprocessing Important factors (according to medical experts): –cholesterol – hypertension –smoking –physical activity –age –diabetes –alcohol consumption –BMI –family anamnesis –level of education
13
13 DM Effort: Data preprocessing Contr_mod_2 large event sequence For each patient: a subsequence containing all his control examinations Coding guarantees that events corresponding to 2 different patients can not be associated in the same episode rule Large event sequence: concatenation of all sub sequences constructed for patients.
14
14 DM effort: Results Examples: –"If the patient has no hypercholesterolemia, and if he sometimes follows his diet, then the patient has no hypercholesterolemia with a probability of 0.8 within 40 months. This rule is supported by 201 examples in the event sequence." –" If one eats less of fats and carbohydrates and he has claudication observed some time later, then this claudication does not disappear with a probability of 0.8 over 30 months. This rule is supported by 21 examples. "
15
15 DM effort: Results Well known phenomena : –indication about correctness in pre-processing as well as in mining data Added-value: suggestion concerning their temporal aspects To be expected: –with new data and new risk factors put in evidence in the last decade, discovering new phenomena along with their optimal window sizes
16
16 Conclusion With STULONG data: Searching for temporal dependencies between atherosclerosis risk factors and clinical demonstration of atherosclerosis that have an optimal interval/window size Offers to the medical expert a possibility to explicit impact of a risk factor and to refine its part in comparison with other ones within a time interval A few episode rules obtained, that allows experts to manually analyse the outputs Could be applied to other medical data sets to help in finding unknown phenomena New perspectives both for data miners and physicians
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.