1 Mining Episode Rules in STULONG dataset N. Méger 1, C. Leschi 1, N. Lucas 2 & C. Rigotti 1 1 INSA Lyon - LIRIS FRE CNRS 2672 2 Université d’Orsay – LRI.

Slides:



Advertisements
Similar presentations
Andrea M. Landis, PhD, RN UW LEAH
Advertisements

Standardized Scales.
Medical Statistics Joan Morris Professor of Medical Statistics Goldsmiths Lecture 2014.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Plant stanol ester in the treatment and prevention guidelines 0.
Marge Koepping, MN, FNP, BC-ADM, CDE Warm Springs Model Diabetes Program Warm Springs SDPI Diabetes Prevention Program.
EVIDENCE BASED MEDICINE for Beginners
AtherEx: an Expert System for Atherosclerosis Risk Assessment Petr Berka, Vladimír Laš University of Economics, Prague Marie Tomečková Institute of Computer.
PEBB Disease Burden Report PEBB Board of Directors August 21, 2007 Bdattach.10.
1 Lauren E. Finn, 2 Seth Sheffler-Collins, MPH, 2 Marcelo Fernandez-Viña, MPH, 2 Claire Newbern, PhD, 1 Dr. Alison Evans, ScD., 1 Drexel University School.
Journal Club Alcohol and Health: Current Evidence March-April 2007.
Presentation Package for Concepts of Physical Fitness 14e
Temporal Pattern Matching of Moving Objects for Location-Based Service GDM Ronald Treur14 October 2003.
Data Heterogeneity Study (Not Data Quality) (OR) “Type 2 Diabetes: A modern day St. Valentine’s Day Massacre” Feb.14, 2011.
Journal Club Alcohol and Health: Current Evidence March-April 2006.
Journal Club Alcohol, Other Drugs, and Health: Current Evidence January–February 2011.
Cohort Studies.
Purposes and uses of cancer registration E.E.U. Akang Department of Pathology University College Hospital Ibadan, Nigeria.
Prescreening ä To optimize safety ä To permit the development of a sound and effective exercise prescription.
Stefan Schulz, Thorsten Seddig, Susanne Hanser, Albrecht Zaiß, Philipp Daumke Checking coding completeness by mining discharge summaries.
C-REACTIVE PROTEIN, FIBRINOGEN, AND CARDIOVASCULAR DISEASE PREDICTION By Patrick Whitledge PA-S2 South University Physician Assistant Program.
Gender-based health and weight loss beliefs in knee osteoarthritis patients.
Biology in Focus, HSC Course Glenda Childrawi, Margaret Robson and Stephanie Hollis A Search For Better Health Topic 11: Epidemiology.
Codex Guidelines for the Application of HACCP
HEALTHY EATING And LIVING Kenneth E. Nixon MD. Problem Overweight and Obesity 97 million adults are overweight or obese Medical Problems Associated with.
Multiple Choice Questions for discussion
Sugar, Heart and Life: A Guide to Living with Diabetes Stephen Spann, MD Department of Family and Community Medicine.
The effects of initial and subsequent adiposity status on diabetes mellitus Speaker: Qingtao Meng. MD West China hospital, Chendu, China.
Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.
Self-reported cognitive and emotional effects and lifestyle changes shortly after preventive cardiovascular consultations in general practice Dea Kehler.
Nut consumption and diseases 實習生:張瀞文 指導老師:蕭佩珍營養師 1.
Michelle Koford Summer Topics Discussed Background Purpose Research Questions Methods Participants Procedures Instrumentation Analysis.
Study Designs Afshin Ostovar Bushehr University of Medical Sciences Bushehr, /4/20151.
Management of Elevated Cholesterol in the Primary Prevention Group of Adult Japanese (MEGA) Trial MEGA Trial Presented at The American Heart Association.
PERIODIC MEDICAL EXAMINATION BY DR. ANGELA ESOIMEME MBBS, MPH, FWACGP.
A three-step approach for STULONG database analysis: characterization of patients’ groups O. Couturier, H. Delalin, H. Fu, E. Kouamou, E. Mephu Nguifo.
HASAR : Mining Sequential Association Rules for Atherosclerosis Risk Factor Analysis Laurent Brisson, Nicolas Pasquier, Céline Hebert, Martine Collard.
Mrs. Watcharasa Pitug ID The Association between Waist Circumference and Renal Insufficiency among Hypertensive Patients 15/10/58 1.
Mrs. Watcharasa Pitug ID The Association between Waist Circumference and Renal Insufficiency among Hypertensive Patients 20/10/58 1.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
HW#2: A Strategy for Mining Association Rules Continuously in POS Scanner Data.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
END Obesity Dr Gul Bano © S Nussey. What is obesity?
Basic Nursing: Foundations of Skills & Concepts Chapter 9
Mrs. Watcharasa Pitug ID The Association between Waist-to-Hight ratio, waist circumference,and Body Mass Index as Risk Factors for Chronic.
1 Impact of Implementing Designed Nursing Intervention Protocol on Clinical Outcome of Patient with Peptic Ulcer By Amal Mohamed Ahmad Assistant Professor,
Course Title: Using Epi Info™ 7 Using Classic Analysis (Continuation) April Epi Info™ 7 Training Software for Public Health Epi Info™ 7 Training.
Background, Philosophical Basis and Principles of Behavior.
NCD-Related Lifestyle Patterns from Childhood to Adolescence: A 13 Years Longitudinal Study Yi-Han Chang a, Lee-Lan Yen a,b, Hsing-Yi Chang b, Chi-Chen.
18 February 2003Mathias Creutz 1 T Seminar: Discovery of frequent episodes in event sequences Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Discovery Challenge – ECML/PKDD2004 September 20, 2004, Pisa, Italy Atherosclerosis Marie Tomečková EuroMISE Centre – Cardio Institute of Computer Science,
Peripheral Artery Disease in Orthopaedic Patients with Asymptomatic Popliteal Artery Calcification on Plain X-ray Adam Podet, MS; Julia Volaufova, phD,;
The Management of People at High Risk of CVD Dr Richard Healicon Mel Varvel NHS Improvement.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Case control & cohort studies
© 2010 Jones and Bartlett Publishers, LLC. Chapter 12 Clinical Epidemiology.
Chapter 7: Epidemiology of Chronic Diseases. “The Change You Like to See….” (1 of 3) Chronic diseases result from prolongation of acute illness. – With.
By: Dr Khalid El Tohami INTRODUCTION TO PUBLIC HEALTH AND EPIDEMIOLOGY (1)
SDS-Rules and Classification Tomáš Karban ECML/PKDD 2003 – Dubrovnik (Cavtat) September 22, 2003.
© 2010 Jones and Bartlett Publishers, LLC. Chapter 5 Descriptive Epidemiology According to Person, Place, and Time.
Current research suggests that older adults will benefit from increasing their whole grain consumption. An emphasis on whole grain intake is presented.
Descriptive Epidemiology According to Person, Place, and Time
Physician Performance Measures: Like It Or Not?
Evidence-based Medicine
Mrs. Watcharasa Pitug ID
Copyright © 2009 American Medical Association. All rights reserved.
9. Introduction to signal detection
Some Epidemiological Studies
Seminole County H.O.P.E. Partnership between KAD Foundation and the Casselberry Senior Center Serving Hispanic Seniors 55+ throughout the County Community.
Presentation transcript:

1 Mining Episode Rules in STULONG dataset N. Méger 1, C. Leschi 1, N. Lucas 2 & C. Rigotti 1 1 INSA Lyon - LIRIS FRE CNRS Université d’Orsay – LRI CNRS UMR 8623 This work has been partially funded by the European Project AEGIS (IST ).

2 Content Motivation About WinMiner Data Mining Effort Conclusion

3 Motivation : Data STULONG Data : A 20 year longitudinal study of risk factors related to atherosclerosis in a population of middle-aged men Tables ENTRY and CONTROL: –1216 patients described by: Identification and social characteristics Behavior Health events Physical and biochemical examinations –From 1 up to 21 control per patients  A sequence of controls for each patient

4 Motivation: Medical issues identified risks factors no treatment available necessity to consider a global risk instead of concentrating prevention efforts on individual ones risk comportments dramatically increases cardio-vascular disease emergence, but no one knows when  Relations between risk factors and clinical demonstration of atherosclerosis?  Time intervals over which these relations are valid?

5 Motivation: WinMiner WinMiner: a single optimised way to find sequential patterns in data along with their optimal time intervals, under user constraints WinMiner suggests to experts possible temporal dependencies among occurrences of event types WinMiner outputs "small" collections of sequential patterns

6 About WinMiner Mining context large event sequences episode & episode rules AB ABC ABC

7 About WinMiner Selecting patterns support: how many times an episode/episode rule occurs within an event sequence? A  B A  B  C confidence: what is the probability of the RHS of an episode rule to occur knowing that its LHS already occured? A  B  C patterns are selected using: –a minimum support threshold –a minimum confidence threshold

8 About WinMiner Selecting the optimal window span confidence w minimum confidence window span such that the episode rule is frequent First Local Maximum (FLM) C1 C2 C2 <= C1 - C1*decRate optimal window span

9 About WinMiner WinMiner : –checks all possible episode rules satisfying to frequency and confidence thresholds –outputs only the FLM-rules, along with their respective optimal window sizes –uses a maximal gap constraint

10 DM effort: Aims Give to the medical expert: a mean to follow both the evolution of risk factors and: (1) impact of medical intervention (2) modifications in patients’ behavior in addition: –significant time periods of observation –frequency –probability

11 DM effort: Data preprocessing Mainly focused on table CONTROL (1226 patients/10572 examinations) Joint operations to export information from table ENTRY Categorization of some factors Choice of relevant factors according to: –Medical expertise –Mining approach  Table Contr_Mod_2

12 DM Effort: Data preprocessing Important factors (according to medical experts): –cholesterol – hypertension –smoking –physical activity –age –diabetes –alcohol consumption –BMI –family anamnesis –level of education

13 DM Effort: Data preprocessing Contr_mod_2  large event sequence For each patient: a subsequence containing all his control examinations Coding guarantees that events corresponding to 2 different patients can not be associated in the same episode rule Large event sequence: concatenation of all sub sequences constructed for patients.

14 DM effort: Results Examples: –"If the patient has no hypercholesterolemia, and if he sometimes follows his diet, then the patient has no hypercholesterolemia with a probability of 0.8 within 40 months. This rule is supported by 201 examples in the event sequence." –" If one eats less of fats and carbohydrates and he has claudication observed some time later, then this claudication does not disappear with a probability of 0.8 over 30 months. This rule is supported by 21 examples. "

15 DM effort: Results Well known phenomena : –indication about correctness in pre-processing as well as in mining data Added-value: suggestion concerning their temporal aspects To be expected: –with new data and new risk factors put in evidence in the last decade, discovering new phenomena along with their optimal window sizes

16 Conclusion With STULONG data: Searching for temporal dependencies between atherosclerosis risk factors and clinical demonstration of atherosclerosis that have an optimal interval/window size Offers to the medical expert a possibility to explicit impact of a risk factor and to refine its part in comparison with other ones within a time interval A few episode rules obtained, that allows experts to manually analyse the outputs Could be applied to other medical data sets to help in finding unknown phenomena  New perspectives both for data miners and physicians