SDS-Rules and Association Rules March 17, 2004Nicosia, Cyprus Tomáš Karban 1 Jan Rauch 2 Milan Šimůnek 2 1 Charles University, Prague Dept. of Software.

Slides:



Advertisements
Similar presentations
Aggregate Data and Statistics
Advertisements

Groupe de travail athérosclérose 1 STULONG Discovery Challenges Feedback Marie Tomečková EuroMISE – Cardio This work is supported by the project LN00B107.
Martin Ralbovský Jan Rauch KIZI FIS VŠE. Contents Motivation & introduction Graphs of quantifiers Classes of quantifiers, tables of critical frequencies.
Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.
Groupe de travail athérosclérose 1 Project STULONG Project STULONG Some analytical work at EuroMISE and University of Economics, Prague Jan Rauch EuroMISE.
2013/12/10.  The Kendall’s tau correlation is another non- parametric correlation coefficient  Let x 1, …, x n be a sample for random variable x and.
University of Economics, Prague MLNET related activities of Laboratory for Intelligent Systems and Dept. of Information and Knowledge Engineering
Toward Automatic Music Audio Summary Generation from Signal Analysis Seminar „Communications Engineering“ 11. December 2007 Patricia Signé.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
GUHA - a summary 1. GUHA (General Unary Hypotheses Automaton) is a method of automatic generation of hypotheses based on empirical data, thus a method.
GUHA - a summary 1. GUHA (General Unary Hypotheses Automaton) is a method of automatic generation of hypotheses based on empirical data, thus a method.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Similar Sequence Similar Function Charles Yan Spring 2006.
Copyright 2008 Koren ECE666/Koren Part.6a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
BIOSTATISTICS 5.5 MEASURES OF FREQUENCY BIOSTATISTICS TERMINAL OBJECTIVE: 5.5 Prepare a Food Specific Attack Rate Table IAW PEF 5.5.
Quantifying Data.
Graphs, relations and matrices
Computer Organization and Assembly Language: Chapter 7 The Karnaugh Maps September 30, 2013 By Engineer. Bilal Ahmad.
The Normal distributions PSLS chapter 11 © 2009 W.H. Freeman and Company.
SEWEBAR - a Framework for Creating and Dissemination of Analytical Reports from Data Mining Jan Rauch, Milan Šimůnek University of Economics, Prague, Czech.
Time Series Data Analysis - II
Wheeler Lower School Mathematics Program Grades 4-5 Goals: 1.For all students to become mathematically proficient 2.To prepare students for success in.
Multiple Choice Questions for discussion
Ratios,Proportions and Rates MAE Course Measures of frequency The basic tools to describe quantitatively the causes and patterns of disease, or.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
Michelle Koford Summer Topics Discussed Background Purpose Research Questions Methods Participants Procedures Instrumentation Analysis.
Measures of Association
Evaluation of Alternative Methods for Identifying High Collision Concentration Locations Raghavan Srinivasan 1 Craig Lyon 2 Bhagwant Persaud 2 Carol Martell.
Analysis of Death Causes in the STULONG Data Set Jan Burian, Jan Rauch EuroMISE – Cardio University of Economics Prague.
Development in the Ferda project December 2006 Martin Ralbovský.
BUSINESS MARKET RESEARCH
Multivariate Descriptive Research In the previous lecture, we discussed ways to quantify the relationship between two variables when those variables are.
Trend Analysis and Risk Identification 1 The Gerstner laboratory for intelligent decision making and control, Czech Technical University, Prague Lenka.
Modifying Logic of Discovery for Dealing with Domain Knowledge in Data Mining Jan Rauch University of Economics, Prague Czech Republic.
1 Classes of association rules short overview Jan Rauch, Department of Knowledge and Information Engineering University of Economics, Prague.
Barnett/Ziegler/Byleen Finite Mathematics 11e1 Chapter 7 Review Important Terms, Symbols, Concepts 7.1. Logic A proposition is a statement (not a question.
Chapter 4 Probability ©. Sample Space sample space.S The possible outcomes of a random experiment are called the basic outcomes, and the set of all basic.
Analysis of Qualitative Data Dr Azmi Mohd Tamil Dept of Community Health Universiti Kebangsaan Malaysia FK6163.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics: A First Course Fifth Edition.
Relative Values. Statistical Terms n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the data  not sensitive to.
Foundations of Sociological Inquiry Statistical Analysis.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In this chapter we will study the relationship between two categorical variables (variables.
Two-Way Tables Categorical Data. Chapter 4 1.  In this chapter we will study the relationship between two categorical variables (variables whose values.
ECML/PKDD 2003 Discovery Challenge Attribute-Value and First Order Data Mining within the STULONG project Anneleen Van Assche, Sofie Verbaeten,
Chapter 6 Two-Way Tables BPS - 5th Ed.Chapter 61.
Information Retrieval Chapter 2 by Rajendra Akerkar, Pawan Lingras Presented by: Xxxxxx.
1 Mining Episode Rules in STULONG dataset N. Méger 1, C. Leschi 1, N. Lucas 2 & C. Rigotti 1 1 INSA Lyon - LIRIS FRE CNRS Université d’Orsay – LRI.
Chapter 12 Chi-Square Tests and Nonparametric Tests.
1 Georgia Tech, IIC, GVU, 2006 MAGIC Lab Rossignac Lecture 01: Boolean Logic Sections 1.1 and 1.2 Jarek Rossignac.
3.3 More about Contingency Tables Does the explanatory variable really seem to impact the response variable? Is it a strong or weak impact?
Introductory Statistics. Test of Independence Review Hypothesis Testing Checking Requirements & Descriptive Statistics.
Chapter 2. **The frequency distribution is a table which displays how many people fall into each category of a variable such as age, income level, or.
SDS-Rules and Classification Tomáš Karban ECML/PKDD 2003 – Dubrovnik (Cavtat) September 22, 2003.
Introduction Data Statistical Methods Table 1: Prevalence of Prior Hip Fracture and Incidence of New Hip Fractures and Fractures of Any Type.
Introduction to Set Theory (§1.6) A set is a new type of structure, representing an unordered collection (group, plurality) of zero or more distinct (different)
Chapter 11 Chi-Square Tests.
The Shopping Basket Analysis Tool
Lecture #27 Tuesday, November 29, 2016 Textbook: 15.1
Propositional Logic.
Statistics in SPSS Lecture 9
TRUTH TABLES.
Chapter 11 Chi-Square Tests.
The Logic of Declarative Statements
UNIVERSITY OF MASSACHUSETTS Dept
UNIVERSITY OF MASSACHUSETTS Dept
Copyright © Zeph Grunschlag,
Overview Functional Testing Boundary Value Testing (BVT)
Chapter 11 Chi-Square Tests.
Propositional Logic Computational Logic Lecture 2
Generalized Diagnostics with the Non-Axiomatic Reasoning System (NARS)
Presentation transcript:

SDS-Rules and Association Rules March 17, 2004Nicosia, Cyprus Tomáš Karban 1 Jan Rauch 2 Milan Šimůnek 2 1 Charles University, Prague Dept. of Software Engineering 2 University of Economics, Prague Dept. of Information and Knowledge Engineering ACM Symposium on Applied Computing SAC 2004

SDS-Rules and Association Rules2 Agenda Introduction to association rules Motivation of SDS-rules SDS-rules in details SDS quantifiers Disjoint sets Implementation technique Application on medical data Conclusion

SDS-Rules and Association Rules3 Association Rules (1) Express relation between premise (antecedent) and consequence (succedent)     and  are Boolean attributes derived as conjunctions from columns of studied data table (rows = objects)  stands for quantifier – truth condition of association rule based on contingency table of  and  Example: account(low) & salary(low)  90% loan_quality(bad)

SDS-Rules and Association Rules4 Association Rules (2) Contingency table Founded implication Various quantifiers available: implications, double implications, equivalence, statistical hypotheses tests, above/outside average relations, etc.   ab  cd

SDS-Rules and Association Rules5 Motivation of SDS-rules Describe interesting relations between couples of disjoint sets (usually catch their difference) Use similar way, same methods Example: get couples of sets that differ significantly in selected property get all properties that differ on fixed pair of sets combination of both... Motivation comes directly from demands of STULONG project (atherosclerosis risk factors)

SDS-Rules and Association Rules6 SDS-Rules (1) SDS-rules can be understood as an extension to association rules SDS-rules have the form  ( , ,  ) ,  define two disjoint sets A and B  defines some property symbol  stands for SDS-quantifier, which defines relation of two sets in the property 

SDS-Rules and Association Rules7 SDS-Rules (2) Table of frequencies is extended to six-fold (called “SDS-table”)   ab  cd (  )(  ) ef first set second set outside both sets

SDS-Rules and Association Rules8 Asymmetric Multiplicative Difference Quantifier the first set contains at least k-times more percent of objects with the property  than the second set both sets have size bigger than Base

SDS-Rules and Association Rules9 Symmetric Additive Difference Quantifier the percentage of the objects with the property  differs between the first and the second set at least by p both sets have size bigger than Base

SDS-Rules and Association Rules10 Disjoint sets Empty intersection of sets can be arranged syntactically by forcing common attribute to  and  Coefficients (i.e. values of the attribute) of common attribute are disjoint  sets are disjoint Example: account(low) & salary(mid) salary(low) & sex(male)

SDS-Rules and Association Rules11 Implementation Technique Data representation – bit strings for every value of every attribute being used Bit string length = number of objects in data table Value “1” in the position i of bit string s(x) = object i has value x for the attribute s Fast operations on bit strings – AND, OR, NOT Building bit strings for the first set, the second set and for studied property  Calculation of SDS-table – counting of “1” in bit strings Truth value of SDS-rule – expression on frequencies from SDS-table Memory conservative

SDS-Rules and Association Rules12 Application on Medical Data STULONG project (“longitudinal study”) studied prevalence of risk factors of atherosclerosis  1400 middle-aged men detailed entry examination, 20 years of checkups Among many other analytical questions: Are there strong relations concerning entry examination and the cause of death? Are there differences in entry examination between men of the risk group, who came down with observed cardiovascular disease (during control examinations) and those who stayed healthy?

SDS-Rules and Association Rules13 Results (1) If we compare the group of patients, who are divorced, have reached apprentice school education and have other responsibility in their jobs, with the second group of patients, who are already pensioners, there is a 53.8% difference in the presence of other cause of death.

SDS-Rules and Association Rules14 Results (1)

SDS-Rules and Association Rules15 Results (2) If we compare the group of patients, who came down with some cardiovascular disease during the control checks, with those, who stayed healthy, we see that in the second group there were 3.97% more patients working in a managerial position.

SDS-Rules and Association Rules16 Results (2)

SDS-Rules and Association Rules17 Conclusion A new method of describing potentially interesting patterns by SDS-rules was described Method was inspired by and applied on medical data, other application domains can surely benefit as well Method is computationally effective Drawback – results are usually large and SDS-rules produced are similar in certain domains (“nuggets”) additional software tool for “online result browsing” Development of statistical SDS-quantifiers is in progress