1 Finding Fuzzy Approximate Dependencies within STULONG Data Discovery Challenge, ECML/PKDD 2003 September 22-27, 2003 Berzal F., Cubero J.C., Sanchez.

Slides:



Advertisements
Similar presentations
Classroom Bill of Rights
Advertisements

A hybrid model of automatic indexing based on paraconsistent logic Carlos Alberto Correa (University of São Paulo) Nair Yumiko Kobashi (University of São.
Reagan/Clinton Compare/Contrast Essay. I. Reagan and Clinton both argue... but... A.Reagan argues B.Clinton argues.
Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,
Sampling: Final and Initial Sample Size Determination
B. Ross Cosc 4f79 1 Uncertainty Knowledge can have uncertainty associated with it - Knowledge base: rule premises, rule conclusions - User input: uncertain,
Chapter 5 Fuzzy Number.
Forecasting based on creeping trend with harmonic weights Creeping trend can be used if variable changes irregularly in time. We use OLS to estimate parameters.
1 CLUSTERING  Basic Concepts In clustering or unsupervised learning no training data, with class labeling, are available. The goal becomes: Group the.
CHAPTER 8 ESTIMATION OF THE MEAN AND PROPORTION Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved.
Chapter 5 Orthogonality
PART 2 Fuzzy sets vs crisp sets
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
I.1 ii.2 iii.3 iv.4 1+1=. i.1 ii.2 iii.3 iv.4 1+1=
2. Point and interval estimation Introduction Properties of estimators Finite sample size Asymptotic properties Construction methods Method of moments.
Lecture 05 Rule-based Uncertain Reasoning
I.1 ii.2 iii.3 iv.4 1+1=. i.1 ii.2 iii.3 iv.4 1+1=
AP Statistics Section 10.2 A CI for Population Mean When is Unknown.
Testing the Difference Between Means (Small Independent Samples)

Chapter 10, sections 1 and 4 Two-sample Hypothesis Testing Test hypotheses for the difference between two independent population means ( standard deviations.
Sampling Designs Avery and Burkhart, Chapter 3 Source: J. Hollenbeck.
Hypothesis Testing :The Difference between two population mean :
Confidence Interval A confidence interval (or interval estimate) is a range (or an interval) of values used to estimate the true value of a population.
Roman Numerals. The Numbers I-1 II-2 III-3 IV-4 V-5 VI-6 VII-7 VIII-8 IX-9 X-10 C-100 D-500 M-1000.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Presented by Johanna Lind and Anna Schurba Facility Location Planning using the Analytic Hierarchy Process Specialisation Seminar „Facility Location Planning“
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Trend Analysis in Stulong Data The Gerstner laboratory for intelligent decision making and control Jiří Kléma, Lenka Nováková, Filip Karel, Olga Štěpánková.
Combined Uncertainty P M V Subbarao Professor Mechanical Engineering Department A Model for Propagation of Uncertainty ….
1ECML / PKDD 2004 Discovery Challenge Mining Strong Associations and Exceptions in the STULONG Data Set Eduardo Corrêa Gonçalves and Alexandre Plastino.
A three-step approach for STULONG database analysis: characterization of patients’ groups O. Couturier, H. Delalin, H. Fu, E. Kouamou, E. Mephu Nguifo.
Trend Analysis and Risk Identification 1 The Gerstner laboratory for intelligent decision making and control, Czech Technical University, Prague Lenka.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
A Probabilistic Quantifier Fuzzification Mechanism: The Model and Its Evaluation for Information Retrieval Felix Díaz-Hemida, David E. Losada, Alberto.
Pattern Discovery of Fuzzy Time Series for Financial Prediction -IEEE Transaction of Knowledge and Data Engineering Presented by Hong Yancheng For COMP630P,
Velocity and Other Rates of Change Notes: DERIVATIVES.
Tests of Hypotheses Involving Two Populations Tests for the Differences of Means Comparison of two means: and The method of comparison depends on.
FUZZY LOGIC INFORMATION RETRIEVAL MODEL Ferddie Quiroz Canlas, ME-CoE.
FORSVARETS FORSKNINGSINSTITUTT Transformation of geographical information into linguistic sentences: Two case studies Jan Terje Bjørke.
ECML/PKDD 2003 Discovery Challenge Attribute-Value and First Order Data Mining within the STULONG project Anneleen Van Assche, Sofie Verbaeten,
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-4 Estimating a Population Mean:  Not Known.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Representation of Fuzzy Knowledge in Relational Databases Authors: José Galindo ; Angélica Urrutia ; Mario Piattini Public:Database and Expert Systems.
What is History? Presentation #3 Mr. Bridgeo. How do we measure or retrace the evolution of societies? The unit of time that historians use to measure.
An outline is useful to organize your information You put this information in categories You use various symbols to organize your information For main.
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman 1 Assumptions 1) Sample is large (n > 30) a) Central limit theorem applies b) Can.
Inequality Constraints Lecture 7. Inequality Contraints (I) n A Review of Lagrange Multipliers –As we discussed last time, the first order necessary conditions.
SDS-Rules and Classification Tomáš Karban ECML/PKDD 2003 – Dubrovnik (Cavtat) September 22, 2003.
A Probabilistic Quantifier Fuzzification Mechanism: The Model and Its Evaluation for Information Retrieval Felix Díaz-Hemida, David E. Losada, Alberto.
ПОРТФОЛИО профессиональной деятельности Белово 2015 Таюшовой Натальи Борисовны Преподавателя дисциплин «Химия», «Биология»
REASONING UNDER UNCERTAINTY: CERTAINTY THEORY
Testing the Difference between Means, Variances, and Proportions
CHAPTER 5 Handling Uncertainty BIC 3337 EXPERT SYSTEM.
6.2 Confidence Intervals for the Mean ( unknown)
Statistics in Applied Science and Technology
Plenary: rules of indices
Flow diagrams (i) (ii) (iii) x – Example
3. Old Brain I III II V IV VII VI VIII IX X XII XI
ОПШТИНА КУРШУМЛИЈА.
ESTIMATION OF THE MEAN AND PROPORTION
Significant models of claim number Introduction
Index Notation Sunday, 24 February 2019.
ESTIMATION OF THE MEAN AND PROPORTION
УВС АЙМГИЙН НИЙГЭМ ЭДИЙН
ESTIMATION OF THE MEAN AND PROPORTION
Guidance Document - Generic Outline
I  Linear and Logical Pulse II  Instruments Standard Ch 17 GK I  Linear and Logical Pulse II  Instruments Standard III  Application.
Presentation transcript:

1 Finding Fuzzy Approximate Dependencies within STULONG Data Discovery Challenge, ECML/PKDD 2003 September 22-27, 2003 Berzal F., Cubero J.C., Sanchez D., Serrano J.M., Vila M.A. University of Granada (Spain)

Discovery Challenge – ECML/PKDD Introduction KDD allow us to obtain relations within data. Non-trivial. Previously unknown. Potentially useful. Fuzzy data  KDD tools and techniques extensions.

Discovery Challenge – ECML/PKDD Problem representation Fuzzy relational database. a ij values: Numeric, scalar (nominal), linguistic labels. Membership degrees. Fuzzy similarity relations, S A 1,..., S A m. t#t#A1A1 A2A2...AmAm t1t1 a 11,  t1 (A 1 )a 12,  t1 (A 2 )... a 1m,  t1 (A m ) t2t2 a 21,  t2 (A 1 )a 22,  t2 (A 2 )... a 2m,  t2 (A m ) t3t3 a 31,  t3 (A 1 )a 32,  t3 (A 2 )... a 3m,  t3 (A m ) ……... …

Discovery Challenge – ECML/PKDD Fuzzy Approximate Dependencies We define Fuzzy Approximate Dependencies relaxing some properties in Functional Dependencies, V  W   t,s t[V] = s[V]  t[W] = s[W] Equality relaxation Considering linguistic labels and membership degrees Universal quatifier relaxation (exceptions allowing)

Discovery Challenge – ECML/PKDD FAD Measures Relevance degree: Support, supp(V  W) Fulfilment degrees: Confidence, conf(V  W) Certainty factor, CF(V  W) [Shortliffe and Buchanan, 1975] Measures belief degree variations. CF(V  W) = 1  Maximum increment (Perfect positive). CF(V  W) = –1  Maximum decrement. CF(V  W) = 0  Statistical independence.

Discovery Challenge – ECML/PKDD Applications Fuzzy Databases. Approximate Dependencies Discovery. Functional Dependencies Discovery. Other applications: Low granularity data. Overlapping semantics.

Discovery Challenge – ECML/PKDD STULONG Database Entry Table. Normal Group (attribute KONSKUP having values 1 or 2). Risk Group (attribute KONSKUP having values 3 or 4). Pathologic Group (value 5 for attribute KONSKUP).

Discovery Challenge – ECML/PKDD Data Preprocessing (I) Problem: Semantic overlapping in symbolic or scalar attributes. Similarity fuzzy relations (subjective). I.e.: DOPRAVA (Means of transport for getting to work): by bikepublic meanscarnot stated on foot by bike public means0.40.0

Discovery Challenge – ECML/PKDD Data Preprocessing (II) Problem: High granularity in numeric attributes. Linguistic labels sets definition starting from intervals. Numeric value  P.e.: BMI (Body mass index): thinoverweight

Discovery Challenge – ECML/PKDD Analytical Questions (I) Dependencies between social factors and physical activity. ROKVSTUPSTAVVZDELANIZODPOV TELAKTZA0.67/ / /0.28 AKTPOZAM0.14/ / / /0.47 DOPRAVA0.20/ / / /0.32 DOPRATRV0.17/ / / /0.44

Discovery Challenge – ECML/PKDD Analytical Questions (II) Dependencies between social factors and smoking. ROKVSTUPSTAVVZDELANIZODPOV KOURENI0.68/0.07 DOBAKOUR0.64/ /0.25 BYVKURAK0.10/ / / /0.64

Discovery Challenge – ECML/PKDD Analytical Questions (III) Dependencies between social factors and alcohol consumption. ROKVSTUPSTAVVZDELANIZODPOV ALKOHOL0.21/ / / /0.31 PIVO100.16/ / / /0.41 PIVO120.10/ / / /0.61 VINO0.16/ / / /0.41 LIHOV0.16/ / / /0.41 PIVOMN0.21/ / / /0.29 VINOMN0.20/ / / /0.31 LIHMN0.20/ / / /0.29

Discovery Challenge – ECML/PKDD Analytical Questions (IV) Dependencies between social factors and physical features. ROKVSTUPSTAVVZDELANIZODPOV BMI0.16/ / / /0.42 SYST10.65/ /0.26 DIAST10.19/ / / /0.30 SYST20.65/ /0.25 DIAST20.19/ / / /0.30

Discovery Challenge – ECML/PKDD Analytical Questions (V) Dependencies between physical activity and smoking. TELAKTZAAKTPOZAMDOPRAVADOPRATRV KOURENI0.50/ /0.13 DOBAKOUR0.27/ / / /0.19 BYVKURAK0.13/ / / /0.55

Discovery Challenge – ECML/PKDD Analytical Questions (VI) Dependencies between physical activity and alcohol consumption. TELAKTZAAKTPOZAMDOPRAVADOPRATRV ALKOHOL0.27/ / / /0.25 PIVO100.22/ / / /0.33 PIVO120.14/ / / /0.50 VINO0.22/ / / /0.33 LIHOV0.22/ / / /0.33 PIVOMN0.27/ / / /0.24 VINOMN0.27/ / / /0.24 LIHMN0.27/ / / /0.23

Discovery Challenge – ECML/PKDD Analytical Questions (VII) Dependencies between physical activity and physical features. TELAKTZAAKTPOZAMDOPRAVADOPRATRV BMI0.21/ / / /0.34 SYST10.27/ / / /0.21 DIAST10.25/ / / /0.23 SYST20.27/ / / /0.20 DIAST20.25/ / / /0.24

Discovery Challenge – ECML/PKDD Analytical Questions (VIII) Dependencies between physical activity and cholesterol degrees. TELAKTZAAKTPOZAMDOPRAVADOPRATRV CHLST0.28/ / / /0.19 TRIGL0.49/ /0.14

Discovery Challenge – ECML/PKDD Analytical Questions (IX) Dependencies between alcohol consumption and physical features. BMISYST1DIAST1SYST2DIAST2 ALKOHOL0.40/ / / / /0.29 PIVO100.35/ / / / /0.38 PIVO120.25/ / / / /0.58 VINO0.35/ / / / /0.38 LIHOV0.35/ / / / /0.38 PIVOMN0.41/ / / / /0.27 VINOMN0.40/ / / / /0.28 LIHMN0.41/ / / / /0.27

Discovery Challenge – ECML/PKDD Analytical Questions (X) Dependencies between alcohol consumption and smoking. KOURENIDOBAKOURBYVKURAK ALKOHOL0.23/ /0.15 PIVO100.13/ / /0.22 PIVO120.08/ / /0.40 VINO0.13/ / /0.22 LIHOV0.13/ / /0.22 PIVOMN0.23/ /0.14 VINOMN0.23/ /0.15 LIHMN0.24/ /0.14

Discovery Challenge – ECML/PKDD Analytical Questions (XI) Dependencies between skin folds and BMI, [TRIC]  [BMI], supp 15.85%, CF 0.54 [SUBSC]  [BMI], supp 17.28%, CF 0.58

Discovery Challenge – ECML/PKDD Concluding Remarks FAD’s allow us to discover relations within imprecise or uncertain data. Experts aid is desirable. Data preprocessing. Results interpretation.