Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Finding Fuzzy Approximate Dependencies within STULONG Data Discovery Challenge, ECML/PKDD 2003 September 22-27, 2003 Berzal F., Cubero J.C., Sanchez.

Similar presentations


Presentation on theme: "1 Finding Fuzzy Approximate Dependencies within STULONG Data Discovery Challenge, ECML/PKDD 2003 September 22-27, 2003 Berzal F., Cubero J.C., Sanchez."— Presentation transcript:

1 1 Finding Fuzzy Approximate Dependencies within STULONG Data Discovery Challenge, ECML/PKDD 2003 September 22-27, 2003 Berzal F., Cubero J.C., Sanchez D., Serrano J.M., Vila M.A. University of Granada (Spain)

2 Discovery Challenge – ECML/PKDD 2003 2 Introduction KDD allow us to obtain relations within data. Non-trivial. Previously unknown. Potentially useful. Fuzzy data  KDD tools and techniques extensions.

3 Discovery Challenge – ECML/PKDD 2003 3 Problem representation Fuzzy relational database. a ij values: Numeric, scalar (nominal), linguistic labels. Membership degrees. Fuzzy similarity relations, S A 1,..., S A m. t#t#A1A1 A2A2...AmAm t1t1 a 11,  t1 (A 1 )a 12,  t1 (A 2 )... a 1m,  t1 (A m ) t2t2 a 21,  t2 (A 1 )a 22,  t2 (A 2 )... a 2m,  t2 (A m ) t3t3 a 31,  t3 (A 1 )a 32,  t3 (A 2 )... a 3m,  t3 (A m ) ……... …

4 Discovery Challenge – ECML/PKDD 2003 4 Fuzzy Approximate Dependencies We define Fuzzy Approximate Dependencies relaxing some properties in Functional Dependencies, V  W   t,s t[V] = s[V]  t[W] = s[W] Equality relaxation Considering linguistic labels and membership degrees Universal quatifier relaxation (exceptions allowing)

5 Discovery Challenge – ECML/PKDD 2003 5 FAD Measures Relevance degree: Support, supp(V  W) Fulfilment degrees: Confidence, conf(V  W) Certainty factor, CF(V  W) [Shortliffe and Buchanan, 1975] Measures belief degree variations. CF(V  W) = 1  Maximum increment (Perfect positive). CF(V  W) = –1  Maximum decrement. CF(V  W) = 0  Statistical independence.

6 Discovery Challenge – ECML/PKDD 2003 6 Applications Fuzzy Databases. Approximate Dependencies Discovery. Functional Dependencies Discovery. Other applications: Low granularity data. Overlapping semantics.

7 Discovery Challenge – ECML/PKDD 2003 7 STULONG Database Entry Table. Normal Group (attribute KONSKUP having values 1 or 2). Risk Group (attribute KONSKUP having values 3 or 4). Pathologic Group (value 5 for attribute KONSKUP).

8 Discovery Challenge – ECML/PKDD 2003 8 Data Preprocessing (I) Problem: Semantic overlapping in symbolic or scalar attributes. Similarity fuzzy relations (subjective). I.e.: DOPRAVA (Means of transport for getting to work): by bikepublic meanscarnot stated on foot0.40.3 0.0 by bike0.3 0.0 public means0.40.0

9 Discovery Challenge – ECML/PKDD 2003 9 Data Preprocessing (II) Problem: High granularity in numeric attributes. Linguistic labels sets definition starting from intervals. Numeric value  P.e.: BMI (Body mass index): 1 25.0 25.1224.73 thinoverweight

10 Discovery Challenge – ECML/PKDD 2003 10 Analytical Questions (I) Dependencies between social factors and physical activity. ROKVSTUPSTAVVZDELANIZODPOV TELAKTZA0.67/0.140.24/0.370.25/0.28 AKTPOZAM0.14/0.470.58/0.280.14/0.490.18/0.47 DOPRAVA0.20/0.320.64/0.140.19/0.320.26/0.32 DOPRATRV0.17/0.470.57/0.220.16/0.460.21/0.44

11 Discovery Challenge – ECML/PKDD 2003 11 Analytical Questions (II) Dependencies between social factors and smoking. ROKVSTUPSTAVVZDELANIZODPOV KOURENI0.68/0.07 DOBAKOUR0.64/0.110.26/0.25 BYVKURAK0.10/0.640.42/0.390.09/0.650.13/0.64

12 Discovery Challenge – ECML/PKDD 2003 12 Analytical Questions (III) Dependencies between social factors and alcohol consumption. ROKVSTUPSTAVVZDELANIZODPOV ALKOHOL0.21/0.350.63/0.150.19/0.340.24/0.31 PIVO100.16/0.430.58/0.210.16/0.430.21/0.41 PIVO120.10/0.620.47/0.390.10/0.620.13/0.61 VINO0.16/0.430.58/0.210.16/0.440.21/0.41 LIHOV0.16/0.430.58/0.210.16/0.430.20/0.41 PIVOMN0.21/0.330.65/0.140.20/0.320.24/0.29 VINOMN0.20/0.330.64/0.150.19/0.330.24/0.31 LIHMN0.20/0.310.64/0.140.19/0.300.25/0.29

13 Discovery Challenge – ECML/PKDD 2003 13 Analytical Questions (IV) Dependencies between social factors and physical features. ROKVSTUPSTAVVZDELANIZODPOV BMI0.16/0.440.58/0.230.15/0.450.20/0.42 SYST10.65/0.120.25/0.26 DIAST10.19/0.320.63/0.140.19/0.320.24/0.30 SYST20.65/0.120.25/0.25 DIAST20.19/0.330.63/0.150.18/0.330.23/0.30

14 Discovery Challenge – ECML/PKDD 2003 14 Analytical Questions (V) Dependencies between physical activity and smoking. TELAKTZAAKTPOZAMDOPRAVADOPRATRV KOURENI0.50/0.110.45/0.13 DOBAKOUR0.27/0.240.47/0.180.30/0.240.42/0.19 BYVKURAK0.13/0.620.26/0.510.15/0.510.23/0.55

15 Discovery Challenge – ECML/PKDD 2003 15 Analytical Questions (VI) Dependencies between physical activity and alcohol consumption. TELAKTZAAKTPOZAMDOPRAVADOPRATRV ALKOHOL0.27/0.310.46/0.230.29/0.300.41/0.25 PIVO100.22/0.390.40/0.300.24/0.390.35/0.33 PIVO120.14/0.590.29/0.500.16/0.590.23/0.50 VINO0.22/0.400.40/0.310.24/0.390.35/0.33 LIHOV0.22/0.390.39/0.300.24/0.380.35/0.33 PIVOMN0.27/0.290.46/0.210.30/0.290.42/0.24 VINOMN0.27/0.310.46/0.230.28/0.300.41/0.24 LIHMN0.27/0.280.46/0.210.29/0.270.41/0.23

16 Discovery Challenge – ECML/PKDD 2003 16 Analytical Questions (VII) Dependencies between physical activity and physical features. TELAKTZAAKTPOZAMDOPRAVADOPRATRV BMI0.21/0.410.39/0.320.23/0.400.34/0.34 SYST10.27/0.260.46/0.190.29/0.250.42/0.21 DIAST10.25/0.290.44/0.220.28/0.290.39/0.23 SYST20.27/0.250.47/0.180.29/0.240.42/0.20 DIAST20.25/0.290.45/0.220.27/0.290.39/0.24

17 Discovery Challenge – ECML/PKDD 2003 17 Analytical Questions (VIII) Dependencies between physical activity and cholesterol degrees. TELAKTZAAKTPOZAMDOPRAVADOPRATRV CHLST0.28/0.240.47/0.170.30/0.230.42/0.19 TRIGL0.49/0.130.45/0.14

18 Discovery Challenge – ECML/PKDD 2003 18 Analytical Questions (IX) Dependencies between alcohol consumption and physical features. BMISYST1DIAST1SYST2DIAST2 ALKOHOL0.40/0.240.25/0.300.28/0.290.24/0.310.28/0.29 PIVO100.35/0.330.21/0.390.38/0.240.20/0.400.24/0.38 PIVO120.25/0.520.14/0.600.16/0.590.13/0.600.17/0.58 VINO0.35/0.320.21/0.400.24/0.380.20/0.400.24/0.38 LIHOV0.35/0.330.21/0.400.24/0.380.20/0.400.24/0.38 PIVOMN0.41/0.230.25/0.280.29/0.270.25/0.290.29/0.27 VINOMN0.40/0.240.25/0.300.28/0.280.24/0.300.28/0.28 LIHMN0.41/0.220.25/0.280.29/0.270.24/0.280.29/0.27

19 Discovery Challenge – ECML/PKDD 2003 19 Analytical Questions (X) Dependencies between alcohol consumption and smoking. KOURENIDOBAKOURBYVKURAK ALKOHOL0.23/0.300.61/0.15 PIVO100.13/0.440.20/0.400.56/0.22 PIVO120.08/0.650.13/0.600.44/0.40 VINO0.13/0.440.20/0.400.56/0.22 LIHOV0.13/0.440.20/0.400.56/0.22 PIVOMN0.23/0.280.61/0.14 VINOMN0.23/0.300.61/0.15 LIHMN0.24/0.280.62/0.14

20 Discovery Challenge – ECML/PKDD 2003 20 Analytical Questions (XI) Dependencies between skin folds and BMI, [TRIC]  [BMI], supp 15.85%, CF 0.54 [SUBSC]  [BMI], supp 17.28%, CF 0.58

21 Discovery Challenge – ECML/PKDD 2003 21 Concluding Remarks FAD’s allow us to discover relations within imprecise or uncertain data. Experts aid is desirable. Data preprocessing. Results interpretation.


Download ppt "1 Finding Fuzzy Approximate Dependencies within STULONG Data Discovery Challenge, ECML/PKDD 2003 September 22-27, 2003 Berzal F., Cubero J.C., Sanchez."

Similar presentations


Ads by Google