Download presentation
Presentation is loading. Please wait.
Published byEdward Silas Miller Modified over 9 years ago
1
1 Finding Fuzzy Approximate Dependencies within STULONG Data Discovery Challenge, ECML/PKDD 2003 September 22-27, 2003 Berzal F., Cubero J.C., Sanchez D., Serrano J.M., Vila M.A. University of Granada (Spain)
2
Discovery Challenge – ECML/PKDD 2003 2 Introduction KDD allow us to obtain relations within data. Non-trivial. Previously unknown. Potentially useful. Fuzzy data KDD tools and techniques extensions.
3
Discovery Challenge – ECML/PKDD 2003 3 Problem representation Fuzzy relational database. a ij values: Numeric, scalar (nominal), linguistic labels. Membership degrees. Fuzzy similarity relations, S A 1,..., S A m. t#t#A1A1 A2A2...AmAm t1t1 a 11, t1 (A 1 )a 12, t1 (A 2 )... a 1m, t1 (A m ) t2t2 a 21, t2 (A 1 )a 22, t2 (A 2 )... a 2m, t2 (A m ) t3t3 a 31, t3 (A 1 )a 32, t3 (A 2 )... a 3m, t3 (A m ) ……... …
4
Discovery Challenge – ECML/PKDD 2003 4 Fuzzy Approximate Dependencies We define Fuzzy Approximate Dependencies relaxing some properties in Functional Dependencies, V W t,s t[V] = s[V] t[W] = s[W] Equality relaxation Considering linguistic labels and membership degrees Universal quatifier relaxation (exceptions allowing)
5
Discovery Challenge – ECML/PKDD 2003 5 FAD Measures Relevance degree: Support, supp(V W) Fulfilment degrees: Confidence, conf(V W) Certainty factor, CF(V W) [Shortliffe and Buchanan, 1975] Measures belief degree variations. CF(V W) = 1 Maximum increment (Perfect positive). CF(V W) = –1 Maximum decrement. CF(V W) = 0 Statistical independence.
6
Discovery Challenge – ECML/PKDD 2003 6 Applications Fuzzy Databases. Approximate Dependencies Discovery. Functional Dependencies Discovery. Other applications: Low granularity data. Overlapping semantics.
7
Discovery Challenge – ECML/PKDD 2003 7 STULONG Database Entry Table. Normal Group (attribute KONSKUP having values 1 or 2). Risk Group (attribute KONSKUP having values 3 or 4). Pathologic Group (value 5 for attribute KONSKUP).
8
Discovery Challenge – ECML/PKDD 2003 8 Data Preprocessing (I) Problem: Semantic overlapping in symbolic or scalar attributes. Similarity fuzzy relations (subjective). I.e.: DOPRAVA (Means of transport for getting to work): by bikepublic meanscarnot stated on foot0.40.3 0.0 by bike0.3 0.0 public means0.40.0
9
Discovery Challenge – ECML/PKDD 2003 9 Data Preprocessing (II) Problem: High granularity in numeric attributes. Linguistic labels sets definition starting from intervals. Numeric value P.e.: BMI (Body mass index): 1 25.0 25.1224.73 thinoverweight
10
Discovery Challenge – ECML/PKDD 2003 10 Analytical Questions (I) Dependencies between social factors and physical activity. ROKVSTUPSTAVVZDELANIZODPOV TELAKTZA0.67/0.140.24/0.370.25/0.28 AKTPOZAM0.14/0.470.58/0.280.14/0.490.18/0.47 DOPRAVA0.20/0.320.64/0.140.19/0.320.26/0.32 DOPRATRV0.17/0.470.57/0.220.16/0.460.21/0.44
11
Discovery Challenge – ECML/PKDD 2003 11 Analytical Questions (II) Dependencies between social factors and smoking. ROKVSTUPSTAVVZDELANIZODPOV KOURENI0.68/0.07 DOBAKOUR0.64/0.110.26/0.25 BYVKURAK0.10/0.640.42/0.390.09/0.650.13/0.64
12
Discovery Challenge – ECML/PKDD 2003 12 Analytical Questions (III) Dependencies between social factors and alcohol consumption. ROKVSTUPSTAVVZDELANIZODPOV ALKOHOL0.21/0.350.63/0.150.19/0.340.24/0.31 PIVO100.16/0.430.58/0.210.16/0.430.21/0.41 PIVO120.10/0.620.47/0.390.10/0.620.13/0.61 VINO0.16/0.430.58/0.210.16/0.440.21/0.41 LIHOV0.16/0.430.58/0.210.16/0.430.20/0.41 PIVOMN0.21/0.330.65/0.140.20/0.320.24/0.29 VINOMN0.20/0.330.64/0.150.19/0.330.24/0.31 LIHMN0.20/0.310.64/0.140.19/0.300.25/0.29
13
Discovery Challenge – ECML/PKDD 2003 13 Analytical Questions (IV) Dependencies between social factors and physical features. ROKVSTUPSTAVVZDELANIZODPOV BMI0.16/0.440.58/0.230.15/0.450.20/0.42 SYST10.65/0.120.25/0.26 DIAST10.19/0.320.63/0.140.19/0.320.24/0.30 SYST20.65/0.120.25/0.25 DIAST20.19/0.330.63/0.150.18/0.330.23/0.30
14
Discovery Challenge – ECML/PKDD 2003 14 Analytical Questions (V) Dependencies between physical activity and smoking. TELAKTZAAKTPOZAMDOPRAVADOPRATRV KOURENI0.50/0.110.45/0.13 DOBAKOUR0.27/0.240.47/0.180.30/0.240.42/0.19 BYVKURAK0.13/0.620.26/0.510.15/0.510.23/0.55
15
Discovery Challenge – ECML/PKDD 2003 15 Analytical Questions (VI) Dependencies between physical activity and alcohol consumption. TELAKTZAAKTPOZAMDOPRAVADOPRATRV ALKOHOL0.27/0.310.46/0.230.29/0.300.41/0.25 PIVO100.22/0.390.40/0.300.24/0.390.35/0.33 PIVO120.14/0.590.29/0.500.16/0.590.23/0.50 VINO0.22/0.400.40/0.310.24/0.390.35/0.33 LIHOV0.22/0.390.39/0.300.24/0.380.35/0.33 PIVOMN0.27/0.290.46/0.210.30/0.290.42/0.24 VINOMN0.27/0.310.46/0.230.28/0.300.41/0.24 LIHMN0.27/0.280.46/0.210.29/0.270.41/0.23
16
Discovery Challenge – ECML/PKDD 2003 16 Analytical Questions (VII) Dependencies between physical activity and physical features. TELAKTZAAKTPOZAMDOPRAVADOPRATRV BMI0.21/0.410.39/0.320.23/0.400.34/0.34 SYST10.27/0.260.46/0.190.29/0.250.42/0.21 DIAST10.25/0.290.44/0.220.28/0.290.39/0.23 SYST20.27/0.250.47/0.180.29/0.240.42/0.20 DIAST20.25/0.290.45/0.220.27/0.290.39/0.24
17
Discovery Challenge – ECML/PKDD 2003 17 Analytical Questions (VIII) Dependencies between physical activity and cholesterol degrees. TELAKTZAAKTPOZAMDOPRAVADOPRATRV CHLST0.28/0.240.47/0.170.30/0.230.42/0.19 TRIGL0.49/0.130.45/0.14
18
Discovery Challenge – ECML/PKDD 2003 18 Analytical Questions (IX) Dependencies between alcohol consumption and physical features. BMISYST1DIAST1SYST2DIAST2 ALKOHOL0.40/0.240.25/0.300.28/0.290.24/0.310.28/0.29 PIVO100.35/0.330.21/0.390.38/0.240.20/0.400.24/0.38 PIVO120.25/0.520.14/0.600.16/0.590.13/0.600.17/0.58 VINO0.35/0.320.21/0.400.24/0.380.20/0.400.24/0.38 LIHOV0.35/0.330.21/0.400.24/0.380.20/0.400.24/0.38 PIVOMN0.41/0.230.25/0.280.29/0.270.25/0.290.29/0.27 VINOMN0.40/0.240.25/0.300.28/0.280.24/0.300.28/0.28 LIHMN0.41/0.220.25/0.280.29/0.270.24/0.280.29/0.27
19
Discovery Challenge – ECML/PKDD 2003 19 Analytical Questions (X) Dependencies between alcohol consumption and smoking. KOURENIDOBAKOURBYVKURAK ALKOHOL0.23/0.300.61/0.15 PIVO100.13/0.440.20/0.400.56/0.22 PIVO120.08/0.650.13/0.600.44/0.40 VINO0.13/0.440.20/0.400.56/0.22 LIHOV0.13/0.440.20/0.400.56/0.22 PIVOMN0.23/0.280.61/0.14 VINOMN0.23/0.300.61/0.15 LIHMN0.24/0.280.62/0.14
20
Discovery Challenge – ECML/PKDD 2003 20 Analytical Questions (XI) Dependencies between skin folds and BMI, [TRIC] [BMI], supp 15.85%, CF 0.54 [SUBSC] [BMI], supp 17.28%, CF 0.58
21
Discovery Challenge – ECML/PKDD 2003 21 Concluding Remarks FAD’s allow us to discover relations within imprecise or uncertain data. Experts aid is desirable. Data preprocessing. Results interpretation.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.