Abrar Fawaz AlAbed-AlHaq Kent State University October 28, 2011

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

Association rules and frequent itemsets mining
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization  If a relation is not in BCNF or 3NF, we refine it by.
CS 440 Database Management Systems Practice problems for normalization.
Frequent Closed Pattern Search By Row and Feature Enumeration
Database Management COP4540, SCS, FIU Functional Dependencies (Chapter 14)
Properties of Armstrong’s Axioms Soundness All dependencies generated by the Axioms are correct Completeness Repeatedly applying these rules can generate.
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Functional Dependencies - Example
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Ontology Learning Mining Functional Dependencies from Data Hong Yao and Howard J. Hamilton Presented By Stephen Lynn.
1 Design Theory. 2 Let U be a set of attributes and F be a set of functional dependencies on U. Suppose that X  U is a set of attributes. Definition:
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
FDImplication: 1 Functional Dependencies (FDs) Let r(R) be a relation and let t  r, then the restriction of t to X  R, written t[X], is the projection.
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, D. W. Cheung, B. Kao Department of Computer Science.
Intelligent Information Directory System for Clinical Documents Qinghua Zou 6/3/2005 Dr. Wesley W. Chu (Advisor)
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee David W. Cheung Ben Kao The University of Hong Kong.
Performance and Scalability: Apriori Implementation.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Mining Association Rules of Simple Conjunctive Queries Bart Goethals Wim Le Page Heikki Mannila SIAM /8/261.
FUNCTIONAL DEPENDENCIES. Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant Information.
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Functional Dependencies An example: loan-info= Observe: tuples with the same value for lno will always have the same value for amt We write: lno  amt.
Mining High Utility Itemset in Big Data
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Basics of Functional Dependencies and Normalization for Relational.
Ihr Logo Fundamentals of Database Systems Fourth Edition El Masri & Navathe Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Ihr Logo Fundamentals of Database Systems Fourth Edition El Masri & Navathe Chapter 10 Functional Dependencies and Normalization for Relational Databases.
1 Discovering Robust Knowledge from Databases that Change Chun-Nan HsuCraig A. Knoblock Arizona State UniversityUniversity of Southern California Journal.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Deanship of Distance Learning Avicenna Center for E-Learning 1 Session - 7 Sequence - 2 Normalization Functional Dependencies Presented by: Dr. Samir Tartir.
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
Spatial Congeries Pattern Mining Presented by: Iris Zhang Supervisor: Dr. David Cheung 24 October 2003.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
Multivalued Dependencies and 4th NF CIS 4301 Lecture Notes Lecture /21/2006.
CS542 1 Schema Refinement Chapter 19 (part 1) Functional Dependencies.
Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Gspan: Graph-based Substructure Pattern Mining
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
Case tool Relational Database Schema Designer Cai Xinlei Tang Ning Xu Chen Zhang Yichuan CS4221 P06.
Fast Mining Frequent Patterns with Secondary Memory Kawuu W. Lin, Sheng-Hao Chung, Sheng-Shiung Huang and Chun-Cheng Lin Department of Computer Science.
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
CS 440 Database Management Systems
Normalization Functional Dependencies Presented by: Dr. Samir Tartir
3.1 Functional Dependencies
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
A Parameterised Algorithm for Mining Association Rules
Functional Dependencies and Normalization
How to test Whether Subschemes in BCNF??
Chapter 19 (part 1) Functional Dependencies
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
CS4222 Principles of Database System
Presentation transcript:

Discovering Functional Dependencies in Relational Databases Using Data Mining Techniques Abrar Fawaz AlAbed-AlHaq Kent State University aalabeda@kent.edu October 28, 2011 4/21/2017

Outline Introduction. Problem Statement. Data Mining Techniques. Discovering Functional Dependencies: Example. Experimental Results. Conclusion. References. 4/21/2017

Introduction Nowadays there is a fast growing amount of data that are collected and stored in large databases. As a result, the databases may contain redundant or inconsistent data. An important concept in relational schema design is that of a functional dependency. Functional dependencies (FD) plays key role in the design of relational databases, FD is a property of the semantic or meaning of the attributes. It helps in simplifying the structure of databases. Discovering functional dependencies (FDs) from an existing relational instance is an important technique in data mining and database design. 4/21/2017

Introduction (cont.) Functional dependencies are relationships between attributes of a database relation; a functional dependency states that the value of an attribute is uniquely determined by the value of some other attributes. 4/21/2017

Outline Introduction. Problem Statement. Data Mining Techniques. Discovering Functional Dependencies: Example. Experimental Results. Conclusion. References. 4/21/2017

Problem Statement The problem is to find all functional dependencies among attribute in a relation database. The early methods for discovering functional dependencies is based on repeatedly sorting and comparing tuples to determine whether or not these tuples meet FD definition, so this approach does not utilize the discovered FDs as knowledge to obtain new knowledge. As a result this approach is highly sensitive to the number of tuples and attributes, it is not practical for large database. Using FD_Mine and TANE algorithms, by using these algorithms, there is no need to sort on any attribute or compare any value. 4/21/2017

Outline Introduction. Problem Statement. Data Mining Techniques. Discovering Functional Dependencies: Example. Experimental Results. Conclusion. References. 4/21/2017

Data Mining Techniques Data mining is the process of producing useful knowledge and information, it uses an analysis tools to discover patterns and a relationships in large data that could be use. We can benefit from data mining algorithm for discovering functional dependencies in large databases to get useful information. Data mining have several techniques that we can use it for discovering functional dependencies such as Apriori algorithm. We have two algorithms here: FD_Mine TANE 4/21/2017

The FD_Mine Algorithm The mechanism of the FD_Mine is as follow. It uses a level-wise search, where results from level k are used to explore level k +1. First, at level 1, all FDs X →Y where X and Y are single attributes are found and stored in FD_SET F1. The set of candidates that are considered at this level is denoted L1. F1 and L1 are used to generate the candidates Xi Xj of L2. At level 2, all FDs of the form Xi Xj → Y are found and stored in FD_SET F2, F1, F2, L1 and L2 are used to generate the candidates of L3, and so on, until the candidates at level Ln-1 have been checked or no candidates remain. 4/21/2017

The TANE Algorithm TANE searches the set containment lattice in a levelwise manner. A level LI is the collection of attributes sets of size I. TANE starts with L1 = {{A} | A R}, and computes L2 from L1, L3 from L2 and so on according to the information obtained during the algorithm. To computing partitions; in the beginning, partitions with respect to the singleton attribute sets are computed straight from relation r. A partition ∏{A}, where ∏ denoted partition, is computed from the column r[A] as follows. First, the values of the column are replaced with integers 1, 2, 3… so the same values are replaced by same integers and different values with different integers. Then the value t[A] is the identifier of the equivalence class [t]{A} of ∏{A}, and ∏ {A} is then easy to construct. 4/21/2017

Outline Introduction. Problem Statement. Data Mining Techniques. Discovering Functional Dependencies: Example. Experimental Results. Conclusion. References. 4/21/2017

Discovering functional dependencies: Example Suppose that FD_Mine is applied to database D, as shown in Table, with R = {A, B, C, D, E}. A B C D E t1 2 t2 1 t3 t4 3 t5 4 t6 t7 4/21/2017

Discovering functional dependencies: Example Next table Summarizes the actions of FD_Mine. In iteration 1, since |ПA| = |ПAD| = 2, Closure’(A) is set to D, and A→D is deduced. In same way, D→A is discovered, so the equivalence A↔D is obtained. As a result, we only need to combine A, B, C, and E to generate the next level candidates {AB, AC, AE, BC, BE, CE}. At the same time, the nontrivial closure of each generated candidate is computed. For example, the closure’(AB)=closure’(A) U closure’(B) = {D} U φ = {D}. In iteration 2, for candidate AB, only AB→C and AB→E need to be checked, because R–{A, B}–Closure’(AB)={A, B, C, D, E}–{A, B}–{D}={C, E}. Since |ПAB| = |ПABE| = 6, then AB→E is obtained. In the same way, at this level, BE→A and CE→A are also discovered, so the equivalence AB↔BE is obtained. As a result, we only need to combine AB, AC, AE, BC, and CE to form the level 3 candidates, which are {ABC, ACE}. Since CE→A, ACE is pruned by pruning rule 4. Since AB→E, then ABC→E. Since A↔D, then ABC→D, so ABC is a key, and ABC is also pruned by pruning rule 2. No other candidate remains, so the algorithm halts. 4/21/2017

Discovering functional dependencies: Example Yao, H., Hamilton, H., and Butz, C. (2002), FD_Mine: Discovering Functional Dependencies in a Database Using Equivalences, Canada.

Discovering functional dependencies: Example The result after removal is shown in table (b). A B C D E t1 2 t2 1 t3 t4 3 t5 4 t6 t7 A B C E t1 t2 1 t3 2 t4 3 t5 4 t6 t7 (a) Before (b) After 4/21/2017

Outline Introduction. Problem Statement. Data Mining Techniques. Discovering functional dependencies: Example. Experimental Results. Conclusion. References. 4/21/2017

Experimental Results FD_Mine was applied to fifteen datasets, obtained from the UCI Machine Learning Repository and the results were compared to TANE. TANE was selected for comparison because it establishes the theoretical framework for the problem. For the dataset given in Table, Figure (a) shows the semi-lattice for FD_Mine, and Figure (b) shows that for TANE. Each node represents a combination of attributes. If an edge is shown between nodes X and XY, then X → Y needs to be checked. Hence, the number of edges is the number of FDs that need to be checked. Both semi-lattices shown in Figure(a) have fewer edges than the lattice. In addition, the semi-lattice for FD_Mine has fewer edges than that for TANE. 4/21/2017

Experimental Results Figure ( a ) FD_Mine Figure ( b ) TANE Yao, H., Hamilton, H., and Butz, C. (2002), FD_Mine: Discovering Functional Dependencies in a Database Using Equivalences, Canada.

Experimental Results Table 5.1 compares the number of FDs that are checked on data by FD_Mine and TANE for 15 UCI datasets. Figure 5.2 shows more detailed results for the Imports-85 dataset. At levels 1 through 5, both algorithms check approximately the same number of FDs, but at levels 6 through 11, FD_Mine checks fewer FDs than TANE, because it prunes more unnecessary candidates than TANE by using the equivalences and FDs discovered at previous levels. For more results. 4/21/2017

Experimental Results Yao, H., Hamilton, H., and Butz, C. (2002), FD_Mine: Discovering Functional Dependencies in a Database Using Equivalences, Canada.

Outline Introduction. Problem Statement. Data Mining Techniques. Discovering functional dependencies: Example. Experimental Results. Conclusion. References. 4/21/2017

Conclusion Discovering Functional Dependency (FD) is employed to extract the Functional Dependency from a datasets. This addresses two issues; one related to the discovering process of the FDs from the datasets and the second issues is to measure the discovering Functional Dependency (FDs) algorithm.  The empirical comparison on 15 UCI datasets between FD_Mine and TANE shows FD_Mine examined fewer FDs than TANE, because it prunes the unnecessary candidates than TANE by using the equivalences and FDs discovered at previous levels. The result based on the experiments on 15 UCI datasets show that the FD_Mine algorithm can prune more candidates than TANE algorithm and it also show that FD_Mine can discovering FDs more than TANE algorithm. 4/21/2017

Outline Introduction. Problem Statement. Data Mining Techniques. Discovering functional dependencies: Example. Experimental Results. Conclusion. References. 4/21/2017

References [1] Yao, H., Hamilton, H., and Butz, C. (2002), FD_Mine: Discovering Functional Dependencies in a Database Using Equivalences, Canada. [2] Wyss, C., Giannella, C., and Robertson, E. (2001), FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances, U.S.A. [3] Huhtala, Y., Karkkainen, J., Porkka, P., and Toivonen, H., (1999), TANE: An Efficient Algorithm for discovering Functional and Approximate Dependencies, Computing Journal, V.42, No.20, pp.100-107. [4] Elmasri, R. and Navathe, S. (2004), Fundamentals of Database Systems, Addison Wesley, Fourth edition. [5] Mannila, H. (2000), Theoretical Frameworks for Data Mining, SIGKDD Explorations, V.1, No.2, pp.30-32. [6] Huhtala, Y., Karkkainen, J., Porkka, P., and Toivonen, H. (2000), Efficient Discovery of Functional and Approximate Dependencies Using Partitions, University of Helsinki, Finland. Thank You 4/21/2017