Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.

Similar presentations


Presentation on theme: "Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU."— Presentation transcript:

1 Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU

2 What is Data Mining?? Extraction of knowledge from data exploration and analysis of large quantities of data to discover meaningful pattern from data. Discover Knowledge

3 Why datamining Datamining is used in: pattern matching and restore the original picture from a noisy one. Medical Business etc What datamining do: Finds relationship and make prediction.

4 Types of data mining Relational data mining: It is the data mining technique for relational databases. Unlike traditional data mining algorithms, which look for patterns in a single table, relational data mining algorithms look for patterns among multiple tables (relational patterns). Web mining: - is the application of data mining techniques to discover patterns from the Web.

5 Software Mining and Data Mining: Instead of mining individual data sets, software mining focuses on metadata, such as database schemas. Knowledge Discovery from software systems addresses structure, behavior as well as the data processed by the software system.data sets

6 OLAP OLAP deals with tools and technique for data analysis that can give nearly instantaneous answer to queries. OLAP use multidimensional array that allow user to analyze the data. Datamining server must be integrated with data warehouse and OLAP server.

7 Data Mining : Motivation Huge amounts of data Important need for turning data into useful information Fast growing amount of data, collected and stored in large and numerous databases exceeded the human ability for comprehension without powerful tools

8 Data Mining Techniques Decision Trees Neural Network Genetic Algorithms Fuzzy Set Theory Rough Set Theory

9 DATA MINING TECHNIQUES Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure. Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset.

10 Genetic algorithms: Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of evolution.

11 THE ROUGH SET THEORY One of the new data mining theories is the rough set theory that can be used for (1) Reduction of data sets (2) Finding hidden data patterns (3) Generation of decision rules

12 What is rough set A rough set is a formal approximation of a crisp set in terms of a pair of sets which give the lower and the upper approximation of the original set. crisp set The tuple composed of the lower and upper approximation is called a rough set.The accuracy is perfect if α P (X) = 1

13 Reduct and Core Reduct is a subset of attributes which by itself can fully characterize the knowledge in the database. The set of attributes which is common to all reducts is called the core.

14 Data mining process Stage-1 Stage-2 Stage-3 Stage-4 RAW DATA K-MEANS ALGORITHM SYMBOLIC RULES QUICK REDUCT

15 Data preparation : Here data are prepared from the database warehouse. Data is stored using MATLAB. K-means algorithm: Data attribute obtained from stage 1 is partitioned into k clusters where each cluster comprises with data- vectors with similar inherent characteristics

16 The K-Means Algorithm Process: The dataset is partitioned into K clusters and the data points are randomly assigned to the clusters resulting in clusters that have roughly the same number of data points. For each data point, calculate the distance from the data point to each cluster. If the data point is closest to its own cluster leave it where it is. If the data point is not closest to its own cluster, move it into the closest cluster. Repeat the above step until a complete pass through all the data points results in no data point moving from one cluster to another. At this point the clusters are stable and the clustering process ends.

17 Quick-reduct algorithm : Quick-reduct algorithm is used to compute a minimal reduct without exhaustively generating all possible subsets. The reduction of attribute is achieved by comparing equivalence relations generated by set of attributes so that the reduced set provides the same predictive capability of the decision feature as the original.

18 QUICKREDUCT(C,D) C ->set of all conditional features; D -> set of decision features. (a) R ← {} (b) Do (c) T ← R (d) ∀ x ∈ (C-R) (e) if γ R ∪ {x}(D) > γT(D) where γR(D)=card(POSR(D)) / card(U) (f) T ← R ∪ {x} (g) R ← T (h) until γR(D) = = γC(D) (i) return R

19 Rule extraction: It uses the following Heuristic Approach – Merge identical rows that has similar condition and decision attribute – Compute the core of every row – Merge duplicate rows and compose a table with reduct value

20 EXAMPLE Substitute LOW=1, MEDIUM=2, HIGH=3, COM=1 and SUB=2. Applying K-Means clustering algorithm with K=2. The clustered rows are {1, 3, 5, 6} and {2, 4, 7, 8}. Then the above table is reconstructed using the clustered rows as the decision value, presented in Table 1. ObjectWeightDoorSizeCylinder 1Low2Com4 2Low4Sub6 3Medium4Cum4 4High2Cum6 5High4Cum4 6Low4Cum4 7High4Sub6 8Low2Sub6

21 Table-2 Data set after K-means clustering Applying the Quickreduct algorithm in Table 2, the final reduct attributes {WEIGHT, DOOR, SIZE} is obtained. Hence, Table 2 can be reduced into Table 3 using the attribute reduct {WEIGHT, DOOR, SIZE}. ObjectWeightDoorSyzeCylinderMileage 112141 214262 324141 432162 534141 614141 734262 812262

22 Table-3 Attribute Reduction ObjectWeightDoorSizeMileage 11211 21422 32411 43212 53411 61411 73422 81222

23 Rule extraction Merge identical objects of Table 3. otherwise compute the core of every object in Table 3 and present it as in Table -4. ObjectWeightDoorSizeMileage 11*11 21*22 3*411 43**2 5*411 61*11 73**2 81*22

24 Merge duplicate objects with same decision value and compose a table with the reduct value. That is, the merged rows are {1, 6},{2, 8}, {3,5}and{4, 7}. Merged table ObjectWeightDoorSizeMileage 11*11 21*22 3*411 43**2

25 The decision obtained from the above example Decision rules are often presented as implications and are often called “if….then…” rules. We can express the rules as follows: If SIZE = 1 THEN MILEAGE = 1 If SIZE = 2 THEN MILEAGE = 2 If DOOR = 4 and SIZE = 1 THEN MILEAGE = 1 If WEIGHT = 3 THEN MILEAGE = 2

26 Classification of Data Mining Systems Techniques used DB oriented techniques Statistic Machine learning Pattern recognition Neural Network Rough Set etc Application adapted Finance Marketing Medical Stock Telecommunication, etc

27 Kinds of DB Relational Data warehouse Transactional DB Advanced DB system Flat files WWW Kinds of Knowledge Classification Association Clustering Prediction … Classification of Data Mining Systems

28 Data Mining as a Step of KDD Patterns Data Warehouse Databases Flat files Selection and Transformation Data Mining Evaluation & Presentation Cleaning and Intergration Knowledge

29 WHY MATLAB FOR DATA MINING? As a programming language, MATLAB is very like other procedural languages such as Fortran or C. Graphing capability in MATLAB is among the best in the business, and all MATLAB graphs are compeltely configurable through software.

30 Data Mining : Problems and Challenges Noisy data Difficult Training Set Dynamic Databases Large Databases Incomplete Data

31 Performance Issues Cost of the Learning Set Time and Memory Constrain t Predictive Ability

32 Conclusion Data Mining is an analytic process designed to explore data in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. The ultimate goal of data mining is prediction. Application of rough set theory in data mining is used for time sequence analysis of electrical signal. It is also used in medical diagnosis. It is very effective due to its less time complexity, less cost, accuracy, cost of learning is less.

33 References www.google.com www.icgst.com http://en.wikipedia.org/wiki/Rough_set http://en.wikipedia.org/wiki/Concept_mining www.ieee.com www.kurth.com www.gosephtechnology.com

34 THANKS!!! Q UESTIONS ? ?


Download ppt "Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU."

Similar presentations


Ads by Google