- Sachin Singh
Data Mining - Concepts Extracting meaningful knowledge from huge chunk of ‘raw’ data. Types –Association –Classification –Temporal
Classification Method Prediction model The C4.5 Tree algorithm Trans_IdAgeStudentCredit_ratingBuys_Computer noExcellentno YesExcellentno YesFairYes NoExcellentYes nofairyes YesExcellentYes Nofairno nofairno
Classification Tree
Analysis of Trees Current work focuses largely on generation of trees –Efficient algorithms –Disk Resident gigantic data sources –Improving accuracy of the generated models Motivation –Current research area – need for analysis
Areas of Analysis Two Sub Problems –Filtering Sub Problem –Comparison Sub Problem
Filtering Sub Problem Typical data warehouses are huge !! Generation of “Bushy” trees Not all outcomes are significant Need to filter trees based on the required outcomes
Filtering Sub Problem Full Classification Tree Filtered Classification Tree
Filtering Sub Problem Advantages –Efficient querying. Faster results –Easy Managed –Useful for comparison sub problem
Comparison Sub Problem Need to monitor changes in data trends by comparing the classification trees Levels of changes identified –Change in test (partition) value –Change in the partitions –Change in node levels –Change in outcome(leaves)
Comparison Sub Problem Issues –Structure of trees unpredictable –Comparing two trees with no standard structure
Solution XML Trees –Convert the tree structure in XML files –XML inherently tree structure –Take advantage of existing XML related technologies –Standard specs
Solution – Proposed File format
Approach Devise Algorithms to solve filtering and comparison problems Analyzing results of comparison in logical terms Measuring efficiency of the algorithms through time and space complexities
Progress