Data Mining with Oracle using Classification and Clustering Algorithms Presented by Nhamo Mdzingwa Supervisor: John Ebden
Overview of Presentation Recap of Proposal Classification of Data Mining & DM Algorithms Oracle Data Mining Data Mining Process Evaluation of Results Progress so far Updated Timeline Plans
Objective Investigate two types of algorithms available in Oracle10g for data mining (ODM). Apply the two algorithms to actual data. Analyse & Evaluate results in terms of performance.
Classification of Data Mining Directed data mining/supervised learning which build a model that describes one particular attribute in terms of the rest of the data. Undirected DM / Unsupervised learning builds a model to establish the relationships amongst all the input attributes by grouping.
Classification of Data Mining algorithms DM strategies Unsupervised learning Supervised learning Classification Naive Bayes Model Seeker Adaptive Bayes Estimation Prediction Predictive variance Clustering k-Means O-Cluster Input attributes but have no output attributes Input attributes and output one or more attributes Association Discovery Visualization
Algorithms offered in Oracle10g classification 1. Adaptive Bayes Network 2. Naive Bayes 3. Model Seeker clustering 1. k-Means 2. O-Cluster 3. Predictive variance association rules 1. Apriori (association rules)
Evaluation of Results Evaluation of unsupervised learning models involves determining the level of predictive accuracy. Evaluated using test data sets. Compare confidence and support levels of models created from the same training data to determine accuracy.
Progress Literature Survey Oracle10g installed on Athena in Hons Lab Exploring the Oracle9i and 10g Suite including JDeveloper Member of MetaLink (Oracle’s online support service)
Updated Timeline Continuation from literature and tutorials done Investigate Clustering & Classification algorithms (theory) done Find suitable computerised case studies of the use of above algorithms – with or without Oracle. done Search datasets for testing (possibilities: AIDS data & faculty data) In progress Apply algorithms to data found then Critically Analyse & assess results Second semester Write up paperSeptember vacation and 3rd term Final project write up Due 7/11