Weka: a useful tool in data mining and machine learning Team 5 Noha Elsherbiny, Huijun Xiong, and Bhanu Peddi
What does it mean really to mine data? Data mining is an experimental science. Data mining finds valuable information hidden in large volumes of data Data mining is the analysis of data and the use of software techniques for finding patterns and regularities in sets of data Data mining encompasses varied fields, including: – Databases – Statistics – Machine Learning – High Performance Computing – Visualization – Mathematics
How does WEKA come into play? "Drowning in Data yet Starving for Knowledge“ There is no single machine learning scheme is suitable to all data mining problems. WEKA(Waikato Environment for knowledge Analysis)
What is in WEKA, that makes it special? Provides many different algorithms for data mining and machine learning This is an open source and freely available It is platform-independent It is easily useable by people who are not data mining specialists It provides flexible facilities for scripting experiments Its has kept up-to-date, with new algorithms being added as they appear in research literature.
How do one implement WEKA, then? Apply a learning method to a dataset and analyze its output to learn more about the data. Use learned models to generate prediction on new instances. Apply several different learners and compare their performance in order to chose best one for prediction.
How do you actually use it? All algorithms take their input form of a single relational table in the ARFF format. The learning methods are called classifiers. – weka.classifiers.IBk: k-nearest neighbour learner – weka.classifiers.trees.J48: decision trees – weka.classifiers.NaiveBayes: Naive Bayes with/without kernels – weka.classifiers.SMO: support vector machines There are also pre-processing tools, called filters
Show us how to use it!
Decision Trees on weka
Naïvebayes on Weka
Support Vector machines
References Witten, Ian: Data Mining: Practical Tools and Techniques KDNuggets, McNicholas, P. D. and Zhao, Y. C. (2009), Association rules: An overview, in Y. Zhao, C. Zhang & L. Cao, eds, 'Post-Mining of Association Rules: Techniques for Effective Knowledge Extraction', IGI Global, pp Available at international.org/downloads/excerpts/33406.pdf process.html University of Waikato, New Zealand