Download presentation
Presentation is loading. Please wait.
1
Waikato Environment for Knowledge Analysis
2
Contents What is WEKA? The Explorer: References and Resources
Preprocess data Data Visualization Classification Clustering Association Rules Attribute Selection References and Resources
3
What is WEKA? Waikato Environment for Knowledge Analysis
It’s a data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand. Weka is also a bird found only on the islands of New Zealand.
4
Getting started with weka
Free : License GNU Multi platform (Java): Windows, Mac, Linux Easy to install Examples of dataset Documentation, Tutorial and Mooc
5
What is WEKA? Data Mining tool
User interface / Integrated to your Java code Data filters Classification Clustering Visualization
6
Weka
7
Input data ARFF (Attribute-Relation File Format) CSV SQL Database
Name of the dataset Attributes’ name, value and type Data
8
The explorer Preprocessing Visualization Classification Clustering
Finding associations Attribute selection
9
Explorer: Preprocessing
49 different filters. Supervised, unsupervised On attributes or instances
10
Demo: Filters
11
Demo: Visualization
12
Explorer: Classifier 76 different classification algorithms
Decision trees, instance-based classifiers, support vector machines (SVM), Bayes’ nets…
13
Demo: Classifier
14
Contents What is WEKA? The Explorer: References and Resources
Preprocess data Data Visualization Classification Clustering Association Rules Attribute Selection References and Resources
15
Input data ARFF (Attribute-Relation File Format) CSV SQL Database
Name of the dataset Attributes’ name, value and type Data
16
Contents What is WEKA? The Explorer: References and Resources
Preprocess data Data Visualization Classification Clustering Association Rules Attribute Selection References and Resources
17
Explorer: clustering data
WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: - k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution
18
K-Means Clustering (contd.)
Example
19
The K-Means Clustering Method
Given k, the k-means algorithm is implemented in four steps: Partition objects into k nonempty subsets Compute seed points as the centroids of the clusters of the current partition (the centroid is the center, i.e., mean point, of the cluster) Assign each object to the cluster with the nearest seed point Go back to Step 2, stop when no more new assignment
20
Demo : Clustering Data
21
Explorer: Finding associations
WEKA contains an implementation of the Apriori algorithm for learning association rules Works only with discrete data Can identify statistical dependencies between groups of attributes: milk, butter bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence 9/21/2018
22
Basic Concepts: Frequent Patterns
Tid Items bought 10 Beer, Nuts, Diaper 20 Beer, Coffee, Diaper 30 Beer, Diaper, Eggs 40 Nuts, Eggs, Milk 50 Nuts, Coffee, Diaper, Eggs, Milk itemset: A set of one or more items k-itemset X = {x1, …, xk} (absolute) support, or, support count of X: Frequency or occurrence of an itemset X (relative) support, s, is the fraction of transactions that contains X (i.e., the probability that a transaction contains X) An itemset X is frequent if X’s support is no less than a minsup threshold Customer buys diaper buys both buys beer September 21, 2018
23
Basic Concepts: Association Rules
Tid Items bought Find all the rules X Y with minimum support and confidence support, s, probability that a transaction contains X Y confidence, c, conditional probability that a transaction having X also contains Y Let minsup = 50%, minconf = 50% Freq. Pat.: Beer:3, Nuts:3, Diaper:4, Eggs:3, {Beer, Diaper}:3 10 Beer, Nuts, Diaper 20 Beer, Coffee, Diaper 30 Beer, Diaper, Eggs 40 Nuts, Eggs, Milk 50 Nuts, Coffee, Diaper, Eggs, Milk Customer buys both Customer buys diaper Customer buys beer Association rules: (many more!) Beer Diaper (60%, 100%) Diaper Beer (60%, 75%) September 21, 2018
24
Demo : Association
25
Explorer: attribute selection
Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two 9/21/2018
26
Demo : Attribute Selection
27
Different ways to use it
Explorer : Preprocessor, clustering, classifier, regression analysis, visualization Experimenter: Analysis and comparison of classifiers
28
Different ways to use it
Simple Command Line Instructions Can be integrated to your java code
29
Weka’s Advantages: Contains a lot of algorithms
Free (most other Data Mining tools are very expensive) Open source, so adapting it to your own needs is possible Constantly under development (not only by the original designers)
30
Drawbacks Lack of possibilities to interface with other software
Performance is often sacrificed in favor of portability, design transparency, etc. Memory limitation, because the data has to be loaded into main memory completely
31
Conclusion + Easy to use + No programming skill needed - Visualization and statistical tools limited
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.