UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 1 Tutorial.

Slides:



Advertisements
Similar presentations
Machine Learning Homework
Advertisements

Florida International University COP 4770 Introduction of Weka.
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Introduction to Data Mining with XLMiner
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Classifier Decision Tree A decision tree classifies data by predicting the label for each record. The first element of the tree is the root node, representing.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
March 25, 2004Columbia University1 Machine Learning with Weka Lokesh S. Shrestha.
An Extended Introduction to WEKA. Data Mining Process.
1 Statistical Learning Introduction to Weka Michel Galley Artificial Intelligence class November 2, 2006.
Machine Learning with WEKA. WEKA: the bird Copyright: Martin Kramer
1 How to use Weka How to use Weka. 2 WEKA: the software Waikato Environment for Knowledge Analysis Collection of state-of-the-art machine learning algorithms.
CSCI 347 / CS 4206: Data Mining Module 05: WEKA Topic 04: Data Preparation Tools.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Data Mining Techniques
An Exercise in Machine Learning
CSCI 347 / CS 4206: Data Mining Module 05: WEKA Topic 01: WEKA Navigation.
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
Contributed by Yizhou Sun 2008 An Introduction to WEKA.
Department of Computer Science, University of Waikato, New Zealand Geoff Holmes WEKA project and team Data Mining process Data format Preprocessing Classification.
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
WEKA and Machine Learning Algorithms. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of.
Appendix: The WEKA Data Mining Software
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.
Department of Computer Science, University of Waikato, New Zealand Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
Weka: Experimenter and Knowledge Flow interfaces Neil Mac Parthaláin
W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef StefanNovember 2005.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Weka – A Machine Learning Toolkit October 2, 2008 Keum-Sung Hwang.
Introduction to Weka Xingquan (Hill) Zhu Slides copied from Jeffrey Junfeng Pan (UST)
 A collection of open source ML algorithms ◦ pre-processing ◦ classifiers ◦ clustering ◦ association rule  Created by researchers at the University.
W E K A Waikato Environment for Knowledge Aquisition.
An Exercise in Machine Learning
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Weka Tutorial. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering – association rule Created by.
Weka. Weka A Java-based machine vlearning tool Implements numerous classifiers and other ML algorithms Uses a common.
Machine Learning with WEKA - Yohan Chin. WEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning algorithms for data tasks.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
@relation age sex { female, chest_pain_type { typ_angina, asympt, non_anginal,
WEKA: A Practical Machine Learning Tool WEKA : A Practical Machine Learning Tool.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Department of Computer Science, University of Waikato, New Zealand Geoff Holmes WEKA project and team Data Mining process Data format Preprocessing Classification.
An Introduction to WEKA
CS 9633 Machine Learning Support Vector Machines
Machine Learning with WEKA
Waikato Environment for Knowledge Analysis
WEKA.
Sampath Jayarathna Cal Poly Pomona
An Introduction to WEKA
Machine Learning with WEKA
Machine Learning with WEKA
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Machine Learning with Weka
An Introduction to WEKA
Machine Learning with Weka
Machine Learning with WEKA
Lecture 10 – Introduction to Weka
Copyright: Martin Kramer
Presentation transcript:

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 1 Tutorial 1 Prototyping DM Techniques with WEKA and YALE Open-Source Software Prototyping DM Techniques with WEKA and YALE Open-Source Software Department of Mathematical Information Technology University of Jyväskylä Mykola Pechenizkiy Course webpage: TIES443 November 7, 2006

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 2 Contents Brief Review of DM Software –Commercial –Open-source WEKA YALE The R Project for Statistical Computing Pentaho – whole BI solutions. –Matlab – Sami will tell you more during the 2nd Tutorial WEKA vs. YALE Comparison –Exploration –Experimentation –Visualization 1 st Assignment

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 3 Data Mining Software Many providers of commercial DM software –SAS Enterprise Miner, SPSS Clementine, Statistica Data Miner, MS SQL Server, Polyanalyst, KnowledgeSTUDIO, … –IBM Intelligent Miner. Universities can now receive free copies of DB2 and Intelligent Miner for educational or research purposes. –See for a listhttp:// Open Source: –WEKA (Waikato Environment for Knowledge Analysis) –YALE (Yet Another Learning Environment) –Many others MLC++, Minitab, AlphaMiner, Rattle, KNIME –The Pentaho BI project – “a pioneering initiative by the Open Source development community to provide organizations with a comprehensive set of BI capabilities that enable them to radically improve business performance, efficiency, and effectiveness.”

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 4 Data Mining with WEKA Copyright: Martin Kramer The following slides are from by Eibe Frank

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 5 WEKA: the software Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” book by Witten & Frank – Main features: –Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods –Graphical user interfaces (incl. data visualization) –Environment for comparing learning algorithms

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM age sex { female, chest_pain_type { typ_angina, asympt, non_anginal, cholesterol exercise_induced_angina { no, class { present, 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM age sex { female, chest_pain_type { typ_angina, asympt, non_anginal, cholesterol exercise_induced_angina { no, class { present, 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present... WEKA only deals with “flat” files

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 8

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 9 Command line tutorial

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 10

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 11 Explorer: Pre-processing the Data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: –Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 12

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 13

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 14

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 15

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 16

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 17

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 18

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 19

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 20

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 21

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 22

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 23

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 24

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 25

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 26

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 27

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 28

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 29

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 30

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 31

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 32

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 33 Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: –Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … “Meta”-classifiers include: –Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, …

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 34

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 35

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 36

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 37

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 38

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 39

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 40

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 41

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 42

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 43

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 44

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 45

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 46

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 47

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 48

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 49

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 50

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 51

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 52

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 53

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 54

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 55

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 56

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 57

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 58

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 59

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 60

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 61

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 62

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 63

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 64

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 65

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 66

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 67

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 68

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 69

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 70

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 71

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 72

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 73

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 74

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 75

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 76

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 77

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 78

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 79

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 80

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 81 Explorer: clustering data WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: – k -Means, EM, Cobweb, X -means, FarthestFirst Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 82

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 83

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 84

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 85

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 86

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 87

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 88

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 89

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 90

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 91

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 92

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 93

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 94

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 95

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 96

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 97 Explorer: finding associations WEKA contains an implementation of the Apriori algorithm for learning association rules –Works only with discrete data Can identify statistical dependencies between groups of attributes: –milk, butter  bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 98

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 99

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 100

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 101

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 102

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 103

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 104

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 105 Explorer: attribute selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: –A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking –An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 106

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 107

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 108

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 109

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 110

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 111

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 112

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 113

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 114 Explorer: Data Visualization Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) –To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 115

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 116

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 117

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 118

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 119

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 120

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 121

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 122

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 123

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 124

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 125

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 126

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 127 Performing Experiments Experimenter makes it easy to compare the performance of different learning schemes For classification and regression problems Results can be written into file or database Evaluation options: cross-validation, learning curve, hold- out Can also iterate over different parameter settings Significance-testing built in!

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 128

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 129

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 130

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 131

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 132

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 133

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 134

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 135

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 136

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 137

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 138

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 139

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 140

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 141 The Knowledge Flow GUI New graphical user interface for WEKA Java-Beans-based interface for setting up and running machine learning experiments Data sources, classifiers, etc. are beans and can be connected graphically Data “flows” through components: e.g., “data source” -> “filter” -> “classifier” -> “evaluator” Layouts can be saved and loaded again later

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 142

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 143

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 144

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 145

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 146

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 147

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 148

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 149

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 150

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 151

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 152

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 153

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 154

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 155

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 156

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 157

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 158

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 159

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 160

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 161

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 162 Conclusion: try it yourself! WEKA is available at  Also has a list of projects based on WEKA  YALE has different interfaces and ideas behind but it also integrates all available DM techniques from WEKA

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 163 YALE – Yet Another Learning Environment Artificial Intelligence Unit of the University of Dortmund. The following slides are compiled from screenshots and related descriptions available from YALE pages

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 164 Features of YALE freely available open-source knowledge discovery environment 100% pure Java (runs on every major platform and operating system) KD processes are modeled as simple operator trees which is both intuitive and powerful operator trees or subtrees can be saved as building blocks for later re-use internal XML representation ensures standardized interchange format of data mining experiments simple scripting language allowing for automatic large- scale experiments multi-layered data view concept ensures efficient and transparent data handling

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 165 Features of YALE Flexibility in using YALE: – graphical user interface (GUI) for interactive prototyping – command line mode (batch mode) for automated large-scale applications – Java API to ease usage of YALE from your own programs simple plugin and extension mechanisms, some plugins already exists and you can easily add your own powerful plotting facility offering a large set of sophisticated high- dimensional visualization techniques for data and models more than 350 machine learning, evaluation, in- and output, pre- and post-processing, and visualization operators plus numerous meta optimization schemes machine learning library WEKA fully integrated YALE’s potential application include text mining, multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 166 Experiment Setup the initial operator tree which only consist of a root node. The lower part of the YALE main frame serves for displaying and viewing log and error messages. The "Tree View" tab is the most often used editor for YALE experiments. Left: the current operator tree. Right: a table with the parameters of the currently selected operator.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 167 After the learning operator "J48", a breakpoint indicates that the intermediate results can be inspected. Due to the modular concept of YALE, it is always possible to inspect and save intermediate results, e.g. the results for each individual run in a cross validation

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 168 add new operators to the experiment: directly from the context menu of its parent. the new operator dialog shown in this screenshot. Several search constrains exist and a short description for each operator is shown

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 169 The operator trees are coded and represented by a simple XML format. The XML editor tab allows for fast and direct manipulations of the current experiment.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 170 The "Box View" - is another viewer for YALE experiments. the box format is an intuitive way of representing the nesting of the operators. but editing is not possible All views can also be printed and exported to a wide range of graphic formats including jpg, png, ps and pdf.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 171 "Monitor" tab provides an overview of the currently used memory and is an important tool for large-scale data mining tasks on huge data sets. The amount of used memory during an experiment run can even be logged in the same way like all other provided logging values.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 172 Data can be imported from several file formats with the attribute editor. Other file formats like Arff, C45, csv, and dBase can be loaded with specialized operators. Attribute Editor can be used to create meta data descriptions from almost arbitrary file formats. These meta data descriptions can then be used for an input operator which actually loads the data.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 173 Additional attributes (features) can easily be constructed from your data. YALE provides several approaches to construct the best feature space automatically. These approaches range from feature space transformations like PCA, GHA, ICA or the kernel versions to standard feature selection techniques to several evolutionary approaches for feature construction and extraction.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 174 Help features to ease the learning phase for new users: An online tutorial, tool tip texts, a beginner and expert mode, operator info screens, a GUI manual, and the YALE tutorial.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 175 Data Visualization Each time a data set is presented in the results tab (e.g. after loading it), several views appear: a meta data view describing all attributes, a data view showing the actual data and a plot view providing a large set of (high-dimensional) plotters for the data set at hand.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 176 The basic scatter plotter: Two of the attribute are used as axes, the class label attribute is used for colorization. The legend at the top maps the colors used to the classes or, in case of a real-valued color plot column, to the corresponding real values.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 177 The standard scatter plotter even allows jittering, zooming, and displaying example ids. Double- clicking a data point opens a visualizer. The standard example visualizer is presented here.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 178 2D scatter plots can be put together to a scatter plot matrix where for all pairs of dimensions a usual scatter plot is drawn. This plotter is only available for less then 10 dimensions. For higher number of dimensions one of the other high-dimensional data plotter presented below should be used.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 179 A 3D scatter plot exists similar to the colorized 2D scatter plot discussed above. The viewport can be rotated and zoomed to fit your needs. The built-in 2D and 3D plotters are a quick and easy way to view your numerical and nominal results, even as online plot at experiment runtime!

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 180 SOM (Self-Organizing Map) plotter which uses a Kohonen net for dimensionality reduction. Plotting of the U-, the P-, and the U*-Matrix are supported with different color schemes. The data points can be colorized by one of the data columns, e.g. with the prediction label.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 181 SOM (Self-Organizing Map) plotter which uses a Kohonen net for dimensionality reduction. a gray scale color scheme was used to plot the U- Matrix.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 182 The parallel plotter prints the axes of all dimensions parallel to each other. This is the natural visualization technique for series data but can also be useful for other types of data. The main advantage of parallel plots is that a very high number of dimensions can be visualized with this technique. The dimensions are colorized with the feature weights. The more yellow a dimension is marked, the more important this column is.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 183 quartile plots (also known as box plots) are often used for experiment results like performance values but it is possible to summarize the statistical properties of data columns in general with this type of plot.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 184 Histogram plots (also known as distribution plots)

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 185 RadViz is another high- dimensional data plotter where the data columns are placed as radial dimension anchors. Each data point is connected to each anchor with a spring corresponding to the feature values. This will lead to a fixed position in the two-dimensional plane. Again, weights are used to mark the more important columns.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 186 A survey plot is a sort of vertical histogram matrix also suitable for a large number of dimensions. Each line corresponds to one data point and can be colorized by one of the columns. The length of each section corresponds to the value of the data point for that dimension. For up to three dimensions the order of the histograms can be selected.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 187 Andrews curves are another way of visualizing high- dimensional data. Each data point is projected onto a set of orthogonal trigonometric functions and displayed as a curve. It is known that Andrews curves preserve distances, so they have many uses for data analysis and exploration. Often outliers and hidden patterns can be well detected in these plots.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 188 Visualization of Models and other Results The result of a learning step is called model. Some models provide a graphical representation of the learned hypothesis. This screenshot presents a learned decision tree for the widely known "labor negotiations" data set from the UCI repository. Results like learned models, performance values, data sets or selected attributes are displayed when the experiment is completed or a breakpoint is reachedUCI repository

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 189 In cases where no graphical representation of a learned model is available, at least a textual description of the learned model is presented. In this screenshot you see a Stacking model consisting of a rule model (the upper half) and a neural network (starts at the lower half). Both base models are described by simple and understandable texts.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 190 This is a density plot (similar to a contour plot) of the decision function of a Support Vector Machine (SVM). Almost all SVM implementations in YALE provide a table and a plot view of the learned model. In this screenshot, red points refer to support vectors, blue points to normal training examples. Bluish regions will be predicted negative, reddish regions will be predicted positive.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 191 only the support vectors are shown colorized by the preditcted function value for the corresponding data point. Examples on the red side will be predicted positive; examples on the blue side will be predicted negative. There is a perfectly linear separation in two of the dimensions and it seems to be that the parameters were not chosen optimal since the number of support vectors is rather high.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 192 alpha values (Lagrange multipliers) of the SVM are plotted against the function values and colorized with the true label. We applied a slight jittering to make more points visible. This model seems to be "well-learned", since only few points have a alpha value not equal to zero and these are the points with function values approximately 0.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 193 This surface plot presents the result of a meta optimization experiment: the parameters of one of the operators are optimized. the plot can be rotated and zoomed.

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 194

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 195

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 196

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 197

UNIVERSITY OF JYVÄSKYLÄ DEPARTMENT OF MATHEMATICAL INFORMATION TECHNOLOGY Tutorial 1: Introduction to WEKA and YALETIES443: Introduction to DM 198 WEKA & YALE Comparison You tell me in your report Now lets go through the first assignment 1 st Assignment nment1.pdf nment1.pdf My advise for you is to come back to this assignment and WEKA and YALE tools after each forthcoming lecture to see how the things are implemented and can be used in practice.