Commonly Used Datasets for ML

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

Associative Learning Memories -SOLAR_A
Biostatistics-Lecture 4 More about hypothesis testing Ruibin Xi Peking University School of Mathematical Sciences.
Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Who.
Machine Learning Reading: Chapter Machine Learning and AI  Improve task performance through observation, teaching  Acquire knowledge automatically.
Supervised Learning & Classification, part I Reading: DH&S, Ch 1.
Supervised Learning & Classification, part I Reading: W&F ch 1.1, 1.2, , 3.2, 3.3, 4.3, 6.1*
Kun Yi Li, Young Scholar Student, Quincy High School
Machine Learning with WEKA. WEKA: the bird Copyright: Martin Kramer
Age of Abalones using Physical Characteristics: A Classification Problem Hiran Mayukh ECE
FEATURES David Kauchak CS 451 – Fall Admin Assignment 2  This class will make you a better programmer!  How did it go?  How much time did you.
A Simple Method to Extract Fuzzy Rules by Measure of Fuzziness Jieh-Ren Chang Nai-Jian Wang.
Pattern Recognition & Machine Learning Debrup Chakraborty
Saichon Jaiyen, Chidchanok Lursinsap, Suphakant Phimoltares IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 21, NO. 3, MARCH Paper study-
CISC 4631 Data Mining Lecture 03: Introduction to classification Linear classifier Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook.
Tan,Steinbach, Kumar: Exploratory Data Analysis (with modifications by Ch. Eick) Data Mining: “New” Teaching Road Map 1. Introduction to Data Mining and.
IE 585 Competitive Network – Learning Vector Quantization & Counterpropagation.
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,
1 Tournament Not complete Processing will begin again tonight, 7:30PM until wee hours Friday, 8-5. Extra Credit 5 points for passing screening, in tournament.
Data Mining Durga Kumar. Internet Advertisements Data Set sements
Introduction to Probability and Bayesian Decision Making Soo-Hyung Kim Department of Computer Science Chonnam National University.
Neural networks – Hands on
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Simulation of Stock Trading J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山 助教: 熊信寬
Linear Classifiers (LC) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
Machine Learning Reading: Chapter Classification Learning Input: a set of attributes and values Output: discrete valued function Learning a continuous.
Data Preliminaries CSC 600: Data Mining Class 1.
Gilad Lerman Math Department, UMN
Data Mining Introduction to Classification using Linear Classifiers
Exploring Data: Summary Statistics and Visualizations
Handwritten Digit Recognition Using Stacked Autoencoders
Koichi Odajima & Yoichi Hayashi
Quadratic Classifiers (QC)
To find near doubles.
DP for Optimum Strategies in Games
Introduction to Pattern Recognition
Chapter 18 From Data to Knowledge
The Center for Geographic Analysis Harvard University Our mission, services, process, staff, and sample projects Jeff Blossom, Center for Geographic.
Intro to Machine Learning
Lockheed Martin Automatic Recognition Clinic
Erich Smith Coleman Platt
Feature Selection for Pattern Recognition
Classification with Perceptrons Reading:
Machine Learning Dr. Mohamed Farouk.
Prepared by Kimberly Sayre and Jinbo Bi
Cloud Analytics Platforms
Machine Learning Week 1.
Figure 1.1 Rules for the contact lens data.
Learning with information of features
Data Mining: Exploring Data
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Principal Component Analysis (PCA)
Outline (I) Introduction to pattern recognition Approaches to PR:
Data Mining: “New” Teaching Road Map
Machine Learning with Weka
DataMining, Morgan Kaufmann, p Mining Lab. 김완섭 2004년 10월 27일
Intro to Machine Learning
CS Fall 2016 (Shavlik©), Lecture 2
network of simple neuron-like computing elements
Machine Learning Tools in Healthcare Analytics
On Convolutional Neural Network
Machine Learning in FinTech
2 types of scale factor problems
Logistic Regression & Transfer Learning
Data Preliminaries CSC 576: Data Mining.
Prediction in Stock Trading
Game Trees and Minimax Algorithm
Data exploration and visualization
Presentation transcript:

Commonly Used Datasets for ML J.-S. Roger Jang (張智星) jang@mirlab.org http://mirlab.org/jang MIR Lab, CSIE Dept. National Taiwan University

Try “license plate dataset” in Google! Datasets There are numerous datasets for testing machine learning algorithms for classification: Kaggle UCI Machine Learning Repository Image net MNIST handwritten digit database Labeled Faces in the Wild Many many more… Try “license plate dataset” in Google!

UCI Dataset: Iris Source Goal Dataset specs R.A. Fisher, 1936 Predict the types of iris in Hawaii Dataset specs 150 instances, 3 classes 4 attributes (features) sepal length sepal width petal length petal width

UCI Dataset: Wine Source Goal Dataset specs Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy. Goal Using 13 chemical constituents to determine the origin of wines Dataset specs 178 instances, 3 classes, 13 attributes

UCI Dataset: Abalone Source Goal Dataset specs Dept. of Primary Industry and FIsheries, Tasmania, Australia Goal Predict the age of abalone (鮑魚) Dataset specs 4177 instances, 29 classes 8 attributes (features): sex, length, diameter, height, whole weight, shucked weight, viscera weight, shell weight 1 output: rings (+1.5 gives the age in years)

UCI Dataset: Mushroom Classification Source Mushroom records drawn from The Audubon Society Field Guide to North American Mushrooms (1981) Goal To determine a mushroom is poisonous or edible Dataset specs 8124 instances, 2 classes, 22 attributes

UCI Dataset: Liver Disorder Source BUPA Medical Research Ltd. Goal Use variables from blood tests and alcohol consumption to see if liver disorder exists Dataset specs 345 instances, 2 classes, 6 attributes (the first five are results from blood tests, the last one is alcohol consumption per day)

UCI Dataset: Credit Screening Source Chiharu Sano, csano@bonnie.ICS.UCI.EDU Goal Determine people who are granted credit Dataset specs 125 instances, 2 classes, 15 attributes

UCI Dataset: House Price Prediction Source CMU StatLib Library Goal Predict house price near Boston Dataset specs 506 instances, 13 attributes

MNIST Digit Dataset (1/2) Source NIST's Special Database 3 (collected among Census Bureau employees) and Special Database 1 (collected among high-school students) Goal Recognize isolated hand-written digits of 0-9 Dataset specs 70000 instances 60000 for training (30000 from SD-3 and 30000 from SD-1) of about 250 writers 10000 for test (5000 from SD-3 and 5000 from SD-1) Normalized to 28x28 gray-scale image, centered by gravity Disjoint writers!

MNIST Digit Dataset (2/2) Links Data source Wikipedia Examples Misclassified digits Predictions 1 & 2 GT

How to Acquire/Visualize the Datasets? Acquire the datasets prData.m for acquiring PR data dcData.m for acquiring DC data Visualize the datasets Please refer to Chapter 2 of DCPR tutorial Example: You need to download Machine Learning Toolbox to try these commands. >> ds=prData('iris') ds = dataName: 'iris' inputName: {'sepal length' 'sepal width' 'petal length' 'petal width'} outputName: {'Setosa' 'Versicolour' 'Virginica'} input: [4x150 double] output: [1x150 double]

Iris Dataset Visualization (1/2) ds=prData('iris'); classSize=dsClassSize(DS, 1); ds=prData('iris'); dsDistPlot(ds);

Iris Dataset Visualization (2/2) ds = prData('iris'); dsProjPlot1(ds); ds = prData('iris'); dsProjPlot2(ds); ds = prData('iris'); dsProjPlot3(ds);