Balance Scale Data Set This data set was generated to model psychological experimental results. Each example is classified as having the balance scale.

Slides:



Advertisements
Similar presentations
scikit-learn Machine Learning in Python Vandana Bachani
Advertisements

Indian Statistical Institute Kolkata
Semantic Analysis of Movie Reviews for Rating Prediction
Three kinds of learning
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
ID3 Algorithm Allan Neymark CS157B – Spring 2007.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
From Genomic Sequence Data to Genotype: A Proposed Machine Learning Approach for Genotyping Hepatitis C Virus Genaro Hernandez Jr CMSC 601 Spring 2011.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10a, April 1, 2014 Support Vector Machines.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10b, April 4, 2014 Lab: More on Support Vector Machines, Trees, and your projects.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
How to Solve Balance Puzzles. Balance Puzzles Here are the basic rules: All weights have a positive, whole number value. Each shape has a consistent,
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification John Blitzer, Mark Dredze and Fernando Pereira University.
Titanic: Machine Learning from Disaster
Kaggle Competition Prudential Life Insurance Assessment
SVM Lab material borrowed from tutorial by David Meyer FH Technikum Wien, Austria see:
Data analysis tools Subrata Mitra and Jason Rahman.
Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
Kaggle Competition Rossmann Store Sales.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
Strong Supervision From Weak Annotation Interactive Training of Deformable Part Models ICCV /05/23.
The Shortest Race. You have to race from one tree (call it A) to another (call it B) and touch a nearby fence (at point P, say) on the way. Where should.
GROUP GOAL Learn and understand python programing language Libraries: Pandas Numpy SKlearn Use machine learning algorithms Decision trees Random Forests.
Kaggle competition Airbnb Recruiting: New User Bookings
General-Purpose Learning Machine
Decision Trees.
Classification with Perceptrons Reading:
SOCIAL COMPUTING Homework 3 Presentation
Basic machine learning background with Python scikit-learn
Labs: Dimension Reduction, Multi-dimensional Scaling, SVM
A brief introduction to neural network
Generalization ..
ID3 Algorithm.
Python Review.
Labs: Dimension Reduction, Multi-dimensional Scaling, SVM
Chapter 6 Indexes, Scales, And Typologies
CS539: Project 3 Zach Pardos.
CSCI N317 Computation for Scientific Applications Unit Weka
An update on scikit-learn
雲端計算.
雲端計算.
Labs: Trees, Dimension Reduction, Multi-dimensional Scaling, SVM
Overview of Neural Network Architecture Assignment Code
Classification Boundaries
雲端計算.
雲端計算.
Nathaniel Choe Rohan Suri
Machine Learning A.Y SEM-II Mr. Dhomse G.P.
Perceptron by Python Wen-Yen Hsu, Hsiao-Lung Chan
雲端計算.
Nearest Neighbors CSC 576: Data Mining.
Machine Learning on Data Lecture 9a- Classification
Analysis for Predicting the Selling Price of Apartments Pratik Nikte
Dr. Sampath Jayarathna Cal Poly Pomona
Dr. Sampath Jayarathna Cal Poly Pomona
The Voted Perceptron for Ranking and Structured Classification
PYTHON PANDAS FUNCTION APPLICATIONS
Analysis on Accelerated Learning Cohorts
Introduction to machine learning KH Wong
Machine Learning for Cyber
Presenter: Donovan Orn
An introduction to neural network and machine learning
Machine Learning for Cyber
Machine Learning for Cyber
Data Mining CSCI 307, Spring 2019 Lecture 8
Machine Learning for Cyber
Presentation transcript:

Python implementation of Decision Tree, Stochastic Gradient Descent, and Cross Validation

Balance Scale Data Set This data set was generated to model psychological experimental results. Each example is classified as having the balance scale tip to the right, tip to the left, or be balanced. The attributes are the left weight, the left distance, the right weight, and the right distance. The correct way to find the class is the greater of (left-distance * left- weight) and (right-distance * right-weight). If they are equal, it is balanced.

Import libraries import numpy as np import pandas as pd from sklearn.cross_validation import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score from sklearn import tree

Read from file data = pd.read_csv( ... 'http://archive.ics.uci.edu/ml/machine-learning-databases/balance- scale/balance-scale.data', ... sep= ',', header= None)

Print length of dataset print('Dataset length:', len(name)) Dataset length: 625

Data Slicing Dataset consists of 5 attributes 4 feature attributes and 1 target attribute The index of the target attribute is 1st X = data.values[:,1:5] Y = data.values[:,0]

Split dataset between train and test X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3) X_train and y_train = training data X_test and y_test = test data Test_size = test set will be 30% of whole dataset and training will be 70%

Decision Tree Training clf_entropy = DecisionTreeClassifier(max_depth=3) clf_entropy.fit(X_train, y_train) Result DecisionTreeClassifier(compute_importances=None, criterion='gini', max_depth=3, max_features=None, max_leaf_nodes=None, min_density=None, min_samples_leaf=1, min_samples_split=2, random_state=None, splitter='best')

Prediction y_pred_en = clf_entropy.predict(X_test) y_pred_en

Stochastic Gradient Descent from sklearn.linear_model import SGDClassifier X = [[0., 0.], [1., 1.]] y = [0, 1] clf = SGDClassifier(loss="hinge", penalty="l2") clf.fit(X, y) Predict new values clf.predict([[2., 2.]])

Cross Validation import numpy as np from sklearn.cross_validation import KFold X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) y = np.array([1, 2, 3, 4]) kf = KFold(4, n_folds=2) len(kf) 2 print(kf) sklearn.cross_validation.KFold(n=4, n_folds=2, shuffle=False, random_state=None)

Cross Validation for train_index, test_index in kf: ... print('TRAIN:', train_index, 'test:', test_index) ... TRAIN: [2 3] test: [0 1] TRAIN: [0 1] test: [2 3]

END