Balance Scale Data Set This data set was generated to model psychological experimental results. Each example is classified as having the balance scale.

Slides:

Advertisements

Similar presentations

scikit-learn Machine Learning in Python Vandana Bachani

Advertisements

Indian Statistical Institute Kolkata

Semantic Analysis of Movie Reviews for Rating Prediction

Three kinds of learning

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.

ID3 Algorithm Allan Neymark CS157B – Spring 2007.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

From Genomic Sequence Data to Genotype: A Proposed Machine Learning Approach for Genotyping Hepatitis C Virus Genaro Hernandez Jr CMSC 601 Spring 2011.

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10a, April 1, 2014 Support Vector Machines.

Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10b, April 4, 2014 Lab: More on Support Vector Machines, Trees, and your projects.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

How to Solve Balance Puzzles. Balance Puzzles Here are the basic rules: All weights have a positive, whole number value. Each shape has a consistent,

Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification John Blitzer, Mark Dredze and Fernando Pereira University.

Titanic: Machine Learning from Disaster

Kaggle Competition Prudential Life Insurance Assessment

SVM Lab material borrowed from tutorial by David Meyer FH Technikum Wien, Austria see:

Data analysis tools Subrata Mitra and Jason Rahman.

Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.

1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.

Kaggle Competition Rossmann Store Sales.

Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.

Strong Supervision From Weak Annotation Interactive Training of Deformable Part Models ICCV /05/23.

The Shortest Race. You have to race from one tree (call it A) to another (call it B) and touch a nearby fence (at point P, say) on the way. Where should.

GROUP GOAL Learn and understand python programing language Libraries: Pandas Numpy SKlearn Use machine learning algorithms Decision trees Random Forests.

Kaggle competition Airbnb Recruiting: New User Bookings

General-Purpose Learning Machine

Decision Trees.

Classification with Perceptrons Reading:

SOCIAL COMPUTING Homework 3 Presentation

Basic machine learning background with Python scikit-learn

Labs: Dimension Reduction, Multi-dimensional Scaling, SVM

A brief introduction to neural network

Generalization ..

Labs: Dimension Reduction, Multi-dimensional Scaling, SVM

Chapter 6 Indexes, Scales, And Typologies

CS539: Project 3 Zach Pardos.

CSCI N317 Computation for Scientific Applications Unit Weka

An update on scikit-learn

Labs: Trees, Dimension Reduction, Multi-dimensional Scaling, SVM

Overview of Neural Network Architecture Assignment Code

Classification Boundaries

Nathaniel Choe Rohan Suri

Machine Learning A.Y SEM-II Mr. Dhomse G.P.

Perceptron by Python Wen-Yen Hsu, Hsiao-Lung Chan

Nearest Neighbors CSC 576: Data Mining.

Machine Learning on Data Lecture 9a- Classification

Analysis for Predicting the Selling Price of Apartments Pratik Nikte

Dr. Sampath Jayarathna Cal Poly Pomona

Dr. Sampath Jayarathna Cal Poly Pomona

The Voted Perceptron for Ranking and Structured Classification

PYTHON PANDAS FUNCTION APPLICATIONS

Analysis on Accelerated Learning Cohorts

Introduction to machine learning KH Wong

Machine Learning for Cyber

Presenter: Donovan Orn

An introduction to neural network and machine learning

Machine Learning for Cyber

Machine Learning for Cyber

Data Mining CSCI 307, Spring 2019 Lecture 8

Machine Learning for Cyber

Presentation transcript:

Python implementation of Decision Tree, Stochastic Gradient Descent, and Cross Validation

Balance Scale Data Set This data set was generated to model psychological experimental results. Each example is classified as having the balance scale tip to the right, tip to the left, or be balanced. The attributes are the left weight, the left distance, the right weight, and the right distance. The correct way to find the class is the greater of (left-distance * left- weight) and (right-distance * right-weight). If they are equal, it is balanced.

Import libraries import numpy as np import pandas as pd from sklearn.cross_validation import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score from sklearn import tree

Read from file data = pd.read_csv( ... 'http://archive.ics.uci.edu/ml/machine-learning-databases/balance- scale/balance-scale.data', ... sep= ',', header= None)

Print length of dataset print('Dataset length:', len(name)) Dataset length: 625

Data Slicing Dataset consists of 5 attributes 4 feature attributes and 1 target attribute The index of the target attribute is 1st X = data.values[:,1:5] Y = data.values[:,0]

Split dataset between train and test X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3) X_train and y_train = training data X_test and y_test = test data Test_size = test set will be 30% of whole dataset and training will be 70%

Decision Tree Training clf_entropy = DecisionTreeClassifier(max_depth=3) clf_entropy.fit(X_train, y_train) Result DecisionTreeClassifier(compute_importances=None, criterion='gini', max_depth=3, max_features=None, max_leaf_nodes=None, min_density=None, min_samples_leaf=1, min_samples_split=2, random_state=None, splitter='best')

Prediction y_pred_en = clf_entropy.predict(X_test) y_pred_en

Stochastic Gradient Descent from sklearn.linear_model import SGDClassifier X = [[0., 0.], [1., 1.]] y = [0, 1] clf = SGDClassifier(loss="hinge", penalty="l2") clf.fit(X, y) Predict new values clf.predict([[2., 2.]])

Cross Validation import numpy as np from sklearn.cross_validation import KFold X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) y = np.array([1, 2, 3, 4]) kf = KFold(4, n_folds=2) len(kf) 2 print(kf) sklearn.cross_validation.KFold(n=4, n_folds=2, shuffle=False, random_state=None)

Cross Validation for train_index, test_index in kf: ... print('TRAIN:', train_index, 'test:', test_index) ... TRAIN: [2 3] test: [0 1] TRAIN: [0 1] test: [2 3]

END