Indian Statistical Institute Kolkata

Slides:



Advertisements
Similar presentations
Computational Learning An intuitive approach. Human Learning Objects in world –Learning by exploration and who knows? Language –informal training, inputs.
Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Support Vector Machines and Margins
What is Statistical Modeling
Classification and Decision Boundaries
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
K nearest neighbor and Rocchio algorithm
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Classification and risk prediction
Introduction to Predictive Learning
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Who.
Three kinds of learning
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Classification and Prediction: Basic Concepts Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Data mining and machine learning A brief introduction.
Mehdi Ghayoumi Kent State University Computer Science Department Summer 2015 Exposition on Cyber Infrastructure and Big Data.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Outline Classification Linear classifiers Perceptron Multi-class classification Generative approach Naïve Bayes classifier 2.
Logistic Regression L1, L2 Norm Summary and addition to Andrew Ng’s lectures on machine learning.
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Machine Learning CSE 681 CH2 - Supervised Learning.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Model representation Linear regression with one variable
Andrew Ng Linear regression with one variable Model representation Machine Learning.
Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.
Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Today Ensemble Methods. Recap of the course. Classifier Fusion
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
The problem of overfitting
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
COMP 2208 Dr. Long Tran-Thanh University of Southampton K-Nearest Neighbour.
Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.
Machine Learning 5. Parametric Methods.
Validation methods.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Supervise Learning. 2 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.”
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Lecture 3: Linear Regression (with One Variable)
CSE 4705 Artificial Intelligence
Introduction to Data Science Lecture 7 Machine Learning Overview
Logistic Regression Classification Machine Learning.
Machine Learning Week 1.
Instance Based Learning
Nearest Neighbors CSC 576: Data Mining.
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Indian Statistical Institute Kolkata Supervised Learning Regression, Classification Linear regression, k-NN classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 11, 2014

An Example: Size of Engine vs Power Power (bhp) Engine displacement (cc) An unknown car has an engine of size 1800cc. What is likely to be the power of the engine?

An Example: Size of Engine vs Power Power (bhp) Target Variable Engine displacement (cc) Intuitively, the two variables have a relation Learn the relation from the given data Predict the target variable after learning

Exercise: on a simpler set of data points y 1 2 3 7 4 10 2.5 ? y x Predict y for x = 2.5

Engine displacement (cc) Linear Regression Training set Power (bhp) Engine displacement (cc) Assume: the relation is linear Then for a given x (=1800), predict the value of y

Engine displacement (cc) Linear Regression Engine (cc) Power (bhp) 800 60 1000 90 1200 80 100 75 1400 1500 120 1800 160 2000 140 170 2400 180 Power (bhp) Engine displacement (cc) Optional exercise Linear regression Assume y = a . x + b Try to find suitable a and b

Exercise: using Linear Regression y 1 2 3 7 4 10 2.5 ? y x Define a regression line of your choice Predict y for x = 2.5

Choosing the parameters right Goal: minimizing the deviation from the actual data points y x The data points: (x1, y1), (x2, y2), … , (xm, ym) The regression line: f(x) = y = a . x + b Least-square cost function: J = Σi ( f(xi) – yi )2 Goal: minimize J over choices of a and b

How to Minimize the Cost Function? b a Goal: minimize J for all values of a and b Start from some a = a0 and b = b0 Compute: J(a0,b0) Simultaneously change a and b towards the negative gradient and eventually hope to arrive an optimal Question: Can there be more than one optimal? Δ

Another example: Y Training set High blood sugar N Age Given that a person’s age is 24, predict if (s)he has high blood sugar Discrete values of the target variable (Y / N) Many ways of approaching this problem

Classification problem Y High blood sugar N ? 24 Age One approach: what other data points are nearest to the new point? Other approaches?

Classification Algorithms The k-nearest neighbor classification Naïve Bayes classification Decision Tree Linear Discriminant Analysis Logistics Regression Support Vector Machine

Classification or Regression? Given data about some cars: engine size, number of seats, petrol / diesel, has airbag or not, price Problem 1: Given engine size of a new car, what is likely to be the price? Problem 2: Given the engine size of a new car, is it likely that the car is run by petrol? Problem 3: Given the engine size, is it likely that the car has airbags?

Classification

Example: Age, Income and Owning a flat Training set Owns a flat Does not own a flat Monthly income (thousand rupees) Age Given a new person’s age and income, predict – does (s)he own a flat?

Example: Age, Income and Owning a flat Training set Owns a flat Does not own a flat Monthly income (thousand rupees) Age Nearest neighbor approach Find nearest neighbors among the known data points and check their labels

Example: Age, Income and Owning a flat Training set Owns a flat Does not own a flat Monthly income (thousand rupees) Age The 1-Nearest Neighbor (1-NN) Algorithm: Find the closest point in the training set Output the label of the nearest neighbor

The k-Nearest Neighbor Algorithm Training set Owns a flat Does not own a flat Monthly income (thousand rupees) Age The k-Nearest Neighbor (k-NN) Algorithm: Find the closest k point in the training set Majority vote among the labels of the k points

The Maholanobis distance (1936) Distance measures How to measure distance to find closest points? Euclidean: Distance between vectors x = (x1, … , xk) and y = (y1, … , yk) Manhattan distance: Generalized squared interpoint distance: S is the covariance matrix The Maholanobis distance (1936)

Classification setup Training data / set: set of input data points and given answers for the data points Labels: the list of possible answers Test data / set: inputs to the classification algorithm for finding labels Used for evaluating the algorithm in case the answers are known (but known to the algorithm) Classification task: Determining labels of the data points for which the label is not known or not passed to the algorithm Features: attributes that represent the data

Evaluation Test set accuracy: the correct performance measure Accuracy = #of correct answer / #of all answers Need to know the true test labels Option: use training set itself Parameter selection (for k-NN) by accuracy on training set Overfitting: a classifier performs too good on training set compared to new (unlabeled) test data

Better validation methods Leave one out: For each training data point x of training set D Construct training set D – x, test set {x} Train on D – x, test on x Overall accuracy = average over all such cases Expensive to compute Hold out set: Randomly choose x% (say 25-30%) of the training data, set aside as test set Train on the rest of training data, test on the test set Easy to compute, but tends to have higher variance

The k-fold Cross Validation Method Randomly divide the training data into k partitions D1,…, Dk : possibly equal division For each fold Di Train a classifier with training data = D – Di Test and validate with Di Overall accuracy: average accuracy over all cases

References Data Mining Map: http://www.saedsayad.com/ Lecture videos by Prof. Andrew Ng, Stanford University Available on Coursera (Course: Machine Learning) Data Mining Map: http://www.saedsayad.com/