Supervised Learning & Classification, part I Reading: W&F ch 1.1, 1.2, 2.1-2.3, 3.2, 3.3, 4.3, 6.1*

Slides:



Advertisements
Similar presentations
Learning deformable models Yali Amit, University of Chicago Alain Trouvé, CMLA Cachan.
Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Decision Tree Approach in Data Mining
Slideshow C4: Drawing graphs. Features of a good bar chart The bars should be drawn accurately with a pencil and ruler. They should be of equal width.
What is Statistical Modeling
Introduction to Data Mining with XLMiner
Multiple regression analysis
x – independent variable (input)
Statistical Methods Chichang Jou Tamkang University.
Lecture #1COMP 527 Pattern Recognition1 Pattern Recognition Why? To provide machines with perception & cognition capabilities so that they could interact.
Supervised Learning & Classification, part I Reading: DH&S, Ch 1.
1 Cluster Analysis EPP 245 Statistical Analysis of Laboratory Data.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,
Supervised Learning I, Cont’d Reading: Bishop, Ch 14.4, 1.6, 1.5.
Bayesian Learning Part 3+/- σ. Administrivia Final project/proposal Hand-out/brief discussion today Proposal due: Mar 27 Midterm exam: Thurs, Mar 22 (Thurs.
Crash Course on Machine Learning
1 Naïve Bayes A probabilistic ML algorithm. 2 Axioms of Probability Theory All probabilities between 0 and 1 True proposition has probability 1, false.
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
A Few Answers Review September 23, 2010
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
CISC 4631 Data Mining Lecture 03: Introduction to classification Linear classifier Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook.
Machine Learning CSE 681 CH2 - Supervised Learning.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
What is an ISA? An ISA is a controlled assessment. It tests your science skills. You need to complete one for each GCSE in science that you do. The ISA.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
ESL Chap1 - Introduction Statistical Learning Problems Identify the risk factors for prostate cancer, based on clinical and demographic variables.
Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Machine Learning Concept Learning General-to Specific Ordering
A TUTORIAL ON SUPPORT VECTOR MACHINES FOR PATTERN RECOGNITION ASLI TAŞÇI Christopher J.C. Burges, Data Mining and Knowledge Discovery 2, , 1998.
CS 445/545 Machine Learning Winter, 2014 See syllabus at
Data Mining and Decision Support
1 What is Data? l An attribute is a property or characteristic of an object l Examples: eye color of a person, temperature, etc. l Attribute is also known.
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Introduction to Machine Learning © Roni Rosenfeld,
Neural networks – Hands on
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
Diagonal is sum of variances In general, these will be larger when “within” class variance is larger (a bad thing) Sw(iris[,1:4],iris[,5]) Sepal.Length.
Pattern recognition – basic concepts. Sample input attribute, attribute, feature, input variable, independent variable (atribut, rys, příznak, vstupní.
CSE 4705 Artificial Intelligence
Data Mining Introduction to Classification using Linear Classifiers
Clustering CSC 600: Data Mining Class 21.
Chapter 7. Classification and Prediction
Chapter 18 From Data to Knowledge
Introduction Machine Learning 14/02/2017.
Learning from Data. Learning from Data Learning sensors actuators environment agent ? As an agent interacts with the world, it should learn about its.
School of Computer Science & Engineering
CSE 4705 Artificial Intelligence
CH 5: Multivariate Methods
Data mining and statistical learning, lecture 1b
Discriminant Analysis
The Elements of Statistical Learning
Introduction to Data Science Lecture 7 Machine Learning Overview
Machine Learning Dr. Mohamed Farouk.
Introduction to Data Mining, 2nd Edition by
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
Prepared by: Mahmoud Rafeek Al-Farra
DataMining, Morgan Kaufmann, p Mining Lab. 김완섭 2004년 10월 27일
Classification and Prediction
The goal of machine learning
Why Machine Learning Flood of data
Multivariate Methods Berlin Chen
Basics of ML Rohan Suri.
Multivariate Methods Berlin Chen, 2005 References:
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Presentation transcript:

Supervised Learning & Classification, part I Reading: W&F ch 1.1, 1.2, , 3.2, 3.3, 4.3, 6.1*

Administrivia... Notes from last class online now Pretest (background assessment) today

The basic ML problem “Emphysema” f( ⋅ ) World Supervised

The basic ML problem Our job: Reconstruct f() from observations Knowing f() tells us: Can recognize new (previously unseen) instances Classification or discrimination Can synthesize new data (e.g., speech or images) Generation Can help us understand the process that generated data Description or analysis Can tell us/find things we never knew Discovery or data mining Can help us act or perform better Control

A classic example: digits The post office wants to be able to auto-scan envelopes, recognize addresses, etc ???

Digits to bits 255, 255, 127, 35, 0, , 0, 93, 11, 45, 6... Feature vector Digitize (sensors)

Measurements & features The collection of numbers from the sensors:... is called a feature vector, a.k.a., attribute vector measurement vector instance Written where d is the dimension of the vector Each is drawn from some range E.g., or or 255, 0, 93, 11, 45, 6...

More on features Features (attributes, independent variables) can come in different flavors: Continuous, Discrete, Categorical or nominal, We (almost always) assume that the set of features is fixed & of finite dimension, d Sometimes quite large, though (d≥100,000 not uncommon) The set of all possible instances is the instance space or feature space E.g., or or or

Classes Every example comes w/ a class A.k.a., label, prediction, dependent variable, etc. For classification problems, class label is categorical For regression problems, it’s continuous Usually called dependent or regressed variable We’ll write E.g., 255, 255, 127, 35, 0, , 0, 93, 11, 45, 6... “7” “8”

A very simple example “Iris” data set Gathered by Fisher in mid-1930’s Feature space is sepal-length, sepal-width, petal-length, petal-width ( ) Classes are: I. setosaI. versicolorI. virginica

Training data Set of all available data for learning == training data A.k.a., parameterization set, fitting set, etc. Denoted Can write as a matrix, w/ a corresponding class vector:

Finally, goals Now that we have and, we have a (mostly) defined job: Key Questions: What candidate functions do we consider? What does “most closely approximates” mean? How do you find the one you’re looking for? How do you know you’ve found the “right” one? Find the function that most closely approximates the “true” function