Download presentation
Presentation is loading. Please wait.
1
Supervised Learning & Classification, part I Reading: W&F ch 1.1, 1.2, 2.1-2.3, 3.2, 3.3, 4.3, 6.1*
2
Administrivia... Notes from last class online now Pretest (background assessment) today
3
The basic ML problem “Emphysema” f( ⋅ ) World Supervised
4
The basic ML problem Our job: Reconstruct f() from observations Knowing f() tells us: Can recognize new (previously unseen) instances Classification or discrimination Can synthesize new data (e.g., speech or images) Generation Can help us understand the process that generated data Description or analysis Can tell us/find things we never knew Discovery or data mining Can help us act or perform better Control
5
A classic example: digits The post office wants to be able to auto-scan envelopes, recognize addresses, etc. 87131 ???
6
Digits to bits 255, 255, 127, 35, 0, 0... 255, 0, 93, 11, 45, 6... Feature vector Digitize (sensors)
7
Measurements & features The collection of numbers from the sensors:... is called a feature vector, a.k.a., attribute vector measurement vector instance Written where d is the dimension of the vector Each is drawn from some range E.g., or or 255, 0, 93, 11, 45, 6...
8
More on features Features (attributes, independent variables) can come in different flavors: Continuous, Discrete, Categorical or nominal, We (almost always) assume that the set of features is fixed & of finite dimension, d Sometimes quite large, though (d≥100,000 not uncommon) The set of all possible instances is the instance space or feature space E.g., or or or
9
Classes Every example comes w/ a class A.k.a., label, prediction, dependent variable, etc. For classification problems, class label is categorical For regression problems, it’s continuous Usually called dependent or regressed variable We’ll write E.g., 255, 255, 127, 35, 0, 0... 255, 0, 93, 11, 45, 6... “7” “8”
10
A very simple example “Iris” data set Gathered by Fisher in mid-1930’s Feature space is sepal-length, sepal-width, petal-length, petal-width ( ) Classes are: I. setosaI. versicolorI. virginica
11
Training data Set of all available data for learning == training data A.k.a., parameterization set, fitting set, etc. Denoted Can write as a matrix, w/ a corresponding class vector:
12
Finally, goals Now that we have and, we have a (mostly) defined job: Key Questions: What candidate functions do we consider? What does “most closely approximates” mean? How do you find the one you’re looking for? How do you know you’ve found the “right” one? Find the function that most closely approximates the “true” function
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.