Supervised Learning & Classification, part I Reading: DH&S, Ch 1
Administrivia... Pretest answers back today Today’s lecture notes online after class Apple Keynote, PDF, PowerPoint PDF & PPT auto-converted; may be flakey
Your place in history Yesterday: Course administrivia Fun & fluffy philosophy Today: The basic ML problem Branches of ML: the 20,000 foot view Intro to supervised learning Definitions and stuff
Pretest results: trends Courses dominated by math, stat; followed by algorithms; followed by CS530; followed by AI & CS500 Proficiencies: probability > algorithms > linear algebra μ=56% σ=28%
The basic ML problem “Emphysema” World Supervised f( )
Our job: Reconstruct f() from observations Knowing f() tells us: Can recognize new (previously unseen) instances Classification or discrimination Hashimoto-Pritzker The basic ML problem f( ) ???
Our job: Reconstruct f() from observations Knowing f() tells us: Can synthesize new data (e.g., speech or images) Generation The basic ML problem Random source Emphysema f( )
The basic ML problem Our job: Reconstruct f() from observations Knowing f() tells us: Can help us understand the process that generated data Description or analysis Can tell us/find things we never knew Discovery or data mining f( ) How many clusters (“blobs”) are there? Taxonomy of data? Networks of relationships? Unusual/unexpected things? Most important characteristics?
The basic ML problem Our job: Reconstruct f() from observations Knowing f() tells us: Can help us act or perform better Control Turn left? Turn right? Accelerate? Brake? Don’t ride in the rain?
A brief taxonomy All ML (highly abbreviated ) - have “inputs” - have “outputs” - find “best” f() - have “inputs” - no “outputs” - find “best” f() - have “inputs” - have “controls” - have “reward” - find “best” f() Supervis ed Unsupervis ed Reinforcemen t Learning
A brief taxonomy All ML Supervis ed Unsupervis ed Reinforcemen t Learning (highly abbreviated ) Classificatio n Regression Discrete outputsContinuous outputs
A classic example: digits The post office wants to be able to auto- scan envelopes, recognize addresses, etc ???
Digits to bits 255, 255, 127, 35, 0, , 0, 93, 11, 45, 6... Feature vector Digitize (sensors)
Measurements & features The collection of numbers from the sensors:... is called a feature vector, a.k.a., attribute vector measurement vector instance 255, 0, 93, 11, 45, 6...
Written where d is the dimension of the vector Each is drawn from some range E.g., or or Measurements & features
Features (attributes, independent variables) can come in different flavors: Continuous Discrete Categorical or nominal More on features
We (almost always) assume that the set of features is fixed & of finite dimension, d Sometimes quite large, though ( d ≥100,000 not uncommon) The set of all possible instances is the instance space or feature space, More on features
We (almost always) assume that the set of features is fixed & of finite dimension, d Sometimes quite large, though ( d ≥100,000 not uncommon) The set of all possible instances is the instance space or feature space, More on features
Every example comes w/ a class A.k.a., label, prediction, dependent variable, etc. For classification problems, class label is categorical For regression problems, it’s continuous Usually called dependent or regressed variable We’ll write E.g., Classes 255, 255, 127, 35, 0, , 0, 93, 11, 45, 6... “7” “8”
Classes, cont’d The possible values of the class variable is called the class set, class space, or range Book writes indiv classes as Presumably whole class set is: So
A very simple example I. setosaI. versicolorI. virginica Sepal length Sepal width Petal length Petal width Feature space,
A very simple example I. setosaI. versicolorI. virginica Class space,
Training data Set of all available data for learning == training data A.k.a., parameterization set, fitting set, etc. Denoted Can write as a matrix, w/ a corresponding class vector:
Finally, goals Now that we have and, we have a (mostly) well defined job: Find the function that most closely approximates the “true” function The supervised learning problem:
Goals? Key Questions: What candidate functions do we consider? What does “most closely approximates” mean? How do you find the one you’re looking for? How do you know you’ve found the “right” one?