David R. Musicant Machine Learning n Definition 1 –“The subfield of AI concerned with programs that learn from experience” –Russell / Norvig, AIMA n Definition 2 –“the application of induction algorithms, which is one step in the knowledge discovery process.” –Machine Learning definition in glossary from Machine Learning at
David R. Musicant Supervised Learning Classification n Example: Cancer diagnosis n Use this training set to learn how to classify patients where diagnosis is not known: n The input data is often easily obtained, whereas the classification is not. Input DataClassification Training Set Test Set
David R. Musicant Classification Problem n Goal: Use training set + some learning method to produce a predictive model. n Use this predictive model to classify new data. n Sample applications:
David R. Musicant Application: Breast Cancer Diagnosis Research by Mangasarian,Street, Wolberg
David R. Musicant Breast Cancer Diagnosis Separation Research by Mangasarian,Street, Wolberg
David R. Musicant Application: Document Classification n The Federalist Papers –Written in by Alexander Hamilton, John Jay, and James Madison to persuade residents of the State of New York to ratify the U.S. Constitution –All written under the pseudonym “Publius” n Who wrote which of them? –Hamilton wrote 56 papers –Madison wrote 50 papers –12 disputed papers, generally understood to be written by Hamilton or Madison, but not known which Research by Bosch, Smith
David R. Musicant Federalist Papers Classification Research by Bosch, SmithGraphic by Fung
David R. Musicant Application: Face Detection n Training data is a collection of Faces and NonFaces n Rotation and Mirroring added in to provide robustness Image obtained from work by Osuna, Freund, and Girosi at
David R. Musicant Face Detection Results Image obtained from work by Osuna, Freund, and Girosi at
David R. Musicant Nearest Neighbor n Simple effective approach for supervised learning problems n Envision each example as a point in n- dimensional space –Picture with 2 of them n Classify test point same as nearest training point (Euclidean distance)
David R. Musicant k-Nearest Neighbor n Nearest Neighbor can be subject to noise –Incorrectly classified training points –Training anomalies n k-Nearest Neighbor –Find k nearest training points (k odd) and vote on which classification n Training time? n Testing time? n Works on numerical data