1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang.

Slides:



Advertisements
Similar presentations
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Advertisements

EECS738 Xue-wen Chen EECS 738: Machine Learning Fall 2011, Prof. Xue-wen Chen The University of Kansas.
Machine Learning: Intro and Supervised Classification
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Machine learning continued Image source:
CSE 5522: Survey of Artificial Intelligence II: Advanced Techniques Instructor: Alan Ritter TA: Fan Yang.
An Overview of Machine Learning
What is Statistical Modeling
Machine Learning II Decision Tree Induction CSE 473.
Computational Learning Theory
1 Text Categorization CSE 454. Administrivia Mailing List Groups for PS1 Questions on PS1? Groups for Project Ideas for Project.
EECS 349 Machine Learning Instructor: Doug Downey Note: slides adapted from Pedro Domingos, University of Washington, CSE
CSE 5731 Lecture 26 Introduction to Machine Learning CSE 573 Artificial Intelligence I Henry Kautz Fall 2001.
Learning From Data Chichang Jou Tamkang University.
CSE 546 Data Mining Machine Learning Instructor: Pedro Domingos.
Machine Learning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
Part I: Classification and Bayesian Learning
Classification and Prediction: Basic Concepts Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Introduction to machine learning
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
General Information Course Id: COSC6342 Machine Learning Time: MO/WE 2:30-4p Instructor: Christoph F. Eick Classroom:SEC 201
Machine Learning CUNY Graduate Center Lecture 1: Introduction.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
MACHINE LEARNING 張銘軒 譚恆力 1. OUTLINE OVERVIEW HOW DOSE THE MACHINE “ LEARN ” ? ADVANTAGE OF MACHINE LEARNING ALGORITHM TYPES  SUPERVISED.
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 10a-11:30a Instructor: Christoph F. Eick Classroom:AH123
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
Machine Learning CSE 681 CH2 - Supervised Learning.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS Machine Learning 15 Jan Inductive Classification.
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
M Machine Learning F# and Accord.net.
Instructor: Pedro Domingos
Chapter1: Introduction Chapter2: Overview of Supervised Learning
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Data Mining and Decision Support
1 Text Categorization CSE Categorization Given: –A description of an instance, x  X, where X is the instance language or instance space. –A fixed.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Machine Learning & Datamining CSE 454. © Daniel S. Weld 2 Project Part 1 Feedback Serialization Java Supplied vs. Manual.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 1-2:30p Instructor: Christoph F. Eick Classroom:AH301
FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan
Who am I? Work in Probabilistic Machine Learning Like to teach 
Instructor: Pedro Domingos
Machine Learning overview Chapter 18, 21
Machine Learning overview Chapter 18, 21
Eick: Introduction Machine Learning
School of Computer Science & Engineering
CSEP 546 Data Mining Machine Learning
Special Topics in Data Mining Applications Focus on: Text Mining
What is Pattern Recognition?
CSEP 546 Data Mining Machine Learning
CSEP 546 Data Mining Machine Learning
Overview of Machine Learning
Machine Learning CSE 454.
Why Machine Learning Flood of data
Christoph F. Eick: A Gentle Introduction to Machine Learning
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Presentation transcript:

1 CSE 446 Machine Learning Daniel Weld Xiao Ling Congle Zhang

2 © Carlos Guestrin What is Machine Learning ?

3 © Carlos Guestrin Machine Learning Study of algorithms that improve their performance at some task with experience DataUnderstanding Machine Learning

4 Why? Is this topic important?

5 © Carlos Guestrin Exponential Growth in Data DataUnderstanding Machine Learning

6 © Carlos Guestrin Supremacy of Machine Learning Machine learning is preferred approach to  Speech recognition, Natural language processing  Web search – result ranking  Computer vision  Medical outcomes analysis  Robot control  Computational biology  Sensor networks …… This trend is accelerating  Improved machine learning algorithms  Improved data capture, networking, faster computers  Software too complex to write by hand  New sensors / IO devices  Demand for self-customization to user, environment

7 Logistics

8 © Carlos Guestrin, D. Weld, Syllabus Covers a wide range of Machine Learning techniques – from basic to state-of-the-art You will learn about the methods you heard about:  Naïve Bayes, logistic regression, nearest-neighbor, decision trees, boosting, neural nets, overfitting, regularization, dimensionality reduction, error bounds, loss function, VC dimension, SVMs, kernels, margin bounds, K-means, EM, mixture models, semi- supervised learning, HMMs, graphical models, active learning Covers algorithms, theory and applications It’s going to be fun and hard work

9 © Carlos Guestrin Prerequisites Probabilities  Distributions, densities, marginalization… Basic statistics  Moments, typical distributions, regression… Algorithms  Dynamic programming, basic data structures, complexity… Programming  Mostly your choice of language, but Python (NumPy) + Matlab will be very useful We provide some background, but the class will be fast paced Ability to deal with “abstract mathematical concepts”

10 Staff Two Great TAs: Fantastic resource for learning, interact with them!  Xiao Ling, CSE 610,  Office hours: TBA  Congle Zhang, CSE 524,  Office hours: TBA Administrative Assistant  Alicen Smith, CSE 546,

11 Text Books Required Text:  Pattern Recognition and Machine Learning; Chris Bishop Optional:  Machine Learning; Tom Mitchell  The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Trevor Hastie, Robert Tibshirani, Jerome Friedman  Information Theory, Inference, and Learning Algorithms; David MacKay  Website: Andrew Ng’s AI class videos  Website: Tom Mitchell’s AI class videos

12 Grading 4 homeworks (55%)  First one goes out Fri 1/6/12 Start early, Start early, Start early, Start early, Start early, Start early, Start early, Start early, Start early, Start early Midterm (15%)  Circa Feb 10 in class Final (30%)  TBD by registrar

13 © Carlos Guestrin Homeworks Homeworks are hard, start early Due at the beginning of class  Minus 33% credit for each day (or part of day) late All homeworks must be handed in, even for zero credit Collaboration  You may discuss the questions  Each student writes their own answers  Write on your homework anyone with whom you collaborate  Each student must write their own code for the programming part  Please don’t search for answers on the web, Google, previous years’ homeworks, etc. Ask us if you are not sure if you can use a particular reference

14 Communication Main discussion board  Urgent announcements   Subscribe: e446 e446 To instructors, always use: 

15 Space of ML Problems What is Being Learned? Type of Supervision (eg, Experience, Feedback) Labeled Examples RewardNothing Discrete Function ClassificationClustering Continuous Function Regression PolicyApprenticeship Learning Reinforcement Learning

16 ©2009 Carlos Guestrin Classification from data to discrete classes

17 © Carlos Guestrin

Spam filtering 18 ©2009 Carlos Guestrin data prediction

19 ©2009 Carlos Guestrin Text classification Company home page vs Personal home page vs Univeristy home page vs …

20 ©2009 Carlos Guestrin Object detection Example training images for each orientation (Prof. H. Schneiderman)

21 ©2009 Carlos Guestrin Reading a noun (vs verb) [Rustandi et al., 2005]

Weather prediction 22 ©2009 Carlos Guestrin

The classification pipeline 23 ©2009 Carlos Guestrin Training Testing

24 ©2009 Carlos Guestrin Regression predicting a numeric value

Stock market 25 ©2009 Carlos Guestrin

Weather prediction revisted 26 ©2009 Carlos Guestrin Temperature

27 ©2009 Carlos Guestrin Modeling sensor data Measure temperatures at some locations Predict temperatures throughout the environment [Guestrin et al. ’04]

28 ©2009 Carlos Guestrin Clustering discovering structure in data

Clustering Data: Group similar things

Clustering images 30 ©2009 Carlos Guestrin [Goldberger et al.] Set of Images

Clustering web search results 31 ©2009 Carlos Guestrin

32 ©2009 Carlos Guestrin Reinforcement Learning training by feedback

Reinforcement Learning 33 © Carlos Guestrin

Checkers ??? 34 ©2011 D. Weld

35 ©2009 Carlos Guestrin Learning to act Reinforcement learning An agent  Makes sensor observations  Must select action  Receives rewards positive for “good” states negative for “bad” states [Ng et al. ’05]

36 In Summary What is Being Learned? Type of Supervision (eg, Experience, Feedback) Labeled Examples RewardNothing Discrete Function ClassificationClustering Continuous Function Regression PolicyApprenticeship Learning Reinforcement Learning

37 In Summary What is Being Learned? Type of Supervision (eg, Experience, Feedback) Labeled Examples RewardNothing Discrete Function ClassificationClustering Continuous Function Regression PolicyApprenticeship Learning Reinforcement Learning

38 Key Concepts

Classifier Hypothesis: Function for labeling examples Label: - Label: + ? ? ? ?

40 Generalization Hypotheses must generalize to correctly classify instances not in the training data. Simply memorizing training examples is a consistent hypothesis that does not generalize.

ML = Function Approximation 41 c(x) x May not be any perfect fit Classification ~ discrete functions h(x) h(x) = contains(`nigeria’, x)  contains(`wire-transfer’, x)

© Daniel S. Weld 42 Why is Learning Possible? Experience alone never justifies any conclusion about any unseen instance. Learning occurs when PREJUDICE meets DATA! Learning a “Frobnitz”

© Daniel S. Weld 43 Bias The nice word for prejudice is “bias”.  Different from “Bias” in statistics What kind of hypotheses will you consider?  What is allowable range of functions you use when approximating? What kind of hypotheses do you prefer?

© Daniel S. Weld 44 Some Typical Biases  Occam’s razor “It is needless to do more when less will suffice” – William of Occam, died 1349 of the Black plague  MDL – Minimum description length  Concepts can be approximated by ... conjunctions of predicates... by linear functions... by short decision trees

ML as Optimization Specify Preference Bias  aka “Loss Function” Solve using optimization  Combinatorial  Convex  Linear  Nasty 45 © Carlos Guestrin

Overfitting Hypothesis H is overfit when  H’ and  H has smaller error on training examples, but  H has bigger error on test examples

Overfitting Hypothesis H is overfit when  H’ and  H has smaller error on training examples, but  H has bigger error on test examples Causes of overfitting  Training set is too small  Large number of features Big problem in machine learning  One solution: Validation set

© Daniel S. Weld48 Overfitting Accuracy On training data On test data Model complexity (e.g., number of nodes in decision tree)

49 The Road Ahead

© Daniel S. Weld 50 (Some) Datamining Issues What feedback (experience) is available? How to represent this experience? How avoid overfitting?

51 Categorization Given:  A description of an instance, x  X, where X is the instance language or instance space.  A fixed set of categories: C={c 1, c 2,…c n } Determine:  The category of x: c(x)  C, where c(x) is a categorization function whose domain is X and whose range is C.

52 Sample Category Learning Problem Instance language:  size  {small, medium, large}  color  {red, blue, green}  shape  {square, circle, triangle} C = {positive, negative} D: ExampleSizeColorShapeCategory 1smallredcirclepositive 2largeredcirclepositive 3smallredtrianglenegative 4largebluecirclenegative

© Daniel S. Weld 53 A Learning Problem

© Daniel S. Weld 54 Hypothesis Spaces

© Daniel S. Weld 55 Terminology

56 General Learning Issues Many hypotheses consistent with the training data. Bias  Any criteria other than consistency with the training data that is used to select a hypothesis. Classification accuracy  % of instances classified correctly  (Measured on independent test data.) Training time  Efficiency of training algorithm Testing time  Efficiency of subsequent classification

© Daniel S. Weld 57 Two Strategies for ML Restriction bias: use prior knowledge to specify a restricted hypothesis space.  Naïve Bayes Classifier Preference bias: use a broad hypothesis space, but impose an ordering on the hypotheses.  Decision trees.

58 © Carlos Guestrin Enjoy! ML is becoming ubiquitous in science, engineering and beyond This class should give you the basic foundation for applying ML and developing new methods