(Briefly) Active Learning + Course Recap. Active Learning Remember Problem Set 1 Question #1? – Part (c) required generating a set of examples that would.

Slides:



Advertisements
Similar presentations
Rerun of machine learning Clustering and pattern recognition.
Advertisements

Data Mining Classification: Alternative Techniques
Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
An Overview of Machine Learning
Supervised Learning Recap
Tuomas Sandholm Carnegie Mellon University Computer Science Department
. Markov Chains as a Learning Tool. 2 Weather: raining today40% rain tomorrow 60% no rain tomorrow not raining today20% rain tomorrow 80% no rain tomorrow.
Università di Milano-Bicocca Laurea Magistrale in Informatica
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Support Vector Machines and Kernel Methods
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Machine Learning II Decision Tree Induction CSE 473.
Decision Trees. DEFINE: Set X of Instances (of n-tuples x = ) –E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, Water, Forecast.
Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.
EECS 349 Machine Learning Instructor: Doug Downey Note: slides adapted from Pedro Domingos, University of Washington, CSE
Machine Learning: Foundations Yishay Mansour Tel-Aviv University.
CSE 546 Data Mining Machine Learning Instructor: Pedro Domingos.
Machine Learning Neural Networks.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Part I: Classification and Bayesian Learning
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Introduction to Data Mining Engineering Group in ACL.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. Concept Learning Reference : Ch2 in Mitchell’s book 1. Concepts: Inductive learning hypothesis General-to-specific.
Inductive learning Simplest form: learn a function from examples
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 Local search and optimization Local search= use single current state and move to neighboring states. Advantages: –Use very little memory –Find often.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Bayesian Networks Martin Bachler MLA - VO
Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Machine Learning.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Outline Inductive bias General-to specific ordering of hypotheses
Slides for “Data Mining” by I. H. Witten and E. Frank.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
Instructor: Pedro Domingos
Machine Learning Concept Learning General-to Specific Ordering
Data Mining and Decision Support
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
CSE344/544 Machine Learning Richa Singh Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
A Brief Introduction to Bayesian networks
Who am I? Work in Probabilistic Machine Learning Like to teach 
Instructor: Pedro Domingos
The Elements of Statistical Learning
Machine Learning for dotNET Developer Bahrudin Hrnjica, MVP
CSEP 546 Data Mining Machine Learning
Data Mining Lecture 11.
CSEP 546 Data Mining Machine Learning
CSEP 546 Data Mining Machine Learning
Computational Learning Theory
Overview of Machine Learning
Computational Learning Theory
CS639: Data Management for Data Science
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Presentation transcript:

(Briefly) Active Learning + Course Recap

Active Learning Remember Problem Set 1 Question #1? – Part (c) required generating a set of examples that would identify the target concept in the worst case. – …we were able to find the correct hypothesis (out of hundreds in H) with only 8 queries! Logarithmic in |X| In general, guaranteeing perfect performance with randomly drawn examples requires a number of queries in |X|. linear

Active Learning (2) Interesting challenge: choosing which examples are most informative Increasingly important: problems are huge and on-demand labelers are available – “Volunteer armies”: ESP game, Wikipedia – Mechanical Turk Key question: How to identify the most informative queries? – Both a technical question & a human interfaces question

Recap

A Few Quotes “A breakthrough in machine learning would be worth ten Microsofts” (Bill Gates, Chairman, Microsoft) “Machine learning is the next Internet” (Tony Tether, Director, DARPA) “Machine learning is the hot new thing” (John Hennessy, President, Stanford) “Web rankings today are mostly a matter of machine learning” (Prabhakar Raghavan, Dir. Research, Yahoo) “Machine learning is going to result in a real revolution” (Greg Papadopoulos, CTO, Sun) “Machine learning is today’s discontinuity” (Jerry Yang, CEO, Yahoo) 5

Magic? No, more like gardening Seeds = Algorithms Nutrients = Data Gardener = You Plants = Programs 6

Types of Learning Supervised (inductive) learning – Training data includes desired outputs Unsupervised learning – Training data does not include desired outputs Reinforcement learning – Rewards from sequence of actions Semi-supervised learning – Training data includes a few desired outputs 7

Supervised Learning GIVEN: Instances X – E.g., days decribed by attributes: Sky, Temp, Humidity, Wind, Water, Forecast Hypothesis space H – E.g. MC2, conjunction of literals: Training examples D – positive and negative examples of the target function c:,…, FIND: A hypothesis h in H such that h(x)=c(x) for all x in D.

Supervised Learning Algorithms Candidate-Elimination x 1 = x 2 = h 1 = h 2 = h 3 = Instances x2x2 x1x1 Hypotheses h2h2 h3h3 h1h1 h 2  h 1 h 2  h 3 specific general

Decision Trees Learn conjunction of disjunctions by greedily splitting on “best” attribute values

Rule Learning Greedily learn rules to cover examples, e.g.: Can also be applied to learn first-order rules:

Neural Networks Non-linear regression/classification technique Especially useful when inputs/outputs are numeric Long training times, quick testing times Inputs Output Age34 2Gender Stage “Probability of beingAlive” 0.6  .4.2 

Instance Based Methods E.g., K-nearest neighbor Quick training times, long test times The “curse of dimensionality”

Support Vector Machines (1) Derived Feature Spaces (the Kernel Trick):

Support Vector Machines (2) Maximizing Margin:

Bayes Nets (1) Qualitative part: Directed acyclic graph (DAG) Nodes - random vars. Edges - direct influence Quantitative part: Set of conditional probability distributions e b e be b b e BE P(A | B,E) Parents Pa of Alarm Earthquake JohnCalls Burglary Alarm MaryCalls

Bayes Nets (2) Flexible modeling approach – Used for SL, SSL, UL Natural for explicitly encoding prior knowledge

Hidden Markov Models Special case of Bayes Nets for sequential data Admit efficient learning, decoding algorithms titi t i+1 t i+2 t i+3 wiwi w i+1 w i+2 w i+3 cities such as Seattle States – unobserved Words – observed

Computational Learning Theory Based on the data we’ve observed, what can we guarantee? “Probably Approximately Correct” learning Extension to continuous inputs: VC dimension

Optimization Techniques Local Search – Hill climbing, simulated annealing Genetic Algorithms – Key innovation: crossover – Also applied to programs (genetic programming)

Unsupervised Learning K-means Hidden Markov Models Both use the same general algorithm… Expectation Maximization

Key Lessons (1) You can’t learn without inductive bias From the Wired article assigned 1 st week: What do you think? Today companies like Google, which have grown up in an era of massively abundant data, don’t have to settle for wrong models. Indeed, they don’t have to settle for models at all.

Key Lessons (2) Overfitting – Can’t just choose the “most powerful” model Choose the “right” model – One that encodes your understanding of the domain and meets your other requirements – E.g. HMMs vs. decision trees for sequential data Decision trees vs. NNs for mushrooms NNs vs. decision trees for face recognition

24 Course Advertisement EECS 395/495 Spring Quarter 2009 “Web Information Retrieval and Extraction” – Basics of Web search, extraction – New research & future directions – Discussion, project based

Thanks!