Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.

Slides:



Advertisements
Similar presentations
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 13.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Machine learning continued Image source:
Unsupervised learning
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Discriminative and generative methods for bags of features
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Recognition: A machine learning approach
Decision making in episodic environments
CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Lecture 28: Bag-of-words models
Bag-of-features models
Unsupervised Learning and Data Mining
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Advanced Multimedia Text Clustering Tamara Berg. Reminder - Classification Given some labeled training documents Determine the best label for a test (query)
Discriminative and generative methods for bags of features
Machine learning Image source:
Introduction to machine learning
Clustering Unsupervised learning Generating “classes”
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Machine learning Image source:
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Tamara Berg Machine Learning Recognizing People, Objects, & Actions 1.
CSE 185 Introduction to Computer Vision Pattern Recognition.
Machine Learning Overview Tamara Berg Language and Vision.
Machine Learning Overview Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
MACHINE LEARNING 張銘軒 譚恆力 1. OUTLINE OVERVIEW HOW DOSE THE MACHINE “ LEARN ” ? ADVANTAGE OF MACHINE LEARNING ALGORITHM TYPES  SUPERVISED.
Data mining and machine learning A brief introduction.
Classification Tamara Berg CSE 595 Words & Pictures.
Mehdi Ghayoumi Kent State University Computer Science Department Summer 2015 Exposition on Cyber Infrastructure and Big Data.
Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.
Recognition using Boosting Modified from various sources including
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Reinforcement Learning Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Prepared by: Mahmoud Rafeek Al-Farra
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Classification II Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
CHAPTER 1: Introduction. 2 Why “Learn”? Machine learning is programming computers to optimize a performance criterion using example data or past experience.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Machine learning Image source:
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
Machine Learning Overview Tamara Berg Recognizing People, Objects, and Actions.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Data Mining and Text Mining. The Standard Data Mining process.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Machine learning Image source:
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Constrained Clustering -Semi Supervised Clustering-
Machine Learning I & II.
Recognition using Nearest Neighbor (or kNN)
Decision making in episodic environments
CS 188: Artificial Intelligence Fall 2008
Text Categorization Berlin Chen 2003 Reference:
Presentation transcript:

Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore, Percy Liang, Luke Zettlemoyer, Rob Pless, Killian Weinberger, Deva Ramanan 1

Announcements HW3 due Oct 29, 11:59pm

Machine learning Image source:

Machine learning Definition –Getting a computer to do well on a task without explicitly programming it –Improving performance on a task based on experience

Big Data!

What is machine learning? Computer programs that can learn from data Two key components –Representation: how should we represent the data? –Generalization: the system should generalize from its past experience (observed data items) to perform well on unseen data items.

Types of ML algorithms Unsupervised –Algorithms operate on unlabeled examples Supervised –Algorithms operate on labeled examples Semi/Partially-supervised –Algorithms combine both labeled and unlabeled examples

Clustering –The assignment of objects into groups (aka clusters) so that objects in the same cluster are more similar to each other than objects in different clusters. –Clustering is a common technique for statistical data analysis, used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics.

Euclidean distance, angle between data vectors, etc

K-means clustering Want to minimize sum of squared Euclidean distances between points x i and their nearest cluster centers m k

K-means x’s – indicate initialization for 3 cluster centers Iterate until convergence: 1) Compute assignment of data points to cluster centers 2) Update cluster centers with mean of assigned points

Flat vs Hierarchical Clustering Flat algorithms –Usually start with a random partitioning of docs into groups –Refine iteratively –Main algorithm: k-means Hierarchical algorithms –Create a hierarchy –Bottom-up: agglomerative –Top-down: divisive

Hierarchical clustering strategies Agglomerative clustering Start with each data point in a separate cluster At each iteration, merge two of the “closest” clusters Divisive clustering Start with all data points grouped into a single cluster At each iteration, split the “largest” cluster

P Produces a hierarchy of clusterings P P P

P

Divisive Clustering Top-down (instead of bottom-up as in Agglomerative Clustering) Start with all data points in one big cluster Then recursively split clusters Eventually each data point forms a cluster on its own.

Flat or hierarchical clustering? For high efficiency, use flat clustering (e.g. k means) For deterministic results: hierarchical clustering When a hierarchical structure is desired: hierarchical algorithm Hierarchical clustering can also be applied if K cannot be predetermined (can start without knowing K)

Clustering in Action – example from computer vision

Recall: Bag of Words Representation  Represent document as a “bag of words”

Bag of features for images  Represent images as a “bag of words”

Bags of features for image classification 1.Extract features

2.Learn “visual vocabulary” Bags of features for image classification

1.Extract features 2.Learn “visual vocabulary” 3.Represent images by frequencies of “visual words” Bags of features for image classification

… 1. Feature extraction

2. Learning the visual vocabulary …

Clustering …

2. Learning the visual vocabulary Clustering … Visual vocabulary

Example visual vocabulary Fei-Fei et al. 2005

3. Image representation ….. frequency Visual words

Types of ML algorithms Unsupervised –Algorithms operate on unlabeled examples Supervised –Algorithms operate on labeled examples Semi/Partially-supervised –Algorithms combine both labeled and unlabeled examples

Example: Sentiment analysis

Example: Image classification apple pear tomato cow dog horse inputdesired output

Example: Seismic data Body wave magnitude Surface wave magnitude Nuclear explosions Earthquakes

The basic classification framework y = f(x) Learning: given a training set of labeled examples {(x 1,y 1 ), …, (x N,y N )}, estimate the parameters of the prediction function f Inference: apply f to a never before seen test example x and output the predicted value y = f(x) outputclassification function input

Naïve Bayes Classification The class that maximizes:

Example: Image classification Car Input: Image Representation Classifier (e.g. Naïve Bayes, Neural Net, etc Output: Predicted label

Example: Training and testing Key challenge: generalization to unseen examples Training set (labels known)Test set (labels unknown)

Some classification methods 10 6 examples Nearest neighbor Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik 2005 … Neural networks LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 … Support Vector Machines and Kernels Conditional Random Fields McCallum, Freitag, Pereira 2000 Kumar, Hebert 2003 … Guyon, Vapnik Heisele, Serre, Poggio, 2001 …

Classification … more soon

Types of ML algorithms Unsupervised –Algorithms operate on unlabeled examples Supervised –Algorithms operate on labeled examples Semi/Partially-supervised –Algorithms combine both labeled and unlabeled examples

Supervised learning has many successes recognize speech, steer a car, classify documents classify proteins recognizing faces, objects in images... Slide Credit: Avrim Blum

However, for many problems, labeled data can be rare or expensive. Unlabeled data is much cheaper. Need to pay someone to do it, requires special testing,… Slide Credit: Avrim Blum

However, for many problems, labeled data can be rare or expensive. Unlabeled data is much cheaper. Speech Images Medical outcomes Customer modeling Protein sequences Web pages Need to pay someone to do it, requires special testing,… Slide Credit: Avrim Blum

However, for many problems, labeled data can be rare or expensive. Unlabeled data is much cheaper. [From Jerry Zhu] Need to pay someone to do it, requires special testing,… Slide Credit: Avrim Blum

Need to pay someone to do it, requires special testing,… However, for many problems, labeled data can be rare or expensive. Unlabeled data is much cheaper. Can we make use of cheap unlabeled data? Slide Credit: Avrim Blum

Semi-Supervised Learning Can we use unlabeled data to augment a small labeled sample to improve learning? But unlabeled data is missing the most important info!! But maybe still has useful regularities that we can use. But… Slide Credit: Avrim Blum