1 COMP3503 Semi-Supervised Learning COMP3503 Semi-Supervised Learning Daniel L. Silver.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Co Training Presented by: Shankar B S DMML Lab
Unsupervised Learning
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.
Overview Full Bayesian Learning MAP learning
The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Spatial Semi- supervised Image Classification Stuart Ness G07 - Csci 8701 Final Project 1.
Text Classification from Labeled and Unlabeled Documents using EM Kamal Nigam Andrew K. McCallum Sebastian Thrun Tom Mitchell Machine Learning (2000) Presented.
Combining Labeled and Unlabeled Data for Multiclass Text Categorization Rayid Ghani Accenture Technology Labs.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Introduction to Data Mining Engineering Group in ACL.
August 16, 2015EECS, OSU1 Learning with Ambiguously Labeled Training Data Kshitij Judah Ph.D. student Advisor: Prof. Alan Fern Qualifier Oral Presentation.
Semi-Supervised Learning
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Data mining and machine learning A brief introduction.
Text Classification, Active/Interactive learning.
Recent Trends in Text Mining Girish Keswani
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Weakly Supervised Training For Parsing Mandarin Broadcast Transcripts Wen Wang ICASSP 2008 Min-Hsuan Lai Department of Computer Science & Information Engineering.
CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.9: Semi-Supervised Learning Rodney Nielsen Many.
Follow-ups to HMMs Graphical Models Semi-supervised learning CISC 5800 Professor Daniel Leeds.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Genetic Algorithms (in 1 Slide) l GA: based on an analogy to biological evolution l Each.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
Classification using Co-Training
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Data Science Credibility: Evaluating What’s Been Learned
Data Mining Practical Machine Learning Tools and Techniques
Semi-Supervised Clustering
Constrained Clustering -Semi Supervised Clustering-
Combining Labeled and Unlabeled Data with Co-Training
Restricted Boltzmann Machines for Classification
Classification of unlabeled data:
Lecture 15: Text Classification & Naive Bayes
Data Mining Lecture 11.
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Chapter 7: Transformations
Machine Learning: UNIT-3 CHAPTER-1
Presentation transcript:

1 COMP3503 Semi-Supervised Learning COMP3503 Semi-Supervised Learning Daniel L. Silver

2 Agenda  Unsupervised + Supervised = Semi- supervised  Semi-supervised approaches  Co-Training  Software

3  Stanford’s Sebastian Thrun holds a $2M check on top of Stanley, a robotic Volkswagen Touareg R5  212 km autonomus vehicle race, Nevada  Stanley completed in 6h 54m  Four other teams also finished  Great TED talk by him on Driverless cars TED talk TED talk  Further background on Sebastian background on Sebastianbackground on Sebastian DARPA Grand Challenge 2005

4 Unsupervised + Supervised = Semi-supervised   Sebastian Thrun on Supervised, Unsupervised and Semi- supervised learning   =qkcFRr7LqAw =qkcFRr7LqAw

5 Labeled data is expensive …

6 6 Semisupervised learning ● Semisupervised learning: attempts to use unlabeled data as well as labeled data  The aim is to improve classification performance ● Why try to do this? Unlabeled data is often plentiful and labeling data can be expensive  Web mining: classifying web pages  Text mining: identifying names in text  Video mining: classifying people in the news ● Leveraging the large pool of unlabeled examples would be very attractive

7 How can unlabeled data help ?

8 8 Clustering for classification ● Idea: use naïve Bayes on labeled examples and then apply EM 1. Build naïve Bayes model on labeled data 2. Label unlabeled data based on class probabilities (“expectation” step) 3. Train new naïve Bayes model based on all the data (“maximization” step) 4. Repeat 2 nd and 3 rd step until convergence ● Essentially the same as EM for clustering with fixed cluster membership probabilities for labeled data and #clusters = #classes ● Ensures finding model parameters that have equal or greater likelihood after each iteration

9 9 Clustering for classification ● Has been applied successfully to document classification ● Certain phrases are indicative of classes ● e.g “supervisor” and “PhD topic” in graduate student webpage ● Some of these phrases occur only in the unlabeled data, some in both sets ● EM can generalize the model by taking advantage of co- occurrence of these phrases ● Has been shown to work quite well ● A bootstrappng procedure from unlabeled to labeled ● Must take care to ensure feedback is positive

10 Also known as Self-training..

11 Also known as Self-training..

12 Clustering for classification ● Refinement 1:  Reduce weight of unlabeled data to increase power of more accuracte labeled data  During Maximization step, maximize weighting of labeled examples ● Refinement 2:  Allow multiple clusters per class  Number of clusters per class can be set by cross- validation.. What does this mean ??

13 Generative Models Xiaojin Zhu slides – p. 28 See Xiaojin Zhu slides – p. 28

14 Co-training ● Method for learning from multiple views (multiple sets of attributes), eg: classifying webpages ● First set of attributes describes content of web page ● Second set of attributes describes links from other pages ● Procedure: 1. Build a model from each view using available labeled data 2. Use each model to assign labels to unlabeled data 3. Select those unlabeled examples that were most confidently predicted by both models (ideally, preserving ratio of classes) 4. Add those examples to the training set 5. Go to Step 1 until data exhausted ● Assumption: views are independent – this reduces the probability of the models agreeing on incorrect labels

15 Co-training ● Assumption: views are independent – this reduces the probability of the models agreeing on incorrect labels ● On datasets where independence holds experiments have shown that co-training gives better results than using a standard semi-supervised EM approach ● Whys is this ?

16 Co-EM: EM + Co-training ● Like EM for semisupervised learning, but view is switched in each iteration of EM ● Uses all the unlabeled data (probabilistically labeled) for training ● Has also been used successfully with neural networks and support vector machines  Co-training also seems to work when views are chosen randomly! ● Why? Possibly because co-trained combined classifier is more robust than the assumptions made per each underlying classifier

17 Unsupervised + Supervised = Semi-supervised   Sebastian Thrun on Supervised, Unsupervised and Semi- supervised learning   =qkcFRr7LqAw =qkcFRr7LqAw

18 Example: Object recognition results from tracking-based semi-supervised learning       Video accompanies the RSS2011 paper "Tracking-based semi- supervised learning".   The classifier used to generate these results was trained using 3 hand- labeled training tracks of each object class plus a large quantity of unlabeled data.   Gray boxes are objects that were tracked in the laser and classified as neither pedestrian, bicyclist, nor car.   The object recognition problem is broken down into segmentation, tracking, and track classification components. Segmentation and tracking are by far the largest sources of error.   Camera data is used only for visualization of results; all object recognition is done using the laser range finder.

19 Software …   WEKA version that does semi-supervised learning ugarte/downloads/weka-37-modificationhttps://sites.google.com/a/deusto.es/xabier- ugarte/downloads/weka-37-modification   LLGC - Learning with Local and Global Consistency us/um/people/denzho/papers/LLGC.pdfhttp://research.microsoft.com/en- us/um/people/denzho/papers/LLGC.pdf

20 References:  Introduction to Semi-Supervised Learning  Introduction to Semi-Supervised Learning _sch_0001.pdfhttp://mitpress.mit.edu/sites/default/files/titles/content/ _sch_0001.pdfhttp://mitpress.mit.edu/sites/default/files/titles/content/ _sch_0001.pdfhttp://mitpress.mit.edu/sites/default/files/titles/content/ _sch_0001.pdf

21 THE END