Zhu, Rogers, Qian, Kalish Presented by Syeda Selina Akter.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Unsupervised Learning
1 Semi-supervised learning for protein classification Brian R. King Chittibabu Guda, Ph.D. Department of Computer Science University at Albany, SUNY Gen*NY*sis.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Expectation Maximization
An Overview of Machine Learning
Supervised Learning Recap
Chapter 4: Linear Models for Classification
What is Statistical Modeling
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Expectation Maximization Algorithm
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Bayesian Learning Rong Jin.
CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Part I: Classification and Bayesian Learning
Active Learning for Class Imbalance Problem
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
Machine Learning CSE 681 CH2 - Supervised Learning.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Lecture 19: More EM Machine Learning April 15, 2010.
G AUSSIAN M IXTURE M ODELS David Sears Music Information Retrieval October 8, 2009.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Copyright © 2005 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics Thomas Maurice eighth edition Chapter 4.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Optimal Bayes Classification
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
An Overview on Semi-Supervised Learning Methods Matthias Seeger MPI for Biological Cybernetics Tuebingen, Germany.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Machine Learning 5. Parametric Methods.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Does the brain compute confidence estimates about decisions?
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Chapter 4 Basic Estimation Techniques
Semi-Supervised Clustering
Machine Learning Logistic Regression
Basic Estimation Techniques
CH 5: Multivariate Methods
Classification of unlabeled data:
Data Mining Lecture 11.
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Machine Learning Logistic Regression
Basic Estimation Techniques
INTRODUCTION TO Machine Learning
Multivariate Methods Berlin Chen
Basics of ML Rohan Suri.
Supervised machine learning: creating a model
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Volume 23, Issue 21, Pages (November 2013)
What is Artificial Intelligence?
Presentation transcript:

Zhu, Rogers, Qian, Kalish Presented by Syeda Selina Akter

 Do humans use unlabeled data in addition to labeled data?  Can this behavior be explained by mathematical models for Semi-supervised Machine Learning?

Based on the assumption that each class form a coherent group

 Participant receives 2 labeled examples at x=-1 and at x=1  Participant receives unlabeled examples sampled from true class feature distributions

 Artificial Fish ◦ Might reflect prior knowledge about the category  Circles of different size ◦ Prior knowledge about size ◦ Limited for displaying on Computer Screen

−2.5−2.0 −1.5 −1.0 − Artificial 3D stimuli: shapes change with x

Block 1 (labeled) 2 labeled examples at x=1 and x=-1 Each example 10 times Block 2 (test) X=-1,-0.9,…-0.1,0,0.1…,0.9,1 21 evenly spaced unlabeled examples

Block 3 (unlabeled-1) 1.28 σ

Block 3 (unlabeled-1) 1 Right shifted Gaussian Mixture

1.28 σ 1 Right shifted Gaussian Mixture Labeled data  off center and not prototypical  not outlier too

 Ranged examples 1.28 σ 1 Right shifted Gaussian Mixture  x ε [-2.5, 2.5]  ensure both groups span the same range  decision is not biased by range of examples

 Block 4 and 5 ◦ Same 21 range examples ◦ Different 230 random examples from Gaussian Mixtures  Block 6 ◦ Same as block 2 ◦ 21 evenly distributed test examples from range [- 1,1] ◦ Test whether decision boundary changed after seeing unlabeled examples

 Told stimuli are microscopic pollens  Press B or N to classify  Label: audio Feedback  No audio feedback for unlabeled data  12 L-subjects, 10 R-subjects  Each Subjects see 6 blocks of data, i.e, same 815 stimuli  Random order

 Fit logistic regression function To the data  Decision boundary after test-1 at x=0.11  Steep Curve indicated decision consistency  Decision boundary for R-Subjects after test-2 at x= 0.48  Decision boundary for L-subjects after test-2 at x=-0.10 Unlabeled data affects decision boundary

 Closer to decision boundary indicates longer reaction time  Test 2 overall faster than test-1 Familiarity with experiments L-Subject and R-subject reaction Time supports decision boundary shift

 Explain human experiment by following two- component Gaussian Mixture Model  Parameter, θ  Priors for parameter θ

 Expectation Maximization (EM) algorithm  Maximize following objective

M-step: E-step:

 EM finds θ  Predictions through Bayes Rule

 GMM fit with EM on block 1, 2 data  GMM fit with EM on block 1-6 data for L-Subjects  GMM fit with EM on block 1- 2 data for R-Subjects The model predicts decision boundary shift

 λ controls decision boundary shift  λ→ 0 effect of unlabeled block diminishes  Observed distance of 0.58 in Human supervised learning at λ = 0.06 People treat unlabeled examples less importantly than labeled examples

 Reaction time = RT1 + RT2  RT1 = base reaction time Decreases with experience For test 1, RT1 = b1 For test 2, RT1 = b2 b2 < b1  RT2: based on difficulty of example P(y|x) ∼ 0 or 1, X easy P(y|x) ∼ 0.5, x difficult RT2= Entropy of the prediction, h(x)

Reaction time model: Where, Value of a and b found with least squares from human experiment data

 Decision curve noticeably flatter than prediction curve  Not due to averaging the decision across the subjects  Decision curve flatter for each subject too  Differences in memory of human and machine  Machine uses all past examples while Human memory might degrade

 Co-training, S3VM and other techniques in humans should be explored  Small number of Participants  Needs to explore when the assumption of coherent group is wrong  Does order of unlabeled stimuli affect?  Exploring using multiple dimensions of features  Conflicting Results (VDR Study) ◦ Complex settings ◦ Too many labeled data

 What is the optimal number of unlabeled data needed to reflect human learning  Control Group  Null Hypothesis  How study of human learning improves Machine Learning Research?