Unsupervised Domain Adaptation: From Practice to Theory John Blitzer TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:

Slides:

Advertisements

Similar presentations

Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.

Advertisements

Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.

A Survey on Transfer Learning Sinno Jialin Pan Department of Computer Science and Engineering The Hong Kong University of Science and Technology Joint.

Machine Learning and Data Mining Linear regression

Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University.

Multi-Label Prediction via Compressed Sensing By Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang (NIPS 2009) Presented by: Lingbo Li ECE, Duke University.

Maximum Margin Markov Network Ben Taskar, Carlos Guestrin Daphne Koller 2004.

Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.

An Overview of Machine Learning

Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005.

Confidence-Weighted Linear Classification Mark Dredze, Koby Crammer University of Pennsylvania Fernando Pereira Penn  Google.

A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois.

Domain Adaptation with Structural Correspondence Learning John Blitzer Shai Ben-David, Koby Crammer, Mark Dredze, Ryan McDonald, Fernando Pereira Joint.

On Sketching Quadratic Forms Robert Krauthgamer, Weizmann Institute of Science Joint with: Alex Andoni, Jiecao Chen, Bo Qin, David Woodruff and Qin Zhang.

Chapter 1: Introduction to Pattern Recognition

Principal Component Analysis

1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.

Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.

Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.

Part I: Classification and Bayesian Learning

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Learning from Multiple Outlooks Maayan Harel and Shie Mannor ICML 2011 Presented by Minhua Chen.

Introduction to Data Mining Engineering Group in ACL.

Online Learning Algorithms

Introduction to domain adaptation

CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.

Training and future (test) data follow the same distribution, and are in same feature space.

Adapting Natural Language Processing Systems to New Domains John Blitzer Joint with: Shai Ben-David, Koby Crammer, Mark Dredze, Fernando Pereira, Alex.

Machine Learning Queens College Lecture 13: SVM Again.

Shai Ben-David University of Waterloo Towards theoretical understanding of Domain Adaptation Learning ECML/PKDD 2009 LNIID Workshop.

1 Introduction to Transfer Learning (Part 2) For 2012 Dragon Star Lectures Qiang Yang Hong Kong University of Science and Technology Hong Kong, China

Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.

Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.

Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.

The Necessity of Combining Adaptation Methods Cognitive Computation Group, University of Illinois Experimental Results Title Ming-Wei Chang, Michael Connor.

Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel Transductive Rademacher Complexity and its Applications.

1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.

Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman.

Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification John Blitzer, Mark Dredze and Fernando Pereira University.

Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.

HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.

Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Distribution Specific Learnability – an Open Problem in Statistical Learning Theory M. Hassan Zokaei-Ashtiani December 2013.

Learning from Labeled and Unlabeled Data Tom Mitchell Statistical Approaches to Learning and Discovery, and March 31, 2003.

Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological.

Advanced Artificial Intelligence Lecture 8: Advance machine learning.

Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.

Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.

Correlation Clustering

A presentation provided by John Blitzer and Hal Daumé III

Machine Learning – Classification David Fenyő

Sentiment analysis algorithms and applications: A survey

Restricted Boltzmann Machines for Classification

An Introduction to Supervised Learning

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Introduction to Sensor Interpretation

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Introduction to Sensor Interpretation

Xiao-Yu Zhang, Shupeng Wang, Xiaochun Yun

What is Artificial Intelligence?

Presentation transcript:

Unsupervised Domain Adaptation: From Practice to Theory John Blitzer TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A AAA A A

? ? ? Unsupervised Domain Adaptation Running with Scissors Title: Horrible book, horrible. This book was horrible. I read half, suffering from a headache the entire time, and eventually i lit it on fire. 1 less copy in the world. Don't waste your money. I wish i had the time spent reading this book back. It wasted my life Avante Deep Fryer; Black Title: lid does not work well... I love the way the Tefal deep fryer cooks, however, I am returning my second one due to a defective lid closure. The lid may close initially, but after a few uses it no longer stays closed. I won’t be buying this one again. SourceTarget

? ? ? Target-Specific Features Running with Scissors Title: Horrible book, horrible. This book was horrible. I read half, suffering from a headache the entire time, and eventually i lit it on fire. 1 less copy in the world. Don't waste your money. I wish i had the time spent reading this book back. It wasted my life Avante Deep Fryer; Black Title: lid does not work well... I love the way the Tefal deep fryer cooks, however, I am returning my second one due to a defective lid closure. The lid may close initially, but after a few uses it no longer stays closed. I won’t be buying this one again SourceTarget

Learning Shared Representations fascinating boring read half couldn’t put it down defective sturdy leaking like a charm fantastic highly recommended waste of money horrible Source Target

Shared Representations: A Quick Review Blitzer et al. (2006, 2007). Shared CCA. Tasks: Part of speech tagging, sentiment. Xue et al. (2008). Probabilistic LSA Task: Cross-lingual document classification. Guo et al. (2009). Latent Dirichlet Allocation Task: Named entity recognition Huang et al. (2009). Hidden Markov Models Task: Part of Speech Tagging

What do you mean, theory?

Statistical Learning Theory:

What do you mean, theory? Statistical Learning Theory:

What do you mean, theory? Classical Learning Theory:

What do you mean, theory? Adaptation Learning Theory:

Goals for Domain Adaptation Theory 1.A computable (source) sample bound on target error 2.A formal description of empirical phenomena Why do shared representations algorithms work? 3.Suggestions for future research

Talk Outline 1.Target Generalization Bounds using Discrepancy Distance [BBCKPW 2009] [Mansour et al. 2009] 2.Coupled Subspace Learning [BFK 2010]

Formalizing Domain Adaptation Source labeled data Source distributionTarget distribution Target unlabeled data

Formalizing Domain Adaptation Source labeled data Source distributionTarget distribution Semi-supervised adaptation Target unlabeled data Some target labels

Formalizing Domain Adaptation Source labeled data Source distributionTarget distribution Semi-supervised adaptation Target unlabeled data Not in this talk

A Generalization Bound

A new adaptation bound Bound from [MMR09]

Discrepancy Distance When good source models go bad

Binary Hypothesis Error Regions +

+ +

+ +

Discrepancy Distance low high When good source models go bad

Computing Discrepancy Distance Learn pairs of hypotheses to discriminate source from target

Computing Discrepancy Distance Learn pairs of hypotheses to discriminate source from target

Computing Discrepancy Distance Learn pairs of hypotheses to discriminate source from target

Hypothesis Classes & Representations Linear Hypothesis Class: Induced classes from projections Goals for 1) 2)

A Proxy for the Best Model Linear Hypothesis Class: Induced classes from projections Goals for 1) 2)

Problems with the Proxy ) 2)

Goals 1.A computable bound 2.Description of shared representations 3.Suggestions for future research

Talk Outline 1.Target Generalization Bounds using Discrepancy Distance [BBCKPW 2009] [Mansour et al. 2009] 2.Coupled Subspace Learning [BFK 2010]

Assumption: Single Linear Predictor target-specific can’t be estimated from source alone... yet source-specific target-specificshared

Visualizing Single Linear Predictor fascinating works well don’t buy source target shared

Visualizing Single Linear Predictor fascinating works well don’t buy

Visualizing Single Linear Predictor fascinating works well don’t buy

Visualizing Single Linear Predictor fascinating works well don’t buy

Dimensionality Reduction Assumption

Visualizing Dimensionality Reduction fascinating works well don’t buy

Visualizing Dimensionality Reduction fascinating works well don’t buy

Representation Soundness fascinating works well don’t buy

Representation Soundness fascinating works well don’t buy

Representation Soundness fascinating works well don’t buy

Perfect Adaptation fascinating works well don’t buy

Algorithm

Generalization

Canonical Correlation Analysis (CCA) [Hotelling 1935] 1) Divide feature space into disjoint views Do not buy the Shark portable steamer. The trigger mechanism is defective. not buy trigger defective mechanism 2) Find maximally correlating projections not buy defective trigger mechanism

Canonical Correlation Analysis (CCA) [Hotelling 1935] 1) Divide feature space into disjoint views Do not buy the Shark portable steamer. The trigger mechanism is defective. not buy trigger defective mechanism 2) Find maximally correlating projections not buy defective trigger mechanism Ando and Zhang (ACL 2005) Kakade and Foster (COLT 2006)

Square Loss: Kitchen Appliances Source Domain Square loss ( s)

Square Loss: Kitchen Appliances Source Domain Square loss ( s)

Using Target-Specific Features mush bad quality warranty evenly super easy great product dishwasher books  kitchen trite the publisher the author introduction to illustrations good reference kitchen  books critique

Comparing Discrepancy & Coupled Bounds Target: DVDs Square Loss Source Instances true error coupled bound

Comparing Discrepancy & Coupled Bounds Target: DVDs Square Loss Source Instances

Comparing Discrepancy & Coupled Bounds Target: DVDs Square Loss Source Instances

Idea: Active Learning Piyush Rai et al. (2010)

Goals 1.A computable bound 2.Description of shared representations 3.Suggestions for future research

Conclusion 1.Theory can help us understand domain adaptation better 2.Good theory suggests new directions for future research 3.There’s still a lot left to do Connecting supervised and unsupervised adaptation Unsupervised adaptation for problems with structure

Thanks Collaborators Shai Ben-David Koby Crammer Dean Foster Sham Kakade Alex Kulesza Fernando Pereira Jenn Wortman References Ben-David et al. A Theory of Learning from Different Domains. Machine Learning Mansour et al. Domain Adaptation: Learning Bounds and Algorithms. COLT 2009.