Frustratingly Easy Domain Adaptation

Slides:



Advertisements
Similar presentations
A Survey on Transfer Learning Sinno Jialin Pan Department of Computer Science and Engineering The Hong Kong University of Science and Technology Joint.
Advertisements

Mapping: Scaling Rotation Translation Warp
D SP InputDigital Processing Output Algorithms Typical approach to DASP systems development Algorithms are in the focus at development of any digital signal.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Maria-Florina Balcan Modern Topics in Learning Theory Maria-Florina Balcan 04/19/2006.
The Nature of Statistical Learning Theory by V. Vapnik
1 Extracting Discriminative Binary Template for Face Template Protection Feng Yicheng Supervisor: Prof. Yuen August 31 st, 2009.
Ensemble Learning: An Introduction
Active Learning with Support Vector Machines
Implementing a reliable neuro-classifier
1 CSE 20: Lecture 7 Boolean Algebra CK Cheng 4/21/2011.
Learning from Multiple Outlooks Maayan Harel and Shie Mannor ICML 2011 Presented by Minhua Chen.
Introduction to machine learning
Radial Basis Function Networks
Introduction to domain adaptation
Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of Illinois at Urbana-Champaign June 25, 2007.
CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:
Machine Learning CSE 681 CH2 - Supervised Learning.
The Necessity of Combining Adaptation Methods Cognitive Computation Group, University of Illinois Experimental Results Title Ming-Wei Chang, Michael Connor.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Practical Issues with SVM. Handwritten Digits:
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
Presented by Daniel Khashabi Joint work with Sebastian Nowozin, Jeremy Jancsary, Andrew W. Fitzgibbon and Bruce Lindbloom.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
The Perceptron. Perceptron Pattern Classification One of the purposes that neural networks are used for is pattern classification. Once the neural network.
Line detection Assume there is a binary image, we use F(ά,X)=0 as the parametric equation of a curve with a vector of parameters ά=[α 1, …, α m ] and X=[x.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
Introductions to Linear Transformations Function T that maps a vector space V into a vector space W: V: the domain of T W: the codomain of T Chapter.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Math – What is a Function? 1. 2 input output function.
Linear Models for Classification
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
CSSE463: Image Recognition Day 33 This week This week Today: Classification by “boosting” Today: Classification by “boosting” Yoav Freund and Robert Schapire.
Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
November 5, 2013Computer Vision Lecture 15: Region Detection 1 Basic Steps for Filtering in the Frequency Domain.
An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM.
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
1 Review of Probability and Random Processes. 2 Importance of Random Processes Random variables and processes talk about quantities and signals which.
Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding Xu Linhe 14S
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
Page 1 CS 546 Machine Learning in NLP Review 1: Supervised Learning, Binary Classifiers Dan Roth Department of Computer Science University of Illinois.
A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.
Domain Adaptation Slide 1 Hal Daumé III Frustratingly Easy Domain Adaptation Hal Daumé III School of Computing University of Utah
 Frustratingly Easy Domain Adaptation Hal Daume III.
Supervise Learning Introduction. What is Learning Problem Learning = Improving with experience at some task – Improve over task T, – With respect to performance.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning for Computer Security
Learning Deep Generative Models by Ruslan Salakhutdinov
Random Testing: Theoretical Results and Practical Implications IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2012 Andrea Arcuri, Member, IEEE, Muhammad.
CSCI-100 Introduction to Computing
Introductory Seminar on Research: Fall 2017
INF 5860 Machine learning for image classification
An Introduction to Supervised Learning
Function Notation “f of x” Input = x Output = f(x) = y.
Chapter 2: Evaluative Feedback
Chapter 8: Generalization and Function Approximation
Concept of a Function.
EE368 Soft Computing Genetic Algorithms.
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
A Novel Smoke Detection Method Using Support Vector Machine
Chapter 2: Evaluative Feedback
Presentation transcript:

Frustratingly Easy Domain Adaptation Hal Daume III

Introduction Task: Developing Learning Algorithms that can be easily ported from one domain to another. Example: from newswire to biomedical docs. particularly interesting in NLP. Idea: Transforming the domain adaptation learning problem into a standard supervised learning problem to which any standard algorithm may be applied (eg., maxent, SVM) Transformation is simple – Augment the feature space of both the source and target data and use the result as input to a standard learning algorithm.

Problem Formalization Notation: X the input space (typically either a real vector or a binary vector) and Y the output space. Ds to denote the distribution over source examples and Dt to denote the distribution over target examples. we have access to a samples Ds ∼ Ds of source examples from the source domain, and samples Dt ∼ Dt of target examples from the target domain. assume that Ds is a collection of N examples and Dt is a collection of M examples (where, typically, N ≫ M). Goal: to learn a function h : X → Y with low expected loss with respect to the target domain.

Adaptation by Feature Augmentation Take each feature in the original problem and make three versions of it: a general version, a source-specific version and a target-specific version. Augmented source data = General and source specific Augmented Target data = General and target specific

Results Tasks (see paper)

Experimental Results See paper