Frustratingly Easy Domain Adaptation Hal Daume III
Introduction Task: Developing Learning Algorithms that can be easily ported from one domain to another. Example: from newswire to biomedical docs. particularly interesting in NLP. Idea: Transforming the domain adaptation learning problem into a standard supervised learning problem to which any standard algorithm may be applied (eg., maxent, SVM) Transformation is simple – Augment the feature space of both the source and target data and use the result as input to a standard learning algorithm.
Problem Formalization Notation: X the input space (typically either a real vector or a binary vector) and Y the output space. Ds to denote the distribution over source examples and Dt to denote the distribution over target examples. we have access to a samples Ds ∼ Ds of source examples from the source domain, and samples Dt ∼ Dt of target examples from the target domain. assume that Ds is a collection of N examples and Dt is a collection of M examples (where, typically, N ≫ M). Goal: to learn a function h : X → Y with low expected loss with respect to the target domain.
Adaptation by Feature Augmentation Take each feature in the original problem and make three versions of it: a general version, a source-specific version and a target-specific version. Augmented source data = General and source specific Augmented Target data = General and target specific
Results Tasks (see paper)
Experimental Results See paper