Adaptive2 Language Model Hyun Goo Kang Stanford University CS224N Final Project
Domain Sensitivity People use languages very differently Statistical NLP Models perform better when trained on the right domain of data
Domain Adaptive Model Yet, we cannot simply neglect language data from the less-related domains. Solution: Adaptive1 Model Include all data, but weight them differently! Given multiple domains of data, find the optimal weights of each domain to maximize our performance!
Domain Adaptive Model Limitation: Solution: Adaptive2 Model In real-world, domains are not well-defined. Solution: Adaptive2 Model Given any mix of data, we form “domains” with document clustering techniques. Document clustering works pretty well across domains. Given user data (or test data), we “adapt” our model to the right domains.
Results
Results