Hal Daumé III Microsoft Research University of Maryland

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

Latent Variables Naman Agarwal Michael Nute May 1, 2013.
ATMAN HB summary seminar # Challenges 2 ATMAN project 9/17/2010.
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Deep Learning in NLP Word representation and how to use it for Parsing
Data Visualization STAT 890, STAT 442, CM 462
Frustratingly Easy Domain Adaptation
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
Hierarchical Subquery Evaluation for Active Learning on a Graph Oisin Mac Aodha, Neill Campbell, Jan Kautz, Gabriel Brostow CVPR 2014 University College.
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
A support Vector Method for Multivariate performance Measures
Online Learning Algorithms
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Presented By Wanchen Lu 2/25/2013
Translated Learning Wenyuan Dai, Yuqiang Chen, Gui-Rong Xue, Qiang Yang, and Yong Yu. Translated Learning. In Proceedings of Twenty- Second Annual Conference.
Overcoming Dataset Bias: An Unsupervised Domain Adaptation Approach Boqing Gong University of Southern California Joint work with Fei Sha and Kristen Grauman.
Curriculum Learning Yoshua Bengio, U. Montreal Jérôme Louradour, A2iA
Creating With Code.
Multimodal Alignment of Scholarly Documents and Their Presentations Bamdad Bahrani JCDL 2013 Submission Feb 2013.
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Sampletalk Technology Presentation Andrew Gleibman
Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
NTU & MSRA Ming-Feng Tsai
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Brief Intro to Machine Learning CS539
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Neural Machine Translation
Recent Trends in Text Mining
Machine Learning for Computer Security
Chapter 7. Classification and Prediction
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
SeaRNN: training RNNs with global-local losses
ECE 5424: Introduction to Machine Learning
School of Computer Science & Engineering
Adversarial Learning for Neural Dialogue Generation
Estimating Link Signatures with Machine Learning Algorithms
COMP61011 : Machine Learning Ensemble Models
Intelligent Information System Lab
Machine Learning Basics
ECE 6504 Deep Learning for Perception
Data Science Process Chapter 2 Rich's Training 11/13/2018.
Deep Learning based Machine Translation
Bird-species Recognition Using Convolutional Neural Network
Classification Discriminant Analysis
Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing
INF 5860 Machine learning for image classification
[Figure taken from googleblog
Neural Networks Geoff Hulten.
Designing Neural Network Architectures Using Reinforcement Learning
ConvAI2 Competition: FUTURE WORK
Compressive Image Recovery using Recurrent Generative Model
Word embeddings (continued)
Unsupervised Perceptual Rewards For Imitation Learning
The Winograd Schema Challenge Hector J. Levesque AAAI, 2011
Machine learning CS 229 / stats 229
Neural Machine Translation
CS249: Neural Language Model
CS 440/ECE448 Lecture 22: Reinforcement Learning
Logistic Regression Geoff Hulten.
Presentation transcript:

Hal Daumé III Microsoft Research University of Maryland me@hal3.name @haldaume3 he/him/his image credit: Lyndon Wong

We’ve all probably seen figures like this… Left-to-right monotonic structure Efficient learning: a series of text classification Separation between learning and inference Search is still an issue in principle, but works well in practice. Better search algorithms can improve the quality of generation. THIS STUFF WORKS (this one in particular is thanks to Kyunghyun Cho)

New Tasks New Models Given that these neural autoregressive models work, what is left to do?

Upcoming presentation at Widening NLP Workshop at ACL’19 New Tasks Sudha Rao Trista Cao Upcoming presentation at Widening NLP Workshop at ACL’19

New Tasks Rao Cao [Louis & Nenkova, IJCNLP’11, Gao, Zhong, Pretiuc-Pietro & Li, AAAI’19]

New Tasks Rao Cao

New Tasks New Models Given that these neural autoregressive models work, what is left to do?

New Models Sean Welleck Kianté Brantley . you i wish <stop> study could lol a work lot Sean Welleck Kianté Brantley Also featuring Kyunghyun Cho (not pictured); to appear at ICML 2019 next week

Linearizing the hierarchical prediction Welleck Brantley + Kyunghyun Cho, ICML’19 . you . i study ??? i <stop> <stop> wish ??? could lol <stop> a <stop> <stop> ??? <stop> <stop> <stop> <stop> work lot <stop> <stop> <stop> <stop>

Imitation learning w/ equivocating expert Welleck Brantley + Kyunghyun Cho, ICML’19 . you i wish you could study lol . <stop> . i study ??? i <stop> <stop> wish ??? could lol <stop> a <stop> <stop> ??? <stop> <stop> <stop> <stop> work lot Target: i wish you could study lol . <stop> <stop> <stop> <stop>

Imitation learning w/ equivocating expert Welleck Brantley + Kyunghyun Cho, ICML’19 . you i wish you could study lol . <stop> . i study i <stop> <stop> wish ??? could lol <stop> a <stop> <stop> ??? <stop> <stop> <stop> <stop> work lot Target: i wish you could study lol . <stop> <stop> <stop> <stop>

Imitation learning w/ equivocating expert Welleck Brantley + Kyunghyun Cho, ICML’19 . you i wish you could study lol . <stop> . i could i <stop> <stop> wish ??? could lol <stop> a <stop> <stop> ??? <stop> <stop> <stop> <stop> work lot Target: i wish you could study lol . <stop> <stop> <stop> <stop>

Quicksort-esque expert policy {the, on, mat, ., sat, cat, the} on Welleck Brantley + Kyunghyun Cho, ICML’19 The cat sat on the mat . {The, sat, cat} {mat, ., the} sat mat {The, cat} {<stop>} {the} {.} cat <stop> the . {The} {<stop>} {<stop>} {<stop>} {<stop>} {<stop>} <stop> The <stop> <stop> <stop> <stop> {<stop>} {<stop>} <stop> <stop>

Model structure on top of quicksort Welleck Brantley {the, on, mat, ., sat, cat, the} + Kyunghyun Cho, ICML’19 on The cat sat on the mat . {The, sat, cat} {mat, ., the} sat Valid items on sat Loss . mat the

Formalizing the expert policy Welleck Brantley + Kyunghyun Cho, ICML’19 where {The, sat, cat} sat Valid items on sat Loss . mat the

Distributing mass across equivocations Welleck Brantley + Kyunghyun Cho, ICML’19 Uniform Oracle Coaching Oracle [He et al., 2012] Annealed Coaching Oracle Valid items . mat the

Training via imitation learning Welleck Brantley This is a special case of imitation learning with an optimal oracle Extensively studied and used in NLP [Goldberg&Nivre, 2012; Vlachos&Clark, 2014 and many more] Extensively studied and used in robotics and control [Ross et al., 2011; and many more recent work from Abeel and Levine et al.] Learning-to-search* for non-monotonic sequential generation Roll-in by a oracle/learned policy Roll-out by an oracle policy Easy to swap roll-in and roll-out policies + Kyunghyun Cho, ICML’19

Results on unconditional generation Welleck Brantley + Kyunghyun Cho, ICML’19 Implicit probabilistic model: sampling 👍 normalized probability 👎 Difficult to analyze quantitatively, but we tried: All the models were trained on utterances from a dialogue data [ConvAI PersonaChat]

Results on unconditional generation Welleck Brantley Results on unconditional generation + Kyunghyun Cho, ICML’19 Implicit probabilistic model: sampling 👍 normalized probability 👎 We can also do a bit of more analysis:

Results on unconditional generation Welleck Brantley Results on unconditional generation + Kyunghyun Cho, ICML’19 Implicit probabilistic model: sampling 👍 normalized probability 👎 We can also do a bit of more analysis:

Welleck Brantley Word descrambling + Kyunghyun Cho, ICML’19

Welleck Brantley Machine translation + Kyunghyun Cho, ICML’19 Lags behind left-to-right, monotonic generation in MT: Though, how much it lags depends on how you measure the quality

Welleck Brantley Machine translation + Kyunghyun Cho, ICML’19

Summary and discussion Welleck Brantley Summary and discussion Rao Cao Lots of fun stuff to do moving to new tasks, models Promising results in non-monotonic generation But still haven’t “cracked” it Should we improve modeling/representations? Should we improve training algorithms? Some contemp work: [Gu et al., arxiv’19; Stern et al., arxiv’19] Code at https://github.com/wellecks/nonmonotonic_text Thanks! Questions?