Learning to Predict Structures with Applications to Natural Language Processing Ivan Titov TexPoint fonts used in EMF. Read the TexPoint manual before.

Slides:

Advertisements

Similar presentations

Welcome to the seminar course

Advertisements

Search-Based Structured Prediction

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Linear Classifiers (perceptrons)

Data Mining & Machine Learning Group Ch. EickAssignment 5 Assignment5 Details  The project is a group project; take advantage of your increased man-power.

Learning for Structured Prediction Overview of the Material TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A.

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

Machine learning continued Image source:

Max-Margin Matching for Semantic Role Labeling David Vickrey James Connor Daphne Koller Stanford University.

Supervised Learning Recap

Lecture 13 – Perceptrons Machine Learning March 16, 2010.

Hidden Markov Models Theory By Johan Walters (SR 2003)

Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.

Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual.

Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.

Welcome to CS 395/495 Measurement and Analysis of Online Social Networks.

Introduction to Machine Learning Approach Lecture 5.

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

Online Learning Algorithms

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.

More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.

1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

1 CS546: Machine Learning and Natural Language Multi-Class and Structured Prediction Problems Slides from Taskar and Klein are used in this lecture TexPoint.

8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.

A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.

Algorithms and their Applications CS2004 ( ) Dr Stephen Swift 1.2 Introduction to Algorithms.

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.

1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.

Universit at Dortmund, LS VIII

Teaching students with Autistic Spectrum Disorders (such as Asperger Syndrome) Kirsty Wayland Ali Fawkes

Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.

1 CS546: Machine Learning and Natural Language Latent-Variable Models for Structured Prediction Problems: Syntactic Parsing Slides / Figures from Slav.

Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.

CS 6961: Structured Prediction Fall 2014 Course Information.

Sequence Models With slides by me, Joshua Goodman, Fei Xia.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

HE 520: Higher Education Laws and Regulations Unit One Seminar Pre-Seminar Welcome to HE 520: Higher Education Laws and Regulations, Unit One Seminar Seminar.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.

Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.

Revisiting Output Coding for Sequential Supervised Learning Guohua Hao & Alan Fern School of Electrical Engineering and Computer Science Oregon State University.

2 Office hours: MWR 4:20-5:30 inside or just outside Elliott 162- tell me in class that you would like to attend. For those of you who cannot stay: MWR:

1 Unit 8 Seminar Effective Writing II for Arts and Science Majors.

John Lafferty Andrew McCallum Fernando Pereira

NTU & MSRA Ming-Feng Tsai

Natural Language Processing AI Revision Lee McCluskey, room 2/07

Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the TexPoint manual before.

Class will start at the top of the hour! Please turn the volume up on your computer speakers to access the audio feature of this seminar. WELCOME TO CE101.

Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.

Welcome to EECS 395/495 Online Advertising: A Systems Approach.

Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.

WELCOME TO MICRO ECONOMICS AB 224 Discussion of Syllabus and Expectations in the Class.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Unit 3 Learning Styles Learning Styles Study Styles LASSI Discussion Assignment Seminar.

Welcome to MT140 Introduction to Management Unit 1 Seminar – Introduction to Management.

Brief Intro to Machine Learning CS539

Lecture 7: Constrained Conditional Models

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Seminar will begin at top of the hour

Objectives of the Course and Preliminaries

CIS 700 Advanced Machine Learning Structured Machine Learning: Theory and Applications in Natural Language Processing Shyam Upadhyay Department of.

CIS 700 Advanced Machine Learning for NLP Inference Applications

CS 4/527: Artificial Intelligence

CSCI 5832 Natural Language Processing

Word embeddings (continued)

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

CS249: Neural Language Model

Presentation transcript:

Learning to Predict Structures with Applications to Natural Language Processing Ivan Titov TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A AA A A

Before we start... 2  Fill in the survey:  Name  (Please make it eligible!)  Matriculation number  Department: CS or CoLi  BSc or MSc and your semester  These two points will affect the set of topics and papers  Your major and research interests  Previous classes attended (or attending now)  Machine learning ?  Statistical NLP ?  Information Extraction?

Outline 3  Introduction to the Topic  Seminar Plan  Requirements and Grading

Learning Machines to Do What?  This seminar is about supervised learning methods 1. Take a set of labeled examples (, “dog”), (, “dog”), (, “cat”), … 2. Define a parameterized class of functions  Represent images as vectors of features (e.g., SIFT features for images)  And consider linear functions:. Distinct vector for each y: cats, dogs,..

Supervised Classification  Linear functions:  We want to select “good” such that it does not make mistake on new examples, i.e. for every it predicts correct  To do this we minimize some error measure on the finite training set  Having just a small error on the training set is not sufficient  For example, we may want it to be “confident” on the training set  But what if is not a class label but a graph?  You cannot have an individual vector for every

 Syntactic Parsing:  3D Layout prediction for an image:  Protein structure prediction (disulfide bonds) Example Problems We cannot learn a distinct model for each y: we need to understand: - how to break it in parts - how to predict these parts - how these parts interact with each other We cannot learn a distinct model for each y: we need to understand: - how to break it in parts - how to predict these parts - how these parts interact with each other

Skeleton of a general SP methods 7  Decide how to represent your structures for learning  Note that we have not or x (input) is the sentence, y (output) is the tree

Skeleton of a general SP methods 8  And then you define a vector, for example:  Their inner product is:

Skeleton of a general SP methods 9  What if we have some bad parse tree:  Their inner product is:

Structured Prediction  You use your model:  We want to select “good” such that it does not make mistake on new examples, i.e. for every it predicts correct  To do this we again minimize some error measure on the finite training set Dependency trees where nodes are words of the current sentence x

Structured Prediction (challenges) 1. Selecting feature representation  It should be sufficient to discriminate correct trees from incorrect ones  It should be possible to decode with it (see (3)) 2. Learning  Which error function to optimize on the training set, for example  How to make it efficient (see (3)) 3. Decoding:  Dynamic programming for simpler representations ?  Approximate search for more powerful ones?

Decoding: example  Decoding: find the dependency tree which has the highest score  Does it remind you something?

Decoding: example  Select the highest scoring directed tree:  For this sentence only a subset is relevant and it can be represented as a weighted directed graph  Directed MST problem: Chi-Liu-Edmonds algorithm (a, b) - a weight for an arc directed right, b -- left

Decoding: example  What if we switch to slightly more powerful representation:  Counts of all subgraphs of size 3 instead of 2?  This problem is NP-complete!  However, you can use approximations  Relaxations to find exact solutions in most cases  Consider non all subgraphs of size 3  It is typical for structured prediction  Use a powerful model but approximate decoding or  A simpler model but exact search

Outline 15  Introduction to the Topic  Seminar Plan  Requirements and Grading

Goals of the seminar 16  Give an overview of state-of-the-art methods for structured prediction  If you encounter a structured prediction problem, you should be able to figure what to use and where to look  This knowledge is applicable outside natural language processing  Learn interesting applications of the methods in NLP  Improve your skills:  Giving talks  Presenting papers  Writing reviews

Plan 17  Next class (November, 5):  Introduction continued: Basic Structured Prediction Methods  Perceptron => Structured Perceptron  Naive Bayes => Hidden Markov Model  Comparison: HMM vs Structured Perceptron for Markov Networks  Decide on the paper to present (before Wednesday, November 2!)  On the basis of the survey and the number of registered students, I will adjust my list and it will be online on today  Starting from November 13: paper presentations by you

Topics (method-wise classification) 18  Hidden Markov Models vs Structured Perceptron  Example: The same class of function but different learning methods (discriminative vs generative)  Probabilistic Context-Free Grammars (CFGs) vs Weighted CFGs  Similar to above but for parsing (predicting trees)  Maximum-Entropy Markov models vs. Conditional Random Fields  Talk about label-bias, compare with generative models,..  Local Methods vs Global Methods  Example: Minimum Spanning Tree algorithm vs Shift-reduce parsing for dependency parsing

Topics (method-wise classification) 19  Max-margin methods  Max-Margin Markov Netwoks vs Structured SVM  Search-based models  Incremental perceptron vs SEARN  Inference with Integer Linear Programming  Encoding non-local constraints about the structure of the outputs  Inducing feature representations:  Latent-annotated PCFGs  Initial attempts vs split-merge methods  Incremental Sigmoid Belief Networks  ISBNs vs Max-Ent models

Topics (application-wise classification) 20  Sequence labelling type tasks:  Part-of-speech tagging John lost his pants  Parsing tasks  Phrase-tree structures  Dependency Structures  Semantic Role Labeling  Information extraction ? NNP VBD POS NNS

Requirements 21  Present a talk to the class  In the most presentation you will need to cover 2 related papers and compare the approached  This way we will have (hopefully) more interesting talks  Write 2 critical “reviews” of 2 selected papers ( pages each)  Note: changed from 3!!  A term paper (12-15 pages) for those getting 7 points  Make sure you are registered to the right “version” in HISPOS!  Read papers and participate in discussion  If you do not read papers it will not work, and we will have a boring seminar  Do not hesitate to ask questions!

Grades 22  Class participation grade: 60 %  You talk and discussion after your talk  Your participation in discussion of other talks  2 reviews  Term paper grade: 40 %  Only if you get 7 points, otherwise you do not need one  Term paper

Presentation 23  Present the methods in accessible way  Do not (!) present something you do not understand  Do not dive into unimportant details  Compare proposed methods  Have a critical view on the paper: discuss shortcomings, any of ideas, any points you still do not understand (e.g., evaluation), any assumptions which seem wrong to you...  To give a good presentation in some cases you may need to read one or (maximum two) additional papers (e.g., those referenced in the paper)  You can check the web for slides on a similar topics and use their ideas, but you should not reuse the slides  See links to the tutorials on how to make a good presentation w  Send me your slides 1 week before the talk  I will give my feedback within 2 days of receiving  Often, we may need to meet and discuss the slides together

Term paper 24  Goal  Describe the papers you presented in class  Your ideas, analysis, comparison (more later)  It should be written in a style of a research paper  Length: 12 – 15 pages  Grading criteria  Clarity  Paper organization  Technical correctness  New ideas are meaningful and interesting  Submitted in PDF to my

Critical review 25  A short critical (!) essay reviewing papers presented in class  One paragraph presenting the essence of the paper (in your own words!)  Other parts underlying both positive sides of the paper (what you like) and its shortcomings  The review should be submitted before its presentation in class  (Exception is the additional reviews submitted for the seminars you skipped, later about it)  No copy-paste from the paper  Length: pages

Your ideas / analysis 26  Comparison of the methods used in the paper with other material presented in the class or any other related work  Any ideas on improvement of the approach ....

Attendance policy 27  You can skip ONE class without any explanation  Otherwise, you will need to write an additional critical review (for the paper which was presented while you were absent)

Office Hours 28  I would be happy to see you and discuss after the talk from 16:00 – 17:00 on Fridays (may change if the seminar timing changes):  Office 3.22, C 7.4  Otherwise, send me and I find the time  Even preferable  Please do meet me:  If you do not understand something (anything?) (Especially important if it is your presentation, but otherwise welcome too)  If you have suggestions and questions

Other stuff 29  Timing of the class  Survey (Doodle poll?)  Select a topic to present and papers to review by Wed; November 2 (we will use Google docs)  Note: earlier talks are easier...  We need a volunteer for November 12