28 June 2007EMNLP-CoNLL1 Probabilistic Models of Nonprojective Dependency Trees David A. Smith Center for Language and Speech Processing Computer Science.

Slides:



Advertisements
Similar presentations
Lecture 24 MAS 714 Hartmut Klauck
Advertisements

O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Bootstrapping Feature-Rich Dependency Parsers with Entropic Priors David A. Smith Jason Eisner Johns Hopkins University.
Exact Inference in Bayes Nets
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
R Yun-Nung Chen 資工碩一 陳縕儂 1 /39.  Non-projective Dependency Parsing using Spanning Tree Algorithms (HLT/EMNLP 2005)  Ryan McDonald, Fernando.
IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Parsing with Soft and Hard Constraints on Dependency Length.
Supervised Learning Recap
Dependency Parsing Some slides are based on:
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan.
1 Learning Dynamic Models from Unsequenced Data Jeff Schneider School of Computer Science Carnegie Mellon University joint work with Tzu-Kuo Huang, Le.
Clustering and greedy algorithms — Part 2 Prof. Noah Snavely CS1114
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Facilities Planning Objectives and Agenda: 1. Different types of Facilities Planning Problems 2. Intro to Graphs as a tool for deterministic optimization.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Discrete Mathematics Lecture 9 Alexander Bukharovich New York University.
Minimum Spanning Tree Algorithms. What is A Spanning Tree? u v b a c d e f Given a connected, undirected graph G=(V,E), a spanning tree of that graph.
GRAPH Learning Outcomes Students should be able to:
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Chapter 2 Graph Algorithms.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Trees and Distance. 2.1 Basic properties Acyclic : a graph with no cycle Forest : acyclic graph Tree : connected acyclic graph Leaf : a vertex of degree.
Week 11 - Wednesday.  What did we talk about last time?  Graphs  Euler paths and tours.
1 1 David A. Smith (JHU  UMass Amherst) Jason Eisner (Johns Hopkins University) Dependency Parsing by Belief Propagation.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Mathematics of Networks (Cont)
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
A Generalization of Forward-backward Algorithm Ai Azuma Yuji Matsumoto Nara Institute of Science and Technology.
Hidden-Variable Models for Discriminative Reranking Jiawen, Liu Spoken Language Processing Lab, CSIE National Taiwan Normal University Reference: Hidden-Variable.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman.
Lecture 2: Statistical learning primer for biologists
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
Graphs. Graphs Similar to the graphs you’ve known since the 5 th grade: line graphs, bar graphs, etc., but more general. Those mathematical graphs are.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Graph Theory. undirected graph node: a, b, c, d, e, f edge: (a, b), (a, c), (b, c), (b, e), (c, d), (c, f), (d, e), (d, f), (e, f) subgraph.
Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the TexPoint manual before.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Lecture 5 Graph Theory prepped by Lecturer ahmed AL tememe 1.
1 David Smith (JHU  UMass Amherst) Jason Eisner (JHU) Dependency Parsing by Belief Propagation.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
PROBABILISTIC GRAPH-BASED DEPENDENCY PARSING WITH CONVOLUTIONAL NEURAL NETWORK Zhisong Zhang, Hai Zhao and Lianhui QIN Shanghai Jiao Tong University
Statistical NLP Winter 2009
CSC 594 Topics in AI – Natural Language Processing
Extreme Learning Machine
Simone Paolo Ponzetto University of Heidelberg Massimo Poesio
David Mareček and Zdeněk Žabokrtský
Approximating the MST Weight in Sublinear Time
EECS 203 Lecture 20 More Graphs.
COMP 6/4030 ALGORITHMS Prim’s Theorem 10/26/2000.
K-means and Hierarchical Clustering
محمدصادق رسولی rasooli.ms{#a#t#}gmail.com
Minimum Spanning Tree Algorithms
Learning to Rank Typed Graph Walks: Local and Global Approaches
Parallel Graph Algorithms
Variable Elimination Graphical Models – Carlos Guestrin
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Presentation transcript:

28 June 2007EMNLP-CoNLL1 Probabilistic Models of Nonprojective Dependency Trees David A. Smith Center for Language and Speech Processing Computer Science Dept. Johns Hopkins University Noah A. Smith Language Technologies Institute Machine Learning Dept. School of Computer Science Carnegie Mellon University

28 June 2007EMNLP-CoNLL2 See Also On the Complexity of Non-Projective Data-Driven Dependency Parsing R. McDonald and G. Satta IWPT 2007 Structured-Prediction Models via the Matrix-Tree Theorem T. Koo, A. Globerson, X. Carreras and M. Collins EMNLP-CoNLL 2007 Coming Up Next!

28 June 2007EMNLP-CoNLL3 Nonprojective Syntax istameamnoritgloriacanitiemROOT IgiveaonbootstrappingtalktomorrowROOT‘ll that NOM my ACC may-knowglory NOM going-gray ACC How would we parse this? That glory shall last till I go gray

28 June 2007EMNLP-CoNLL4 Edge-Factored Models (McDonald et al., 2005) Non-neg. score for each edge Find edge sum among legal trees children parents Score edges in isolation Find maximum spanning tree with Chu-Liu-Edmonds NP hard to add sibling or degree constraints, hidden node variables What about training? Unlabeled for now

28 June 2007EMNLP-CoNLL5 If Only It Were Projective… IgiveaonbootstrappingtalktomorrowROOT‘ll An Inside-Outside algorithm gives us Normalizing constant for globally normalized models Posterior probability of edges Summing over hidden variables But we can’t use Inside-Outside for nonprojective parsing!

28 June 2007EMNLP-CoNLL6 Graph Theory to the Rescue! Tutte’s Matrix-Tree Theorem (1948) The determinant of the Kirchoff (aka Laplacian) adjacency matrix of directed graph G without row and column r is equal to the sum of scores of all directed spanning trees of G rooted at node r. Exactly the Z we need! O(n 3 ) time!

28 June 2007EMNLP-CoNLL7 Building the Kirchoff (Laplacian) Matrix Negate edge scores Sum columns (children) Strike root row/col. Take determinant N.B.: This allows multiple children of root, but see Koo et al

28 June 2007EMNLP-CoNLL8 Why Should This Work? Chu-Liu-Edmonds analogy: Every node selects best parent If cycles, contract and recur Clear for 1x1 matrix; use induction Undirected case; special root cases for directed

28 June 2007EMNLP-CoNLL9 When You Have a Hammer… Matrix-Tree Theorem Sequence-normalized log-linear models (Lafferty et al. ‘01) Minimum Bayes-risk parsing (cf. Goodman ‘96) Hidden-variable models O(n) inference with length constraints (cf. N. Smith & Eisner, ‘05) Minimum risk training (D. Smith & Eisner ‘06) Tree (Rényi) entropy (Hwa ‘01; S & E ‘07)

28 June 2007EMNLP-CoNLL10 Analogy to Other Models Sequence ProjectivePCFGs Shift- Reduce (Action- Based) Projective global log- linear Max-margin (or error-driven) (e.g. McDonald, Collins) Non- projective ? Parent- predicting (K. Hall ‘07) This Work

28 June 2007EMNLP-CoNLL11 More Machinery: The Gradient Since Invert Kirchoff matrix K in O(n 3 ) time via LU factorization The edge gradient is also edge posterior probability. Use the chain rule to backpropagate into s(i,j), whatever its internal structure may be.

28 June 2007EMNLP-CoNLL12 Nonprojective Conditional Log-Linear Training trainArabicCzechDanishDutch MIRA CL CoNLL 2006 Danish and Dutch CoNLL 2007 Arabic and Czech Features from McDonald et al Compared with MSTParser’s MIRA max-margin training Trained LL weights with stochastic gradient descent Same #iterations and stopping criteria as MIRA Significance on paired permutation test

28 June 2007EMNLP-CoNLL13 Minimum Bayes-Risk Parsing parsetrainArabicCzechDanishDutch mapMIRA CL mBrMIRA CL Select the tree, not with the highest probability, but the most expected correct edges. Plug posteriors into MST MIRA doesn’t estimate probs. N.B. One could do mBr inside MIRA.

28 June 2007EMNLP-CoNLL14 Edge Clustering FranzlovesMilena OBJ SUBJ FranzlovesMilena ZC B A Y X (Supervised) labeled dependency parsing Simple idea: conjoin each model feature with a cluster Sum out all possible edge labelings if we don’t care about labels per se. OR

28 June 2007EMNLP-CoNLL15 Edge Clustering No significant gains or losses from clustering

28 June 2007EMNLP-CoNLL16 What’s Wrong with Edge Clustering? Edge labels don’t interact Unlike clusters on PCFG nonterminals (e.g. Matsuzaki et al.’05) Cf. small/no gains for unlabeled accuracy from supervised labeled parsers NP-A NP-BNP-A FranzlovesMilena AB No interaction Interaction in rewrite rule

28 June 2007EMNLP-CoNLL17 Constraints on Link Length Example with L=1, R=2 Max. left/right child distances L and R (Cf. Eisner & N. Smith‘05) Band-diagonal Kirchoff matrix once root row and column are removed Inversion in O(min(L 3 R 2, L 2 R 3 )n) time

28 June 2007EMNLP-CoNLL18 Conclusions O(n 3 ) inference for edge-factored nonprojective dependency models Performance closely comparable to MIRA Learned edge clustering doesn’t seem to help unlabeled parsing Many other applications to hit

28 June 2007EMNLP-CoNLL19 Thanks Jason Eisner Keith Hall Sanjeev Khudanpur The Anonymous Reviewers Ryan McDonald & Michael Collins & colleagues For sharing drafts