An Attempt at Unsupervised Learning of Hierarchical Dependency Parsing via the Dependency Model with Valence (DMV)

Slides:



Advertisements
Similar presentations
Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

CSC411Artificial Intelligence 1 Chapter 3 Structures and Strategies For Space State Search Contents Graph Theory Strategies for Space State Search Using.
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
Exact and heuristics algorithms
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
S. Sudarshan Based partly on material from Fawzi Emad & Chau-Wen Tseng
Tree Data Structures &Binary Search Tree 1. Trees Data Structures Tree  Nodes  Each node can have 0 or more children  A node can have at most one parent.
Problem Solving Agents A problem solving agent is one which decides what actions and states to consider in completing a goal Examples: Finding the shortest.
Solving Problems by Searching Currently at Chapter 3 in the book Will finish today/Monday, Chapter 4 next.
Graphs By JJ Shepherd. Introduction Graphs are simply trees with looser restrictions – You can have cycles Historically hard to deal with in computers.
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
Algorithm Strategies Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
COMP171 Depth-First Search.
Applying Genetic Programming to Stratego Ryan Albarelli.
Blind Search-Part 2 Ref: Chapter 2. Search Trees The search for a solution can be described by a tree - each node represents one state. The path from.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
2/10/03Tucker, Sec Tucker, Applied Combinatorics, Sec. 3.2, Important Definitions Enumeration: Finding all of the possible paths in a rooted tree.
1 Section 9.2 Tree Applications. 2 Binary Search Trees Goal is implementation of an efficient searching algorithm Binary Search Tree: –binary tree in.
Intro to NLP - J. Eisner1 The Expectation Maximization (EM) Algorithm … continued!
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
1 SLIDE SHOW INSTRUCTIONS This presentation is completely under your control. This lesson will show only one step at a time, to see the next step you.
Image Segmentation Image segmentation is the operation of partitioning an image into a collection of connected sets of pixels. 1. into regions, which usually.
Statistical NLP Winter 2009 Lecture 17: Unsupervised Learning IV Tree-structured learning Roger Levy [thanks to Jason Eisner and Mike Frank for some slides]
Subway Network Algorithm Matt Freeburg ICS 311 Fall 2006 University of Hawai’i at Manoa.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
06 - Boundary Models Overview Edge Tracking Active Contours Conclusion.
Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.
Chapter 12 Recursion, Complexity, and Searching and Sorting
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 21 Oct 28, 2005 Nanjing University of Science & Technology.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
LEARNING USER PLAN PREFERENCES OBFUSCATED BY FEASIBILITY CONSTRAINTS Nan Li, William Cushing, Subbarao Kambhampati, and Sungwook Yoon School of Computing.
Methods in Computational Linguistics II Queens College Lecture 8: Dynamic Programming.
Information Retrieval Lecture 6 Introduction to Information Retrieval (Manning et al. 2007) Chapter 16 For the MSc Computer Science Programme Dell Zhang.
Chapter 23: Probabilistic Language Models April 13, 2004.
Merge Sort: Taught By Example CO1406: Algorithms and Data Structures Module Lecturer: Dr. Nearchos Paspallis Week 10 Lab - Practice with Merge sort and.
Inside-outside reestimation from partially bracketed corpora F. Pereira and Y. Schabes ACL 30, 1992 CS730b김병창 NLP Lab
Pengantar Kecerdasan Buatan 4 - Informed Search and Exploration AIMA Ch. 3.5 – 3.6.
Binary Tree. Some Terminologies Short review on binary tree Tree traversals Binary Search Tree (BST)‏ Questions.
TREES K. Birman’s and G. Bebis’s Slides. Tree Overview 2  Tree: recursive data structure (similar to list)  Each cell may have zero or more successors.
More Trees Discrete Structures (CS 173)
Machine Vision ENT 273 Regions and Segmentation in Images Hema C.R. Lecture 4.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Hello Everyone!!! 1. Tree And Graphs 2 Features of Trees  Tree Nodes Each node have 0 or more children A node have must one parent  Binary tree Tree.
Can you put these children’s TV shows in date order - earliest to latest?
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
CS 721 Project Implementation of Hypergraph Edge Covering Algorithms By David Leung ( )
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
PCFG estimation with EM The Inside-Outside Algorithm.
Integers Integers are positive whole numbers, their opposites (negatives), and zero.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Articulation Points 2 of 2 (Algorithm)
Recursive Objects (Part 4)
Discrete Mathematicsq
Binary search tree. Removing a node
Tree.
Depth-First Search.
Elementary graph algorithms Chapter 22
Graphs Part 2 Adjacency Matrix
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Trees.
Elementary graph algorithms Chapter 22
Presentation transcript:

An Attempt at Unsupervised Learning of Hierarchical Dependency Parsing via the Dependency Model with Valence (DMV)

Motivation Dependency Parsing: Search Query Refinement Statistical Machine Translation Unsupervised Learning: Availability of Large Quantities of Data

DMV Pick a Direction (left or right) Generate the first child, or stop; Generate more children, until stop. Repeat in the other direction. Recurse… Porder Pstop Pattach

EM Inside-Outside Algorithm: Inside: Pi(i,X,j) = P(X derives i…j) Outside: Po(i,X,j) = P(S derives 0…iXj…l) Re-Estimation: Frequency of sub-tree (i,X,j)=Pi(i,X,j)*Po(i,X,j)

Evaluation Head-percolation of Penn Treebank parses; % edges correct (directed or undirected) in the best (P)CFG parse… Zero Knowledge: 14.4 (29.9) Adjacent Word Heuristic: 33.6 Klein & Manning: 43.2 (63.7) Oracle: 75.5 (77.5) - Pattach: 60.0 (63.3) - Pstop: 53.9 (57.7) - PstopA: 50.0 (54.8) - PstopN: 12.5 (30.8)

EM Didn’t work out… always made things worse, even when initialized with very good solutions. If started using Zero Knowledge, then after 1 iteration already gets 18.4 (38.4), then worsens. If started using an Ad-Hoc Harmonic for Pattach, then 21.5 (47.1) after 1 iteration, then worse, and similarly even for the Oracle solution… Summary: - DMV – useful, simple, extensible model; - EM – more thorough debugging needed.