LING 581: Advanced Computational Linguistics Lecture Notes February 16th.

Slides:



Advertisements
Similar presentations
LING 581: Advanced Computational Linguistics Lecture Notes February 2nd.
Advertisements

1 JavaCUP JavaCUP (Construct Useful Parser) is a parser generator Produce a parser written in java, itself is also written in Java; There are many parser.
LING 581: Advanced Computational Linguistics Lecture Notes February 9th.
Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.
In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.
LING 581: Advanced Computational Linguistics Lecture Notes January 19th.
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
LING 581: Advanced Computational Linguistics Lecture Notes February 2nd.
LING 581: Advanced Computational Linguistics Lecture Notes March 9th.
LING 581: Advanced Computational Linguistics Lecture Notes January 26th.
LING 581: Advanced Computational Linguistics Lecture Notes May 5th.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
LING 581: Advanced Computational Linguistics Lecture Notes January 26th.
Partial Prebracketing to Improve Parser Performance John Judge NCLT Seminar Series 7 th December 2005.
ANLE1 CC 437: Advanced Natural Language Engineering ASSIGNMENT 2: Implementing a query expansion component for a Web Search Engine.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
LING 581: Advanced Computational Linguistics Lecture Notes January 19th.
Portability, Parallelism and Efficiency in Parsing Dan Bikel University of Pennsylvania March 11th, 2002.
Guide To UNIX Using Linux Third Edition
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
8 Shell Programming Mauro Jaskelioff. Introduction Environment variables –How to use and assign them –Your PATH variable Introduction to shell programming.
A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)
Advanced UNIX Shell Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
LING 581: Advanced Computational Linguistics Lecture Notes February 12th.
How to Tag a Corpus Using Stanford Tagger. Accuracy All tokens: 97.32% Unknown words: 90.79%
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Introduction to C Programming CE Lecture 7 Compiler options and makefiles.
LING 581: Advanced Computational Linguistics Lecture Notes February 16th.
Quiz 15 minutes Open note, open book, open computer Finding the answer – working to get it – is what helps you learn I don’t care how you find the answer,
LING 581: Advanced Computational Linguistics Lecture Notes February 19th.
OCR GCSE Computing © Hodder Education 2013 Slide 1 OCR GCSE Computing Python programming 1: Introduction.
Supertagging CMSC Natural Language Processing January 31, 2006.
Sequencing The most simple type of program uses sequencing, a set of instructions carried out one after another. Start End Display “Computer” Display “Science”
Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 5 th.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
ICS312 Introduction to Compilers Set 23. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
LING 581: Advanced Computational Linguistics Lecture Notes February 24th.
LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.
Identify internal hardware devices (e. g
CSC 594 Topics in AI – Natural Language Processing
PRESENTED BY: PEAR A BHUIYAN
LING/C SC 581: Advanced Computational Linguistics
LING 388: Computers and Language
LING 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
LING 388: Computers and Language
LING/C SC 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
PDF Data extraction made simple
Lecture 12 Non-Regular Languages
Lab 4: Introduction to Scripting
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
Presented by, Mr. Satish Pise
SPL – PS1 Introduction to C++.
LING/C SC 581: Advanced Computational Linguistics
Presentation transcript:

LING 581: Advanced Computational Linguistics Lecture Notes February 16th

Bikel Collins From treebanks search to stochastic parsers trained on the WSJ Penn treebank Java re-implementation of Collins’ parser Paper – Daniel M. Bikel Intricacies of Collins’ Parsing Model. (PS) (PDF) in Computational Linguistics, 30(4), pp PS) (PDF) in Computational Linguistics, 30(4), pp – intricacies.pdf Software – parser

Bikel Collins The wrapper is syntactic sugar for various commands Scripting language is TCL/TK (“tickle T K”) Assume variables – set prefix "/Users/sandiway/research/" – set dbprefix "$prefix/dbparser" – set tbvprefix "/Applications/treebankviewer.app/Contents/MacOS" POS tagging (MXPOST, in directory jmx) – $prefix/jmx/mxpost $prefix/jmx/tagger.project /tmp/err.txt Parsing – $dbprefix/bin/parse 400 $dbprefix/settings/$properties $dbprefix/bin/$ddf /tmp/test2.txt stdout Training – $dbprefix/bin/train 800 $dbprefix/settings/$properties $dbprefix/bin/$mrg stdout

Bikel Collins POS tagging (MXPOST, in directory jmx) – tagger_input – $prefix/jmx/mxpost $prefix/jmx/tagger.project /tmp/err.txt Parsing – set ddf "wsj obj.gz” – set properties "collins.properties" – parser_input – $dbprefix/bin/parse 400 $dbprefix/settings/$properties $dbprefix/bin/$ddf /tmp/test2.txt stdout Training – set mrg "wsj mrg” – set properties "collins.properties" – $dbprefix/bin/train 800 $dbprefix/settings/$properties $dbprefix/bin/$mrg stdout Unix file descriptors 0 Standard input (stdin) 1Standard output (stdout) 2Standard error(stderr) GUI components frame.input text.input.t -height 4 -yscrollcommand {.input.s set} scrollbar.input.s -command {.input.t yview} frame.tagged text.tagged.t -height 9 -yscrollcommand {.tagged.s set} scrollbar.tagged.s -command {.tagged.t yview} Code proc tagger_input {} { set lines [.input.t get 1.0 end] set infile [open "/tmp/test.txt" w] puts -nonewline $infile [string trimright $lines] close $infile } proc parser_input {} { set lines [.tagged.t get 1.0 end] set infile [open "/tmp/test2.txt" w] puts -nonewline $infile [string trimright $lines] close $infile } Unix file descriptors 0 Standard input (stdin) 1Standard output (stdout) 2Standard error(stderr) GUI components frame.input text.input.t -height 4 -yscrollcommand {.input.s set} scrollbar.input.s -command {.input.t yview} frame.tagged text.tagged.t -height 9 -yscrollcommand {.tagged.s set} scrollbar.tagged.s -command {.tagged.t yview} Code proc tagger_input {} { set lines [.input.t get 1.0 end] set infile [open "/tmp/test.txt" w] puts -nonewline $infile [string trimright $lines] close $infile } proc parser_input {} { set lines [.tagged.t get 1.0 end] set infile [open "/tmp/test2.txt" w] puts -nonewline $infile [string trimright $lines] close $infile }

Bikel Collins There’s also a simple tree viewer I wrote but it may not run on your system…

Bikel Collins Relevant files and directories bikeldemo – wrapper2.tcl(prefix set to /Users/sandiway) jmx – mxpost(shell script) – mxpost.jar(Java code) dbparser – dbparser/bin/parse(shell script) – dbparser/bin/train(shell script) – dbparser/dbparser.jar(Java code) – dbparser/userguide/guide.pdf

EVALB How to evaluate parsing accuracy? – count bracketing matches – (LR) Bracketing recall = (number of correct constituents) (number of constituents in the goldfile) – (LP) Bracketing precision = (number of correct constituents) (number of constituents in the parsed file) Program called evalb – – written in C – get it to compile on your system (Makefile)

EVALB

Homework Task 2 Part 1 – Run the examples you showed on your slides from Homework Task 1 using the Bikel Collins parser. – Evaluate how close the parses are to the “gold standard” Part 2 – WSJ corpus: sections 00 through 24 – Evaluation: on section 23 – Training: normally (20 sections) – How does the Bikel Collins vary in accuracy if you randomly pick 1, 2, 3,…20 sections to do the training with… plot graph with evalb…