LING 581: Advanced Computational Linguistics

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
Advertisements

LING 581: Advanced Computational Linguistics Lecture Notes February 2nd.
Prepared by Abdullah Mueen and Eamonn Keogh
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
LING 581: Advanced Computational Linguistics Lecture Notes January 19th.
LING 581: Advanced Computational Linguistics Lecture Notes May 5th.
LING 581: Advanced Computational Linguistics Lecture Notes February 16th.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
LING 581: Advanced Computational Linguistics Lecture Notes January 26th.
Lecture 37 CSE 331 Nov 4, Homework stuff (Last!) HW 10 at the end of the lecture Solutions to HW 9 on Monday Graded HW 9 on.
Lecture 38 CSE 331 Dec 3, A new grading proposal Towards your final score in the course MAX ( mid-term as 25%+ finals as 40%, finals as 65%) .
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
LING 581: Advanced Computational Linguistics Lecture Notes January 19th.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
C++ Crash Course Class 1 What is programming?. What’s this course about? Goal: Be able to design, write and run simple programs in C++ on a UNIX machine.
Files COP3275 – PROGRAMMING USING C DIEGO J. RIVERA-GUTIERREZ.
1 計算機程式設計 Introduction to Computer Programming Lecture01: Introduction and Hello World 9/10/2012 Slides modified from Yin Lou, Cornell CS2022: Introduction.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
LING 581: Advanced Computational Linguistics Lecture Notes February 16th.
LING 581: Advanced Computational Linguistics Lecture Notes February 19th.
©2012 Paula Matuszek CSC 9010: Text Mining Applications Lab 3 Dr. Paula Matuszek (610)
Introduction to Testing CSIS 1595: Fundamentals of Programming and Problem Solving 1.
TODAY IN ALGEBRA…  Return Ch. 1 Test  Learning Goal: Review solutions to Ch. 1 test and how to do test corrections for credit up to 75%  See STATs for.
A Relevant and Descriptive Title Your Name and Your Partner’s Name Mrs. Ouellette, Honors Biology Licking Heights High School A Relevant and Descriptive.
Online Course Overview MATH 1720 (Once you have read each slide simply hit your return key to move to the next slide.) (Once you have read each slide simply.
1 Project 2: Using Variables and Expressions. 222 Project 2 Overview For this project you will work with three programs Circle Paint Ideal_Weight What.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
LING/C SC/PSYC 438/538 Lecture 19 Sandiway Fong 1.
Software Engineering Algorithms, Compilers, & Lifecycle.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Multiple Sequence Alignment with PASTA Michael Nute Austin, TX June 17, 2016.
LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.
Measuring Performance Based on slides by Henri Casanova.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
2008/11/19: Lecture 18 CMSC 104, Section 0101 John Y. Park
Understanding Communication with a Robot? Activity (60 minutes)
Modified from Stanford CS276 slides Lecture 4: Index Construction
A Relevant and Descriptive Title
CIS 700 Advanced Machine Learning Structured Machine Learning:   Theory and Applications in Natural Language Processing Shyam Upadhyay Department of.
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
LING 388: Computers and Language
Using the two data sets below: 1) Draw a scatterplot for each.
LING 581: Advanced Computational Linguistics
Sussex Neuroscience Coding Club title slide
LING 581: Advanced Computational Linguistics
LING 388: Computers and Language
LING/C SC 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
CS110: Discussion about Spark
LING/C SC 581: Advanced Computational Linguistics
System Combination LING 572 Fei Xia 01/31/06.
LING/C SC 581: Advanced Computational Linguistics
Elec 2607 Digital Switching Circuits
Arrays I Handling lists of data.
LING 388: Computers and Language
LING 408/508: Computational Techniques for Linguists
Hank Childs, University of Oregon
CMSC201 Computer Science I for Majors Final Exam Information
LING/C SC 581: Advanced Computational Linguistics
2008/11/19: Lecture 18 CMSC 104, Section 0101 John Y. Park
1.3.7 High- and low-level languages and their translators
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
Lecture 17 – Practice Exercises 3
Running & Testing Programs :: Translators
LING/C SC 581: Advanced Computational Linguistics
Presentation transcript:

LING 581: Advanced Computational Linguistics Lecture Notes March 13th

Administrivia We need to reboot the homework (my fault: explained below) You get full credit for the Homework (treat as done) Let's redo the homework as new Homework 7

Preparing the Training Data

Goal: get the graph What does the graph for the F-score (Labeled Recall/Labeled Precision) look like? F-score report your results # of sentences/sections used for training

A problem with the Homework Symptom: Lots of nulls instead of parses returned by Bikel-Collins Cause: Bikel-Collins doesn't do so well when the POS tags are not supplied also fails to tag --: ( --) Solution: Let's supply the gold tags

Initial fix: nevalb.c Modify evalb.c to not skip the null parses -e n       number of errors to kill (default=10)   MAX_ERROR 10 in COLLINS.PRM (100 errors ok) Modify evalb.c to not skip the null parses nevalb.c counts them as if the parser had simply returned (W1 .. Wn) this kills the recall for those cases (but- well - precision is 100%) Unfortunately, the number of nulls is impossible to ignore – despite increasing the number of sections dedicated to training …

Creating section 23 input Format now is ((Word1 (Tag1)) .. (Wordn (Tagn)))

Creating section 23 input Can use .tagged_sents() from the ptb corpus that we already have in nltk extract_23.py python3 extract_23.py > wsj_23.lsp

Splitting section 23 input Manageable chunks of 650 sentences each? NO! dbp/bin/parse 1500 dbp/settings/collins.properties ptb/wsj-12.obj.gz ptb/wsj_23-aa java -server -Xms1500m -Xmx1500m -cp /Users/sandiway/courses/581/ling581-18/dbp/dbparser.jar -Dparser.settingsFile=dbp/settings/collins.properties danbikel.parser.Parser -is ptb/wsj-12.obj.gz -sa ptb/wsj_23-aa

Parsing Platform-dependent: about 20 mins; 109 sentences in (5W CPU) Problem is memory used don't want to activate garbage collection don't want to activate memory compression

Re-combine section 23 output cat wsj_23-a[a-w].parsed > wsj_23-12.parsed Then do evalb:

Repeat parsing how many times? The following schedule is 20 repeats But you may use more sparse repeats, depending on your access to fast laptop(s) or lab machines Also see next slide Sections #s

Only sentences <= 40 words long $ python3 del_gt40.py wsj_23.gold Stats: 175 out of 2416, remain 2241 Create: 40wsj_23.gold $ python3 del_gt40.py wsj_23.lsp  Create: 40wsj_23.lsp $ split -l 100 40wsj_23.lsp 40wsj_23- Assume: wsj_23.txt (supplied on course webpage) del_gt40.py

Only sentences <= 40 words long 40wsj_23-aa

Only sentences <= 40 words long run-ad.sh cat 40wsj_23-a[a-w].parsed > 40wsj_23-12.parsed EVALB/nevalb -p EVALB/COLLINS.prm -e 100 ptb/40wsj_23-200.gold ptb/40wsj_23-200.parsed Just testing 1st 200 on nevalb