LING 581: Advanced Computational Linguistics Lecture Notes February 12th.

Slides:

Advertisements

Similar presentations

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.

Advertisements

Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.

LING 581: Advanced Computational Linguistics Lecture Notes February 2nd.

CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.

Albert Gatt Corpora and Statistical Methods Lecture 11.

Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.

Modeling the Evolution of Product Entities Priya Radhakrishnan 1, Manish Gupta 1,2, Vasudeva Varma 1 1 Search and Information Extraction Lab, IIIT-Hyderabad,

Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,

LING 581: Advanced Computational Linguistics Lecture Notes February 9th.

Semantic Role Labeling Abdul-Lateef Yussiff

Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.

Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.

LING 581: Advanced Computational Linguistics Lecture Notes January 19th.

Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.

LING 581: Advanced Computational Linguistics Lecture Notes February 2nd.

LING 581: Advanced Computational Linguistics Lecture Notes February 23rd.

Using Treebanks tgrep2 Lecture 2: 07/12/2011. Using Corpora For discovery For evaluation of theories For identifying tendencies – distribution of a class.

Introduction to treebanks Session 1: 7/08/

LING 581: Advanced Computational Linguistics Lecture Notes February 16th.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

LING 581: Advanced Computational Linguistics Lecture Notes January 26th.

Partial Prebracketing to Improve Parser Performance John Judge NCLT Seminar Series 7 th December 2005.

1 More Xkwic and Tgrep LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006.

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

CS224N Interactive Session Competitive Grammar Writing Chris Manning Sida, Rush, Ankur, Frank, Kai Sheng.

TopicTrend By: Jovian Lin Discover Emerging and Novel Research Topics.

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

LING/C SC/PSYC 438/538 Lecture 27 Sandiway Fong. Administrivia 2 nd Reminder – 538 Presentations – Send me your choices if you haven’t already.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

HW7 Extracting Arguments for % Ang Sun March 25, 2012.

Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová

1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa

Lecture 10 NLTK POS Tagging Part 3 Topics Taggers Rule Based Taggers Probabilistic Taggers Transformation Based Taggers - Brill Supervised learning Readings:

AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,

LING 581: Advanced Computational Linguistics Lecture Notes February 19th.

Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

Conversion of Penn Treebank Data to Text. Penn TreeBank Project “A Bank of Linguistic Trees” (as of 11/1992) University of Pennsylvania, LINC Laboratory.

Introduction to Syntactic Parsing Roxana Girju November 18, 2004 Some slides were provided by Michael Collins (MIT) and Dan Moldovan (UT Dallas)

NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.

Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.

Part-of-speech tagging

Shallow Parsing for South Asian Languages -Himanshu Agrawal.

LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 5 th.

LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.

Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.

NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.

NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =

Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.

LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.

Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.

LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.

LING 581: Advanced Computational Linguistics Lecture Notes February 24th.

LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.

Coping with Problems in Grammars Automatically Extracted from Treebanks Carlos A. Prolo Computer and Info. Science Dept. University of Pennsylvania.

LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.

Prototype-Driven Learning for Sequence Models

LING 581: Advanced Computational Linguistics

LING/C SC 581: Advanced Computational Linguistics

LING 581: Advanced Computational Linguistics

LING 581: Advanced Computational Linguistics

LING/C SC 581: Advanced Computational Linguistics

LING 581: Advanced Computational Linguistics

LING/C SC 581: Advanced Computational Linguistics

Constraining Chart Parsing with Partial Tree Bracketing

LING/C SC 581: Advanced Computational Linguistics

LING/C SC 581: Advanced Computational Linguistics

LING/C SC 581: Advanced Computational Linguistics

Progress report on Semantic Role Labeling

LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.

Presentation transcript:

LING 581: Advanced Computational Linguistics Lecture Notes February 12th

Today's Topics 1.Passives in the WSJ PTB 2.MXPOST 3.BIKEL COLLINS 4.EVALB

How many passives? Passives: < /^NP.*-SBJ-([0-9]+)/#1%i << ( VP < (VBN $.. < ( -NONE- < /\*-([0-9]+)/#1%i))))

How many passives? Example: -TTL (title)

How many passives? Example: – Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields PLC, was named a nonexecutive director of this British industrial conglomerate. < /^NP.*-SBJ-([0-9]+)/#1%i << ( VP < VBN << < ( -NONE- < /\*-([0-9]+)/#1%i)))

How many passives? Example (false positive): – New England Electric, based in Westborough, Mass., had offered $ 2 billion to acquire PS of New Hampshire, well below the $ 2.29 billion value United Illuminating places on its bid and the $ 2.25 billion Northeast says its bid is worth.

How many passives? Need some form of the verb be: – < /^NP.*-SBJ-([0-9]+)/#1%i << ( VP < (/^VB/ < /^(be|am|are|is|was|were|been|'m|'s|'re)$/) < (VP < VBN << < ( -NONE- < /\*-([0-9]+)/#1%i))))

Today's Topics 1.Passives in the WSJ PTB 2.MXPOST 3.BIKEL COLLINS 4.EVALB

MXPOST A statistical POS tagger: A maximum entropy model for Part-of-Speech Tagging (Rathnaparkhi, A.) Find, download and unpack jmx.tar CLASSPATH environment variable: – export CLASSPATH=/Users/sandiway/Downloads/jmx/mxpost.jar Test file (~/Desktop/test.txt): –Colorless green ideas sleep furiously.

MXPOST Run (assuming I'm in my jmx directory):./mxpost tagger.project/ < ~/Desktop/test.txt Read items from tagger.project//word.voc Read 45 items from tagger.project//tag.voc Read items from tagger.project//tagfeatures.contexts Read contexts, numFeatures from tagger.project//tagfeatures.fmap Read model tagger.project//model : numPredictions=45, numParams= Read tagdict from tagger.project//tagdict *This is MXPOST (Version 1.0)* *Copyright (c) 1997 Adwait Ratnaparkhi* Colorless_JJ green_JJ ideas_NNS sleep_VBP furiously_RB._. Sentence: 0 Length: 6 Elapsed Time: seconds. trained on WSJ sections 00-18

MXPOST Task: to determine t i for w i Training: use features based on history: – { w i, w i+1, w i-1, w i+2, w i-2, t i-1, t i-2 } 1."rare" = (count < 5) 2.feature must occur 10 times or more to be used

MXPOST Example:

MXPOST Example:

MXPOST Another model (I downloaded) trained on wsj-02-21:./mxpost ~/research/jmx/wsj mxpost/ < ~/Desktop/test.txt Read 1061 items from /Users/sandiway/research/jmx/wsj mxpost//word.voc Read 45 items from /Users/sandiway/research/jmx/wsj mxpost//tag.voc Read 4190 items from /Users/sandiway/research/jmx/wsj mxpost//tagfeatures.contexts Read 4190 contexts, 9286 numFeatures from /Users/sandiway/research/jmx/wsj mxpost//tagfeatures.fmap Read model /Users/sandiway/research/jmx/wsj mxpost//model : numPredictions=45, numParams=9286 Read tagdict from /Users/sandiway/research/jmx/wsj mxpost//tagdict *This is MXPOST (Version 1.0)* *Copyright (c) 1997 Adwait Ratnaparkhi* Colorless_JJ green_JJ ideas_NNS sleep_IN furiously_RB._. Sentence: 0 Length: 6 Elapsed Time: seconds.

Today's Topics 1.Passives in the WSJ PTB 2.MXPOST 3.BIKEL COLLINS 4.EVALB

Bikel Collins Parsing – Command – Input file format (sentences)

Statistical Parser Development Methodology – partition corpus into a training and a test set – compare parser output on test set against withheld reference parses Possible criteria – all or nothing: match the reference (“gold”) tree exactly – partial credit: use some “goodness of fit” metric

Statistical Parser Development Standard Operating Procedure: training sections 2–21 evaluation 0 24 section 23 – one million words of 1989 Wall Street Journal (WSJ) articles – nearly 50,000 sentences (49,208), sectioned almost 40K sentences (39,832) 2.4K sentences (2,416) Penn Treebank (WSJ)

Computing Tree Similarity Evaluation: – we’ve got 2400 sentences to compare How do we automate parser output scoring? PARSEVAL: bracketing span match A Procedure for Quantitatively Comparing the Syntactic Coverage of English Parsers (Black et al., 1991) Computer program: EVALB (Sekine & Collins, 1997) PARSEVAL: bracketing span match A Procedure for Quantitatively Comparing the Syntactic Coverage of English Parsers (Black et al., 1991) Computer program: EVALB (Sekine & Collins, 1997)

Today's Topics 1.Passives in the WSJ PTB 2.MXPOST 3.BIKEL COLLINS 4.EVALB

PARSEVAL Proposal: given trees T 1 and T 2 Preprocessing stage: – Remove auxiliary verbs, not, infinitival marker to, empty categories, possessive ‘s, and punctuation – Remove unary branching nodes (including part of speech tags) Compute scores: – # Crossing Parentheses – Recall – Precision in the literature, the F-score is reported, i.e. F = 2 ⋅ precision ⋅ recall ÷ (precision + recall) in the literature, the F-score is reported, i.e. F = 2 ⋅ precision ⋅ recall ÷ (precision + recall)

PARSEVAL Proposal: given trees T 1 and T 2 Preprocessing stage: – Remove auxiliary verbs, not, infinitival marker to, empty categories, possessive ‘s, and punctuation – Remove unary branching nodes (including part of speech tags) would go there ⊢ go there has been laughing ⊢ laughing does sing it correctly ⊢ sing it correctly is a cup is blue has a dollar does the laundry (not deleted if copula or main verb) is not in here ⊢ is in here she opted to retire ⊢ she opted retire Lori’s mother ⊢ Lori mother

PARSEVAL Proposal: given trees T 1 and T 2 Preprocessing stage: – Remove auxiliary verbs, not, infinitival marker to, empty categories, possessive ‘s, and punctuation – Remove unary branching nodes (including part of speech tags)

PARSEVAL Proposal: given trees T 1 and T 2 Preprocessing stage: – Remove auxiliary verbs, not, infinitival marker to, empty categories, possessive ‘s, and punctuation – Remove unary branching nodes (including part of speech tags) Black et al. (1991) describes other special case rules for sequences of prepositions etc. Black et al. (1991) describes other special case rules for sequences of prepositions etc.

PARSEVAL Proposal: given trees T 1 and T 2 Compute scores: – # Crossing Parentheses – Recall – Precision

Proposal: given trees T 1 and T 2 Compute scores: – # Crossing Parentheses PARSEVAL Gold (T 2 ): Parser (T 1 ): Score = 1

# Crossing Parentheses 0 The 1 prospect 2 of 3 cutting 4 back 5 spending 6 PARSEVAL Gold (T 2 ): Parser (T 1 ): easy to compute

PARSEVAL Proposal: given trees T 1 and T 2 (gold) Compute scores: – # Crossing Parentheses – Recall – Precision

PARSEVAL Proposal: given trees T 1 and T 2 (gold) Compute scores: – Recall – Precision |nodes(T 1 ) ∩ nodes(T 2 )| |nodes(T 2 )| |nodes(T 1 ) ∩ nodes(T 2 )| |nodes(T 2 )| |nodes(T 1 ) ∩ nodes(T 2 )| |nodes(T 1 )| |nodes(T 1 ) ∩ nodes(T 2 )| |nodes(T 1 )|

Proposal: given trees T 1 and T 2 (gold) Compute scores: – Recall – Precision PARSEVAL Gold (T 2 ): Parser (T 1 ): nodes(T 1 ) ∩ nodes(T 2 ) Recall: 3/4 = 75% Precision: 3/5 = 60% F-score: 2 ⋅ 3/5 ⋅ 3/4 ÷ (3/5+3/4) = 69%

Recall and Precision: 0 The 1 prospect 2 of 3 cutting 4 back 5 spending 6 PARSEVAL Gold (T 2 ): Parser (T 1 ): also easy to compute

EVALB Implements the basic PARSEVAL proposal (some small differences) Preprocessing stage Compute scores:

EVALB [6] THE PARAMETER (.prm) FILE The.prm file sets options regarding the scoring method. COLLINS.prm gives the same scoring behaviour as the scorer used in (Collins 97). The options chosen were: 1) LABELED 1 to give labelled precision/recall figures, i.e. a constituent must have the same span *and* label as a constituent in the goldfile. 2) DELETE_LABEL TOP Don't count the "TOP" label (which is always given in the output of tgrep) when scoring. 3) DELETE_LABEL -NONE- Remove traces (and all constituents which dominate nothing but traces) when scoring. For example.... (VP (VBD reported) (SBAR (-NONE- 0) (S (-NONE- *T*-1)))) (..))) would be processed to give.... (VP (VBD reported)) (..))) 4) DELETE_LABEL, -- for the purposes of scoring remove punctuation DELETE_LABEL : DELETE_LABEL `` DELETE_LABEL '' DELETE_LABEL. 5) DELETE_LABEL_FOR_LENGTH -NONE- -- don't include traces when calculating the length of a sentence (important when classifying a sentence as 40 words) 6) EQ_LABEL ADVP PRT Count ADVP and PRT as being the same label when scoring.

EVALB To run the scorer: > evalb -p Parameter_file Gold_file Test_file For example to use the sample files: > evalb -p sample.prm sample.gld sample.tst

EVALB Gold standard : (S (A (P this)) (B (Q is) (A (R a) (T test)))) (S (A-SBJ-1 (P this)) (B-WHATEVER (Q is) (A (R a) (T test)))) (S (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test)))) (S (A (P this)) (B (Q is) (A (R a) (T test))) (-NONE- *)) (S (A (P this)) (B (Q is) (A (R a) (T test))) (: *)) Test : (S (A (P this)) (B (Q is) (A (R a) (T test)))) (S (A (P this)) (B (Q is) (C (R a) (T test)))) (S (A (P this)) (B (Q is) (A (R a) (U test)))) (S (C (P this)) (B (Q is) (A (R a) (U test)))) (S (A (P this)) (B (Q is) (R a) (A (T test)))) (S (A (P this) (Q is)) (A (R a) (T test))) (S (P this) (Q is) (R a) (T test)) (P this) (Q is) (R a) (T test) (S (A (P this)) (B (Q is) (A (A (R a) (T test))))) (S (A (P this)) (B (Q is) (A (A (A (A (A (R a) (T test)))))))) (S (A (P this)) (B (Q was) (A (A (R a) (T test))))) (S (A (P this)) (B (Q is) (U not) (A (A (R a) (T test))))) (TOP (S (A (P this)) (B (Q is) (A (R a) (T test))))) (S (A (P this)) (NONE *) (B (Q is) (A (R a) (T test)))) (S (A (P this)) (S (NONE abc) (A (NONE *))) (B (Q is) (A (R a) (T test)))) (S (A (P this)) (B (Q is) (A (R a) (TT test)))) (S (A (P This)) (B (Q is) (A (R a) (T test)))) (S (A (P That)) (B (Q is) (A (R a) (T test)))) (S (A (P this)) (B (Q is) (A (R a) (T test)))) (S (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test))) (A (P this)) (B (Q is) (A (R a) (T test)))) (S (A (P this)) (B (Q is) (A (R a) (T test))) (-NONE- *)) (S (A (P this)) (B (Q is) (A (R a) (T test))) (: *))

EVALB Results: Sent. Matched Bracket Cross Correct Tag ID Len. Stat. Recal Prec. Bracket gold test Bracket Words Tags Accracy ============================================================================ ============================================================================ === Summary === -- All -- Number of sentence = 24 Number of Error sentence = 5 Number of Skip sentence = 2 Number of Valid sentence = 17 Bracketing Recall = Bracketing Precision = Complete match = Average crossing = 0.06 No crossing = or less crossing = Tagging accuracy = len<=40 -- Number of sentence = 23 Number of Error sentence = 5 Number of Skip sentence = 2 Number of Valid sentence = 16 Bracketing Recall = Bracketing Precision = Complete match = Average crossing = 0.06 No crossing = or less crossing = Tagging accuracy = 96.88

EVALB [5] HOW TO CREATE A GOLDFILE FROM THE PENN TREEBANK The gold and parsed files are in a format similar to this: (TOP (S (INTJ (RB No)) (,,) (NP (PRP it)) (VP (VBD was) (RB n't) (NP (NNP Black) (NNP Monday))) (..))) To create a gold file from the treebank: tgrep -wn '/.*/' | tgrep_proc.prl will produce a goldfile in the required format. ("tgrep -wn '/.*/'" prints parse trees, "tgrep_process.prl" just skips blank lines). For example, to produce a goldfile for section 23 of the treebank: tgrep -wn '/.*/' | tail | tgrep_process.prl | sed 2416q > sec23.gold You don’t have the ancient program tgrep…

EVALB However you can use tsurgeon from the Stanford tregex you downloaded to accomplish the same thing Example: – file: wsj_0927.mrg

EVALB./tsurgeon.sh -treeFile wsj_0927.mrg -s ( (S (NP-SBJ-1 (NNP H.) (NNP Marshall) (NNP Schwarz)) (VP (VBD was) (VP (VBN named) (S (NP-SBJ (-NONE- *-1)) (NP- PRD (NP (NP (NN chairman)) (CC and) (NP (NN chief) (JJ executive) (NN officer))) (PP (IN of) (NP (NP (NNP U.S.) (NNP Trust) (NNP Corp.)) (,,) (NP (NP (DT a) (JJ private-banking) (NN firm)) (PP (IN with) (NP (NP (NNS assets)) (PP (IN under) (NP (NN management))) (PP (IN of) (NP (QP (IN about) ($ $) (CD 17) (CD billion)) (-NONE- *U*)))))))))))) (..))) ( (S (NP-SBJ (NP (NNP Mr.) (NNP Schwarz)) (,,) (ADJP (NP (CD 52) (NNS years)) (JJ old)) (,,)) (VP (MD will) (VP (VB succeed) (NP (NNP Daniel) (NNP P.) (NNP Davison)) (NP-TMP (NNP Feb.) (CD 1)) (,,) (SBAR-TMP (RB soon) (IN after) (S (NP-SBJ (NNP Mr.) (NNP Davison)) (VP (VBZ reaches) (NP (NP (NP (DT the) (NN company) (POS 's)) (JJ mandatory) (NN retirement) (NN age)) (PP (IN of) (NP (CD 65))))))))) (..))) ( (S (NP-SBJ-1 (NP (NNP Mr.) (NNP Schwarz)) (,,) (SBAR (WHNP-2 (WP who)) (S (NP-SBJ (-NONE- *T*-2)) (VP (VBZ is) (NP-PRD (NP (NN president)) (PP (IN of) (NP (NNP U.S.) (NNP Trust))))))) (,,)) (VP (MD will) (VP (VB be) (VP (VBN succeeded) (NP (-NONE- *-1)) (PP-LOC (IN in) (NP (DT that) (NN post))) (PP (IN by) (NP-LGS (NP (NNP Jeffrey) (NNP S.) (NNP Maurer)) (,,) (NP (CD 42)) (,,) (SBAR (WHNP-3 (WP who)) (S (NP-SBJ (-NONE- *T*-3)) (VP (VBZ is) (NP-PRD (NP (JJ executive) (NN vice) (NN president)) (PP (IN in) (NP (NP (NN charge)) (PP (IN of) (NP (NP (DT the) (NN company) (POS 's)) (NN asset-management) (NN group)))))))))))))) (..))) ( (S (NP-SBJ (NP (NNP U.S.) (NNP Trust)) (,,) (NP (NP (DT a) (JJ 136-year-old) (NN institution)) (SBAR (WHNP-2 (WDT that)) (S (NP-SBJ (-NONE- *T*-2)) (VP (VBZ is) (NP-PRD (NP (CD one)) (PP (IN of) (NP (NP (DT the) (JJS earliest) (NN high- net) (JJ worth) (NNS banks)) (PP-LOC (IN in) (NP (DT the) (NNP U.S.)))))))))) (,,)) (VP (VBZ has) (VP (VBN faced) (NP (NP (VBG intensifying) (NN competition)) (PP (IN from) (NP (NP (JJ other) (NNS firms)) (SBAR (WHNP-3 (WDT that)) (S (NP- SBJ (-NONE- *T*-3)) (VP (VBP have) (VP (VP (VBN established) (NP (-NONE- *RNR*-1))) (,,) (CC and) (VP (ADVP-MNR (RB heavily)) (VBN promoted) (NP (-NONE- *RNR*-1))) (,,) (NP-1 (NP (JJ private-banking) (NNS businesses)) (PP (IN of) (NP (PRP$ their) (JJ own))))))))))))) (..))) You can then redirect standard output to a file …

EVALB Example Assemble section 23 into one file cat TREEBANK_3/parsed/mrg/wsj/23/*.mrg > wsj_23.mrg – at this point not one tree per physical line Run tsurgeon./tsurgeon.sh -treeFile wsj_23.mrg -s > wsj_23.gold – File wsj_23.gold contains one tree per line

Task Run section 23 of the WSJ Treebank on the Bikel Collins parser – Extract the sentences from section 23 (perl etc.) Then run EVALB on the section to see how the Bikel Collins parser scores. Report back next time.