A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.

Slides:

Advertisements

Similar presentations

HPSG parser development at U-tokyo Takuya Matsuzaki University of Tokyo.

Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 11.

Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.

Grammatical Relations and Lexical Functional Grammar Grammar Formalisms Spring Term 2004.

Dependency Parsing Joakim Nivre. Dependency Grammar Old tradition in descriptive grammar Modern theroretical developments: –Structural syntax (Tesnière)

Dependency Parsing Some slides are based on:

Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.

In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.

Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

From linear sequences to abstract structures: Distributional information in infant-direct speech Hao Wang & Toby Mintz Department of Psychology University.

Parsing with CFG Ling 571 Fei Xia Week 2: 10/4-10/6/05.

Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

1/13 Parsing III Probabilistic Parsing and Conclusions.

1/17 Probabilistic Parsing … and some other approaches.

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.

Automatic Measurement of Syntactic Development in Child Language Kenji Sagae Language Technologies Institute Student Research Symposium September 2005.

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

Automatic Measurement of Syntactic Development in Child Language Kenji Sagae Alon Lavie Brian MacWhinney Carnegie Mellon University.

Introduction to Machine Learning Approach Lecture 5.

1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.

Measuring Language Development in Children: A Case Study of Grammar Checking in Child Language Transcripts Khairun-nisa Hassanali and Yang Liu {nisa,

Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.

Corpus-based computational linguistics or computational corpus linguistics? Joakim Nivre Uppsala University Department of Linguistics and Philology.

For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.

Spring /22/071 Beyond PCFGs Chris Brew Ohio State University.

1 Statistical Parsing Chapter 14 October 2012 Lecture #9.

The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.

Ling 570 Day 17: Named Entity Recognition Chunking.

SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.

Inductive Dependency Parsing Joakim Nivre

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.

A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart

Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.

Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.

Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.

CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Supertagging CMSC Natural Language Processing January 31, 2006.

Intra-Chunk Dependency Annotation : Expanding Hindi Inter-Chunk Annotated Treebank Prudhvi Kosaraju, Bharat Ram Ambati, Samar Husain Dipti Misra Sharma,

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

Dependency Parsing Parsing Algorithms Peng.Huang

Parsing & Language Acquisition: Parsing Child Language Data CSMC Natural Language Processing February 7, 2006.

Instructor: Nick Cercone CSEB - 1 Parsing and Context Free Grammars Parsers, Top Down, Bottom Up, Left Corner, Earley.

Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.

NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.

Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.

PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)

A Simple English-to-Punjabi Translation System By : Shailendra Singh.

Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.

Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:

Natural Language Processing Vasile Rus

Language Identification and Part-of-Speech Tagging

Raymond J. Mooney University of Texas at Austin

CSC 594 Topics in AI – Natural Language Processing

Basic Parsing with Context Free Grammars Chapter 13

David Mareček and Zdeněk Žabokrtský

Authorship Attribution Using Probabilistic Context-Free Grammars

Probabilistic and Lexicalized Parsing

David Kauchak CS159 – Spring 2019

Presentation transcript:

A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon University Thesis Committee: Alon Lavie, co-chair Brian MacWhinney, co-chair Lori Levin Jaime Carbonell John Carroll, University of Sussex

2 Natural Language Parsing: Sentence → Syntactic Structure One of the core problems in NLP Input: The boy ate the cheese sandwich Output: (S (NP (Det The) (N boy)) (VP (V ate) (NP (Det the) (N cheese) (N sandwich)))) (ROOT (predicate eat) (surface ate) (tense past) (category V) (SUBJ (category N) (agreement 3s) (surface boy) (DET (surface the) (category Det))) (OBJ (category N) (definite +) (DET (surface the) (category Det)) (predicate sandwich) (surface sandwich) (MOD (category N) (surface cheese) (predicate cheese)))) ((1 2 The DET) (2 3 boy SUBJ) (3 0 ate ROOT) (4 6 the DET) (5 6 cheese MOD) (6 3 sandwich OBJ)) Grammatical Relations (GRs) Subject, object, adjunct, etc.

3 Using Natural Language Processing in Child Language Research CHILDES Database (MacWhinney, 2000) –200 megabytes of child-parent dialog transcripts –Part-of-speech and morphology analysis Tools available Not enough for many research questions –No syntactic analysis Can we use NLP to analyze CHILDES transcripts? –Parsing –Many decisions: representation, approach, etc.

4 Parsing CHILDES: Specific and General Motivation Specific task: automatic analysis of syntax in CHILDES corpora –Theoretical importance (study of child language development) –practical importance (measurement of syntactic competence) In general: Develop techniques for syntactic analysis, advance parsing technologies –Can we develop new techniques that perform better than current approaches? Rule-based Data-driven

5 Research Objectives Identify a suitable syntactic representation for CHILDES transcripts –Must address the needs of child language research Develop a high accuracy approach for syntactic analysis of spoken language transcripts –parents and children at different stages of language acquisition The plan: a multi-strategy approach –ML: ensemble methods –Parsing: several approaches possible, but combination is an underdeveloped area

6 Research Objectives Develop methods for combining analyses from different parsers and obtain improved accuracy –Combining rule-based and data-driven approaches Evaluate the accuracy of developed systems Validate the utility of the resulting systems to the child language community –Task-based evaluation: Automatic measurement of grammatical complexity in child language

7 Overview of the Multi-Strategy Approach for Syntactic Analysis Transcripts Parser A Parser B Parser C Parser D Parser E Parser Combination SYNTACTIC STRUCTURES

8 Thesis Statement The development of a novel multi-strategy approach for syntactic parsing allows for identification of Grammatical Relations in transcripts of parent-child dialogs at a higher level of accuracy than previously possible Through the combination of different NLP techniques (rule-based or data-driven), the multi-strategy approach can outperform each strategy in isolation, and produce significantly improved accuracy The resulting syntactic analysis are at a level of accuracy that makes them useful to child language research

9 Outline The CHILDES GR scheme GR Parsing of CHILDES transcripts Combining different strategies Automated measurement of syntactic development in child language Related work Conclusion

10 CHILDES GR Scheme (Sagae, MacWhinney and Lavie, 2004) Grammatical Relations (GRs) –Subject, object, adjunct, etc. –Labeled dependencies Addresses needs of child language researchers –Informative and intuitive, basis for DSS and IPSyn Dependent Head Dependency Label

11 CHILDES GR Scheme Includes Important GRs for Child Language Study

12 Outline The CHILDES GR scheme GR Parsing of CHILDES transcripts Combining different strategies Automated measurement of syntactic development in child language Related word Conclusion Evaluation Data Rule-based GR parsing Data-driven GR parsing

13 The Task: Sentence → GRs Input: We eat the cheese sandwich Output:

14 Evaluation of GR Parsing Dependency accuracy Precision/Recall of GRs

15 Evaluation: Calculating Dependency Accuracy We SUBJ 2 0 eat ROOT 3 5 the DET 4 5 cheeseMOD 5 2 sandwichSUBJ

16 Evaluation: Calculating Dependency Accuracy We SUBJ 2 0 eat ROOT 3 5 the DET 4 5 cheeseMOD 5 2 sandwichSUBJ 1 2 We SUBJ 2 0 eat ROOT 3 4 the DET 4 2 cheeseOBJ 5 2 sandwichPRED Accuracy = number of correct dependencies total number of dependencies = 2 / 5 = % GOLDPARSED

17 Evaluation: Precision and Recall of GRs Precision and recall are calculated separately for each GR type Calculated on aggregate counts over entire test corpus Example: SUBJ Precision = # SUBJ matches between PARSED and GOLD Total number of SUBJs in PARSED Recall = # SUBJ matches between PARSED and GOLD Total # of SUBJs in GOLD F-score = 2 ( Precision × Recall ) Precision + Recall

18 Evaluation: Precision and Recall of GRs Precision = # SUBJ matches between PARSED and GOLD Total number of SUBJs in PARSED = 1 / 2 = 50% 1 2 We SUBJ 2 0 eat ROOT 3 5 the DET 4 5 cheeseMOD 5 2 sandwichOBJ 1 2 We SUBJ 2 0 eat ROOT 3 4 the DET 4 2 cheeseOBJ 5 2 sandwichSUBJ GOLDPARSED Recall = # SUBJ matches between PARSED and GOLD Total # of SUBJs in GOLD = 1 / 1 = 100% F-score = 2 ( Precision × Recall ) Precision + Recall = 2(50×100) / (50+100) = 66.67

19 Outline The CHILDES GR scheme GR Parsing of CHILDES transcripts Combining different strategies Automated measurement of syntactic development in child language Evaluation Data Rule-based GR parsing Data-driven GR parsing

20 CHILDES Data: the Eve Corpus (Brown, 1973) A corpus from CHILDES –Manually annotated with GRs Training: ~ 5,000 words (adult) Development: ~ 1,000 words –600 adult, 400 child Test: ~ 2,000 words –1,200 adult, 800 child

21 Not All Child Utterances Have GRs Utterances in training and test sets are well-formed I need tapioca in the bowl. That’s a hat. In a minute. What about * Warm puppy happiness a blanket. * There briefcase. ? I drinking milk. ? I want Fraser hat. Separate Eve-child test set (700 words)

22 The WSJ Corpus (Penn Treebank) 1 million words Widely used –Sections 02-21: training –Section 22: development –Section 23: evaluation Large corpus with syntactic annotation –Out-of-domain Constituent structures –Convert to unlabeled dependencies using head- percolation table

23 Outline The CHILDES GR scheme GR Parsing of CHILDES transcripts Combining different strategies Automated measurement of syntactic development in child language Evaluation Data Rule-based GR parsing Data-driven GR parsing

24 Rule-Based Parsing The parser’s knowledge is encoded in manually written rules –Grammar, lexicon, etc. Only analyses that fit the rules are possible Accurate in specific domains, difficult to achieve wide coverage in open domain –Coverage, ambiguity, domain knowledge

25 Rule-Based Parsing of CHILDES data (Sagae, Lavie & MacWhinney, 2001, 2004) LCFlex (Rosé and Lavie, 2001)  Rules: CFG backbone augmented with unification constraints  Manually written, 153 rules  Robustness  Limited insertions: [Do] [you] want to go outside?  Limited skipping: No um maybe later.  PCFG disambiguation model  Trained on 2,000 words

26 High Precision from a Small Grammar Eve test corpus – 2,000 words 31% of the words can be parsed Accuracy (over all 2,000 words): 29% Precision: 94% High Precision, Low Recall Improve recall using parser’s robustness –Insertions, skipping –Multi-pass approach

27 Robustness and Multi-Pass Parsing No insertions, no skipping 31% parsed, 29% recall, 94% precision Insertion of NP and/or auxiliary 38% parsed, 35% recall, 92% precision Skipping of 1 word 52% parsed, 47% recall, 90% precision Skipping of 1 word, insertion of NP, aux 63% parsed, 55% recall, 88% precision

28 Use Robustness to Improve Recall

29 Outline The CHILDES GR scheme GR Parsing of CHILDES transcripts Combining different strategies Automated measurement of syntactic development in child language Evaluation Data Rule-based GR parsing Data-driven GR parsing

30 Data-driven Parsing Parser learns from a corpus of annotated examples Data-driven parsers are robust Two approaches –Existing statistical parser –Classifier-based parsing

31 Accurate GR Parsing with Existing Resources (Mostly) Large training corpus: Penn Treebank (Marcus et al., 1993) –Head-table converts constituents into dependencies Use an existing parser (trained on the Penn Treebank) –Charniak (2000) Convert output to unlabeled dependencies Use a classifier for dependency labeling

32 Unlabeled Dependency Identification We eat the cheese sandwich sandwich eat

33 Domain Issues Parser training data is in a very different domain –WSJ vs Parent-child dialogs Domain specific training data would likely be better Performance is acceptable –Shorter, simpler sentences –Unlabeled dependency accuracy WSJ test data: 92% Eve test data: 90%

34 Dependency Labeling Training data is required –Eve training set (5,000 words) Labeling dependencies is easier than finding unlabeled dependencies Use a classifier –TiMBL (Daelemans et al., 2004) –Extract features from unlabeled dependency structure –GR labels are target classes

35 Dependency Labeling

36 Features Used for GR Labeling Head and dependent words –Also their POS tags Whether the dependent comes before or after the head How far the dependent is from the head The label of the lowest node in the constituent tree that includes both the head and dependent

37 Features Used for GR Labeling Consider the words “we” and “eat” Features: we, pro, eat, v, before, 1, S Class: SUBJ

38 Good GR Labeling Results with Small Training Set Eve training set –5,000 words for training Eve test set –2,000 words for testing Accuracy of dependency labeling (on perfect dependencies): 91.4% Overall accuracy (Charniak parser + dependency labeling): 86.9%

39 Some GRs Are Easier Than Others Overall accuracy: 86.9% Easily identifiable GRs –DET, POBJ, INF, NEG: Precision and recall above 98% Difficult GRs –COMP, XCOMP: below 65% I think that Mary saw a movie (COMP) She tried to see a movie (XCOMP)

40 Precision and Recall of Specific GRs GRPrecisionRecallF-score SUBJ OBJ COORD JCT MOD PRED ROOT COMP XCOMP

41 Parsing with Domain-Specific Data Good results with a system based on the Charniak parser Why domain-specific data? –No Penn Treebank –Handle dependencies natively –Multi-strategy approach

42 Classifier-Based Parsing (Sagae & Lavie, 2005) Deterministic parsing –Single path, no backtracking –Greedy –Linear run-time Simple shift-reduce algorithm –Single pass over the input string Variety: Left-to-right, right-to-left (order matters) Classifier makes parser decisions –Classifier not tied to parsing algorithm Variety: Different types of classifiers can be used

43 A Simple, Fast and Accurate Approach Classifier-based parsing with constituents –Trained and evaluated on WSJ data: 87.5% –Very fast, competitive accuracy Simple adaptation to labeled dependency parsing –Similar to Malt parser (Nivre, 2004) –Handles CHILDES GRs directly

44 GR Analysis with Classifier-Based Parsing Stack S –Items may be POS-tagged words or dependency trees –Initialization: empty Queue W –Items are POS-tagged words –Initialization: Insert each word of the input sentence in order (first word is in front)

45 Shift and Reduce Actions Shift –Remove (shift) the word in front of queue W –Insert shifted item on top of stack S Reduce –Pop two topmost item from stack S –Push new item onto stack S New item forms new dependency Choose LEFT or RIGHT Choose Dependency Label

46 Parser Decisions Shift vs. Reduce If Reduce – RIGHT or LEFT –Dependency label We use a classifier to make these decisions

47 Classes and Features Classes –SHIFT –LEFT-SUBJ –LEFT-JCT –RIGHT-OBJ –RIGHT-JCT –… Features: derived from parser configuration –Crucially: two topmost items in S, first item in W –Additionally: other features that describe the current configuration (look-ahead, etc)

48 Parsing CHILDES with a Classifier-Based Parser Parser uses SVM Trained on Eve training set (5,000 words) Tested on Eve test set (2,000 words) Labeled dependency accuracy: 87.3% –Uses only domain-specific data –Same level of accuracy as GR system based on Charniak parser

49 Precision and Recall of Specific GRs GRPrecisionRecallF-score SUBJ OBJ COORD JCT MOD PRED ROOT COMP XCOMP

50 Precision and Recall of Specific GRs GRPrecisionRecallF-score SUBJ OBJ COORD JCT MOD PRED ROOT COMP XCOMP

51 Outline The CHILDES GR scheme GR Parsing of CHILDES transcripts Combining different strategies Automated measurement of syntactic development in child language Related Work Conclusion Weighted voting Combination as parsing Handling young child utterances

52 Combine Different Parsers to Get More Accurate Results Rule-based Statistical parsing + dependency labeling Classifier-based parsing –Obtain even more variety SVM vs MBL Left-to-right vs right-to-left

53 Simple (Unweighted) Voting Each parser votes for each dependency Word-by-word Every vote has the same weight

54 Simple (Unweighted) Voting He eats cake Parser A Parser B Parser C 1 2 He SUBJ1 2 He SUBJ 1 3 He SUBJ 2 0 eats CMOD 2 0 eats ROOT2 0 eats ROOT 3 1 cake OBJ 3 1 cake OBJ3 2 cake OBJ GOLD 1 2 He SUBJ 2 0 eats ROOT 3 2 cake OBJ

55 Simple (Unweighted) Voting He eats cake Parser A Parser B Parser C 1 2 He SUBJ1 2 He SUBJ 1 3 He SUBJ 2 0 eats CMOD 2 0 eats ROOT2 0 eats ROOT 3 1 cake OBJ 3 1 cake OBJ3 2 cake OBJ GOLD 1 2 He SUBJ 2 0 eats ROOT 3 2 cake OBJ VOTED 1 2 He SUBJ 2 0 eats ROOT 3 1 cake SUBJ He eats cake Parser A Parser B Parser C 1 2 He SUBJ1 2 He SUBJ 1 3 He SUBJ 2 0 eats CMOD 2 0 eats ROOT2 0 eats ROOT 3 1 cake OBJ 3 1 cake OBJ3 2 cake OBJ GOLD 1 2 He SUBJ 2 0 eats ROOT 3 2 cake OBJ VOTED

56 Simple (Unweighted) Voting He eats cake Parser A Parser B Parser C 1 2 He SUBJ1 2 He SUBJ 1 3 He SUBJ 2 0 eats CMOD 2 0 eats ROOT2 0 eats ROOT 3 1 cake OBJ 3 1 cake OBJ3 2 cake OBJ GOLD 1 2 He SUBJ 2 0 eats ROOT 3 2 cake OBJ VOTED 1 2 He SUBJ 2 0 eats ROOT 3 1 cake OBJ He eats cake Parser A Parser B Parser C 1 2 He SUBJ1 2 He SUBJ 1 3 He SUBJ 2 0 eats CMOD 2 0 eats ROOT2 0 eats ROOT 3 1 cake OBJ 3 1 cake OBJ3 2 cake OBJ GOLD 1 2 He SUBJ 2 0 eats ROOT 3 2 cake OBJ VOTED

57 Weighted Voting Each parser has a weight –Reflects confidence in parser’s GR identification Instead of adding number of votes, add the weight of votes Takes into account that some parsers are better than others

58 Weighted Voting He eats cake Parser A (0.4) Parser B (0.3) Parser C (0.8) 1 2 He SUBJ1 2 He SUBJ 1 3 He SUBJ 2 0 eats CMOD 2 0 eats ROOT2 0 eats ROOT 3 1 cake OBJ 3 1 cake OBJ3 2 cake OBJ GOLD 1 2 He SUBJ 2 0 eats ROOT 3 2 cake OBJ VOTED 1 3 He SUBJ 2 0 eats ROOT 3 2 cake OBJ VOTED

59 Label-Weighted Voting Not just one weight per parser, but one weight for each GR for each parser Takes into account specific strengths of each parser

60 Label-Weighted Voting He eats cake Parser A Parser B Parser C 1 2 He SUBJ (0.7)1 2 He SUBJ (0.8) 1 3 He SUBJ (0.6) 2 0 eats CMOD (0.3) 2 0 eats ROOT (0.9)2 0 eats ROOT(0.7) 3 1 cake OBJ (0.5) 3 1 cake OBJ (0.3)3 2 cake OBJ (0.9) GOLD 1 2 He SUBJ 2 0 eats ROOT 3 2 cake OBJ VOTED 1 2 He SUBJ 2 0 eats ROOT 3 2 cake OBJ VOTED

61 Voting Produces Very Accurate Results Parsers –Rule-based –Statistical based on Charniak parser –Classifier-based Left-to-right SVM Right-to-left SVM Left-to-right MBL Simple Voting: 88.0% Weighted Voting: 89.1% Label-weighted Voting: 92.1%

62 Precision and Recall of Specific GRs GRPrecisionRecallF-score SUBJ0.98 OBJ0.94 COORD JCT MOD PRED ROOT COMP XCOMP

63 Outline The CHILDES GR scheme GR Parsing of CHILDES transcripts Combining different strategies Automated measurement of syntactic development in child language Weighted voting Combination as parsing Handling young child utterances

64 Voting May Not Produce a Well-Formed Dependency Tree Voting on a word-by-word basis No guarantee of well-formedness Resulting set of dependencies may form a graph with cycles, or may not even be fully connected –Technically not fully compliant with CHILDES GR annotation scheme

65 Parser Combination as Reparsing Once several parsers have analyzed a sentence, use their output to guide the process of reparsing the sentence Two reparsing approaches –Maximum spanning tree –CYK (dynamic programming)

66 Dependency Parsing as Search for Maximum Spanning Tree First, build a graph –Each word in input sentence is a node –Each dependency proposed by any of the parsers is an weighted edge –If multiple parsers propose the same dependency, add weights into a single edge Then, simply find the MST –Maximizes the votes –Structure guaranteed to be a dependency tree –May have crossing branches

67 Parser Combination with the CYK Algorithm The CYK algorithm uses dynamic programming to find all parses for a sentence given a CFG –Probabilistic version finds most probable parse Build a graph, as with MST Parse the sentence using CYK –Instead of a grammar, consult the graph to determine how to fill new cells in the CYK table –Instead of probabilities, we use the weights from the graph

68 Precision and Recall of Specific GRs GRPrecisionRecallF-score SUBJ0.98 OBJ0.94 COORD JCT MOD PRED ROOT0.97 COMP XCOMP0.88

69 Outline The CHILDES GR scheme GR Parsing of CHILDES transcripts Combining different strategies Automated measurement of syntactic development in child language Weighted voting Combination as parsing Handling young child utterances

70 Handling Young Child Utterances with Rule-Based and Data-Driven Parsing Eve-child test set: I need tapioca in the bowl. That’s a hat. In a minute. * Warm puppy happiness a blanket. * There briefcase. ? I drinking milk. ? I want Fraser hat.

71 Three Types of Sentences in One Corpus No problem –High accuracy No GRs –But data-driven systems will output GRs Missing words, agreement errors, etc. –GRs are fine, but a challenge for data-driven systems trained on fully grammatical utterances

72 To Analyze or Not To Analyze: Ask the Rule-Based Parser Utterances with no GRs are annotated in test corpus as such Rule-based parser set to high precision –Same grammar as before If sentence cannot be parsed with the rule- based system, output No GR. –88% Precision, 89% Recall – Sentences are fairly simple

73 The Rule-Based Parser also Identifies Missing Words If the sentence can be analyzed with the rule-based system, check if any insertions were necessary –If inserted be or possessive marker ’s, insert the appropriate lexical item in the sentence Parse the sentence with data-driven systems, run combination

74 High Accuracy Analysis of Challenging Utterances Eve-child test –No rule-based first pass: 62.9% accuracy Many errors caused by GR analysis of words with no GRs –With rule-based pass: 88.0% accuracy 700 words from Naomi corpus –No rule-based: 67.4% –Rule-based, then combo: 86.8%

75 Outline The CHILDES GR scheme GR Parsing of CHILDES transcripts Combining different strategies Automated measurement of syntactic development in child language Related work Conclusion

76 Index of Productive Syntax (IPSyn) (Scarborough, 1990) A measure of child language development Assigns a numerical score for grammatical complexity (from 0 to 112 points) Used in hundreds of studies

77 IPSyn Measures Syntactic Development IPSyn: Designed for investigating differences in language acquisition –Differences in groups (for example: bilingual children) –Individual differences (for example: delayed language development) –Focus on syntax Addresses weaknesses of Mean Length of Utterance (MLU) –MLU surprisingly useful until age 3, then reaches ceiling (or becomes unreliable) IPSyn is very time-consuming to compute

78 Computing IPSyn (manually) Corpus of 100 transcribed utterances –Consecutive, no repetitions Identify 56 specific language structures (IPSyn Items) –Examples: Presence of auxiliaries or modals Inverted auxiliary in a wh-question Conjoined clauses Fronted or center-embedded subordinate clauses –Count occurrences (zero, one, two or more) Add counts

79 Automating IPSyn Existing state of manual computation –Spreadsheets –Search each sentence for language structures –Use part-of-speech tagging to narrow down the number of sentences for certain structures For example: Verb + Noun, Determiner + Adjective + Noun Automatic computation is possible with accurate GR analysis –Use GRs to search for IPSyn items

80 Some IPSyn Items Require Syntactic Analysis for Reliable Recognition (and some don’t) Determiner + Adjective + Noun Auxiliary verb Adverb modifying adjective or nominal Subject + Verb + Object Sentence with 3 clauses Conjoined sentences Wh-question with inverted auxiliary/modal/copula Relative clauses Propositional complements Fronted subordinate clauses Center-embedded clauses

81 Automating IPSyn with Grammatical Relation Analyses Search for language structures using patterns that involve POS tags and GRs (labeled dependencies) Examples –Wh-embedded clauses: search for wh-words whose head (or transitive head) is a dependent in a GR of types [XC]SUBJ, [XC]PRED, [XC]JCT, [XC]MOD, COMP or XCOMP –Relative clauses: search for a CMOD where the dependent is to the right of the head

82 Evaluation Data Two sets of transcripts with IPSyn scoring from two different child language research groups Set A –Scored fully manually –20 transcripts –Ages: about 3 yrs. Set B –Scored with CP first, then manually corrected –25 transcripts –Ages: about 8 yrs. (Two transcripts in each set were held out for development and debugging)

83 Evaluation Metrics: Point Difference Point difference –The absolute point difference between the scores provided by our system, and the scores computed manually –Simple, and shows how close the automatic scores are to the manual scores –Acceptable range Smaller for older children

84 Evaluation Metrics: Point-to-Point Accuracy Point-to-point accuracy –Reflects overall reliability over each scoring decision made in the computation of IPSyn scores –Scoring decisions: presence or absence of language structures in the transcript Point-to-Point Acc = C(Correct Decisions) C(Total Decisions) –Commonly used for assessing inter-rater reliability among human scorers (for IPSyn, about 94%).

85 Results IPSyn scores from –Our GR-based system (GR) –Manual scoring (HUMAN) –Computerized Profiling (CP) Long, Fey and Channell, 2004

86 GR-based IPSyn Is Quite Accurate SystemAvg. Point Difference to HUMAN Point-to-point Reliability (%) GR (total) CP (total) GR (set A) CP (set A) GR (set B) CP (set B)

87 GR-Based IPSyn Close to Human Scoring Automatic scores very reliable Validates usefulness of –GR annotation scheme –Automatic GR analysis Validates analysis over a large set of children of different ages

88 Outline The CHILDES GR scheme GR Parsing of CHILDES transcripts Combining different strategies Automated measurement of syntactic development in child language Related work Conclusion

89 Related Work GR schemes, GR evaluation: –Carroll, Briscoe & Sanfilippo, 1998 –Lin, 1998 –Yeh, 2000 –Preiss, 2003 Rule-based robust parsing –Heeman & Allen, 2001 –Lavie, 1996 –Rosé & Lavie, 2001 Parsing –Carroll & Briscoe, 2002 –Briscoe & Carroll, 2002 –Buchholz, 2002 –Tomita, 1987 –Magerman, 1995 –Ratnaparkhi, 1997 –Collins, 1997 –Charniak, 2000 Deterministic parsing –Yamada & Matsumoto, 2003 –Nivre & Scholz, 2004 Parser Combination –Henderson & Brill, 1999 –Brill & Wu, 1998 –Yeh, 2000 –Sarkar, 2001 Automatic measurement of grammatical complexity –Long, Fey & Channell, 2004

90 Outline The CHILDES GR scheme GR Parsing of CHILDES transcripts Combining different strategies Automated measurement of syntactic development in child language Related work Conclusion

91 Major Contributions An annotation scheme based on GRs for syntactic structure in CHILDES transcripts A linear-time classifier-based parser for constituent structures The development of rule-based and data-driven approaches to GR analysis –Precision/recall trade-off using insertions and skipping –Data-driven GR analysis using existing resources Charniak parser, Penn Treebank –Parser variety in classifier-based dependency parsing

92 Major Contributions (2) The use of different voting schemes for combining dependency analyses –Surpasses state-of-the-art in WSJ dependency parsing –Vastly outperforms individual parsing approaches A novel reparsing combination scheme –Maximum spanning trees, CYK An accurate automated tool for measurement of syntactic development in child language –Validates annotation scheme and quality of GR analyses

93 Possible Future Directions Classifier-based parsing –Beam search keeping linear time –Tree classification (Kudo & Matsumoto, 2004) Parser combination –Parser variety, reparsing combination with constituent trees Automated measurement of grammatical complexity –Take precision/recall into account –A data-driven approach to replace search rules Other languages

94

95

96

97

98 More on Dependency Voting On WSJ data: 93.9% unlabeled accuracy On Eve data –No RB: 91.1% COMP: 50% –No charn, No RB: 89.1% COMP: 50%, COORD: 84%, ROOT: 95% –No charn: 90.5% COMP: 67% –No RL, no MBL: 91.8%

99 Full GR Results XJCT ( 2 / 2) : OBJ ( 90 / 91) : NEG ( 26 / 25) : SUBJ ( 180 / 181) : INF ( 19 / 19) : POBJ ( 48 / 51) : XCOMP ( 23 / 23) : QUANT ( 4 / 4) : VOC ( 2 / 2) : TAG ( 1 / 1) : CPZR ( 10 / 9) : PTL ( 6 / 6) : COORD ( 33 / 33) : COMP ( 18 / 18) : AUX ( 74 / 78) : CJCT ( 6 / 5) : PRED ( 54 / 55) : DET ( 45 / 47) : MOD ( 94 / 89) : ROOT ( 239 / 238) : PUNCT ( 286 / 286) : COM ( 45 / 44) : ESUBJ ( 2 / 2) : CMOD ( 3 / 3) : JCT ( 78 / 84) :

100 Weighted Voting He eats cake Parser A (0.4) Parser B (0.3) Parser C (0.8) 1 2 He SUBJ1 2 He SUBJ 1 3 He SUBJ 2 0 eats CMOD 2 0 eats ROOT2 0 eats ROOT 3 1 cake OBJ 3 1 cake OBJ3 2 cake OBJ GOLD 1 2 He SUBJ 2 0 eats ROOT 3 2 cake OBJ VOTED 1 3 He SUBJ 2 0 eats ROOT 3 2 cake OBJ VOTED

101 Weighted Voting He eats cake Parser A Parser B Parser C 1 2 He SUBJ (0.7)1 2 He SUBJ (0.8) 1 3 He SUBJ (0.6) 2 0 eats CMOD (0.3) 2 0 eats ROOT (0.9)2 0 eats ROOT(0.7) 3 1 cake OBJ (0.5) 3 1 cake OBJ (0.3)3 2 cake OBJ (0.9) GOLD 1 2 He SUBJ 2 0 eats ROOT 3 2 cake OBJ VOTED 1 2 He SUBJ 2 0 eats ROOT 3 2 cake OBJ VOTED