Automatic Measurement of Syntactic Development in Child Language Kenji Sagae Language Technologies Institute Student Research Symposium September 2005.

Slides:



Advertisements
Similar presentations
CLL Session 3: L2 Research Methodology LAEL, Lancaster University Florencia Franceschina.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
CHAPTER 2 THE NATURE OF LEARNER LANGUAGE
Tracking L2 Lexical and Syntactic Development Xiaofei Lu CALPER 2010 Summer Workshop July 14, 2010.
Grammatical Relations and Lexical Functional Grammar Grammar Formalisms Spring Term 2004.
Dependency Parsing Some slides are based on:
® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.
Playing the Telephone Game: Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Department of Computer.
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
The Nature of Learner Language
1 Words and the Lexicon September 10th 2009 Lecture #3.
Normalized alignment of dependency trees for detecting textual entailment Erwin Marsi & Emiel Krahmer Tilburg University Wauter Bosma & Mariët Theune University.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
From linear sequences to abstract structures: Distributional information in infant-direct speech Hao Wang & Toby Mintz Department of Psychology University.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Workshop on Treebanks, Rochester NY, April 26, 2007 The Penn Treebank: Lessons Learned and Current Methodology Ann Bies Linguistic Data Consortium, University.
Measuring Linguistic Complexity Kristopher Kyle
Automatic Measurement of Syntactic Development in Child Language Kenji Sagae Alon Lavie Brian MacWhinney Carnegie Mellon University.
Review of three tests of children’s narrative ability [Poster presented at Narratives, Intervention, and Literacy conference, Paris, France, Sept. 2012]
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Measuring Language Development in Children: A Case Study of Grammar Checking in Child Language Transcripts Khairun-nisa Hassanali and Yang Liu {nisa,
Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓
Automatic Extraction of Opinion Propositions and their Holders Steven Bethard, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou and Dan Jurafsky Department.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.
Corpus-based computational linguistics or computational corpus linguistics? Joakim Nivre Uppsala University Department of Linguistics and Philology.
A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 12.
A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.
Inductive Dependency Parsing Joakim Nivre
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Automatic Readability Evaluation Using a Neural Network Vivaek Shivakumar October 29, 2009.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 16, March 6, 2007.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Supertagging CMSC Natural Language Processing January 31, 2006.
Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Fita Ariyana Rombel 7 (Thursday 9 am).
Intra-Chunk Dependency Annotation : Expanding Hindi Inter-Chunk Annotated Treebank Prudhvi Kosaraju, Bharat Ram Ambati, Samar Husain Dipti Misra Sharma,
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Parsing & Language Acquisition: Parsing Child Language Data CSMC Natural Language Processing February 7, 2006.
Learning Extraction Patterns for Subjective Expressions 2007/10/09 DataMining Lab 안민영.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge ACL 2008.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Language Identification and Part-of-Speech Tagging
CSC 594 Topics in AI – Natural Language Processing
David Mareček and Zdeněk Žabokrtský
Improving a Pipeline Architecture for Shallow Discourse Parsing
THE NATURE of LEARNER LANGUAGE
Dependency Grammar & Stanford Dependencies
The Nature Of Learner Language
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Automatic Measurement of Syntactic Development in Child Language Kenji Sagae Language Technologies Institute Student Research Symposium September 2005 Joint work with Alon Lavie and Brian MacWhinney

2 Using Natural Language Processing in Child Language Research  CHILDES Database (MacWhinney, 2000)  Several megabytes of child-parent dialog transcripts  Part-of-speech and morphology analysis  Tools available  Recently proposed syntactic annotation scheme (Sagae et al., 2004)  Grammatical Relations (GRs)  POS analysis not enough for many research questions  Very small amount of annotated data  Parsing  Can we use current NLP tools to analyze CHILDES GRs?  Allows, for example, automatic measurement of syntactic development

3 Outline  The CHILDES GR annotation scheme  Automatic GR analysis  Measurement of Syntactic Development

4 CHILDES GR Scheme (Sagae et al., 2004)  Addresses needs of child language researchers  Grammatical Relations (GRs)  Subject, object, adjunct, etc.  Labeled dependencies Dependent Head Dependency Label

5 CHILDES GR Scheme Includes Important GRs for Child Language Study

6 Automatic Syntactic (GR) Analysis  Input: a sentence  Output: dependency structure (GRs)  Three steps  Text preprocessing  Unlabeled dependency identification  Dependency labeling

7 STEP 1: Text Preprocessing Prepares Utterances for Parsing  CHAT transcription system  Explicitly marks certain extra-grammatical material: disfluency, retracing and repetitions  CLAN tools (MacWhinney, 2000)  Remove extra-grammatical material  Provide POS and Morphological analyses  CHAT and CLAN tools are publicly available

8 Step 2: Unlabeled Dependency Identification  Why?  Large training corpus: Penn Treebank (Marcus et al., 1993)  Head-table converts constituents into dependencies  Use an existing parser (trained on the Penn Treebank)  Charniak (2000)  Convert output to dependencies  Alternatively, a dependency parser  For example: MALT parser (Nivre and Scholz, 2004), Yamada and Matsumoto (2003)

9 Unlabeled Dependency Identification We eat the cheese sandwich sandwich eat

10 Domain Issues  Parser training data is in a very different domain  WSJ vs Parent-child dialogs  Domain specific training data would be better  But would have to be created (manually)  Performance is acceptable  Shorter, simpler sentences  Unlabeled dependency accuracy  WSJ test data: 92%  CHILDES data (2,000 words): 90%

11 Final Step: Dependency Labeling  Training data is required  Labeling dependencies is easier than finding unlabeled dependencies  Less training data is needed for labeling than for full labeled dependency parsing  Use a classifier  TiMBL (Daelemans et al., 2004)  Extract features from unlabeled dependency structure  GR labels are target classes

12 Dependency Labeling

13 Features Used for GR Labeling  Head and dependent words  Also their POS tags  Whether the dependent comes before or after the head  How far the dependent is from the head  The label of the lowest node in the constituent tree that includes both the head and dependent

14 Features Used for GR Labeling Consider the words “we” and “eat” Features: we, pro, eat, v, before, 1, S Class: SUBJ

15 Good GR Labeling Results with Small Training Set  5,000 words for training  2,000 words for testing  Accuracy of dependency labeling (on perfect dependencies): 91.4%  Overall accuracy (Charniak parser + dependency labeling): 86.9%

16 Some GRs Are Easier Than Others  Overall accuracy: 86.9%  Easily identifiable GRs  DET, POBJ, INF, NEG: Precision and recall above 98%  Difficult GRs  COMP, XCOMP: below 65%  Less than 4% of the GRs seen in training and test sets.

17 Precision and Recall of Specific GRs GRPrecisionRecallF-score SUBJ OBJ COORD JCT MOD PRED ROOT COMP XCOMP

18 Index of Productive Syntax (IPSyn) (Scarborough, 1990)  A measure of child language development  Assigns a numerical score for grammatical complexity (from 0 to 112 points)  Used in hundreds of studies

19 IPSyn Measures Syntactic Development  IPSyn: Designed for investigating differences in language acquisition  Differences in groups (for example: bilingual children)  Individual differences (for example: delayed language development)  Focus on syntax  Addresses weaknesses of Mean Length of Utterance (MLU)  MLU surprisingly useful until age 3, then reaches ceiling (or becomes unreliable)  IPSyn is very time-consuming to compute

20 IPSyn Is More Informative Than MLU in Children Over Age 3yrs

21 Computing IPSyn (manually)  Corpus of 100 transcribed utterances  Consecutive, no repetitions  Identify 56 specific language structures (IPSyn Items)  Examples:  Presence of auxiliaries or modals  Inverted auxiliary in a wh-question  Conjoined clauses  Fronted or center-embedded subordinate clauses  Count occurrences (zero, one, two or more)  Add counts

22 Automating IPSyn  Existing state of manual computation  Spreadsheets  Search each sentence for language structures  Use part-of-speech tagging to narrow down the number of sentences for certain structures  For example: Verb + Noun, Determiner + Adjective + Noun  Can’t we just use part-of-speech tagging?  Only one other automated implementation of IPSyn exists, and it uses only words and POS tags

23 Automating IPSyn without Syntactic Analysis  Use patterns of words and parts-of-speech to find language structures  Computerized Profiling, or CP (Long, Fey and Channell, 2004)  Works well for many IPSyn items  Det + Adjective + Noun sequence  But does not work very well for several important items  Fronted or center-embedded subordinate clauses  Inverted auxiliary in a wh-question  Cuts down manual work significantly (good)  Fully automatic IPSyn scores only somewhat accurate (not so good)

24 Some IPSyn Items Require Syntactic Analysis for Reliable Recognition (and some don’t)  Determiner + Adjective + Noun  Auxiliary verb  Adverb modifying adjective or nominal  Subject + Verb + Object  Sentence with 3 clauses  Conjoined sentences  Wh-question with inverted auxiliary/modal/copula  Relative clauses  Propositional complements  Fronted subordinate clauses  Center-embedded clauses

25 Automating IPSyn with Grammatical Relation Analyses  Search for language structures using patterns that involve POS tags and GRs (labeled dependencies)  Still room for under- and over-generalization, but patterns are easier to write and more reliable  Examples  Wh-embedded clauses: search for wh-words whose head (or transitive head) is a dependent in a GR of types [XC]SUBJ, [XC]PRED, [XC]JCT, [XC]MOD, COMP or XCOMP  Relative clauses: search for a CMOD where the dependent is to the right of the head

26 Evaluation Data  Two sets of transcripts with IPSyn scoring from two different child language research groups  Set A  Scored fully manually  20 transcripts  Ages: about 3 yrs.  Set B  Scored with CP first, then manually corrected  25 transcripts  Ages: about 8 yrs. (Two transcripts in each set were held out for development and debugging)

27 Evaluation Metrics: Point Difference  Point difference  The absolute point difference between the scores provided by our system, and the scores computed manually  Simple, and shows how close the automatic scores are to the manual scores  Acceptable range  Smaller for older children

28 Evaluation Metrics: Point-to-Point Accuracy  Point-to-point accuracy  Reflects overall reliability over each scoring decision made in the computation of IPSyn scores  Scoring decisions: presence or absence of language structures in the transcript Point-to-Point Acc = C(Correct Decisions) C(Total Decisions)  Commonly used for assessing inter-rater reliability among human scorers (for IPSyn, about 94%).

29 Results  IPSyn scores from  Our GR-based system (GR)  Manual scoring (HUMAN)  Computerized Profiling (CP)

30 GR-based IPSyn Is Quite Accurate SystemAvg. Point Difference to HUMAN Point-to-point Reliability (%) GR (total) CP (total) GR (set A) CP (set A) GR (set B) CP (set B)

31 Comparing Our GR-IPSyn and CP-IPSyn

32 Error Analysis: Four Problematic Items Cause Half of Error  Four (of 56) IPSyn items account for about half of all mistakes made by our GR-based system (a)Propositional complement: 16.9% “I said you can go now” (b) Copula/Modal/Aux for emphasis or ellipsis: 12.3% “I thought he ate his cake, but he didn’t.” (c) Relative clause: 10.6% “This is the car I saw.” (d) Bitransitive predicate: 5.8% “I gave her the book.” (a), (c), (d): Incorrect GR analysis (b): Imperfect search pattern

33 Conclusion and Future Work  We can annotate transcripts of child language with Grammatical Relations using current NLP tools and a small amount of manually annotated data  The reliability of an automated version of IPSyn that uses CHILDES GRs is close to that of human scoring  GR analysis still needs work  More training data  Other parsing techniques  Use of GR-based IPSyn by child language researchers should reveal additional problem areas

34 References Charniak, E A maximum-entropy-inspired parser. Proceedings of the First Annual Meeting of the North American Chapter of the Association for Computational Linguistics. Seattle, WA. Daelemans, W., Zavrel, J., van der Sloot, K., and van den Bosch TiMBL: Tilburg Memory Based Learner, version 5.1, Reference Guide. ILK Research Group Technical Report Series, no , Long, S. H., Fey, M. E., Channell, R. W Computerized Profiling (version 9.6.0). Cleveland, OH: Case Western Reserve University. MacWhinney, B The CHILDES Project: Tools for Analyzing Talk. Mahwah, NJ: Lawrence Erlbaum Associates. Marcus, M. P., Santorini, B., Marcinkiewics, M. A Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19. Nivre, J., Scholz, M Deterministic parsing of English text. Proceedings of the International Conference on Computational Linguistics (pp ). Geneva, Switzerland. Sagae, K., MacWhinney, B., Lavie, A Adding syntactic annotations to transcripts of parent-child dialogs. Proceedings of the Fourth International Conference on Language Resources and Evaluation. Lisbon, Portugal. Scarborough, H. S Index of Productive Syntax. Applied Psycholinguistics, 11,

35 Where POS Tagging is not enough  Sentences with same POS sequence may have different structure (a)Before [,] he told the man he was cold. (b)Before he told the story [,] he was cold.  Some syntactic structures are difficult to recognize using only POS tags and words  Search patterns may under- and over-generate  Using syntactic analysis is easier and more reliable