Parsing & Language Acquisition: Parsing Child Language Data CSMC 35100 Natural Language Processing February 7, 2006.

Slides:



Advertisements
Similar presentations
Tracking L2 Lexical and Syntactic Development Xiaofei Lu CALPER 2010 Summer Workshop July 14, 2010.
Advertisements

Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
WestEd.org Infant/Toddler Language Development Language Development and Older Infants.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
How Children Learn Language. Lec. 3
® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.
Universität des Saarlandes Seminar: Recent Advances in Parsing Technology Winter Semester Jesús Calvillo.
Introduction.  “a technique that enables the computer to encode complex grammatical knowledge such as humans use to assemble sentences, recognize errors.
The Linguistics of SLA.
In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
From linear sequences to abstract structures: Distributional information in infant-direct speech Hao Wang & Toby Mintz Department of Psychology University.
DS-to-PS conversion Fei Xia University of Washington July 29,
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
CALL: Computer-Assisted Language Learning. 2/14 Computer-Assisted (Language) Learning “Little” programs Purpose-built learning programs (courseware) Using.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.
Measuring Linguistic Complexity Kristopher Kyle
Automatic Measurement of Syntactic Development in Child Language Kenji Sagae Language Technologies Institute Student Research Symposium September 2005.
Automatic Measurement of Syntactic Development in Child Language Kenji Sagae Alon Lavie Brian MacWhinney Carnegie Mellon University.
Introduction to Machine Learning Approach Lecture 5.
Semantic Parsing for Robot Commands Justin Driemeyer Jeremy Hoffman.
Building High Quality Databases for Minority Languages such as Galician F. Campillo, D. Braga, A.B. Mourín, Carmen García-Mateo, P. Silva, M. Sales Dias,
14: THE TEACHING OF GRAMMAR  Should grammar be taught?  When? How? Why?  Grammar teaching: Any strategies conducted in order to help learners understand,
Introduction.  Classification based on function role in classroom instruction  Placement assessment: administered at the beginning of instruction 
Emergence of Syntax. Introduction  One of the most important concerns of theoretical linguistics today represents the study of the acquisition of language.
Learning the passive in natural(istic) settings Katie Alcock, Ken Rimba, Manizha Tellaie, and Charles Newton Thanks to Kamil ud Deen.
® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John Sabatini and Lei Chen Educational Testing Service.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Corpus-based computational linguistics or computational corpus linguistics? Joakim Nivre Uppsala University Department of Linguistics and Philology.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
The Linguistics of Second Language Acquisition
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Assessment of Morphology & Syntax Expression. Objectives What is MLU Stages of Syntactic Development Examples of Difficulties in Syntax Why preferring.
SOCIO-COGNITIVE APPROACHES TO TESTING AND ASSESSMENT
Automatic Readability Evaluation Using a Neural Network Vivaek Shivakumar October 29, 2009.
Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 Coordinate Structures: On the Relationship between Parsing Preferences and Corpus Frequencies Ilona Steiner.
Supertagging CMSC Natural Language Processing January 31, 2006.
Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
J UMPING AROUND AND LEAVING THINGS OUT : A PROFILE OF THE NARRATIVES ABILITIES OF CHILDREN WITH SPECIFIC LANGUAGE IMPAIRMENT M IRANDA, A., M C C ABE, A.,
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Chapter 2 Key Concepts. behaviorism Theoretical view proposing that learning principles can explain most behavior, and that observable events, rather.
Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.
Matakuliah: G0922/Introduction to Linguistics Tahun: 2008 Session 13 Second Language Acquisition.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Instructor: Nick Cercone CSEB - 1 Parsing and Context Free Grammars Parsers, Top Down, Bottom Up, Left Corner, Earley.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Using PaQu for language acquisition research Jan Odijk CLARIN 2015 Conference Wroclaw,
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Input, Interaction, and Output Input: (in language learning) language which a learner hears or receives and from which he or she can learn. Enhanced input:
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Language learning Approaches & key theorists. Historical language approaches 1 Grammar/translation Formalised end 19 th C. Mind consisting of separate.
1 Prepared by: Laila al-Hasan. 2 language Acquisition This lecture concentrates on the following topics: Language and cognition Language acquisition Phases.
Question Classification Ling573 NLP Systems and Applications April 25, 2013.
Investigating Pitch Accent Recognition in Non-native Speech
Dependency Grammar & Stanford Dependencies
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Parsing & Language Acquisition: Parsing Child Language Data CSMC Natural Language Processing February 7, 2006

Roadmap Motivation –Understanding language acquisition Child and child-directed Challenges –Casual conversation; ungrammatical speech Robust parsing –Dependency structure & dependency labels Child language assessment –Contrast w/human & auto –Error analysis Conclusions

Motivation: Child-directed Speech View child as learner Child-directed speech –Known to differ significantly from adult-directed E.g. acoustic-prosodic measures –Slower, increased duration, pause, pitch range, height Children attend more carefully –Input to learner Track vocabulary, morphology, phonology, syntax –Relate exposure to acquisition

Motivation: Child Speech Insight into language acquisition process –Assess linguistic development –Evaluate hypotheses for acquisition E.g. Rizzi’s “truncated structure hypothesis” –Phonology, morphology, lexicon, syntax,etc

Focus: Syntax Prior research emphasizes: –Lexicon, phonology, morphology –(Relatively) Easy to extract from Recordings, manual transcriptions Syntactic development –Significant markers in linguistic development Acquisition of inflection, agreement, aux-inversion,. –Rich source of information

Challenges Analyzing syntactic input & development –Requires parsing of child- and child-directed speech –Manual analysis prohibitively costly for lots Resources for training: Treebanks –Newspaper, adult conversational speech Current speech: –Child-directed: very conversational, ellipsis, Vocatives, onomatopoeia –Child: fragments, possibly ungrammatical

Resources CHILDES –Corpus (100s MB) of child language data Many languages –All manually transcribed Marked for disfluency, repetition, retracing Some manually morphologically analyzed, POS Some audio/video available –Not syntactically analyzed Specialized morphological analyzer, POS

Examples *CHI: more cookie. %mor: qn|more n|cookie. *MOT: how about another graham cracker? %mor: adv:wh|how prep|about^adv|about det|another n|graham n|cracker?

Parsing for Assessment Syntactic analysis of child speech –Assign to particular developmental level –Based on presence, frequency of constructions Measures: –MLU – Mean length of utterance Reaches ceiling around age 3, uninformative –IPSyn – Explicitly measures syntactic structure Scores 100 utts on 56 structures –NP,VP, questions, sentence structure –0=absent; 1=found once; 2=found > once Some identifiable from POS/morph and patterns –Others not: aux-inversion, conjunction, subord clauses, etc

Syntactic Analysis Approach Extract grammatical relations (GRs) –Analyze sentence to labeled dependencies Aux, Neg, Det, Inf, (ECX)Subj, Objs, Preds, Mods (CX)Jct, (X)Comp, etc Decomposition: –Text processing: Removes tagged disfluencies, reps, retracings Morphological analysis, POS tagging (special) –Unlabeled dependency analysis –Dependency labeling

Unlabeled Dependency Parsing Identify unlabeled dependency –Parse with Charniak parser Trained on Penn treebank –Convert to dependencies based on head table Different domain: 90.1% vs 92% WSJ –Shorter! < 15 wds

Dependency Labeling Assign labels to dependency structure –Easier than finding dep structure itself –Labels more separable 30 way classification: –Train on 5K words w/manual dependency labels TiMBL, features include: –Head & dep words, POS; order, separation, label of subsumer 91% accuracy Parse+labels: ~87% –Some trivial: Det, INF: 98%; (X)Comp: ~60% – Competitive

Automating Assessment Prior work: Computerized Profiling (CP) –Exploits word/POS patterns –Limited for older children; more sophisticated Generally used in semi-automatic approach Syntactic analysis improves Before, he told the man he was cold. Before he told the story, he was cold –Same POS pattern, similar words; diff’t structure –GR with clausal type (e.g. COMP), dep left Construct syntactically informed patterns

Evaluation Point difference: –Unsigned difference in scores (man vs auto) Point-to-point accuracy: –Number of language structure decisions correct Divided by number of decisions Test data: –A) 20 trans: ages 2-3; manual; MLU 2.9 –B) 25 trans: ages 8-9; semi-auto CP; MLU 7.0

Results Contrast w/ human assessment; CP –Point difference: 3.3 total GR; 8.3 CP CP worse on older children: 6.2 vs 10.2 –Less effective on more complex sentences –Point-to-point: 92% GR; ~85% CP –No pattern of miss vs false detect GR automatic scoring: high agreement

Error Analysis 4 of 56 IPSyn structures -> 50% of errors –Propositional complements, rel. clauses, bitransitive predicates Result of syntactic analysis errors –Esp. COMP – least accurate –Emphasis/Ellipsis Bad search patterns –More reliable than POS/word, but still hand-crafted

Conclusion Automatic analysis of child language data –Syntax, beyond morphology & POS Two-phrase dependency analysis –Unlabeled structure, followed by label assignment –Accurate even with out-of-domain training Enables more nuanced assessment –Especially as learner syntax becomes complex