Download presentation
Presentation is loading. Please wait.
Published byRosa Chambers Modified over 8 years ago
1
Parsing & Language Acquisition: Parsing Child Language Data CSMC 35100 Natural Language Processing February 7, 2006
2
Roadmap Motivation –Understanding language acquisition Child and child-directed Challenges –Casual conversation; ungrammatical speech Robust parsing –Dependency structure & dependency labels Child language assessment –Contrast w/human & auto –Error analysis Conclusions
3
Motivation: Child-directed Speech View child as learner Child-directed speech –Known to differ significantly from adult-directed E.g. acoustic-prosodic measures –Slower, increased duration, pause, pitch range, height Children attend more carefully –Input to learner Track vocabulary, morphology, phonology, syntax –Relate exposure to acquisition
4
Motivation: Child Speech Insight into language acquisition process –Assess linguistic development –Evaluate hypotheses for acquisition E.g. Rizzi’s “truncated structure hypothesis” –Phonology, morphology, lexicon, syntax,etc
5
Focus: Syntax Prior research emphasizes: –Lexicon, phonology, morphology –(Relatively) Easy to extract from Recordings, manual transcriptions Syntactic development –Significant markers in linguistic development Acquisition of inflection, agreement, aux-inversion,. –Rich source of information
6
Challenges Analyzing syntactic input & development –Requires parsing of child- and child-directed speech –Manual analysis prohibitively costly for lots Resources for training: Treebanks –Newspaper, adult conversational speech Current speech: –Child-directed: very conversational, ellipsis, Vocatives, onomatopoeia –Child: fragments, possibly ungrammatical
7
Resources CHILDES –Corpus (100s MB) of child language data Many languages –All manually transcribed Marked for disfluency, repetition, retracing Some manually morphologically analyzed, POS Some audio/video available –Not syntactically analyzed Specialized morphological analyzer, POS
8
Examples *CHI: more cookie. %mor: qn|more n|cookie. *MOT: how about another graham cracker? %mor: adv:wh|how prep|about^adv|about det|another n|graham n|cracker?
9
Parsing for Assessment Syntactic analysis of child speech –Assign to particular developmental level –Based on presence, frequency of constructions Measures: –MLU – Mean length of utterance Reaches ceiling around age 3, uninformative –IPSyn – Explicitly measures syntactic structure Scores 100 utts on 56 structures –NP,VP, questions, sentence structure –0=absent; 1=found once; 2=found > once Some identifiable from POS/morph and patterns –Others not: aux-inversion, conjunction, subord clauses, etc
10
Syntactic Analysis Approach Extract grammatical relations (GRs) –Analyze sentence to labeled dependencies Aux, Neg, Det, Inf, (ECX)Subj, Objs, Preds, Mods (CX)Jct, (X)Comp, etc Decomposition: –Text processing: Removes tagged disfluencies, reps, retracings Morphological analysis, POS tagging (special) –Unlabeled dependency analysis –Dependency labeling
11
Unlabeled Dependency Parsing Identify unlabeled dependency –Parse with Charniak parser Trained on Penn treebank –Convert to dependencies based on head table Different domain: 90.1% vs 92% WSJ –Shorter! < 15 wds
12
Dependency Labeling Assign labels to dependency structure –Easier than finding dep structure itself –Labels more separable 30 way classification: –Train on 5K words w/manual dependency labels TiMBL, features include: –Head & dep words, POS; order, separation, label of subsumer 91% accuracy Parse+labels: ~87% –Some trivial: Det, INF: 98%; (X)Comp: ~60% – Competitive
13
Automating Assessment Prior work: Computerized Profiling (CP) –Exploits word/POS patterns –Limited for older children; more sophisticated Generally used in semi-automatic approach Syntactic analysis improves Before, he told the man he was cold. Before he told the story, he was cold –Same POS pattern, similar words; diff’t structure –GR with clausal type (e.g. COMP), dep left Construct syntactically informed patterns
14
Evaluation Point difference: –Unsigned difference in scores (man vs auto) Point-to-point accuracy: –Number of language structure decisions correct Divided by number of decisions Test data: –A) 20 trans: ages 2-3; manual; MLU 2.9 –B) 25 trans: ages 8-9; semi-auto CP; MLU 7.0
15
Results Contrast w/ human assessment; CP –Point difference: 3.3 total GR; 8.3 CP CP worse on older children: 6.2 vs 10.2 –Less effective on more complex sentences –Point-to-point: 92% GR; ~85% CP –No pattern of miss vs false detect GR automatic scoring: high agreement
16
Error Analysis 4 of 56 IPSyn structures -> 50% of errors –Propositional complements, rel. clauses, bitransitive predicates Result of syntactic analysis errors –Esp. COMP – least accurate –Emphasis/Ellipsis Bad search patterns –More reliable than POS/word, but still hand-crafted
17
Conclusion Automatic analysis of child language data –Syntax, beyond morphology & POS Two-phrase dependency analysis –Unlabeled structure, followed by label assignment –Accurate even with out-of-domain training Enables more nuanced assessment –Especially as learner syntax becomes complex
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.