Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parsing & Language Acquisition: Parsing Child Language Data CSMC 35100 Natural Language Processing February 7, 2006.

Similar presentations


Presentation on theme: "Parsing & Language Acquisition: Parsing Child Language Data CSMC 35100 Natural Language Processing February 7, 2006."— Presentation transcript:

1 Parsing & Language Acquisition: Parsing Child Language Data CSMC 35100 Natural Language Processing February 7, 2006

2 Roadmap Motivation –Understanding language acquisition Child and child-directed Challenges –Casual conversation; ungrammatical speech Robust parsing –Dependency structure & dependency labels Child language assessment –Contrast w/human & auto –Error analysis Conclusions

3 Motivation: Child-directed Speech View child as learner Child-directed speech –Known to differ significantly from adult-directed E.g. acoustic-prosodic measures –Slower, increased duration, pause, pitch range, height Children attend more carefully –Input to learner Track vocabulary, morphology, phonology, syntax –Relate exposure to acquisition

4 Motivation: Child Speech Insight into language acquisition process –Assess linguistic development –Evaluate hypotheses for acquisition E.g. Rizzi’s “truncated structure hypothesis” –Phonology, morphology, lexicon, syntax,etc

5 Focus: Syntax Prior research emphasizes: –Lexicon, phonology, morphology –(Relatively) Easy to extract from Recordings, manual transcriptions Syntactic development –Significant markers in linguistic development Acquisition of inflection, agreement, aux-inversion,. –Rich source of information

6 Challenges Analyzing syntactic input & development –Requires parsing of child- and child-directed speech –Manual analysis prohibitively costly for lots Resources for training: Treebanks –Newspaper, adult conversational speech Current speech: –Child-directed: very conversational, ellipsis, Vocatives, onomatopoeia –Child: fragments, possibly ungrammatical

7 Resources CHILDES –Corpus (100s MB) of child language data Many languages –All manually transcribed Marked for disfluency, repetition, retracing Some manually morphologically analyzed, POS Some audio/video available –Not syntactically analyzed Specialized morphological analyzer, POS

8 Examples *CHI: more cookie. %mor: qn|more n|cookie. *MOT: how about another graham cracker? %mor: adv:wh|how prep|about^adv|about det|another n|graham n|cracker?

9 Parsing for Assessment Syntactic analysis of child speech –Assign to particular developmental level –Based on presence, frequency of constructions Measures: –MLU – Mean length of utterance Reaches ceiling around age 3, uninformative –IPSyn – Explicitly measures syntactic structure Scores 100 utts on 56 structures –NP,VP, questions, sentence structure –0=absent; 1=found once; 2=found > once Some identifiable from POS/morph and patterns –Others not: aux-inversion, conjunction, subord clauses, etc

10 Syntactic Analysis Approach Extract grammatical relations (GRs) –Analyze sentence to labeled dependencies Aux, Neg, Det, Inf, (ECX)Subj, Objs, Preds, Mods (CX)Jct, (X)Comp, etc Decomposition: –Text processing: Removes tagged disfluencies, reps, retracings Morphological analysis, POS tagging (special) –Unlabeled dependency analysis –Dependency labeling

11 Unlabeled Dependency Parsing Identify unlabeled dependency –Parse with Charniak parser Trained on Penn treebank –Convert to dependencies based on head table Different domain: 90.1% vs 92% WSJ –Shorter! < 15 wds

12 Dependency Labeling Assign labels to dependency structure –Easier than finding dep structure itself –Labels more separable 30 way classification: –Train on 5K words w/manual dependency labels TiMBL, features include: –Head & dep words, POS; order, separation, label of subsumer 91% accuracy Parse+labels: ~87% –Some trivial: Det, INF: 98%; (X)Comp: ~60% – Competitive

13 Automating Assessment Prior work: Computerized Profiling (CP) –Exploits word/POS patterns –Limited for older children; more sophisticated Generally used in semi-automatic approach Syntactic analysis improves Before, he told the man he was cold. Before he told the story, he was cold –Same POS pattern, similar words; diff’t structure –GR with clausal type (e.g. COMP), dep left Construct syntactically informed patterns

14 Evaluation Point difference: –Unsigned difference in scores (man vs auto) Point-to-point accuracy: –Number of language structure decisions correct Divided by number of decisions Test data: –A) 20 trans: ages 2-3; manual; MLU 2.9 –B) 25 trans: ages 8-9; semi-auto CP; MLU 7.0

15 Results Contrast w/ human assessment; CP –Point difference: 3.3 total GR; 8.3 CP CP worse on older children: 6.2 vs 10.2 –Less effective on more complex sentences –Point-to-point: 92% GR; ~85% CP –No pattern of miss vs false detect GR automatic scoring: high agreement

16 Error Analysis 4 of 56 IPSyn structures -> 50% of errors –Propositional complements, rel. clauses, bitransitive predicates Result of syntactic analysis errors –Esp. COMP – least accurate –Emphasis/Ellipsis Bad search patterns –More reliable than POS/word, but still hand-crafted

17 Conclusion Automatic analysis of child language data –Syntax, beyond morphology & POS Two-phrase dependency analysis –Unlabeled structure, followed by label assignment –Accurate even with out-of-domain training Enables more nuanced assessment –Especially as learner syntax becomes complex


Download ppt "Parsing & Language Acquisition: Parsing Child Language Data CSMC 35100 Natural Language Processing February 7, 2006."

Similar presentations


Ads by Google