Parsing & Language Acquisition: Parsing Child Language Data CSMC 35100 Natural Language Processing February 7, 2006.

Parsing & Language Acquisition: Parsing Child Language Data CSMC 35100 Natural Language Processing February 7, 2006

Roadmap Motivation –Understanding language acquisition Child and child-directed Challenges –Casual conversation; ungrammatical speech Robust parsing –Dependency structure & dependency labels Child language assessment –Contrast w/human & auto –Error analysis Conclusions

Motivation: Child-directed Speech View child as learner Child-directed speech –Known to differ significantly from adult-directed E.g. acoustic-prosodic measures –Slower, increased duration, pause, pitch range, height Children attend more carefully –Input to learner Track vocabulary, morphology, phonology, syntax –Relate exposure to acquisition

Motivation: Child Speech Insight into language acquisition process –Assess linguistic development –Evaluate hypotheses for acquisition E.g. Rizzi’s “truncated structure hypothesis” –Phonology, morphology, lexicon, syntax,etc

Focus: Syntax Prior research emphasizes: –Lexicon, phonology, morphology –(Relatively) Easy to extract from Recordings, manual transcriptions Syntactic development –Significant markers in linguistic development Acquisition of inflection, agreement, aux-inversion,. –Rich source of information

Challenges Analyzing syntactic input & development –Requires parsing of child- and child-directed speech –Manual analysis prohibitively costly for lots Resources for training: Treebanks –Newspaper, adult conversational speech Current speech: –Child-directed: very conversational, ellipsis, Vocatives, onomatopoeia –Child: fragments, possibly ungrammatical

Resources CHILDES –Corpus (100s MB) of child language data Many languages –All manually transcribed Marked for disfluency, repetition, retracing Some manually morphologically analyzed, POS Some audio/video available –Not syntactically analyzed Specialized morphological analyzer, POS

Parsing for Assessment Syntactic analysis of child speech –Assign to particular developmental level –Based on presence, frequency of constructions Measures: –MLU – Mean length of utterance Reaches ceiling around age 3, uninformative –IPSyn – Explicitly measures syntactic structure Scores 100 utts on 56 structures –NP,VP, questions, sentence structure –0=absent; 1=found once; 2=found > once Some identifiable from POS/morph and patterns –Others not: aux-inversion, conjunction, subord clauses, etc

Syntactic Analysis Approach Extract grammatical relations (GRs) –Analyze sentence to labeled dependencies Aux, Neg, Det, Inf, (ECX)Subj, Objs, Preds, Mods (CX)Jct, (X)Comp, etc Decomposition: –Text processing: Removes tagged disfluencies, reps, retracings Morphological analysis, POS tagging (special) –Unlabeled dependency analysis –Dependency labeling

Unlabeled Dependency Parsing Identify unlabeled dependency –Parse with Charniak parser Trained on Penn treebank –Convert to dependencies based on head table Different domain: 90.1% vs 92% WSJ –Shorter! < 15 wds

Dependency Labeling Assign labels to dependency structure –Easier than finding dep structure itself –Labels more separable 30 way classification: –Train on 5K words w/manual dependency labels TiMBL, features include: –Head & dep words, POS; order, separation, label of subsumer 91% accuracy Parse+labels: ~87% –Some trivial: Det, INF: 98%; (X)Comp: ~60% – Competitive

Automating Assessment Prior work: Computerized Profiling (CP) –Exploits word/POS patterns –Limited for older children; more sophisticated Generally used in semi-automatic approach Syntactic analysis improves Before, he told the man he was cold. Before he told the story, he was cold –Same POS pattern, similar words; diff’t structure –GR with clausal type (e.g. COMP), dep left Construct syntactically informed patterns

Evaluation Point difference: –Unsigned difference in scores (man vs auto) Point-to-point accuracy: –Number of language structure decisions correct Divided by number of decisions Test data: –A) 20 trans: ages 2-3; manual; MLU 2.9 –B) 25 trans: ages 8-9; semi-auto CP; MLU 7.0

Results Contrast w/ human assessment; CP –Point difference: 3.3 total GR; 8.3 CP CP worse on older children: 6.2 vs 10.2 –Less effective on more complex sentences –Point-to-point: 92% GR; ~85% CP –No pattern of miss vs false detect GR automatic scoring: high agreement

Error Analysis 4 of 56 IPSyn structures -> 50% of errors –Propositional complements, rel. clauses, bitransitive predicates Result of syntactic analysis errors –Esp. COMP – least accurate –Emphasis/Ellipsis Bad search patterns –More reliable than POS/word, but still hand-crafted

Conclusion Automatic analysis of child language data –Syntax, beyond morphology & POS Two-phrase dependency analysis –Unlabeled structure, followed by label assignment –Accurate even with out-of-domain training Enables more nuanced assessment –Especially as learner syntax becomes complex

Parsing & Language Acquisition: Parsing Child Language Data CSMC 35100 Natural Language Processing February 7, 2006.

Similar presentations

Presentation on theme: "Parsing & Language Acquisition: Parsing Child Language Data CSMC 35100 Natural Language Processing February 7, 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parsing & Language Acquisition: Parsing Child Language Data CSMC 35100 Natural Language Processing February 7, 2006.

Similar presentations

Presentation on theme: "Parsing & Language Acquisition: Parsing Child Language Data CSMC 35100 Natural Language Processing February 7, 2006."— Presentation transcript:

Similar presentations

About project

Feedback