Download presentation
Presentation is loading. Please wait.
Published byBuck Matthews Modified over 8 years ago
1
Using Latent Dirichlet Allocation for Child Narrative Analysis Khairun-nisa Hassanali 1, Yang Liu 1 and Thamar Solorio 2 nisa@hlt.utdallas.eduyangl@hlt.utdallas.edusolorio@uab.edunisa@hlt.utdallas.eduyangl@hlt.utdallas.edusolorio@uab.edu 1 The University of Texas at Dallas 2 University of Alabama at Birmingham 1. Summary Explored the use of LDA in the context of child language analysis Used LDA topics from child narratives to create an extended vocabulary and summary The LDA topic keywords covered the main components of the narrative Improved performance in the automatic prediction of LI and coherence when using LDA topic keywords to create features, in addition to baseline features 7. Conclusion 3. Data Transcripts of adolescents aged 14 years Story telling task based on the picture book “Frog, where are you?” 118 speakers (99 TD children, 18 LI children) 118 transcripts (87 coherent, 31 incoherent) Transcripts are annotated for language impairment and coherence Identified topics corresponding to narrative structure Identified subtopics 2. Introduction Child language narratives are used for language analysis, measurement of language development, and the detection of LI Automatic detection of LI is faster and allows for exploring more features beyond norm referenced tests Given a child language transcript, answer the following question: Does the transcript belong to a typically developing (TD) child or a child with LI? Is the narrative produced by the child understandable or coherent? We explore the use of Latent Dirichlet Allocation (LDA) for detecting topics from child narratives We use LDA topics in two classification tasks: Automatic prediction of Language Impairment (LI) Automatic prediction of coherence Findings: LDA is useful for detecting topics that correspond to the narrative structure Improved performance in the automatic prediction of LI and coherence 5. U sing LDA Topic Related Features For Detection of LI and Coherence No Topic Words Used by TD PopulationTopic Described 1 went, frog, sleep, glass, put, caught, jar, yesterday, out, house Introduction 2 frog, up, woke, morning, called, gone, escaped, next, kept, realized Frog goes missing 3 window, out, fell, dog, falls, broke, quickly, opened, told, breaking Dog falls out of the window 4 tree, bees, knocked, running, popped, chase, dog, inside, now, flying Dog chases the bees 5 deer, rock, top, onto, sort, big, up, behind, rocks, picked Deer behind the rock 6 searched, boots, room, bedroom, under, billy, even, floor, tilly, tried Search for frog in the room 7 dog, chased, owl, tree, bees, boy, came, hole, up, more Boy is chased by owl from a tree with beehives 8 jar, gone, woke, escaped, night, sleep, asleep, dressed, morning, frog Frog goes missing 9 deer, top, onto, running, ways, up, rocks, popped, suddenly, know Boy runs into the deer 10 looking, still, dog, quite, cross, obviously, smashes, have, annoyed Displeasure of boy with dog BIONLP 2013 6. Experiments 4. Topic Words Extracted by LDA Used LDA topics to generate a summary and extended vocabulary Used extended vocabulary to detect presence or absence of topics Automatic classification of LI Count of bigrams of the words in the summary Presence or absence of LDA topic keywords Presence or absence of words in the summary Automatic classification of coherence Presence or absence of LDA topics Features PrecisionRecall F-1 Gabani et al.’s (2011) (baseline)0.8240.7370.778 Narrative (Hassanali et al., 2012a)0.3850.2630.313 Topic features0.3080.2110.25 Narrative + Gabani’s0.8890.8420.865 Narrative + Gabani’s + topic features0.850.8950.872 Automatic classification of LI and coherence Naïve Bayes classifier performed the best Leave one out cross validation Use of topic based features, in addition to baseline features, led to improved performance for both tasks Feature CoherentIncoherent PRF-1PR Narrative (baseline) (Hassanali et al)0.8690.8390.8540.5880.6450.615 Narrative + automatic topic Features0.8950.8850.890.6880.710.699 Automatic Prediction of LI Automatic Prediction of Coherence This research is sponsored by Used LDA to generate topic words K= 20, alpha = 0.8 Used transcripts of TD children
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.