Download presentation
Presentation is loading. Please wait.
Published byCleopatra Bridges Modified over 9 years ago
1
Data collection and experimentation
2
Why should we talk about data collection? It is a central part of most, if not all, aspects of current speech technology The higher grades (A, B; as tested in the home exam assignments and the project) require a measure of data collection
3
What is data collection? In speech technology, the gathering of human communicative behaviours that can be used for implementation of e.g. spoken dialogue systems What do we gather? -Speech -Text -Voices -Gestures -Patterns!
4
All vs one? Recognition: we want to have seen all possibilities Synthesis: we want one, consistent behaviour
5
Group exercise Same groups as before Design one or more data collection(s) that will become the basis for a spoken dialogue system intended to inform users of the television program Take note of why you make your design choices We’ll talk about it here in 30 minutes
6
Application -Remote control Select programme Menu options - tree -TV guide More free speech But connected to GUI options (e.g. for lists) Data -Room environment -Age recognition data Recognize age Recognize identity of a specific mother -Usage probabilities -Asking people - ratings -Language? Programmes are english, swedish -Read tv guide -But people speak differently (“trean”) -Monitor corpus (updated) -“Beta” version – iterative process (h/h, WoZ, beta) -Demography: adults, elderly, kids? -Keywords Cloud Times Some commands
7
What is a corpus? Wikipedia: -A collection of written or spoken material in machine- readable form, assembled for the purpose of studying linguistic structures, frequencies, etc. Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05 7
8
Why collect a corpus? -”[...] for the purpose of studying linguistic structures, frequencies, etc.” -Sample - cannot analyze all -Training data for duplicating behaviours -Analysis of how humans do things -Generalisability, representativeness Same results in different corpora Use constraints, standards, theories to form the corpus If findings are expected - corroborate theory - we're better off Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05 8
9
How is a corpus collected? Often high formal demands: -Structure -Balance Audio, visual, audiovisual - choice of modalities -Requires equipment -Silent lab Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05 9
10
Where are corpora collected? Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05 10
11
When are corpora collected? Often collected once, then static -But monitor corpora exists -And the web is as always changing things Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05 11
12
Examples of corpora? Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05 12
13
Thank you! Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.