Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data collection and experimentation. Why should we talk about data collection? It is a central part of most, if not all, aspects of current speech technology.

Similar presentations


Presentation on theme: "Data collection and experimentation. Why should we talk about data collection? It is a central part of most, if not all, aspects of current speech technology."— Presentation transcript:

1 Data collection and experimentation

2 Why should we talk about data collection? It is a central part of most, if not all, aspects of current speech technology The higher grades (A, B; as tested in the home exam assignments and the project) require a measure of data collection

3 What is data collection? In speech technology, the gathering of human communicative behaviours that can be used for implementation of e.g. spoken dialogue systems What do we gather? -Speech -Text -Voices -Gestures -Patterns!

4 All vs one? Recognition: we want to have seen all possibilities Synthesis: we want one, consistent behaviour

5 Group exercise Same groups as before Design one or more data collection(s) that will become the basis for a spoken dialogue system intended to inform users of the television program Take note of why you make your design choices We’ll talk about it here in 30 minutes

6 Application -Remote control Select programme Menu options - tree -TV guide More free speech But connected to GUI options (e.g. for lists) Data -Room environment -Age recognition data Recognize age Recognize identity of a specific mother -Usage probabilities -Asking people - ratings -Language? Programmes are english, swedish -Read tv guide -But people speak differently (“trean”) -Monitor corpus (updated) -“Beta” version – iterative process (h/h, WoZ, beta) -Demography: adults, elderly, kids? -Keywords Cloud Times Some commands

7 What is a corpus? Wikipedia: -A collection of written or spoken material in machine- readable form, assembled for the purpose of studying linguistic structures, frequencies, etc. Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05 7

8 Why collect a corpus? -”[...] for the purpose of studying linguistic structures, frequencies, etc.” -Sample - cannot analyze all -Training data for duplicating behaviours -Analysis of how humans do things -Generalisability, representativeness Same results in different corpora Use constraints, standards, theories to form the corpus If findings are expected - corroborate theory - we're better off Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05 8

9 How is a corpus collected? Often high formal demands: -Structure -Balance Audio, visual, audiovisual - choice of modalities -Requires equipment -Silent lab Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05 9

10 Where are corpora collected? Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05 10

11 When are corpora collected? Often collected once, then static -But monitor corpora exists -And the web is as always changing things Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05 11

12 Examples of corpora? Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05 12

13 Thank you! Questions?


Download ppt "Data collection and experimentation. Why should we talk about data collection? It is a central part of most, if not all, aspects of current speech technology."

Similar presentations


Ads by Google