Download presentation
Presentation is loading. Please wait.
1
Guy Aston guy@sslmit.unibo.it Compiling a corpus of transcribed speech
2
Anyqs A corpus for classroom use in training interpreters Transcribed spontaneous speech (hard to come by) Understandable without detailed contextual information (standard format) Contemporary Quite a lot (currently 1.4M words) Easy to encode in TEI and to index with XAIRA
3
No way is this publicly available The BBC site contains transcripts of all Any Questions programmes in the last 3 years, which you can download freely for personal non- commercial use. But/and you cannot adapt, alter or create a derivative work except for your own personal, non-commercial use.
4
What the BBC’s original looks like … PRESENTER: Jonathan Dimbleby PANELLISTS: Lord Falconer Malcolm Rifkind Anne McElvoy Chris Huhne FROM: Medical Women's Federation, Central London DIMBLEBY Welcome to London where we are on the edge of Regent's Park at the Royal College of Obstetricians and Gynaecologists..... On our panel: the former Lord Chancellor Charlie Falconer.... And Anne McElvoy, executive editor and columnist at the Evening Standard. [CLAPPING] Our first question please. HICKS Tom Hicks. Should Ian Blair resign?
5
Marking it up in XML… In the Header Programme details Date Participants and roles Setting In the Text Topic boundaries (new question) Utterance boundaries and their speakers Sentence boundaries (based on punctuation in transcript) Non-verbal events (clapping, laughter, coughs) Pos tagging – CLAWS7 Alignment with audio – maybe some day ???
6
Overall document structure Any questions [Date] [Profile] [Text]
7
Profile <person name=“surname” sex =“f | m | u” role = “presenter | questioner | panellist | audience” background =“Con | Lab | Lib | journalist | academic |...”> fullname...
8
Text Welcome to London … … Tom Hicks. Should Ian Blair resign ? … …
9
The magic lines in the corpus header person u person
10
Meaning you can find occurrences for speakers with a certain sex role background Try it!
11
Things to do with it (1): emphasis Agreement (most frequent adverb collocates 1L) Agree (871) entirely/actually/rather/completely/absolutely/broadly Disagree (122) profoundly / fundamentally / strongly / completely
12
Things to do with it (2): subjunctives in speech It were (215) As it were (173) If it were (32) I wish it were (3)
13
Things to do with it (3): As it were A particularly Any Questions feature? A particularly male one? Any Questions Male speakers164 151/Mwords Female speakers 9 30/Mwords BNC spoken Male speakers291 0.6 / 1000 sentences Female speakers68 0.2 / 1000 sentences
14
deeply alarmed, concerned, depressing, disillusioned, distressing, offended, regrettable, sceptical, shocking, unfair, upset, worrying profoundly disagree, wrong
15
Let alone 24 occurrences
16
Things to do with it (4): Preferred lexis of patriotism? occurrences/1000 UK –Lab 61 –Con 15 –Lib 30 –(Ukip1 United Kingdom –Lab 21 –Con 23 –Lib 15 –(Ukip0 occurrences/1000 Britain –Lab 104 –Con 139 –Lib 56 –(Ukip8
17
Thank you! for any answers on how to get permission …
18
Utterances / Sentences Role Lab 2141 / 6845 Con 1787 / 6309 Lib 1096 / 4098 Presenter 7936 / 13318 Questioner 1180 / 2241 Other 3144 / 11535 Sex Male 1086994 Female 303986 Unknown 39 / 70 Total 17284 / 44346
19
Words – 1397872 Role/Background Lab 267019 (19.1%) Con 249101 (17.8%) Lib 163156 (11.7%) Other panel 407118 (29.1%) Total panel 1085394 (77.6%) Presenters 270484 (19.3%) Questioners 41950 (3.0%) Audience 44 (0.0%) Sex Male 1087585 (77.8%) Female 309285 (22.1%) Unknown 1002 (0.1%)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.