Download presentation
Presentation is loading. Please wait.
Published byKyle Martin Modified over 10 years ago
1
Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk) Corpora: Resources for the study of language
2
160 lectures, 39 seminars Transcripts, video and audio 199 XML files: Transcripts with detailed annotation Metadata included in header 160 lecture transcripts are tagged for Part-of- Speech www.reading.ac.uk/AcaDepts/ll/base_corpus/ Funded by AHRB, Euralex, BALEAP and university sources
3
A corpus of assessed student writing at university level Texts collected at Warwick, Reading and Oxford Brookes University Funded by Economic and Social Research Council of England (ESRC) RES-000-23-0800
4
6.5 million words 2,896 texts 2,761 assignments XML files, POS-tagged 30+ disciplines 4 levels of study
5
Query interface: Sketch Engine Commercial service: Applied Linguistics pays annual subscription
8
LevelRawRel % 3225121.7 2275107.7 125596.0 PG6662.1
9
BASE: Linking audio and video to the transcripts, either online or on hard drives Insertion of timestamp data into transcripts Example Why? Access to temporal, spatial, paralinguistic, phonological information Studies of speech rate, for example
10
Comparison between languages Historical linguistics Stylistics Studies of language in use Specialised language use [eg, doctor- patient interactions] Investigations of multimodality
11
PhD thesis corpus Electronic submission Academic speech events Seminars, tutorials, etc Student use of computers in preparing assignments [video and text] Reading and writing of undergraduates
12
Hosting corpus resources at Reading or other university – preferably on Linux servers – with customisable interfaces BASE, BAWE, and other corpora that Reading possesses For use by all departments at Reading and also elsewhere Varied levels of user access Centralised support needed – lack of continuity with project staff
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.