Paul Thompson Applied Linguistics Corpora: Resources for the study of language.

Slides:



Advertisements
Similar presentations
Institutional Readiness Questionnaire Bonnie Luterbach, Raymond Guy, Kathleen Matheos Funding for this study was provided by HRSDC and CNIE.
Advertisements

IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
Bringing scholarly communication in kicking and screaming into the Internet age Thomas Krichel
CAVA A Human Communication Audio-Visual Archive (Video removed) Co-funded by UCL and the JISC (Joint Information Systems Committee) April 2009 – August.
School of Technology 1 Safety for Older Drivers When Driving Dr Mary Zajicek.
ESDS Qualidata: Qualitative Data Preparation and Use John Southall ESDS 26 November 2003.
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD Libby Bishop Online Qualitative Data Resources: Best Practice in Metadata Creation.
ESDS Qualidata. Qualitative Data Collections Data from National Research Council (ESRC) individual research grant awards Data from ESRC Programme research.
A Common Standard for Data and Metadata: The ESDS Qualidata XML Schema Libby Bishop ESDS Qualidata – UK Data Archive E-Research Workshop Melbourne 27 April.
ESDS Qualidata Libby Bishop, ESDS Qualidata Economic and Social Data Service UK Data Archive ESDS Awareness Day Friday 5 December 2003Royal Statistical.
IT Support Staff Seminar March 2005 Tony Brett Address Standardization Tony Brett IT Support Staff Services OUCS.
Update on Learning Technology Jon Alltree - Flexible Learning project Julie Vuolo - Assessment/ITEAM project Paul Hudson – StudyNet and other technology.
Providing collections, tools and services for digital humanities A national library perspective Clément Oury Head of Digital Legal Deposit Bibliothèque.
“In light of this, it is suggested…”: Comparing n-grams in Chinese and British students’ undergraduate assignments from UK universities Maria LeedhamICAME.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
K-State Digital Library: New tools for collection building and e-resource discovery KSU Digital Library Department.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Annotation, Alignment and Transcription: An extremely brief and basic introduction to Elan and Transcriber OLAC Tutorial at the Linguist Society of America.
June 28, 2007Max Planck Institute, Leipzig The LL-MAP Project.
Pedagogic uses of a corpus of student writing and their implications for sampling and annotation Alois Heuboeck University of Reading, UK.
Planning a Web-based Course Barbara Lockee Office of Distance Learning Department of Teaching & Learning.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
30 May 2003IASSIST 2003: Strength in Numbers From manuscripts to metadata: collaborative working in the Archives Hub Amanda Hill University of Manchester.
Context-aware Trellis (caT) Principal Investigator: Richard Furuta Center for the Study of Digital Libraries and the Department of Computer Science Texas.
Ease Design Principles Tim Kelly University of Warwick.
~ Multimodal Communication ~ HOW TO: From raw data to data annotation.
EIA : “Automated Understanding of Captured Experience” Georgia Institute of Technology, College of Computing Investigators: Irfan Essa, G. Abowd,
New “Collaborate” Button Integrate UI directly into the browser. Preferred target: Firefox Easiest browser to extend in terms of UI.
Chowdhury, G. and Chowdhury, S. (2006) e-learning support for LIS education in UK. In: 7th Annual Conference of the Subject Centre for Information and.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Development and Implementation of Teaching Aids to enhance the Understanding of Control Systems Dr Mahmoud Abdulwahed Prof Zoltan K Nagy and Dr Adam R.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
University of St Andrews EuroCRIS Membership meeting, Helsinki May 2007 Research Expertise Database Anna Clements Project Manager, Business Improvements.
5 - 8 December 1999 Brisbane, Australia Ascilite99 1st Announcement and Call for Papers December 1999 Brisbane, Australia Ascilite99 1st Announcement.
Java CGI Lecture notes by Theodoros Anagnostopoulos.
1 © Netskills Quality Internet Training, University of Newcastle Search Engines and Other Animals © Netskills, Quality Internet Training, University of.
Introduction to HTML Tutorial 1 eXtensible Markup Language (XML)
Hands-on tutorial: Using Praat for analysing a speech corpus Mietta Lennes Palmse, Estonia Department of Speech Sciences University of Helsinki.
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD Libby Bishop Language and Computation Day University of Essex 4 October 2005.
UCREL: from LOB to REVERE Paul Rayson. November 1999CSEG awayday Paul Rayson2 A brief history of UCREL In ten minutes, I will present a brief history.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Opening access to UK doctoral theses: the EThOS E-Theses Service 13 August 2014 Sara Gould.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson The University of Texas at Austin Latin American Digital Library Initiative,
The Sketch Engine as Infrastructure for Large Scale Text Collections for Humanities Research Adam Kilgarriff Lexical Computing Ltd. & Univ of Leeds, UK.
L JSTOR Tools for Linguists 22nd June 2009 Michael Krot Clare Llewellyn Matt O’Donnell.
LINGUATECA FLUP/CLUP The Corpógrafo – a Web-based environment for corpora research extract Term Candidates.
Supporting the Net Generation Learner Professor Eeva Leinonen – Deputy Vice Chancellor.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
DocLing2016 Software Tools Peter K. Austin Department of Linguistics SOAS, University of London
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
General Architecture of Retrieval Systems 1Adrienn Skrop.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
Theses in the UK: PhD research, university repositories and EThOS ETD2014 International Conference 24 July 2014 Sara Gould.
Promoting e-resources Gintarė Tautkevičienė Kaunas University of Technology, Lithuania.
MA English Language Teaching: Online
7th Annual Hong Kong Innovative Users Group Meeting
Current Research Areas
Moodle in the School of Life Sciences
DATA INTEGRATION FOR LANGUAGE DOCUMENTATION
WP3: Supporting RTD in Language Technologies
Student: Salman Shtayeh
Brown Dog Data Collection Native Byte Encoding Data Structures
A Brief Intro to Corpus Techniques in ELT Research
BBI 3423 LANGUAGE AND ICT.
Finding information about the Library on the Web
Get to Know Your “W” Drive
Some Approaches to Faculty Assignments
Some Approaches to Faculty Assignments
Some Approaches to Faculty Assignments
Presentation transcript:

Paul Thompson Applied Linguistics Corpora: Resources for the study of language

160 lectures, 39 seminars Transcripts, video and audio 199 XML files: Transcripts with detailed annotation Metadata included in header 160 lecture transcripts are tagged for Part-of- Speech Funded by AHRB, Euralex, BALEAP and university sources

A corpus of assessed student writing at university level Texts collected at Warwick, Reading and Oxford Brookes University Funded by Economic and Social Research Council of England (ESRC) RES

6.5 million words 2,896 texts 2,761 assignments XML files, POS-tagged 30+ disciplines 4 levels of study

Query interface: Sketch Engine Commercial service: Applied Linguistics pays annual subscription

LevelRawRel % PG6662.1

BASE: Linking audio and video to the transcripts, either online or on hard drives Insertion of timestamp data into transcripts Example Why? Access to temporal, spatial, paralinguistic, phonological information Studies of speech rate, for example

Comparison between languages Historical linguistics Stylistics Studies of language in use Specialised language use [eg, doctor- patient interactions] Investigations of multimodality

PhD thesis corpus Electronic submission Academic speech events Seminars, tutorials, etc Student use of computers in preparing assignments [video and text] Reading and writing of undergraduates

Hosting corpus resources at Reading or other university – preferably on Linux servers – with customisable interfaces BASE, BAWE, and other corpora that Reading possesses For use by all departments at Reading and also elsewhere Varied levels of user access Centralised support needed – lack of continuity with project staff