Introduction to ELAN Mary Chambers ELAP, Department of Linguistics, SOAS.

Slides:



Advertisements
Similar presentations
How to Author Teaching Files Draft Medical Imaging Resource Center.
Advertisements

IRCS Workshop on Open Language Archives IMDI & Endangered Languages Archives Heidi Johnson / AILLA.
Software Tools for Language Documentation DocLing 2013 Peter K. Austin Department of Linguistics, SOAS.
Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 3: Thursday (corpora)
XML-Based Course Websites Michael Wollowski Computer Science and Software Engineering Department Rose-Hulman Institute of Technology.
Annotation, Alignment and Transcription: An extremely brief and basic introduction to Elan and Transcriber OLAC Tutorial at the Linguist Society of America.
Using ELAN for transcription and annotation Anthony Jukes.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
Zum Aufbau eines multimedialen Spracharchivs Dagmar Jung (Institut für Linguistik, Allgemeine Sprachwissenschaft, Universität zu Köln) CCeH Eröffnungsworkshop.
Docsoft:AV Automatic Closed Captioning and Transcribing Appliance July 9 th, 2007.
MUSCLE movie data base is a multimodal movie corpus collected to develop content- based multimedia processing like: - speaker clustering - speaker turn.
Database „Multilingualism“ – Perspectives for collaborative corpus construction and collaborative commentary Thomas Schmidt Sonderforschungsbereich 538.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
~ Multimodal Communication ~ HOW TO: From raw data to data annotation.
Audio-visual media in L2 teaching Film. What media do you use? 2 Videos with transcription Available on YouTube or Deutsche Welle (
Department of Computer Science 1 CSS 496 Business Process Re-engineering for BS(CS)
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
Rethinking language documentation & support for the 21st century David Nathan Endangered Languages Archive SOAS University of London.
What Linguists Want (we think) Helen Aristar Dry & Anthony Aristar LINGUIST List & E-MELD.
Current Trends in Language Documentation and the Hans Rausing Endangered Languages Project Lenore A. Grenoble Dartmouth College Lenore A. Grenoble Linguistics.
July 11, 2003E-MELD 2003 E-MELD “School” of Best Practice Helen Aristar-Dry & Gayathri Sriram The LINGUIST List Eastern Michigan University.
Starting Chapter 4 Starting. 1 Course Outline* Covered in first half until Dr. Li takes over. JAVA and OO: Review what is Object Oriented Programming.
Phonetics and Phonology
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
June 20, 2006E-MELD 2006, MSU1 Toward Implementation of Best Practice: Anthony Aristar, Wayne State University Other E-MELD Outcomes.
Eureka! User friendly access to the MPI linguistic data archive Max Planck Institute for Psycholinguistics Alexander Koenig Jacquelijn Ringersma Claus.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Application of Audio and Video Processing Methods for Language.
David Nathan Endangered Languages Archive SOAS University of London 3L Summer School, Conference, 6 July 2012 Training for language documentation: trends.
Working group meeting January Time sheets Accounting Topic sheets Handouts Quality plan Anything else? Topics for consideration.
CapturaTalk4Android Demonstration Abi James
Sekimo Solutions mentioned by the TEI  CONCUR: an optional feature of SGML (not XML) that allows multiple.
practical aspects1 Translation Tools Translation Memory Systems Text Concordance Tools Useful Websites.
Working Group 5 Resource Transformation and Presentation Chairs:Debbie Anderson, Laura Welcher Members:Andrea Berez, Ed Garrett, Sadie Williams, Moses.
Hands-on tutorial: Using Praat for analysing a speech corpus Mietta Lennes Palmse, Estonia Department of Speech Sciences University of Helsinki.
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD Libby Bishop Language and Computation Day University of Essex 4 October 2005.
Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)
Understand business uses of presentation software and methods of distribution.
Collaborative Annotation of the AMI Meeting Corpus Jean Carletta University of Edinburgh.
Documenting Endangered Languages Claire Bowern Rice University and CRLC, ANU (talk slides will be available.
1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.
Student Edition: Gale Info Trac Database Lesson Grades 9-12 High School Student Edition: Gale Info Trac Database Lesson Grades 9-12 High School Anita Cellucci.
The New DRS Introduction. What is DRS? Digital repository for preservation and access – Maintains integrity of deposited content – Preserves content for.
TypeCraft Software Evaluation 21/02/ :45 Powered by None Complete: 10 On, Partial: 0 Off, Excluded: 0 Off Country: All, Region:
Annotation by category – ELAN and ISO DCR Han Slöetjes, Peter Wittenburg Max-Planck-Institute for Psycholinguistics LREC,
Streaming Video – TED. What is it? TED = Technology, Entertainment, Design. Collection of speeches Wide variety of topics Playlists.
Click to add Text Bibliographic tools. Digital Editor The Digital Editor has, as primary responsibility, the day-to-day management of digital products.
DocLing2016 Software Tools Peter K. Austin Department of Linguistics SOAS, University of London
1 February 2012 ILCAA, TUFS, Tokyo program David Nathan and Peter Austin Hans Rausing Endangered Languages Project SOAS, University of London Language.
Basics of Natural Language Processing Introduction to Computational Linguistics.
1 Dr. Cord Pagenstecher Testimonies on Nazi Forced Labor and the Holocaust Building Digital Environments for Research and Education Dr. Cord Pagenstecher.
ELAN as a tool for oral history CLARIN Oral History Workshop Oxford Sebastian Drude CLARIN ERIC 18 April 2016.
Adapted from Slideshow-Presentation-Without-Being-a-Pro.
© STZ Language Learning Media Telos Language Partner (TLP Pro) TLP Pro combines communication-oriented interactive self-study activities with intuitive.
1 UNIT 13 The World Wide Web. Introduction 2 Agenda The World Wide Web Search Engines Video Streaming 3.
1 UNIT 13 The World Wide Web. Introduction 2 The World Wide Web: ▫ Commonly referred to as WWW or the Web. ▫ Is a service on the Internet. It consists.
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
Africa Area Bantu Department – Bantu Phonology Tool August Phonology Template Editor and Search Tool (PTEST) Kent Schroeder SIL International Nairobi,
Pori Saikia University of Essex
Joel Priestley, Text Laboratory Oxford, April 2016
Atlas.ti Training Manual
Transcription Workshop for HIST 499
Spoken Meadow Mari Corpus: Data, Design, and Aims
Unit# 8: Introduction to Computer Programming
Hands-on tutorial: Using Praat for analysing a speech corpus
Transcription Workshop HIST 499
Tutorial 7 – Integrating Access With the Web and With Other Programs
Chapter 13 Adding Slide Transitions
The Audio Notetaker Workspace Explained
VoiceXML An investigation Author: Mya Anderson
Presentation transcript:

Introduction to ELAN Mary Chambers ELAP, Department of Linguistics, SOAS

What is ELAN?  EUDICO Linguistic Annotator  Annotation tool developed by MPI: create, edit, view and search annotations for video and audio data  links text annotations with audio and/or video data.  one audio stream, up to four video streams  annotations are on tiers, these can be independent or linked to other tiers.  no limit to the number of tiers.  tiers can be hidden or rearranged for ease of use  ELAN files can be exported in a variety of formats (including to Shoebox/Toolbox for interlinearisation, then reimported)

Demonstration...

Tiers, types, and stereotypes...  Imagine an annotated text with two speakers, with a transcription and free translation  There are 4 tiers:  There are 2 types of tier: tx (text), and ft (free translation)‏  Each 'type' is further categorised according to its stereotype - the way tiers of this type combine with other tiers...

Tiers  Each speaker can have their own set of tiers, so overlapping speech is not a problem.  Tiers can contain many kinds of annotations, some of the most obvious are:  IPA transcription  practical orthographic transcription  free translations into languages of wider communication  morphemes and gloss  gesture annotation  grammar notes  any other information which seems relevant

Linguistic types  Every annotation tier must be assigned a linguistic type which tells Elan what type of information the tier contains.  Stereotypes:  None: The annotation on the tier is linked directly to the time axis (eg. intonation units/sentences - a transcription or a reference number).  Time Subdivision: The annotation on the parent tier can be sub- divided into smaller units, which, in turn, can be linked to time intervals (eg. words). There cannot be gaps between units.  Symbolic subdivision: Similar to Time Subdivision, except that the smaller units cannot be linked to a time interval (eg. morphemes within words).  Included In: like Time Subdivision but there can be gaps (eg. words, with silence between them).  Symbolic association: one-to-one association with a parent tier, eg. transcription with ref field, gloss and morpheme, free translation with sentence.

Tier dependencies: parents and children Document X Types: Text/utterances (speaker A) (none) Words (Time subdivision) Morphemes (symbolic subdivision) Parts of speech (symbolic association) glosses (symbolic association) Free translations (symbolic association) Text/utterances (speaker B) Words Morphemes Parts of speech glosses Free translations

Is it worth it?  Time-alignment is time-consuming!  Tiers, types, and stereotypes only have to be set up once  Output is time-aligned transcription in XML which can be used for many purposes  Archival  Import to Toolbox for interlinearisation  Import to DVD-authoring software

Different workflows are possible  ELAN files can be imported/exported in a variety of formats, including Shoebox/Toolbox  Toolbox → ELAN  ELAN → Toolbox  Transcriber → ELAN → Toolbox  Back and forth?

Working with Toolbox  This is not entirely straightforward, but is not too difficult if you are already quite familiar with the workings of Toolbox and the structure of its files.  If you know you want to export to Toolbox, it’s better to start from the beginning with a ref type and tier (stereotype: None) which will only contain time information now (ie. it will be empty), but later will contain a Toolbox ref number. The transcription tier will be a symbolic association depending from the ref tier  The Toolbox export process puts the time and speaker information in separate fields. After working in Toolbox, ELAN can import the file, and the time and speaker information will be preserved.

Any questions?