ELAN as a tool for oral history CLARIN Oral History Workshop Oxford Sebastian Drude CLARIN ERIC 18 April 2016.

Slides:



Advertisements
Similar presentations
IRCS Workshop on Open Language Archives IMDI & Endangered Languages Archives Heidi Johnson / AILLA.
Advertisements

Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD Libby Bishop Online Qualitative Data Resources: Best Practice in Metadata Creation.
A Common Standard for Data and Metadata: The ESDS Qualidata XML Schema Libby Bishop ESDS Qualidata – UK Data Archive E-Research Workshop Melbourne 27 April.
Mitglied der Leibniz-Gemeinschaft Querying Spoken Language Corpora Thomas Schmidt IDS Mannheim.
MultiCon Multilayer Concordance Functions in ELAN and ANNEX Onno Crasborn Centre for Language Studies Radboud Universiteit, Nijmegen CLARIN-NL, Hilversum,
Software Tools for Language Documentation DocLing 2013 Peter K. Austin Department of Linguistics, SOAS.
Audio-based Emotion Recognition for Advanced Information Retrieval in Judicial Domain ICT4JUSTICE 2008 – Thessaloniki,October 24 G. Arosio, E. Fersini,
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 3: Thursday (corpora)
ACCESSIBLE TECHNOLOGIES FOR SPEECH MANAGEMENT “Making media accessible to all” ITU workshop – Geneva October 2013.
Annotation, Alignment and Transcription: An extremely brief and basic introduction to Elan and Transcriber OLAC Tutorial at the Linguist Society of America.
Using ELAN for transcription and annotation Anthony Jukes.
Languages & The Media, 4 Nov 2004, Berlin 1 Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System.
Zum Aufbau eines multimedialen Spracharchivs Dagmar Jung (Institut für Linguistik, Allgemeine Sprachwissenschaft, Universität zu Köln) CCeH Eröffnungsworkshop.
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Nov. 17, 2004© Artem Chebotko, OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko Department.
CSD 5230 Advanced Applications in Communication Modalities 7/3/2015 AAC 1 Introduction to AAC Orientation to Course Assessment Report Writing.
Libraries and Institutional Content Management Systems
What Linguists Want (we think) Helen Aristar Dry & Anthony Aristar LINGUIST List & E-MELD.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Information Retrieval in Practice
Towards a definition of GestBase - an open database of gestures Milan Rusko Institute of Informatics of the Slovak Academy of Sciences, Bratislava.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Eureka! User friendly access to the MPI linguistic data archive Max Planck Institute for Psycholinguistics Alexander Koenig Jacquelijn Ringersma Claus.
Lights, Camera, Caption! Presented by Kaela Parks.
Real-Time Speech Recognition Subtitling in Education Respeaking 2009 Dr Mike Wald University of Southampton.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Application of Audio and Video Processing Methods for Language.
Max Planck Institute for Psycholinguistics Tool development report H. Brugman MPI Nijmegen.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Why should we invest in DWF? Peter Wittenburg CLARIN Research.
Introduction to ELAN Mary Chambers ELAP, Department of Linguistics, SOAS.
Hands-on tutorial: Using Praat for analysing a speech corpus Mietta Lennes Palmse, Estonia Department of Speech Sciences University of Helsinki.
Gesture Coding with the NEUROGES-ELAN System Information Location German Sport University Cologne Am Sportpark Müngersdorf
Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)
Collaborative Annotation of the AMI Meeting Corpus Jean Carletta University of Edinburgh.
SPEECH AND WRITING. Spoken language and speech communication In a normal speech communication a speaker tries to influence on a listener by making him:
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Prof. Thomas Sikora Technische Universität Berlin Communication Systems Group Thursday, 2 April 2009 Integration Activities in “Tools for Tag Generation“
Vidispine Data Model Vidispine Bootcamp. Overview Collection Storage File Item Shape Item Component Shape Component Metadata abstract entity physical.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
1 CLARIN - NL What is going on? Jan Odijk Amsterdam 26 Aug 2010.
Exploring ‘Workspaces’ Tom Visser, SARA compute and networking services, Amsterdam Garching Workshop 21 st September 2010.
Accessible Media Using Video and Audio to meet the needs of a diverse populations Presented by Kaela Parks.
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
Annotation by category – ELAN and ISO DCR Han Slöetjes, Peter Wittenburg Max-Planck-Institute for Psycholinguistics LREC,
DocLing2016 Software Tools Peter K. Austin Department of Linguistics SOAS, University of London
1 February 2012 ILCAA, TUFS, Tokyo program David Nathan and Peter Austin Hans Rausing Endangered Languages Project SOAS, University of London Language.
1 Dr. Cord Pagenstecher Testimonies on Nazi Forced Labor and the Holocaust Building Digital Environments for Research and Education Dr. Cord Pagenstecher.
Search and Annotation Tool for Oral History INTER-VIEWS Henk van den Heuvel, Centre for Language and Speech Technology (CLST) Radboud University Nijmegen,
INTRODUCTION TO APPLIED LINGUISTICS
Genoa – May 23, 2006 LREC workshop From Media Crossing to Media Mining Franciska de Jong University of Twente/TNO ICT
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
CLARIN ERIC Franciska de Jong Oxford April 2016
Multi-Source Information Extraction Valentin Tablan University of Sheffield.
Faculty of Information Technology, Brno University of Technology, CZ
Joel Priestley, Text Laboratory Oxford, April 2016
The Simple Corpus Tool Martin Weisser Research Center for Linguistics & Applied Linguistics Guangdong University of Foreign Studies
The AMI Meeting Corpus: A Pre-Announcement
3.0 Map of Subject Areas.
Hands-on tutorial: Using Praat for analysing a speech corpus
Assessment of Communication
ESSENTIAL PRACTICES IN EARLY LITERACY
Peggy van der Kreeft Deutsche Welle
Pilar Orero, Spain Yoshikazu SEKI, Japan 2018
Content Augmentation for Mixed-Mode News Broadcasts Mike Dowman
A HCL Proprietary Utility for
SIDE: The Summarization IDE
Presentation transcript:

ELAN as a tool for oral history CLARIN Oral History Workshop Oxford Sebastian Drude CLARIN ERIC 18 April 2016

ELAN as a tool for oral history 1.Introducing the ELAN tool 2.Overview of basic features 3.Advanced features 4.Desiderata 5.Oral history CLARIN2

1. Introducing the ELAN tool Around 15 years of development Developed at the Max-Planck-Institute for Psycholinguistics in Nijmegen and by partners Main developer Han Sloetjes (more than 10 years) Used in particular in the context of:  Language documentation (DOBES)  Language acquisition studies (MPI, CHILDES)  Gesture studies  Sign language studies CLARIN3

1. Introducing the ELAN tool 4CLARIN

2. Overview of basic features Manual annotation of audio and video recordings Several video streams are possible (dialogues…) Unlimited tiers for annotating all kind of information (“linguistic types” go beyond linguistics) Tiers can repeat for unlimited participants Hierarchy of tiers; tiers can inherit time marks Different input methods (support for various scripts) Multi-tier regular expression search Several in- and export formats, print CLARIN5

2. Overview of basic features 6CLARIN

2. Overview of basic features 7CLARIN

2. Overview of basic features Tier ‘Stereotypes’ none: segments can be freely set (root mother tier): e.g., transcription on “sentence” level symbolic association: segments have identical start- and endpoints as in the mother tier: e.g., translation on “sentence” level, glosses for individual word forms or morphs included in: segments can only be created within the boundaries of a segment on the mother tier: e.g., certain constituents of a sentence CLARIN8

2. Overview of basic features Tier ‘Stereotypes’ time subdivison: a segment on the mother tier is completely covered by one or several adjacent segments; inner boundaries can be freely set: e.g., individual word forms symbolic subdivison: a segment on the mother tier is completely covered by one or several adjacent segments; inner boundaries are not related to times: e.g., individual morphs within word forms CLARIN9

2. Overview of basic features CLARIN10 Twin-tool ANNEX (online visualization tool)

3. Advanced features Additional modes: Segmentation mode, transcription mode Interaction with online services: Alignment of Transcriptions and Audio (MAUS) Automatized recognition of speech vs. silence, speaker differentiation Automatized recognition of moving hands & heads ELAN Annotation Format in workflow tools CLARIN11

4. Desiderata Flexibility creates a need for standards: Which type of information is in a tier? (tier naming) How is the information encoded? (conventions, abbreviations) Independent of ELAN: Automatized transcription (speech recognition) Applying NLP-tools to (originally) oral data (named entity recognition, topic extraction…) CLARIN12

5. Oral history Useful for manual annotation: transcription, notes, tags Using advanced features, automatization (e.g. alignment of existing transcripts) What kinds of annotation are typically needed? Template for specific needs for O.H.? Are there technical opportunities to improve ELAN for O.H.? Specific metadata for O.H.? CLARIN13

ELAN as a tool for oral history CLARIN Oral History Workshop Oxford Sebastian Drude CLARIN ERIC 18 April 2016