Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview.

Slides:



Advertisements
Similar presentations
IRCS Workshop on Open Language Archives IMDI & Endangered Languages Archives Heidi Johnson / AILLA.
Advertisements

Choosing a PC Before you choose a PC you need to think about what it is going to be used for.
                      Digital Audio 1.
Computer Hardware Software Network Peripheral devices Input Breaking codes Modeling weather systems Mainframe Server System unit CPU Input Devices Data.
Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 3: Thursday (corpora)
Establishing a Digital Oral History Program Karen Kruse Thomas, Ph.D. Associate Director Reichelt Oral History Program.
Whiteboard Content Sharing Audio Video Text Chat Polls & Recording Meet Now Skype Integration MS Lync 2013 Features, Tools & Tips for facilitators… Limitations.
HOW WELL DO YOU KNOW THE BASICS OF USING YOUR COMPUTER?
1 Testing Oral Ability Pertemuan 22 Matakuliah: >/ > Tahun: >
Mgt 240 Lecture Exam Review February 1, Homework Three Due Friday 2/4 at 5pm Due Friday 2/4 at 5pm Any questions? Any questions? Posted on course.
Data Processing A simple model and current UKDA practice Alasdair Crockett, Data Standards Manager, UKDA.
Input Devices Image Capture Devices, Sound Capture Devices, Remote Controls PREPARED & PRESENTED BY: FAHAD AHMAD KHAN.
Computers They're Not Magic! (for the most part)‏ Adapted from Ryan Moore.
Consumer Electronics February 8, What type of Digital Camera is right for me?
Digital Alternatives to Transcribed Records at FAO IAMLADP Working Group on Technology for Conferences, Languages and Publications Task Force on Digital.
History of computers What your computer can do depends upon two things: the hardware your computer has, and the software that can be run on your computer. 
What Linguists Want (we think) Helen Aristar Dry & Anthony Aristar LINGUIST List & E-MELD.
COMPONENTS OF AN EFFECTIVE WRITING PROGRAM
Current Trends in Language Documentation and the Hans Rausing Endangered Languages Project Lenore A. Grenoble Dartmouth College Lenore A. Grenoble Linguistics.
July 11, 2003E-MELD 2003 E-MELD “School” of Best Practice Helen Aristar-Dry & Gayathri Sriram The LINGUIST List Eastern Michigan University.
Introduction to Computers
Multimedia. Definition What is Multimedia? Multimedia can have a many definitions these include: Multimedia means that computer information can be represented.
Unit 30 P1 – Hardware & Software Required For Use In Digital Graphics
Video and Language Documentation: panacea or madness? David Nathan Endangered Languages Archive School of Oriental and African Studies University of London.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Alternative Input Devices Part B There will be a test on this information (both part a & b).
Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 4: Archiving.
David Nathan Endangered Languages Archive SOAS University of London 3L Summer School, Conference, 6 July 2012 Training for language documentation: trends.
 The hardware and the software is what runs your computer.  Inside this monitor has wires, chips, slots and etc.  The hard drive stores data and task.
3. Multimedia Systems Technology
Introduction to ELAN Mary Chambers ELAP, Department of Linguistics, SOAS.
1 Language Documentation in West Africa July Winneba, Ghana David Nathan & Sophie Salffner Endangered Languages Archive Hans Rausing Endangered.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
Documenting Endangered Languages Claire Bowern Rice University and CRLC, ANU (talk slides will be available.
Unit 1 Living in the Digital WorldChapter 2 On the move This presentation will cover the following topic: Keeping in touch Name:
Edgewood Ward 7 JUN 2015 Dan Eliason, Assistant Ward Clerk AUDIO FILES on SEARCH FAMILY.
Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 2: Grammar Writing.
Video in Documentary linguistics Louise Ashmore David Nathan.
Key principles Everything is strange –Question why? –Stop and reflect Members’ point of view –Developer themselves – why do that? No a priori expectations.
Page 1. Page 2  Mehran University College Of Engineeirng And Technology Kkaripur Mir’s  Name:- Gul Nawaz Khan Mahar  Roll No:- 12K-EL 17, 12K-EL 01,
By Tom and James. Hardware is a physical part of the system that you can pick up and move. There are two types of hardware, external and internal. External.
Know what a computer is used for Understand the difference between hardware and software Be able to describe the way that data is stored in a computer.
1 February 2012 ILCAA, TUFS, Tokyo program David Nathan and Peter Austin Hans Rausing Endangered Languages Project SOAS, University of London Language.
Learning Through Failure. Reflect O Take a few moments to write down your answers to the following questions: O What was your reaction to the video? O.
CCT 333: Imagining the Audience in a Wired World Class 6: Intro to Research Methods – Qualitative Methods.
Chapter 2 Hardware.
Identify internal hardware devices (e. g
How to Recover Deleted Photos from Android Cell Phone? Android is keeping on improving their products and make sure to provide the best software service.
Senior Capstone Class Fall  What is a survey? Tool designed to elicit information from an individual or group of individuals Measures attitudes,
LDTC W ORKSHOP 4 Recording and Archiving. W HAT ARE RECORDINGS USED FOR ? Linguistic analysis Teaching purposes Preservation of important cultural topics.
Computer Hardware Introduction What’s inside that box?
Identify internal hardware devices (e. g
VIDEOGRAPHY AND AUDIO Camera, Know Thyself.
Digital Stewardship Curriculum
HOW WELL DO YOU KNOW THE BASICS OF USING YOUR COMPUTER?
Computer Basics: Parts of a Computer? Part I
Teaching English to Speakers of Other Languages
Developing a Methodology
Workflows in archaeology & heritage sciences
Creating Engaging Lecture Videos
Types of Computers & Computer Hardware
                      Digital Audio 1.
Ian Ramsey C of E School GCSE ICT On the move Keeping in touch.
Implementation of ICT-related solutions
audio and video analog and digital
Digital Stewardship Curriculum
Identify internal hardware devices (e. g
Nanotechnology & Society
Presentation transcript:

Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Overview

OVERVIEW, GOALS OF CLASS

Tools for documentation Physical tools: Hardware Software Stimuli Conceptual tools: What makes a good documentary corpus Procedural tools: How to go about documenting a language Tools for disseminating results

Overview Week 1: overview, hardware, software Week 2: elicitation techniques, grammar writing Week 3: narratives, conversation, corpus building Week 4: lexicon, archiving

About the class “How to describe/document a language” *No practical component* (in that we won’t be working with speakers) However, there will be time (I hope!) to talk about your own field data And we will be doing some exercises with existing data I will provide datasets for exercises (if you don’t have data of your own to use) You can also use data from the field methods class here at the Institute.

A few assumptions for this class Not talking about community-oriented materials here (I see documentary materials as feeding into that though) Assuming that the language doesn’t have a lot of other materials apart from what the linguist will be producing Assuming that the linguist will be the one doing most of the writing. Implicitly assuming a grammar/dictionary/texts model (more on this below). None of these assumptions are crucial, they’re just there so we can limit the topic a bit.

PRINCIPLES OF DOCUMENTATION

What is language documentation? Documentary Linguistics as its own subfield. Doing things with linguistic data: Getting the data Preserving it Processing it (Analyzing it) Cf Woodbury (2002): Language documentation is the creation, annotation, preservation, and dissemination of transparent records of a language. Important for both theoretical and empirical branches of linguistics: typology, historical linguistics, etc

What shapes the language record? The linguist (i.e. you!) Their interests Their abilities The speakers and their interests! External circumstances funding time available lucky breaks unlucky breaks

Language Documentation as a Language Legacy Particularly relevant for endangered languages. Your work might be the only substantive record of a language: few speakers field might view the language as “done” speakers might view the language as “done”

Planned Documentation vs “Collect it all” “making a record of the language” : ‘comprehensive grammar’ You can’t collect everything. All documentation is sampling. Unstructured, unanalyzed corpora usually aren’t very useful They are hard to use; They don’t get worked on; They usually aren’t big enough to test hypotheses computationally; They require native speakers (or people who are already very familiar with the language) -> fine for languages with a major presence, but what about the quarter of the world’s languages with fewer than 10,000 speakers?

What counts as documentation? When is a collection big enough to count as language documentation? Is an article in Linguistic Inquiry language documentation? creation annotation preservation dissemination but only a very small fragment of a language.

How much time/space does a documentary corpus take? Depends on the resources: Time Speakers Money Levels of Interest

Grammar, Dictionary, Texts “The Boasian Trilogy” Structure, Lexicon, Culture Way to present the analysis and also allow others to recreate it (or challenge it) from the underlying data. Conceived broadly: Capture language structure Capture language in use Capture lexicon and meaning

Sampling: Documentation as snapshots A big part of documentation is constructing a good set of “samples”. To do that, you will need to consider what the purpose of the documentary record is. That is, why are you collecting data on the language? “to make a lasting record of the language” “to reclaim the language to future speakers” “to write a reference grammar” “to document the culture in the traditional language” “to investigate a particular aspect of the language” all of the above… …

Sampling Are your “snapshots” representative? Speakers Subjects/Topics Grammatical constructions Lexicon …

Planned versus opportunistic collection Planned: translated sentences. grammaticality judgments etc. Unplanned (or planning gone wrong): Speakers reinterpret your prompts and construct narratives from them. New speaker comes to a session and wants to tell stories. You find a new (to you) morpheme in your data and want to find out how it works. You overhear a new construction in conversation.

What constitutes a documentary corpus? ***Everything*** sound files videos transcripts (elicitation prompts – part of the annotation) photographs maps (artifacts) metadata (data about the data) metametadata …

WORKFLOW AND DATA TYPES

Workflow: 1.What do you need to do to document a language? 2.What order do you need to do it in? 3.(How will you know if it’s been done right?)

Scaled workflow Project as a whole (timescale of years) e.g. “Bardi language documentation” Immediate tasks (timescale of weeks or months) e.g. “Bardi learners guide” Subtasks (timescale of days or weeks) e.g. “write the section on numbers” Data gathering (timescale of single session) e.g. “get data on numerals in use”

Workflow while on fieldwork

HARDWARE

Sample field kit: Equipment: Laptop Audio recorder Video recorder + microphones + backup means of recording (e.g. from laptop, second recorder) Media: backup devices [hard drive, DVDs, etc] memory cards for recorders paper! pens! Other ways of keeping the equipment clean carry bag stills camera (cell phone, ipad, etc) batteries, other power equipment tripod Stimuli/research prompts

Audio The field has converged on solid state recorders using SD cards Handy Zoom H2 or H4 (or H6 coming soon!) Edirol R-09 Marantz PMD 660 or 670 And/or laptops (or laptop plus external sound card/preprocessor) small/portable AA batteries high quality, lossless formats easy to use easy to transfer data

Not recommended: Dictaphones Cassette recorders DAT

Video Less consensus on models Major component of the documentation or side-project? Options: smart phone ipad stills camera with video function dedicated video camera SD card mic jack Problems: mpeg vs other proprietary video formats large files memory-intensive

Microphones headset vs lapel vs meeting microphone dynamic vs cardioid wired vs wireless SLR vs 1/8” jack The built-in mics in the Edirol, Handy, etc, are also ok You get what you pay for, approximately. Remember that microphone placement and volume monitoring is much more important than the quality of the microphone (far more recordings are ruined through the former than the latter).

Computer Laptop Lots of memory Lots of hard drive space Usually don’t need ruggedization features Get cheapest possible and assume it won’t last for more than a season, or try for a higher end model Special considerations for high altitude, high humidity, or low temperature work. High altitude: hard drives fail: use solid state High humidity: condensation issues Low temperatures: battery issues (See Lanz 2010)

Tablets? Most language software won’t run on ipads or other tablets. Great for stimuli, backup recorder, camera, etc. Too much data

Sample field kit: Equipment: Laptop Audio recorder Video recorder + microphones + backup means of recording (e.g. from laptop, second recorder) Media: backup devices [hard drive, DVDs, etc] memory cards for recorders paper! pens! Other ways of keeping the equipment clean carry bag stills camera (cell phone, ipad, etc) batteries, other power equipment tripod Stimuli/research prompts