Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Software.

Slides:



Advertisements
Similar presentations
Texts and Digital Objects What seems to have changed.
Advertisements

Legal Meetings: Extended Instructions on Movica and Screencast.
Endangered Languages and Web-Based Archiving Megan J. Crowhurst The University of Texas at Austin & CELP Contributors: Chris Beier, Heidi Johnson, Lev.
Mobile Applications and Time Management Deanna Arbuckle, MRC, CRC University of Dayton
SharePoint Forms All you ever wanted to know about forms but were afraid to ask.
LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.
MICHAEL MARINO CSC 101 Whats New in Office Office Live Workspace 3 new things about Office Live Workspace are: Anywhere Access Store Microsoft.
Microsoft Office Illustrated Fundamentals Unit C: Getting Started with Unit C: Getting Started with Microsoft Office 2010 Microsoft Office 2010.
Unknown/uncontrolled data applications Bad/broken end-user applications Inefficient business processes Backlog of IT requests No data access control/backup.
Language data and XML: archiving and interoperability Simon Musgrave Linguistics Program Monash University
Easy, like an attachment. But can your doc stand on its own? Yes. Only teachers can upload files to course site. So definitely a push- tool. Maybe.
Lesson 15 Presentation Programs.
Managing Citations Using Free or Fee-Based Software: What's the difference? Cynthia Bail uOttawa Library Friday, January 30, :10 p.m. SuperConference.
ICT Workshop Departmental-use Dr. Hoda Baytiyeh-Naja The American University of Beirut.
Software Tools for Language Documentation DocLing 2013 Peter K. Austin Department of Linguistics, SOAS.
Using Audacity Audacity is a free, easy-to- use audio editor and recorder for a variety of operating systems.
From Cassettes to CDs Digitizing Audio. Topics Overview Tools Required Media Types Preparing the Computer Recording the Audio Editing the Audio Creating.
Copyright © 2014 Pearson Education, Inc. Publishing as Prentice Hall
File Management Chapter 3
Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 3: Thursday (corpora)
Website design basics QUME Learning objectives Understand the basic elements of a Web page and how it is produced Be aware of different approaches.
IT’s Gone Mobile: How to do your Job Anywhere Jason Hand IT Specialist, Central NM Electric Cooperative Jason Hand Cell:
Instructional Technology & Design Office or Zotero & Mendeley Workshop Presented by Kate Rojas.
Chapter 3 Software Two major types of software
Web Browsers It is an application software that is used to display and interact with text, images and other information located on web pages at web sites.
Computer Software.
REDCap Overview Institute for Clinical and Translational Science Heath Davis Fred McClurg Brian Finley.
Google Confidential and Proprietary Chrome 101 & Max Stacy Behmer Google Certified Teacher & Google Education Trainer
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
Computer Concepts 2014 Chapter 7 The Web and .
RMG Study Group Session I: Git, Sphinx, webRMG Connie Gao 9/20/
WIKI IN EDUCATION Giti Javidi. W HAT IS WIKI ? A Wiki can be thought of as a combination of a Web site and a Word document. At its simplest, it can be.
Item Web 2.0 application relevant to teacher’s work.
© Paradigm Publishing, Inc. 5-1 Chapter 5 Application Software Chapter 5 Application Software.
WorkPlace Pro Utilities.
TECHNOLOGY RESOURCES TO SUPPORT RESEARCH (Or: How to Tame Alot of Paper) Ashley Shaw
Eureka! User friendly access to the MPI linguistic data archive Max Planck Institute for Psycholinguistics Alexander Koenig Jacquelijn Ringersma Claus.
The Archive of the Indigenous Languages of Latin America Goals and Visions.
Max Planck Institute for Psycholinguistics Tool development report H. Brugman MPI Nijmegen.
Cataloguing Electronic resources Prepared by the Cataloguing Team at Charles Sturt University.
University of Sunderland CDM105 Session 5 Web Authoring Tools The past and present A history of web authoring tools and an overview of Macromedia Dreamweaver.
Sources of Information for the Research Paper
Software Writer:-Rashedul Hasan Editor:- Jasim Uddin.
HTML Hyper Text Markup Language A simple introduction.
Copyright 2006 Thomson Corporation ISI Web of Knowledge EndNote ® Web and EndNote ® Integrated solutions for research and publishing October 2006.
 Saundra Speed  Mariela Esparza  Kevin Escalante.
Chapter 8 Browsing and Searching the Web. Browsing and Searching the Web FAQs: – What’s a Web page? – What’s a URL? – How does a browser work? – How do.
CHAPTER TEN AUTHORING.
Access 2013 Platform Overview Access Low up-front investment Easy to evolve and iterate Easy adoption One version of the truth Easy to collaborate.
Teachers Discovering Computers Integrating Technology and Digital Media in the Classroom 5 th Edition Let’s Review Lesson 2! Who Wants to Be a Computer.
Bardi Computer Dictionaries. Why? Can add sound and pictures easily Can add sound and pictures easily Makes it easier to look up Bardi and English words.
INSTRUCTIONAL TECHNOLOGY FOR EDUCATORS: Day 3 Richard Hartshorne, Ph.D. Associate Professor, Educational Technology Skype: rhartsho49er
PAN-European Exploitation of the Results of the Libraries Programme - EXPLOIT German Libraries Institute Berlin EXPLOIT 1 Electronic library materials.
Blogs and Wikis Tim Bornholtz. Purpose Many new technologies are available on the internet that enable people to publish and edit content without expensive.
DocLing2016 Software Tools Peter K. Austin Department of Linguistics SOAS, University of London
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
Integrating and Extending Workflow 8 AA301 Carl Sykes Ed Heaney.
Getting Started Telligent or SharePoint (or Hybrid)?
Introduction of Wget. Wget Wget is a package for retrieving files using HTTP and FTP, the most widely-used Internet protocols. Wget is non-interactive,
By: Jamie Morgan  A wiki is a web page or collection of web pages which you and your students can access to contribute or modify content without having.
PDF Recovery Tool Fix Portable Document File Format.
The importance of being Connected
G Suite Elevator Pitches
Teaching and Learning with Technology
Internet.
Office Edition Overview (Dec. 2018).
TYPES OF INFORMATION SOURCES
Scholarship Tools Managing Bibliographies
Microsoft Office Illustrated Fundamentals
PDF FORM FILLING Brian Heywood – August 2019.
Presentation transcript:

Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Software

GENERAL PRINCIPLES

General Principles “Interoperability”: aka “never need to retype!” Ideally, it should be possible to move data easily and seamlessly between programs (or use a single program for everything). In reality… The ‘single program’ solution produces unacceptable compromises (bloatware, can’t handle all data types, etc) Multiple programs require the programmers to agree on data standards, which will never happen. Several programs now solve *most* of the problems. Workarounds are possible but require care.

Choosing software Ease of use with your data How long will it take to learn the program? Is it worth it? Can you type your language easily (e.g. right-to-left input, non- Ascii symbols, etc)? Can you keep track of all the information you need to? Does the program work well within your workflow? Can you search your data easily? Can you get your data into and out of the program? Can you back it up? Can other people use it? Do you need to share your data? Do you need to have more than one person working on your data?

SOFTWARE TYPES Today:

Software types Each point of the workflow (i.e. each task in the documentation) will need certain software Creation [and editing] Annotation [and analysis] Preservation Dissemination [and publication]

Overview of programs by function Organize field notes and recordings Process recordings (audio and video) Transcription Corpus creation Parsing/interlinearization Analysis/tagging Lexicon/dictionary making Writing other secondary materials journal articles reference grammar, dissertation community materials Web publication Backup tools

Organizing field notes/recordings Some way of recording what is on each recording, what you have in the collection Needed for archiving *Very* helpful for when you’re working with your collection. Database program. Filemaker, Bento Access Base, etc Other options: spreadsheet program like Excel or Base. Notetaking software such as Evernote Metadata tool such as IMDI iTunes Metadata extractor that takes collections and lets you annotate the file names.

Sample Filemaker db

Not recommended Using the file name annotation in Apple finder, etc

Information to collect Session info who, what, when, where, how language Speaker info demographic info Media info sampling rate, etc Material status info access restrictions, processing comments, links

Processing recordings Audio and video editing software. Resampling changing format (e.g. wav > mp3) clipping Audacity is popular. Praat can also do all the needed things (but can be cumbersome) Quicktime or VLC for video conversions and clipping (other video editing suites)

Transcription Elan (de facto standard?) ( tools/elan/) Transcriber ( Transana ( Clan (used for the Childes database and talkbank) ( Time-aligned transcripts should be mandatory!

Elan example

Corpus creation, management, and searching aka: some way to organize your electronic fieldnotes and transcripts. Toolbox, Elan TLex has a corpus option, based on text files TextStat ( berlin.de/en/textstat/) berlin.de/en/textstat/

Parsing/Interlinearization Most commonly used platforms: Toolbox Flex See for a great list of options. Elan can be set up to parse semi-manually. Lingsync (

Toolbox pros and cons works on text files with minimal markup well established in the field can build lexicon basic corpus tools can interface with Elan small footprint files are sharable through e.g. dropbox flat file database structure Elan interface requires some hand-processing of texts (or knowledge of python) parser doesn’t work well for complex morphology, tone no ‘on the fly’ update mac use is fiddly

Flex Pros and Cons Structured data entry on the fly update built-in analysis libraries (e.g. dictionary topics, morpheme types) Slow! Bloatware! Windows only File sharing difficult built-in analysis libraries encourage paint-by-numbers documentation.

Summary: Data analysis (Depends on the data and task) Praat for phonetics ( Corpus program, concordance generator (*very* useful!) Toolbox FLEx TextStat Some way of keeping track of notes, analyses, etc e.g. Evernote e has a very long list of options. e Tiddlywiki (Text editor, e.g. Textwrangler, NotePad++, etc)

Dictionary Lexique Pro ( Toolbox or FLEx For combining with texts, parsing, etc Kirrkirr Great for presenting final dictionaries Not an editing program TshwaneLex (tschwanedje.com) For editing and printing/sharing dictionaries (not free) WeSay (wesay.org) (Lexus)

Pros and Cons: Lexique Widely used Can edit entries easily Can add pictures and sound Can sort by meaning English or Language list Easy to update Can publish on internet Can use with Toolbox Tends to crash Doesn’t always work Limited display No fuzzy searching Linking is temperamental PC only hard to have multiple users hard to create fully bilingual dictionaries

Pros and Cons: TshwaneLex Flexible/customizable Reliable PC and Mac Can include images and sound Multiple exports Can import corpus data can create fully bilingual dictionaries not free steep learning curve Mac version has some bugs can’t use same file for interlinearization

Pros and Cons: FLEx Powerful Free Can integrate with written corpus Can export to web, lexique, etc PC only Clunky, slow Can’t integrate sound file transcriptions Only one dictionary model Lots of assumptions about linguistics built in Difficult to export in multiple formats (via \ code)

Software suggestions Toolbox (or Flex or Lexique) TshwaneLex if making a stand-alone dictionary Transcription software (e.g. Elan) a text editor (e.g. textwrangler [osx], edit pad [pc], etc) Concordance program (textstat, monoconc, etc)

Sample workflow Transcribe with Elan Export clips to Praat using sendpraat preliminary annotation in Elan Export to Toolbox interlinearize compile lexical materials (annotate for analysis within file) Reimport transcripts to Elan Use Elan for corpus search Export to web via CuPed Pros: Relatively streamlined Requires few programs Exploits advantages of each program can be done offline portable/can be backed up easily flexible to export to other formats if needed Cons: potential version control issues

What about using Word & Excel? Pros: No learning curve May be the most available software May already have data Cons: Proprietary Designed for business, not for dictionary writing Word – no explicit structure Word – hard to search Excel – “flat” format, not relational Multiple people can’t work on same document

Disseminating results Many possibilities, depending on the type of results: Word processing software for journal articles, dissertation, etc: Word, LyX, (Xe)LaTeX Desktop publishing software for school books, etc (e.g. indesign) CuPED for sharing Elan transcripts. Web editing software App authoring software Wunderkammer for mobile phone dictionaries Google earth for site mapping etc.

Backing up your data Cloud solutions Dropbox Sugarsync Google Drive mac.com etc University backup services Physical media portable hard drive DVDs -> Send to archive

Cloud services Pros: Cheap (often free) “set and forget”: can work with existing directory structures Cons: Requires internet connection Might be difficult if you’re backing up audio and video Not necessarily secure Syncing issues: easy to overwrite data, not always easy to get it back. If something happens to the service provider, you’re screwed.

Physical media Pros Solves many of the ‘cons’ of cloud services Cons Unknown durability time-consuming Doesn’t solve the problem of physical destruction of data if kept in the same place as your main working copies!

Archiving Archive your original recordings as soon as possible! You don’t need to finish analysis before archiving part of the collection. More about archiving in week 4.

Recommendations Cloud based back up for analysis files, frequently access files. Have at least one (ideally two or more) physical backups of all your data, stored in different places (e.g. one at home, one with your parents, one in the community) Archive materials as soon as possible. Sound like overkill?

Causes of Data Loss