Nov 21, 2005University of Texas at Austin The E-MELD Project Helen Aristar Dry & Anthony Aristar The LINGUIST List Eastern Michigan U & Wayne State U.

Slides:



Advertisements
Similar presentations
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
Advertisements

The Seven Pillars of Open Language Archiving: A Vision Statement Gary Simons and Steven Bird Workshop on Web-based Language Documentation and Description.
Outreach Jeff Good UC Berkeley. OLAC's Needs Maximal involvement from the whole community –The more data providers involved the more useful the services.
The Open Language Archives Community: Building a worldwide library of digital language resources Gary Simons, SIL International LSA Tutorial on Archiving.
Jan 7, 2005 Linguistic Society of America 2005 Annual Meeting, Oakland, CA The E-MELD Project: Helen Aristar Dry The LINGUIST List Eastern Michigan University.
LSA Archiving Tutorial January 2005 Archives, linguists, and language speakers.
Getting Involved in OLAC Steven Bird University of Pennsylvania LREC Symposium: The Open Language Archives Community 29 May 2002.
Getting Involved in OLAC Steven Bird University of Pennsylvania LSA Symposium: The Open Language Archives Community 4 January 2002.
Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002http://linguistlist.org.
Helen Dry & Anthony Aristar LINGUIST List: LSA Symposium: The Open Language Archives Community 4 January 2002http://linguistlist.org.
The Seven Pillars of Open Language Archiving: Introducing the OLAC Vision Gary Simons SIL International LSA Symposium: The Open Language Archives Community.
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
The LEGO Project Brent Miller, The LINGUIST List.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
June 28, 2007Max Planck Institute, Leipzig The LL-MAP Project.
StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.
Depositing and Disseminating Digital Resources Alan Morrison Collections Manager AHDS Subject Centre for Literature, Linguistics and Languages.
FGDC, Meet the DDI Adding Geospatial Metadata to a Numeric Data Catalog Julie Linden Yale University.
Open Exeter Project Team
What Linguists Want (we think) Helen Aristar Dry & Anthony Aristar LINGUIST List & E-MELD.
The Rosetta Project Digital Language Archive Laura Buszard-Welcher The Long Now Foundation / University of California, Berkeley.
Digital Library Architecture and Technology
July 11, 2003E-MELD 2003 E-MELD “School” of Best Practice Helen Aristar-Dry & Gayathri Sriram The LINGUIST List Eastern Michigan University.
Resource Discovery (metadata and searching) Working Group Report.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
June 20, 2006E-MELD 2006, MSU1 Toward Implementation of Best Practice: Anthony Aristar, Wayne State University Other E-MELD Outcomes.
Johannes Spitzbart Phonogrammarchiv, Austrian Academy of Sciences Österreichische Tage der Digitalen Geisteswissenschaften save the data - workshop on.
The Archive of the Indigenous Languages of Latin America Goals and Visions.
What Agencies Should Know About PDF/A September 20, 2005 Susan J. Sullivan, CRM
OCLC Research: an update Lorcan Dempsey
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson / The University of Texas at Austin.
Planning for Arctic GIS and Geographic Information Infrastructure Sponsored by the Arctic Research Support and Logistics Program 30 October 2003 Seattle,
The DiVA System: Current Status and Ongoing Development Uwe Klosa Electronic Publishing Centre, Uppsala University, Sweden Eva Müller.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List.
1 Integrated Services Program The Virginia Metadata Training Workshop Summer, 2006 Lyle Hornbaker Integrated Services Program
Themes Architecture Content Metadata Interoperability Standards Knowledge Organisation Systems Use and Users Legal and Economic Issues The Future.
The Canadian Information Network for Research in the Social Sciences and Humanities Tim Au Yeung and Mary Westell Libraries.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access.
Discovery Metadata for Special Collections Concepts, Considerations, Choices William E. Moen School of Library and Information Sciences Texas Center for.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
Introduction to metadata
Aug 2-5, 2002 EMELD Workshop Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
Technology – Broad View Aspects that play a role when integrating archives leave the details of some core topics to the 2. day Bernhard Neumair:Base Technologies.
Storage of digital objects Adolf Knoll National Library of the Czech Republic
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Improving Description through Collaboration: The Ethnomusicological Video for Instruction & Analysis Digital Archive Music Library Association, February.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA The School of Best Practice How Standards can Matter Anthony Aristar, Wayne State University.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 1 The GOLD Effort So Far Terry Langendoen Brian Fitzsimons Emily Kidder Department of Linguistics.
Irakli Garibashvili Director, National Scientific Library in Georgia.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
By: Jamie Morgan  A wiki is a web page or collection of web pages which you and your students can access to contribute or modify content without having.
GNU EPrints 2 Overview Christopher Gutteridge 19 th October 2002 CERN. Geneva, Switzerland.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Toward Best Practice for Language Resource Conversion
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Implementing an Institutional Repository: Part II
Overview Ideas Other Stuff
Márton Németh – László Drótos How to catalogue a web archive?
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Metadata supported full-text search in a web archive
Presentation transcript:

Nov 21, 2005University of Texas at Austin The E-MELD Project Helen Aristar Dry & Anthony Aristar The LINGUIST List Eastern Michigan U & Wayne State U

Nov 21, 2005 University of Texas at Austin E-MELD Electronic Metastructure for Endangered Languages Documentation  5 year NSF project,  Linguist List, ELF, LDC  Goal: To aid in …the preservation of endangered languages data …the development of infrastructure for electronic archives

Nov 21, 2005 University of Texas at Austin Summary of the problem (2001): EL resources were/are Difficult to find Difficult to use Difficult to preserve Needed: More uniformity in naming, cataloguing, annotating, i.e., interoperable standards More knowledge of how to create digital resources that last

Nov 21, 2005 University of Texas at Austin Problems with EL resources  Difficult to find  At distributed sites  Language names ambiguous  No central catalog of resources or cataloging information (metadata)  Lack of interoperability among archives  Difficult to display accurately  Idiosyncratic character encoding  Specific fonts needed

Nov 21, 2005 University of Texas at Austin Problems with EL resources, 2  Difficult to compare  Non-standard terminology  Idiosyncratic markup & annotation schemes  Difficult to manipulate or reuse  Specific software needed (incl. specific software version), e.g. MSWord 1.0  Meaning represented via formatting, which was not documented  bold represents “headword”

Nov 21, 2005 University of Texas at Austin Problems with EL resources, 3 Impermanent—vulnerable to:  Deterioration of the physical media  Hardware obsolescence  Software obsolescence

Nov 21, 2005 University of Texas at Austin PHONOGRAMMARCHIV - AUSTRIAN ACADEMY OF SCIENCE slide from Dietrich Schüller, Director

Nov 21, 2005 University of Texas at Austin Toward a Solution: E-MELD Components Involve linguistics community in d eveloping standards Promote consensus about:  Language Identification  Metadata  Annotation and markup Teach and facilitate implementation of “best practices” in the creation of digital language documentation

Nov 21, 2005 University of Texas at Austin Promoting consensus : annual workshops 2001, Santa Barbara, CA: The Need for Standards E-MELD 2002, Ann Arbor, MI: Digitizing Lexical Information E-MELD 2003, Lansing, MI: Digitizing Texts E-MELD 2004, Detroit, MI: Databases and Best Practice E-MELD 2005, Cambridge, MA: Linguistic Ontologies & Terminology

Nov 21, 2005 University of Texas at Austin 2006 E-MELD Workshop on Digital Language Documentation Michigan State University June 20-22, 2006 In conjunction with the 2006 Summer Meeting of the Linguistic Society of America Topic: Electronic Archiving and Digital Tools: Current State & Future Directions Please come!

Nov 21, 2005 University of Texas at Austin Finding resources: metadata OLAC metadata standards (subcommunity of OAI) OLAC search engine on LL site:  OLAC metadata editor on LL site:  XSL Stylesheets for transformation / presentation of OLAC metadata Ethnologue/LL language codes proposed as ISO standard

Nov 21, 2005 University of Texas at Austin Using resources: comparing and finding annotation Ontologies developed (as interlanguage between markups and as search aids)  GOLD: General Ontology for Linguistic Description (morphosyntax)  OPF: Ontology of Phonetic Features (based on Ladefoged & Madison) ODIN Project: mining interlinear glossed text on the web (Will Lewis et al)

Nov 21, 2005 University of Texas at Austin Using resources: Tools Tools to encourage use of the ontology:  OntoElan: text annotation (modification of MPI’s Elan)  OntoGloss: stand-off annotation tool  FIELD: lexical input Tool to encourage use of Unicode  CharWrite: input of Unicode characters Facility to encourage use of OLAC metadata  Stylesheet library  ORE

Nov 21, 2005 University of Texas at Austin Facilitating ‘Best Practices’ in resource creation Creation of reference website School of Best Practices in Digital Language Documentation Addressed to the individual linguist who creates language documentation

Nov 21, 2005 University of Texas at Austin What should the linguist do? To ensure that digital data endure long into the future: 1.Create an archival copy: Put the materials into an enduring file format. 2.Deposit the materials with an archive that will make a practice of periodically migrating them to new storage media as needed.

Nov 21, 2005 University of Texas at Austin Organization of the School Entrance Hall: orientation Classroom: lessons & tutorials Reading Room: bibliography Work Room: online work Tool Room: links to tools Help (incl. Ask an Expert) Case Studies: documentation of 10 ELs digitized according to best practices

Nov 21, 2005 University of Texas at Austin Currently School has: Documentation from 12 ELs: MocoviKayardild MonguorPotawatomi TofaEga SalibaNavajo Biao MienW. Sissala (Chorote)(Nivacle)

Nov 21, 2005 University of Texas at Austin Current Initiatives Identify and record metadata for legacy documentation Improve the ontology (GOLD) – incorporate suggestions from 2005 E-MELD workshop Finish prototyped software

Nov 21, 2005 University of Texas at Austin Future: finish prototyped software OntoElan: ontology-aware modification of MPI’s Elan annotation tool OntoGloss: ontology-aware stand-off annotation tool CharWrite: downloadable tool for web-input of Unicode characters CharWrite FIELD: Field Input Environment for Linguistic Data All but OntoGloss available through the School of Best Practices website

Nov 21, 2005 University of Texas at Austin Current Initiatives: School of BP Make the School even more practical  Distinguish between good, better, best practice  Emphasize  explicit ‘how-to’ pages  Different paths for different user types  Advice from experts, e.g. “equipment on a budget” page, Ask-An-Expert

Nov 21, 2005 University of Texas at Austin Practices in resource creation Good practice: ensure preservation Better practice: ensure longterm intelligibility  “We don’t want to create another Rosetta Stone” - Whalen, 2003 Best practice: promote interoperability

Nov 21, 2005 University of Texas at Austin School of Best Practices in Digital Language Documentation

Nov 21, 2005 University of Texas at Austin Future Directions MultiTree LL-MAP

Nov 21, 2005 University of Texas at Austin What is MultiTree? 3-year grant Database of all hypothesized language relations Ultimately linked to GIS database Interface to allow linguists to input updates Panel of experts to assess input

Nov 21, 2005 University of Texas at Austin LL-MAP Collect geographically linked linguistic data Build this into a GIS system, allowing layers of information to be built into a single map Then… Build tools for querying, annotating and discussing this data Build tools which allow new language data from linguists and anthropologists to be incorporated into this system