Toward Best Practice for Language Resource Conversion

Slides:



Advertisements
Similar presentations
1 of 15 Information Access Internal Information © FAO 2005 IMARK Investing in Information for Development Information Access Internal Information.
Advertisements

The Seven Pillars of Open Language Archiving: A Vision Statement Gary Simons and Steven Bird Workshop on Web-based Language Documentation and Description.
NIMAS Images and File Size Julia Myers Nicole Gaines January 29, 2008.
Joint Information Systems Committee Digital Library Services BL/JISC Workshop Rachel Bruce JISC Programme Director The Digital Library and its Services,
Strategic issues for digital projects... …or, what are we doing here?
TECHNICAL VOCATIONAL EDUCATIONAL AND TRAINING COLLEGES AN INTRODUCTION TO THE IMPEMENTATION OF A COMPLIANT RISK MANAGEMENT PROCESS July 2014.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
Selecting Preservation Strategies for Web Archives Stephan Strodl, Andreas Rauber Department of Software.
© Tefko Saracevic, Rutgers University 1 EVALUATION in searching IR systems Digital libraries Reference sources Web sources.
Depositing and Disseminating Digital Resources Alan Morrison Collections Manager AHDS Subject Centre for Literature, Linguistics and Languages.
Codex Guidelines for the Application of HACCP
July 11, 2003E-MELD 2003 E-MELD “School” of Best Practice Helen Aristar-Dry & Gayathri Sriram The LINGUIST List Eastern Michigan University.
June 20, 2006E-MELD 2006, MSU1 Toward Implementation of Best Practice: Anthony Aristar, Wayne State University Other E-MELD Outcomes.
1 The Technical Standards and Your Bid Sarah Ormes UKOLN University of Bath Bath, BA2 7AY UKOLN is funded by Resource: The Council for Museums, Archives.
‘intelligent openness’ The common objective of an RCUK data policy Gregor McDonagh
Libraries, Archives, and Digital Preservation: The Reality of What We Must Do Leslie Johnston Acting Director, National Digital Information Infrastructure.
Cleo Sgouropoulou * Educational Technology & Didactics of Informatics Educational Technology & Didactics of Informatics Instructional.
Teachers’ Domain: An Accessible Digital Library for Education Bryan Gould and Trisha O’Connell WGBH National Center for Accessible Media
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
Technology – Broad View Aspects that play a role when integrating archives leave the details of some core topics to the 2. day Bernhard Neumair:Base Technologies.
Digital library infrastructure -- systems Repositories for storing digital resources protect, manage, deliver, and preserve digital resources over time.
Digital Collections Forum Doug Moncur AIATSIS September 2004.
EVALUATION OF THE SEE SARMa Project. Content Project management structure Internal evaluation External evaluation Evaluation report.
1 February 2012 ILCAA, TUFS, Tokyo program David Nathan and Peter Austin Hans Rausing Endangered Languages Project SOAS, University of London Language.
Preservation Planning Bojana Tasić FORS SEEDS Workshop I Belgrade, October.
Requirement Elicitation Nisa’ul Hafidhoh Teknik Informatika
Stages of Research and Development
CESSDA SaW Training on Trust, Identifying Demand & Networking
Group evaluation There is need to assess the degree to which a group is achieving or has achieved its set goals. The process of assessing this constitutes.
Criteria for Assessing Repository Trustworthiness: An Assessment
1 TOOL DESIGN A Review of Learning Design:
Charlotte McClain-Nhlapo Senior Operations Officer The World Bank
How does Workplace Affect What and How you Write
Presented by Munezero Immaculee Joselyne PhD in Software Engineering
INFORM RISK ASSESSMENT METHODOLOGY PROJECT: DESIGNING A TOOL
Our Understanding of Institution/Capacity Building
a collaborative effort
Service Organization Control (SOC)
Active Data Management in Space 20m DG
Chapter 18 MobileApp Design
The motivation Distributed knowledge sources Distributed experience
Reading Research Papers-A Basic Guide to Critical Analysis
GDPR - New Data Protection Regulation
Richard Waller NOF Technical Advisor UKOLN is supported by:
Data Stewardship Interest Group WGISS-45 Meeting
Introduction to Research Data Management
Consensus driven Decision
Information Security Risk Management
Overview Ideas Other Stuff
Archives and Records Professionals for Research Data IG
Research data preservation in Canada
Comparing apples with apples
Helene Skikos DG Education and Culture
Leading Practice Implementation Guide
Update on the Developments in Government Auditing Standards
COBIT 5: Framework, BMIS, Implementation and future Information Security Guidance Presented by.
Leading Practice Implementation Guide
OER Courses and Degrees – Benefits and challenges
Results from County Extension Educator Survey January 2017
Digitization Standards: Issues & Updates
Policy Frameworks: building a firm foundation for your IR
Assessment of current management plan (EoH Tool 5)
Hydrographic Services and Standards Committee
WG standards for data access/exchange
Hydrographic Services and Standards Committee
TECHNICAL REPORTS WRITING
Dairy Subgroup #1: Fostering Markets for Non-Digester Projects
Successful Data Curation for Large Data Archives
Introduction to reference metadata and quality reporting
Presentation transcript:

Toward Best Practice for Language Resource Conversion EMELD 2003 Working Group on Resource Conversion EMELD Resource Conversion WG 20030713

EMELD Resource Conversion WG 20030713 Working Group Baden Hughes, Chilin Shih (co-chairs) Helen Aristar-Dry, Steven Bird, Reinhard Hiss, Will Lewis, Barbara Need, Steven Weinberger EMELD Resource Conversion WG 20030713

EMELD Resource Conversion WG 20030713 Objectives Consider the methodology for and make recommendations about the conversion of legacy (possibly non-digital) language resources into enduring BP formats Examine ongoing conversion processes and identify issues in the conversion of digital language resources in working contexts EMELD Resource Conversion WG 20030713

EMELD Resource Conversion WG 20030713 Methodology Focus on high level principles which pervade general language resource conversion problems rather than format-specific resource conversion issues Acceptance that appropriate technical expertise probably already exists “somewhere” but needs to be adapted to the EMELD context EMELD Resource Conversion WG 20030713

EMELD Resource Conversion WG 20030713 Subject Matter Content and Structure Metadata Text Audio Video Still Images Physical Media Hardware / Software EMELD Resource Conversion WG 20030713

EMELD Resource Conversion WG 20030713 Core Values Bird & Simons (2003) “Seven Dimensions …”: content, format, discovery and preservation Motivation to ensure persistence and longevity of archive quality digital objects EMELD Resource Conversion WG 20030713

EMELD Resource Conversion WG 20030713 Principles …1 Ignorance is not bliss ! Not every user needs to be a technical expert, but should be assisted their context and functional requirements and to access sufficient information to make an informed choice Conversion issues will affect institutions and individuals at many levels – particularly in terms of resources available to address issues EMELD Resource Conversion WG 20030713

EMELD Resource Conversion WG 20030713 Principles …2 Conversion and Archiving The best available copy should be archived according to BP Format neutrality in respect to use involves effort but is essential to ensure long term viability Archiving practice will imply resource conversion for preservation purposes Consistency in conversion methodology is inherently better than random variation EMELD Resource Conversion WG 20030713

EMELD Resource Conversion WG 20030713 Principles …3 Conversion and Re-Use Requirements for re-use vary between agents and purposes Inherent in most (all?) conversion processes is some degree of information loss, thus the absolute minimum number of format conversions should be undertaken Where possible, converted materials should include information about their digital lineage Additional information pertaining to the language resource may be located separately from the resource itself and needs to be preserved EMELD Resource Conversion WG 20030713

A Pragmatic Approach to BP .. 1 The lineage of digital language resources may have included processes which are less than optimal practices BP may not realistically be achievable in all contexts (constraints such as time, money, equipment, expertise, inclination …) Some practices have inherently higher potential to cause conversion and archiving issues Significant incentives need to be offered to induce change in language data management practices towards BP – would you prefer to choose BP or be forced to adopt BP when you lose data ? EMELD Resource Conversion WG 20030713

A Pragmatic Approach to BP .. 2 Software choice will impact on the longevity of language resource data. Ideological debates about software development methodologies is often misleading when considering longevity and preservation Absolute ranking of practice on a scale of worst to best is not transparent (context sensitive, moving target …) EMELD Resource Conversion WG 20030713

EMELD Resource Conversion WG 20030713 Ongoing Work Items …1 Identify and review core documents on BP formats, including accessible recommendations for different audiences Identify and review software tools which enable conversion according to BP principles (this is not necessarily a democratic system!) Develop accessible case studies of typical language resource conversion problems, critique them and provide advice on how to achieve BP in these contexts EMELD Resource Conversion WG 20030713

EMELD Resource Conversion WG 20030713 Ongoing Work Items … 2 Examine how physical media choices can affect the retention or loss of information and implications for the language resource conversion process Promulgate resource conversion as a pervasive issue to be considered by many other BP contexts EMELD Resource Conversion WG 20030713

Observations Relevant to Other Working Groups Resource Archiving Good archiving practice will consider resource conversion as a fundamental issue Infrastructural constraints may significantly increase the risk of information loss Resource Creation BP at the data collection point reduces the risk of information loss in any conversion process Conversion implications need to be considered when selecting an appropriate tool for the data and functionality types required EMELD Resource Conversion WG 20030713

Observations Relevant to EMELD EMELD needs to consider the longevity and persistency implications for ongoing archiving functions particularly in reference to the “long term” – this may include adequate financial resourcing EMELD Resource Conversion WG 20030713

Logistical Recommendations Creation of Communities of Expertise within EMELD framework to advise on working group topics (cf. Ask-A-Linguist) including experts from outside linguistics Creation of Working Groups email lists for ongoing work in these areas User reviews and solutions section for tools and processes within the EMELD School site EMELD Resource Conversion WG 20030713