Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Resource Conversion William Lewis CSU Fresno.

Slides:



Advertisements
Similar presentations
Preservation by Migration to XML Dirk Roorda. work on a preservation strategy positioning of the XML preservation strategy implementing the strategy in.
Advertisements

Archiving and linguistic databases Jeff Good, MPI EVA LSA Annual Meeting Oakland, California January 6, 2005 Available at:
Jan 7, 2005 Linguistic Society of America 2005 Annual Meeting, Oakland, CA The E-MELD Project: Helen Aristar Dry The LINGUIST List Eastern Michigan University.
The Seven Pillars of Open Language Archiving: Introducing the OLAC Vision Gary Simons SIL International LREC Symposium: The Open Language Archives Community.
The Seven Pillars of Open Language Archiving: Introducing the OLAC Vision Gary Simons SIL International LSA Symposium: The Open Language Archives Community.
LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.
Language data and XML: archiving and interoperability Simon Musgrave Linguistics Program Monash University
Lesson 15 Presentation Programs.
SYSTEM PROGRAMMING & SYSTEM ADMINISTRATION
By Mary Anne Poatsy, Keith Mulbery, Eric Cameron, Jason Davidson, Rebecca Lawson, Linda Lau, Jerri Williams Chapter 8 Get Connected 1 Copyright © 2014.
Strategic Thinking and Significant Characteristics Hamish James.
Lecture Converting Access to HTML and Beyond. Reports Converted to a Web Page A report designed for paper can be easily exported to HTML Right-click on.
The eXtensible Past XML As a Means for Easy Access to Historical Research Data and a Strategy for Digital Preservation.
Getting Started Chapter One DATABASE CONCEPTS, 7th Edition
Information Extraction from Documents for Automating Softwre Testing by Patricia Lutsky Presented by Ramiro Lopez.
Copyright © 2003 Pearson Education, Inc. Slide 1-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
Chapter 3 Software Two major types of software
The Re-engineering and Reuse of Software
Other Features Index and table of contents Macros and VBA.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
CEDROM-SNi’s DITA- based Project From Analysis to Delivery By France Baril Documentation Architect.
Emerging Technologies Committee eForms and XML at NYS Dept of Taxation and Finance Jim Lieb, Director – Common Services NYS Department of Taxation and.
Overview of Mini-Edit and other Tools Access DB Oracle DB You Need to Send Entries From Your Std To the Registry You Need to Get Back Updated Entries From.
July 11, 2003E-MELD 2003 E-MELD “School” of Best Practice Helen Aristar-Dry & Gayathri Sriram The LINGUIST List Eastern Michigan University.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Ensuring that digital data last The priority of archival form over working form and presentation.
Luc Audrain Hachette Livre Head of digitalization
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
Importing/Exporting Applicant Data An overview of methods for moving SOPHAS data into external data systems (Banner, PeopleSoft, etc) from SOPHAS or AdMIT.
Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
Revitalizing Endangered Language Data: Case studies in rescuing legacy documentation CELCNA 2007 Naomi Fox, Julia James, University of Utah.
June 20, 2006E-MELD 2006, MSU1 Toward Implementation of Best Practice: Anthony Aristar, Wayne State University Other E-MELD Outcomes.
Class Instructor Name Date. Classroom Tips Class Roster – Please Sign In Class Roster – Please Sign In Internet Usage Internet Usage –Breaks and Lunch.
Open Textbooks and Electronic Publishing Formats/Standards Arctic Virtual Learnng Tools
Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Interoperability: Where the irresistible force of flexibility meets the immovable.
Max Planck Institute for Psycholinguistics Tool development report H. Brugman MPI Nijmegen.
To enhance learning, service, and research through an advanced information technology environment. Our Mission:To enhance learning, service,and research.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List.
CakePHP is an open source web development framework. It follows Model-View- Controller and is developed using PHP. IT is the basic for user to create.
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Open Your Mind to Open Source MPDO’s & EOPR’s Centre for IT & eGovernance AMR-APARD Hyderabad Welcome!
Advanced Lesson 5: Advanced Data Management Excel can import data, or bring it in from other sources and file formats. Importing data is useful because.
Nov 21, 2005University of Texas at Austin The E-MELD Project Helen Aristar Dry & Anthony Aristar The LINGUIST List Eastern Michigan U & Wayne State U.
Chapter 4 Database Processing Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall 4-1.
1 Introduction  Extensible Markup Language (XML) –Uses tags to describe the structure of a document –Simplifies the process of sharing information –Extensible.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
Technology – Broad View Aspects that play a role when integrating archives leave the details of some core topics to the 2. day Bernhard Neumair:Base Technologies.
Miruna Badescu Eau de Web Biodiversity Action Plans data reporting and publishing.
Web Technologies for Bioinformatics Ken Baclawski.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA The School of Best Practice How Standards can Matter Anthony Aristar, Wayne State University.
1 DMS-DQS-SUPSC03-PRE-12-E © DEIMOS Space S.L., 2007 A Semantic Data Grid for Satellite Mission Quality Analysis Reuben Wright Deimos Space.
@ulccwww.ulcc.ac.uk IRMS Cymru October 2015 From EDRMS to digital archive: a wish-list for ways to preserve digital records.
PREPARED BY: PN. SITI HADIJAH BINTI NORSANI. LEARNING OUTCOMES: Upon completion of this course, students should be able to: 1. Understand the structure.
4 Mar 2004http:// VERS: Victorian Electronic Record Strategy Digital Preservation Seminar ODU Spring 2004.
Connecting to External Data. Financial data can be obtained from a number of different data sources.
XML Schema – XSLT Week 8 Web site:
Edexcel OnCourse Databases Unit 9. Edexcel OnCourse Database Structure Presentation Unit 9Slide 2 What is a Database? Databases are everywhere! Student.
1 3 Computing System Fundamentals 3.5 Data Representation.
Creating Section 508 Compliant Documents & Presentations
Chapter 11 Designing Effective Output
Toward Best Practice for Language Resource Conversion
Grammar-based Specification and Parsing for Binary File Formats
Database Systems Unit 16.
Representing Information as bit patterns
Creating Section 508 Compliant Documents & Presentations
Creating Section 508 Compliant Documents & Presentations
Overview Ideas Other Stuff
William Lewis CSU Fresno
Software Engineering and Architecture
Interactive Powerpoint
Presentation transcript:

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Resource Conversion William Lewis CSU Fresno

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 2 Preliminaries Eventually any resource will become obsolete  Resource conversion is inevitable One should plan from the start for eventual conversion Encode your resource such that it  is Migratable  is Reusable  will Endure

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 3 Best Practice? Simons (this symposium) argues there are 3 relevant formats for encoding data: 1. Working form 2. Presentation form 3. Archival form (1) is tied to particular software. (2) is generally generated from (1), but itself is often “semantically” sparse.

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 4 Best Practice! Encoding resource in archival form (3) insures that  the resource is reusable, facilitating interoperability  the resource can be migrated to other formats (including presentation formats)  the resource endures

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 5 Data Survivability Converting to archival XML form provides for data reuse and insures survivability: Working Form Archival Form Conversion Process HTMLPDFOther XML form

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 6 Data Conversion (“text”) Highly dependent on flexibility of working form and related software  Converting from proprietary, binary format most difficult – to be avoided  Converting from plain text output easiest – might be avoided due to potential data loss  Converting from enriched text form (Unicode compliant) or XML coded data is best, but may not always be possible

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 7 Data Conversion Working Form Archival Form Conversion Process

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 8 Data Conversion Working Form Archival Form Inter- mediary Form (Enriched text) CP

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 9 Intermediary Conversion Use  Print Function or “Save as…”  Print Function or other file convert  Data Query (direct to XML?)  As is Resources in  Word Processor Spreadsheet  Proprietary Flat File DB  Relational DB  XML or enriched text (inc. Shoebox)

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 10 Intermediary Conversion Important:  Insure that conversion to Intermediary Form suffers no data loss, or that the data loss suffered is minimal  Danger in Save As (and Print to file), in that data loss is possible

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 11 Final Conversion Intermediary to Archival Form (Best Practice XML):  Font/Character transforms  Macros or methods for enriching and aligning data elements  Tables or “glossaries” defining how content and form should be interpreted

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 12 Data Conversion – Case Study Converting Hopi Dictionary (Hill et al 1998) from working form (legacy format) Purpose:  Build software to extract relevant data from working form  Generate reusable archival format  For dissemination on the Web  For use by others  To preserve data should DB software become unusable

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 13 Hopi Dictionary Example entry from Hopi Dictionary:

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 14 Hopi Dictionary Conversion Until now:  Generated text file from DB  Manually converted IPA fonts in MSWord  Generated PDFs for dissemination

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 15 Hopi Dictionary Conversion New Process:  Convert DB format to “enriched” text  Software transforms for fonts from text format (Unicode compliant IPA)  Identify the grammatical concepts used in entries, linked to GOLD (Farrar & Langendoen, this symposium)  Generate XML – structured using modified EMELD IGT format (Bow, Hughes & Bird 2003)

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 16 Archival Hopi Dictionary Record

Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 17 Recipe for Resource Conversion Choose data format that is easily archived  Where the software provides for data migration, or,  The data format itself is easily converted Use existing software to bring you as close to Archival Form as possible (Intermediary Form) Clearly identify  Content and structural semantics (“terms”)  Fonts used (and transforms)  Data alignment Construct transforms/macros/software to convert to Archival form