William Lewis CSU Fresno

Slides:



Advertisements
Similar presentations
File Formats for Tariff Content. Prepared by Gary Kravis – UNICON, Inc. Practical Practical …must lend itself to tariff content …must lend itself to tariff.
Advertisements

Archiving and linguistic databases Jeff Good, MPI EVA LSA Annual Meeting Oakland, California January 6, 2005 Available at:
Jan 7, 2005 Linguistic Society of America 2005 Annual Meeting, Oakland, CA The E-MELD Project: Helen Aristar Dry The LINGUIST List Eastern Michigan University.
The Seven Pillars of Open Language Archiving: Introducing the OLAC Vision Gary Simons SIL International LSA Symposium: The Open Language Archives Community.
LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Language data and XML: archiving and interoperability Simon Musgrave Linguistics Program Monash University
Lesson 15 Presentation Programs.
Alternative FILE formats
HTML5 ETDs Edward A. Fox, Sung Hee Park, Nicholas Lynberg, Jesse Racer, Phil McElmurray Digital Library Research Laboratory Virginia Tech ETD 2010, June.
With Microsoft Access 2010© 2011 Pearson Education, Inc. Publishing as Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Access.
By Mary Anne Poatsy, Keith Mulbery, Eric Cameron, Jason Davidson, Rebecca Lawson, Linda Lau, Jerri Williams Chapter 8 Get Connected 1 Copyright © 2014.
Lecture Converting Access to HTML and Beyond. Reports Converted to a Web Page A report designed for paper can be easily exported to HTML Right-click on.
The digital scholar’s workbench Ian Barnes ELPUB 2007 Vienna — 13th to 15th June 2007.
Chapter 3 Software Two major types of software
The International Household Survey Network IHSN IHSN Secretariat PARIS21 Steering Committee, 14 November 2007.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
July 11, 2003E-MELD 2003 E-MELD “School” of Best Practice Helen Aristar-Dry & Gayathri Sriram The LINGUIST List Eastern Michigan University.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Ensuring that digital data last The priority of archival form over working form and presentation.
Luc Audrain Hachette Livre Head of digitalization
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
June 20, 2006E-MELD 2006, MSU1 Toward Implementation of Best Practice: Anthony Aristar, Wayne State University Other E-MELD Outcomes.
Open Textbooks and Electronic Publishing Formats/Standards Arctic Virtual Learnng Tools
Max Planck Institute for Psycholinguistics Tool development report H. Brugman MPI Nijmegen.
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
To enhance learning, service, and research through an advanced information technology environment. Our Mission:To enhance learning, service,and research.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Resource Conversion William Lewis CSU Fresno.
Open Your Mind to Open Source MPDO’s & EOPR’s Centre for IT & eGovernance AMR-APARD Hyderabad Welcome!
Advanced Lesson 5: Advanced Data Management Excel can import data, or bring it in from other sources and file formats. Importing data is useful because.
Nov 21, 2005University of Texas at Austin The E-MELD Project Helen Aristar Dry & Anthony Aristar The LINGUIST List Eastern Michigan U & Wayne State U.
Chapter 4 Database Processing Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall 4-1.
1 Introduction  Extensible Markup Language (XML) –Uses tags to describe the structure of a document –Simplifies the process of sharing information –Extensible.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
Technology – Broad View Aspects that play a role when integrating archives leave the details of some core topics to the 2. day Bernhard Neumair:Base Technologies.
Miruna Badescu Eau de Web Biodiversity Action Plans data reporting and publishing.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Document Computing Technologies for Managing Electronic Document Collections Ross Wilkinson... [et al.] Circulation Counter [RES3H] ZA4080.D
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Builder Compositional Design – with a twist…. Problem Consider your favorite –Text editor, word processor, spreadsheet, drawing tool They allow editing.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA The School of Best Practice How Standards can Matter Anthony Aristar, Wayne State University.
4 Mar 2004http:// VERS: Victorian Electronic Record Strategy Digital Preservation Seminar ODU Spring 2004.
XML Schema – XSLT Week 8 Web site:
1 3 Computing System Fundamentals 3.5 Data Representation.
Creating Section 508 Compliant Documents & Presentations
Information Retrieval in Practice
Network Infrastructure Services Supporting WAP Clients
Dependency Management
Chapter 11 Designing Effective Output
Search Engine Architecture
Toward Best Practice for Language Resource Conversion
Grammar-based Specification and Parsing for Binary File Formats
Representing Information as bit patterns
What is a Database and Why Use One?
Chapter 12: Automated data collection methods
Creating Section 508 Compliant Documents & Presentations
How to write in DITA Anindita Basu.
The Automation of the U.S. Budget Appendix Volume
Creating Section 508 Compliant Documents & Presentations
Overview Ideas Other Stuff
SDMX: an Overview Abdulla Gozalov UNSD.
DATABASES WHAT IS A DATABASE?
Learning the Basics of Microsoft Word 2010 for Microsoft Windows
Dynamically Updated Publications
Software Engineering and Architecture
Databases WOW!! A database is a collection of related data.
Metadata supported full-text search in a web archive
Interactive Powerpoint
Future of EDAMIS Webforms
Presentation transcript:

William Lewis CSU Fresno Jan 9, 2004 Resource Conversion William Lewis CSU Fresno Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA Symposium on Best Practice, LSA: Boston, MA

Symposium on Best Practice Preliminaries Eventually any resource will become obsolete  Resource conversion is inevitable One should plan from the start for eventual conversion Encode your resource such that it is Migratable is Reusable will Endure Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA

Symposium on Best Practice Simons (this symposium) argues there are 3 relevant formats for encoding data: Working form Presentation form Archival form (1) is tied to particular software. (2) is generally generated from (1), but itself is often “semantically” sparse. Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA

Symposium on Best Practice Encoding resource in archival form (3) insures that the resource is reusable, facilitating interoperability the resource can be migrated to other formats (including presentation formats) the resource endures Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA

Symposium on Best Practice Data Survivability Converting to archival XML form provides for data reuse and insures survivability: Working Form Archival Form Conversion Process HTML PDF Other XML form Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA

Data Conversion (“text”) Highly dependent on flexibility of working form and related software Converting from proprietary, binary format most difficult – to be avoided Converting from plain text output easiest – might be avoided due to potential data loss Converting from enriched text form (Unicode compliant) or XML coded data is best, but may not always be possible Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA

Symposium on Best Practice Data Conversion Working Form Archival Form Conversion Process Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA

Symposium on Best Practice Data Conversion Working Form Inter-mediary Form (Enriched text) Archival Form CP Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA

Intermediary Conversion Resources in Word Processor Spreadsheet Proprietary Flat File DB Relational DB XML or enriched text (inc. Shoebox) Use Print Function or “Save as…” Print Function or other file convert Data Query (direct to XML?) As is Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA

Intermediary Conversion Important: Insure that conversion to Intermediary Form suffers no data loss, or that the data loss suffered is minimal Danger in Save As (and Print to file), in that data loss is possible Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA

Symposium on Best Practice Final Conversion Intermediary to Archival Form (Best Practice XML): Font/Character transforms Macros or methods for enriching and aligning data elements Tables or “glossaries” defining how content and form should be interpreted Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA

Data Conversion – Case Study Converting Hopi Dictionary (Hill et al 1998) from working form (legacy format) Purpose: Build software to extract relevant data from working form Generate reusable archival format For dissemination on the Web For use by others To preserve data should DB software become unusable Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA

Symposium on Best Practice Hopi Dictionary Example entry from Hopi Dictionary: Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA

Hopi Dictionary Conversion Until now: Generated text file from DB Manually converted IPA fonts in MSWord Generated PDFs for dissemination Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA

Hopi Dictionary Conversion New Process: Convert DB format to “enriched” text Software transforms for fonts from text format (Unicode compliant IPA) Identify the grammatical concepts used in entries, linked to GOLD (Farrar & Langendoen, this symposium) Generate XML – structured using modified EMELD IGT format (Bow, Hughes & Bird 2003) Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA

Archival Hopi Dictionary Record Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA

Recipe for Resource Conversion Choose data format that is easily archived Where the software provides for data migration, or, The data format itself is easily converted Use existing software to bring you as close to Archival Form as possible (Intermediary Form) Clearly identify Content and structural semantics (“terms”) Fonts used (and transforms) Data alignment Construct transforms/macros/software to convert to Archival form Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA