Archiving and linguistic databases Jeff Good, MPI EVA LSA Annual Meeting Oakland, California January 6, 2005 Available at:

Slides:



Advertisements
Similar presentations
European Judicial Network The EJN website:
Advertisements

Information Systems Technology Ross Malaga B Copyright © 2005 Prentice Hall, Inc. B-1 WORKING WITH DATABASES.
Slide 1 Insert your own content. Slide 2 Insert your own content.
Chapter 6 Flowcharting.
1 of 18 Information Dissemination New Digital Opportunities IMARK Investing in Information for Development Information Dissemination New Digital Opportunities.
1 of 16 Information Access The External Information Providers © FAO 2005 IMARK Investing in Information for Development Information Access The External.
1 of 18 Information Access Introduction to Information Access © FAO 2005 IMARK Investing in Information for Development Information Access Introduction.
1 of 15 Information Access Internal Information © FAO 2005 IMARK Investing in Information for Development Information Access Internal Information.
28 April 2004Second Nordic Conference on Scholarly Communication 1 Citation Analysis for the Free, Online Literature Tim Brody Intelligence, Agents, Multimedia.
OLAC Metadata Steven Bird University of Melbourne / University of Pennsylvania OLAC Workshop 10 December 2002.
Jan 7, 2005 Linguistic Society of America 2005 Annual Meeting, Oakland, CA The E-MELD Project: Helen Aristar Dry The LINGUIST List Eastern Michigan University.
Getting Involved in OLAC Steven Bird University of Pennsylvania LREC Symposium: The Open Language Archives Community 29 May 2002.
SEM25-01 ETSI Documentation Service (EDS) Antoinette van Tricht Editor © ETSI All rights reserved ETSI Seminar.
In the Format section, we have activated the Bibliographic style drop down menu. From this page, you can choose a specific journal or format (e.g. BMC.
List and Search Grants Chapter 2. List and Search Grants 2-2 Objectives Understand the option My Grants List Grant Screen Viewing a Grant Understand the.
0 - 0.
ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.
Addition Facts
Quantitative Data Preparation Louise Corti ESDS/ UKDA Social Science Data Archives for Social Historians: creating, depositing and using qualitative data.
Quantitative Data Preparation Alasdair Crockett, Data Services Manager UK Data Archive.
SADC Course in Statistics Session 4 & 5 Producing Good Tables.
SADC Course in Statistics Excel for statistics Module B2, Session 11.
1 © Netskills Quality Internet Training, University of Newcastle From My Home Page to FrontPage An Overview of Authoring Tools Patris van Boxel Netskills.
Andy Powell, Eduserv Foundation Feb 2007 The Dublin Core Abstract Model – a packaging standard?
Introduction Lesson 1 Microsoft Office 2010 and the Internet
INTERNET PROTOCOLS Class 9 CSCI 6433 David C. Roberts Entire contents copyright 2011, David C. Roberts, all rights reserved.
DIGITAL ACCOUNTING RESEARCH CONFERENCE (2005) The Application of Electronic Forms in the Financial Work Flow.
Campaign Overview Mailers Mailing Lists
Word Lesson 7 Working with Documents
“The Honeywell Web-based Corrective Action Solution”
Integration Integrating Word, Excel, Access, and PowerPoint
INTRODUCTORY MICROSOFT ACCESS Lesson 1 – Access Basics
XML and General Dutch Dictionary (ANW) Van der Kamp, Lexical databases and digital tools, april 29 th, 2005, 1 Peter van der Kamp
1 Managing Citations and References with Software Tools CASLIN 2008, Blansko, Czech Republic 16th June 2008 Linda Skolková
Language data and XML: archiving and interoperability Simon Musgrave Linguistics Program Monash University
Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.
We are partners in learning. Name, Title Date
Addition 1’s to 20.
Test B, 100 Subtraction Facts
‘You Be George’ Activity. ProblemLearning TargetRight?Wrong?Simple mistake? More study? 1 Place Value: I can write numerals in expanded form to 10 thousands.
INTRODUCTORY MICROSOFT ACCESS Lesson 6 – Integrating Access
Week 1.
Lesson 11 Presentation Graphics
Building an EMS Database on a Company Intranet By: Nicholas Bollons Sally Goodman.
13-1 © Prentice Hall, 2004 Chapter 13: Designing the Human Interface (Adapted) Object-Oriented Systems Analysis and Design Joey F. George, Dinesh Batra,
Computer Concepts BASICS 4th Edition
Transferring your PST to your Home-Drive. NAVCOMTELSTA-GUAM N8 / LNSC.
Excel Lesson 17 Importing and Exporting Data Microsoft Office 2010 Advanced Cable / Morrison 1.
© Paradigm Publishing, Inc Access 2010 Level 2 Unit 2Advanced Reports, Access Tools, and Customizing Access Chapter 8Integrating Access Data.
1 Implementing DDIEditor in the Danish Data Archive - Demonstration and gained experience Part of session: Recent Developments in the DDI Implementation.
Benchmark Series Microsoft Excel 2013 Level 2
Microsoft Access Exporting Access Data and Mail Merging.
Overview Importing text files Creating Forms Creating Reports.
Tutorial 8 Sharing, Integrating and Analyzing Data
© Paradigm Publishing, Inc Access 2010 Level 1 Unit 2Creating Forms and Reports Chapter 8Importing and Exporting Data.
Competency #6 MTT Preparation Manual. Competency #6 The master technology teacher demonstrates knowledge of how to communicate in different formats for.
1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Access 2010 by Robert Grauer, Keith Mast, and Mary Anne.
1 XML: an introduction David Nathan. 2 XML  an in-line markup system  single sequence of plain text only (but can be unicode)  equivalent to a tree.
Data Management David Nathan & Peter Austin & Robert Munro.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Resource Conversion William Lewis CSU Fresno.
IB ITGS Case Study. Introduction: Serving thousands of clients, it is method of environment-friendly green ticketing. User friendly system which minimizes.
Databases: An Overview Chapter 7, Exploring the Digital Domain.
1 Digital Preservation Testbed Database Preservation Issues Remco Verdegem Bern, 9 April 2003.
Excel and Data Analysis. Excel can be a powerful tool for analysis Excel provides many tools for analyzing data –Filtering –Sorting –Formulas –Charts.
A computer contains two major sets of tools, software and hardware. Software is generally divided into Systems software and Applications software. Systems.
Managing ETDs with Associated Complex Digital Objects Gabrielle V. Michalek Director, Scholarly Publishing, Archives and Data Services Carnegie Mellon.
Global Rangelands Data Entry Guidelines March 23, 2015.
How to get started with RefWorks
How to get started with RefWorks
Access Lesson 14 Import and Export Data
Presentation transcript:

Archiving and linguistic databases Jeff Good, MPI EVA LSA Annual Meeting Oakland, California January 6, 2005 Available at:

2 Goals Cover important conceptual issues in designing a linguistic database Discuss some steps to take in building a database Discuss practical issues in creating archivable versions of databases

3 What is a database? Here, at least, Im considering it to be any digitally-encoded data which is structured in a well-defined way A dictionary, a text corpus could be considered a database in this sense A journal article would not be a database in this sense

4 Databases overview One could, in principle, encode a database in files produced by a word processor However, the existence of more specialized tools like database and spreadsheet software allows one to encode the logical structure of some set of data By using a logical encoding, it then becomes easy to quickly generate useful different views of a single underlying data set

5 Database views A given underlying logical structure must be given some surface structure to be viewed by humans The following example of multiple views of a Kanarese paradigm comes from Penton et. al (2004)

6

7 Logical structure The logical structure of the Kanarese paradigm

8 Logical structure Linguists do not generally think explicitly about the logical structure of the types of data they work with However, we do frequently work with data formats for which there are standardized ways of presenting their logical structure For example, a word list entry Example entry: chien n. dog Logical structure: headword pos. gloss

9 Building a database Things to consider when building a database What is the logical structure of my data? What kinds of views (or products) do I intend to produce with the database? Do I have special computing needs limiting my software choices (e.g., need special character support, primarily working online/offline, only have limited computing power)?

10 Building a database There are many tools which can produce linguistic databases, though not all are suited for encoding all kinds of logical structures For complex logical structures specialized database software, e.g. FileMaker Pro, SQL database, may be required For simple databases, software which is good at producing tables, e.g., Microsoft Excel or Microsoft Word XML editor for producing XML databases

11 Archiving Your choice of a tool will also be influenced by the products you wish to produce The one product which needs to be considered at the outset by any project is the archival format of the database

12 Archiving For now, the only electronic archival formats for databases are text files formatted with a machine-readable encoding of the logical structure of the data in the database The overarching goal of an archive format: Self-documenting, machine-readable encoding of logical structure In theory, best practice is to use XML In practice, the necessary tool support isnt sufficient for the needs of the ordinary working linguist

13 Self-documenting, machine-readable word-list record in XML Archiving chien n. dog

14 Archiving Same kind of data, not best practice, but still good practice, in tab-delimited text with carriage returns separating records headwordposgloss chiennoundog chatnouncat...

15 Archiving Some common bad practices Not regularly producing an archive format for your database (e.g., working solely with a FileMaker or Excel file) Not documenting the structure of your database and notational conventions used within it

16 Summary Come to an understanding of the logical structure of your data before building a database Consider the kinds of views you will need of your data when choosing a tool for building a database From the outset, develop a plan for regularly producing a version of your database in an archive format

17 Reference Penton, David, Catherine Bow, Steven Bird, and Baden Hughes Towards a general model for linguistic paradigms. Proceedings of the E-MELD 2004 Workshop on Linguistic Databases and Best Practice, Detroit, Michigan. Available at: Acknowledgements I would like to thank all the presenters and participants at the 2004 E- MELD workshop on Linguistic Databases and Best Practice. The bulk of the content of this talk consists of my own interpretation of the discussion at that workshop. Available at: