Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Slides:



Advertisements
Similar presentations
ESDS Qualidata Libby Bishop, ESDS Qualidata Economic and Social Data Service UK Data Archive ESDS Awareness Day Friday 5 December 2003Royal Statistical.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
C Introduction to the Geostat project Session on User needs (Geostat workshop in Bled 1-3 october 2008) Lars H. Backer
TUTORIAL Version beta 3.0b Start Table of Contents Getting Started…………………………. Getting Started Search…………………………………… …. Search Creating New Resources…………….
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Environmental Information Data Centre: enabling the discovery of CEH-held data John Watkins Deputy Director EIDC.
Archaeoinformatics.org Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, Arizona State University The Pennsylvania.
Digital Antiquity Envisioning the Digital Archaeological Record Keith Kintigh School of Human Evolution & Social Change Arizona State University CAA 2009.
Information Types and Registries Giridhar Manepalli Corporation for National Research Initiatives Strategies for Discovering Online Data BRDI Symposium.
Prentice Hall, Database Systems Week 1 Introduction By Zekrullah Popal.
National Digital Repository ® Preserving the imperfect: reflections from NDAD and elsewhere Kevin Ashley Head of Digital Archives Group ULCC.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
The MetaDater Model and the formation of a GRID for the support of social research John Kallas Greek Social Data Bank National Center for Social Research.
Systems Engineering Foundations of Software Systems Integration Peter Denno, Allison Barnard Feeney Manufacturing Engineering Laboratory National Institute.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
SERNEC Image/Metadata Database Goals and Components Steve Baskauf
Networking Session: Global Information Structures for Science & Cultural Heritage - The Interoperability Challenge «INTEROPERABILITY FROM THE CULTURAL.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Database Systems COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.
Database Design - Lecture 1
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Research Data Management At the Smithsonian Using SIdora Nano Tech Working Group May 15, 2014.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Development of metadata in the National Statistical Institute of Spain Work Session on Statistical Metadata Genève, 6-8 May-2013 Ana Isabel Sánchez-Luengo.
Chapter 6 SAS ® OLAP Cube Studio. Section 6.1 SAS OLAP Cube Studio Architecture.
CODATA 2006 Beijing - E-Science Session The Role of Scientific Data in e-Science: How Do We Preserve All Necessary Data So They are Useful John Rumble.
Design central EMODnet portal Objectives, Technical Proposal and Consultation Process.
Using SAS® Information Map Studio
IASSIST 2008 Collection, Communication, Access and Preservation IASSIST 2008 – session E3 Yesterday, Today and Tomorrow: Data on the Web from Vision to.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Digital Antiquity Data Integration with tDAR The Digital Archaeological Record: The potentials of archaeozoological data integration through tDAR Katherine.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Library Repositories and the Documentation of Rights Leslie Johnston, University of Virginia Library NISO Workshop on Rights Expression May 19, 2005.
An Introduction. Aspiration To begin the process of adding significant value to those emerging repositories in which.
TAG: Transatlantic Archaeology Gateway Faunal Remains Workshop York 10 March 2010.
1 Digital Preservation Testbed Database Preservation Issues Remco Verdegem Bern, 9 April 2003.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
Database Concepts Track 3: Managing Information using Database.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Research Data Management At the Smithsonian Using Sidora CNI December 10, 2013.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
The Importance of Standards in Digital Preservation Tina Norris Kayla Payne Jennifer
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
Jon Bateman Transatlantic Archaeology Gateway The Transatlantic Archaeology Gateway: fishing data from the pond Jon Bateman and.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
Research Data Management At the Smithsonian PASIG, Washington, DC May 24, 2013.
Landscape Heritage Sustainable Development Indicator Assessment using Geographical Information Systems in County Clare Lianda d’Auria Department of Geography,
Carl Lagoze Digital Library Service Registry Workshop Services in a Scholarly Communication Framework.
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
Data Stewardship Lifecycle A framework for data service professionals Protectors of data.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Discover ScholarSphere A repository service collaboration between the University Libraries and ITS.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Databases and Information Management
Chapter 1 Database Systems
Databases and Information Management
Metadata The metadata contains
Bird of Feather Session
Presentation transcript:

Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State University In the Session Organized by Stuart Jeffrey Taking the Long View: Putting Sustainability at the Heart of Data Creation CAA Granada 7 April 2010

Digital Antiquity Sustaining Database Semantics Background Today, digital databases (spreadsheets) are often the only loci of irreplaceable records of systematically collected archaeological observations In the US, databases are often not curated at all and are rapidly being lost. Digital repositories e.g., ADS & tDAR can provide preservation and access

Digital Antiquity Sustaining Database Semantics What Semantic Metadata are Necessary to Adequately Sustain/Document a Database? Sufficient information for an archaeologist not familiar with the specifics of a project to make sensible analytical use of the data Necessary for comparative and synthetic research Necessary to reevaluate conclusions based on systematic evidence Our ethical (legal) obligation is to preserve our data make data useable

Digital Antiquity Sustaining Database Semantics Adequate Preservation is Rarely Achieved in Museum Contexts Too frequently the media are curated so there is no long term preservation of data Semantic metadata is often on paper e.g., existing coding manual, coding keys But adequate semantic documentation is more comprehensive than analysts would typically think to write down

Digital Antiquity Sustaining Database Semantics Documenting Databases Internally encoded: Structure, Table Names, Column Names & Data Types Usually not internally encoded: Each Column Nature of the column values (not just string, etc.) Arbitrary (lot number, provenience label) Measurement (units of measure and methods) Coded or abbreviated value (nominal variables) Coded Nominal Values within Columns Label & description of every value and how it is distinguished from others (101=rabbit)

Digital Antiquity Sustaining Database Semantics More Subtle Points Are all values in a coding key used? Fish vs species of fish; birds, reptiles etc. Can lead to conclusion that a species, of bird, for example, is absent when in fact species was not recorded to this level (i.e., missing data) Academic traditions influence what is needed in more subtle ways. What constitutes an adequate description varies. What works for an Americanist might not work for a European Medievalist Probably no absolute adequacy We can do better and we must move forward

Digital Antiquity Sustaining Database Semantics Our Approach

Digital Antiquity Sustaining Database Semantics Digital Antiquity Digital Antiquity is a newly established multi- institutional organization based in the US devoted to enhancing preservation and access to the digital records of archaeological investigations: to permit scholars to more effectively create and communicate knowledge of the long-term human past; to enhance the management of archaeological resources; and to provide for the long-term preservation of irreplaceable records of archaeological investigations. Business model targets technical, financial and sociological sustainability in 4-5 years

Digital Antiquity Sustaining Database Semantics Digital Antiquity’s Software Aspiring to be an on-line, open source, trusted digital repository for archaeological data and documents Provides preservation and free, on-line discovery and access for archaeological data and documents Web-based ingest interface: the contributor uploads data and is prompted for detailed metadata Advanced tools for data integration across inconsistently recorded databases

Digital Antiquity Sustaining Database Semantics Database Ingest Elicit Project & Information Resource metadata Location, Time, Keywords, Credit, etc

Digital Antiquity Sustaining Database Semantics Upload the Database

Digital Antiquity Sustaining Database Semantics Database Documentation For each column in the database Indicate data type (measurement or coded integer) Indicate the material class and nature of variable For each measurement, elicit units (e.g., m, kg) For each coded value (string or number) Provide a digital “Coding Sheet” specific to that analyst and dataset that associates codes with labels and descriptions Associate each coded value labels with an ontology node with a standard definition The original values do not change

Digital Antiquity Sustaining Database Semantics Column Registration

Digital Antiquity Sustaining Database Semantics Coding Sheets

Digital Antiquity Sustaining Database Semantics Ontologies Ontology is a map of the semantic relationships among a set of concepts. In tDAR, ontologies are ordinarily hierarchical (tree-like) and represent an arbitrary number of levels of class-subclass relationships For a given variable, a user community develops an ontology to enable integration –not centrally controlled

Digital Antiquity Sustaining Database Semantics Define Ontology

Digital Antiquity Sustaining Database Semantics

Map Coding Sheet to Ontology

Digital Antiquity Sustaining Database Semantics Integration: Standard Approach Standardization at or before the time of data ingest (least common denominator) This will fundamentally not work in archaeology For legacy data sets, the lcd is very low Different regional traditions in terminology, materials (lithics ceramics), and their analyses Enforced standardization is a non-starter for the profession in the US

Digital Antiquity Sustaining Database Semantics tDAR Data Integration Because the digital encoding of the semantics known to the repository We have the ability to combine datasets Created by different investigators Using incommensurate coding schemes into a dataset in which the observations are analytically comparable

Digital Antiquity Sustaining Database Semantics tDAR Process Query to Identify Relevant Databases User selects databases move into user workspace Select Columns to Integrate Specify Filtering & Aggregation of Ontology Values Perform Aggregation Obtain integrated database with commensurate observations Download Result & Analyze It In Place (beta, needs documentation)

Digital Antiquity Sustaining Database Semantics Query

Digital Antiquity Sustaining Database Semantics Add Results to Workspace

Digital Antiquity Sustaining Database Semantics Select Databases to Integrate

Digital Antiquity Sustaining Database Semantics Define Integration Conditions

Digital Antiquity Sustaining Database Semantics Filtering and Aggregation

Digital Antiquity Sustaining Database Semantics Initial Datasets Knowth Durrington Walls

Digital Antiquity Sustaining Database Semantics Integrated Dataset

Digital Antiquity Sustaining Database Semantics Output Output Database 3 columns, area, FUSD FUSP observations from both datasets (with any filtering eliminating cases) provenience and stratum values are the same as in the original databases Taxon values are values in the ontology with aggregation performed Database is downloaded and analysed by user.

Digital Antiquity Sustaining Database Semantics Output File

Digital Antiquity Sustaining Database Semantics To Come in tDAR Integration User dictated integration is in place Query-oriented, ad hoc data integration Based on a query, tDAR identifies databases that satisfy data requirement of the query: i.e., that are relevant and record needed variables Interact, as necessary with the user Perform integration on-the-fly, i.e. using ontologies, align key portions of the metadata for the selected columns Output is an integrated dataset with maximum resolution and minimal changes

Digital Antiquity Sustaining Database Semantics Acknowledgments Andrew W. Mellon Foundation National Science Foundation Collaborators at ASU K. Selcuk Candan, Tiffany Clark, Hasan Davulcu, John Howard, Shelby Manney, Ben Nelson, Margaret Nelson, Yan Qi, Katherine Spielmann Digital Antiquity Board of Directors Sander van der Leeuw, Arizona State University (ASU) [chair] Carol Ackerson, Girl Scouts Arizona Cactus-Pine Council Jeffrey Altschul, SRI Foundation Kim Bullerdick, Owner, BI, L.L.C. Jaime Casap, Google, Inc. John Howard, University College, Dublin Keith Kintigh, ASU Tim Kohler, Washington State University Fred Limp, University of Arkansas Harry Papp, L. Roy Papp & Associates Julian Richards, University of York Dean Snow, The Pennsylvania State University

Digital Antiquity Sustaining Database Semantics Questions?