Prototyping Digital Libraries Handling Heterogeneous Data Sources – An ETANA-DL Case Study Unni Ravindranathan, Rao Shen, Marcos André Gonçalves, Weiguo Fan, Edward A. Fox, James W. Flanagan Virginia Tech, Blacksburg, VA, USA (and CWRU) ECDL 2004, Bath, England, September 2004
Acknowledgements (Selected) Sponsors: NSF grant ITR ; AOL, ASOR, CWRU, ETANA, Vanderbilt U., Virginia Tech Faculty/Staff: Lillian Cassel, Debra Dudley, Roger Ehrich, Manuel Perez, Naren Ramakrishnan VT (Former) Students: Aaron Krowne, Ming Luo, Fernando Das Neves, Ricardo Torres, Hussein Suleman
Acknowledgements (contd.) Karen Borstad, MPP Douglas Clark, Walla Walla College Joanne Eustis, CWRU Nick Fischio, CWRU Paul Gherman, Vanderbilt U. Andrew Graham, U. Toronto Tim Harrison, U. Toronto Larry Herr, Canadian University College Christopher Holland, LRP Paul Jacobs, Mississippi State U. Douglas Knight, Vanderbilt U. Stan LaBianca, Andrews U. David McCreery, Willamette U. Eric Meyers, Duke U. Adam Porter, Illinois College Jack Sasson, Vanderbilt U. Tom Schaub, Indiana U. of Penn. Randall Younker, Andrews U.
Outline Problems Background Approach ETANA-DL ETANA-DL Prototype System Modeling ETANA-DL ETANA-DL Services Analysis Conclusions Future Work
Problems Interoperability among heterogeneous archaeological systems Delay in publication of primary archaeological data Lack of sustainable solutions to long-term preservation of valuable information Lack of services useful to the archaeology community, including “traditional DL services” Difficulty in understanding complex archaeological information systems Difficulty in requirements elicitation for archaeological systems
Outline Problems Background Approach ETANA-DL ETANA-DL Prototype System Modeling ETANA-DL ETANA-DL Services Analysis Conclusions Future Work
Open Archives Initiatives Promotes interoperability among DLs Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Data Provider possess metadata and share it (internally / externally) via well-defined OAI protocols (e.g., database servers) Service Provider harvest data from Data Providers provide higher-level services to users
Traditional Digital Libraries ? Program Document Document Document Program Program Image Image Image Video Video Video ? Monolithic and/or Custom-built web-based application UsersDigital Library Digital Objects
Introduction to ODL (Open Digital Libraries) Open Digital Libraries Framework for componentized Digital Libraries Design principles for components Protocols for inter-component communications Built upon OAI
Open Digital Libraries Approach UsersETANA-DLSites Bone Search Filter Union Recent Browse USER INTERFACE Filter Seed Figurine Pottery
Basic ODL Model: An application for Archaeology OAI Data Provider OAI-PMH ODL Protocol User Interface Nimrin ETANA-DL Union Catalog OAI-PMH ETANA-DL Search Engine ODL Service Provider Component WWW Interface ODL Protocol
Componentized services example User Search Handler Servlet Query Results IRDB Search Engine User Interface Index DB Query in the IRDB query language Results in XML Query Parsed XML
5S Model – Informally Digital libraries are complex information systems that: help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams)
Outline Problems Background Approach ETANA-DL ETANA-DL Prototype System Modeling ETANA-DL ETANA-DL Services Analysis Conclusions Future Work
Solution – our approach Applying and extending Digital Library (DL) techniques to solve the following problems: interoperability, making primary data available, data preservation Modeling archaeological information systems using 5S theory to better understand the domain and design the system and the supported services Rapidly prototyping DLs that handle heterogeneous archaeological data using componentized frameworks: requirements elicitation, provide useful services.
Outline Problems Background Approach ETANA-DL ETANA-DL Prototype System Modeling ETANA-DL ETANA-DL Services Analysis Conclusions Future Work
ETANA-DL Archaeological Digital Library Applies and extends the OAI-PMH Open Archives Initiative Protocol for Metadata Handling Design considerations Componentized Distributed architecture Extensible Portable
ETANA Digital Library Core Components - DigBase DigBase (DB) Central repository - stores metadata Union catalog - for the collections in ETANA-DL Various kinds of digital objects – excavation records, images, text collections, etc. General services - Search, Browse, Annotate, Recommend, etc. Archaeology-specific services - artifact analysis, visualizations, artifact interpretation, workflows, etc.
ETANA Digital Library Core Components - DigKit DigKit (DK) A suite of tools for collecting and recording archaeological data in the field, that can be used for a new dig Metadata will migrate to DigBase (DB). Real-time collaborative archaeology: Metadata in DB will be rapidly available to others.
Outline Problems Background Approach ETANA-DL ETANA-DL Prototype System Modeling ETANA-DL ETANA-DL Services Analysis Conclusions Future Work
Architecture Union Catalog Inverted Files DB used by Services Index Browse Engine Search Component Browse DB Other ETANA-DL Services Web Interface XOAI DigBase DB Data Mapping Component OAI Data Provider OAI Archaeological Site ETANA-DL DigKit Configure
Modeling ETANA-DL – An Archaeological DL Meta-model Text Video Audio *Site *Sub-partition *Container*Artifact*LocusRegion Taxonomies Temporal Artifact-specific Space model Structure model Metadata DrawingPhoto3D Stream model *Partition Society model Archaeologist General public Geographic space Service Manager Information Satisfaction Value added Repository building Scenario model Services Domain specific User interfaceMetric space Spatial
Modeling ETANA-DL – The ETANA-DL model *Field*Pail *Bone *LocusJordan Taxonomies Space model Structure model Field record, locus sheet Figurine image (photo) Stream model Umayri Society model Archaeologist Generic public Site-specific coordinate system Web interface Vector space ETANA-DL Service Manager Searching, Browsing Annotation, binding Harvesting, Converting Scenario model Services Object comparison, marking item for analysis Archaeological periods Bone type Seed species *Square *Figurine *Quadrant*Bag *Locus Jordan Valley Nimrin *Square *Field*Basket*LocusSouthern IsraelHalif*Area *Seed Site/field plan (drawing) Preliminary/Final Report (application/pdf) Spatial
Modeling ETANA-DL – Mapping heterogeneous data to the structural model SitePartition Sub- partition LocusContainer Lahav Field I Area A8 Locus A8074 Basket 224 Nimrin Quadrant NW Quadrant Value N25/W50 Locus 96 Bag 240 Umayri Field A Square 7J59 Locus 001 Pail 12
Data Mapping
ETANA-DL Schema Design Bone Seed Figurine ETANA-DL Object Count Animal …… Species Name …… Description Dimensions …… Owner Subpartition Partition Locus ID Container Collection ……
Outline Problems Background Approach ETANA-DL ETANA-DL Prototype System Modeling ETANA-DL ETANA-DL Services Analysis Conclusions Future Work
ETANA-DL Services: Categories Information satisfaction Searching Browsing Recommendation Archaeology (Domain) specific Object comparison Marking items Value-added Annotation Items of interest (Binding service) Recent searches/discussions User management
Searching: Search Interface
Searching: Search Results
Searching: Advanced Search
Searching: Advanced Search Results
Multi Dimensional Browsing Site structure Temporal Object-specific User context
Searching within a Context
Searching within a Context: Search Results
Restoring Browsing Contexts
Object Comparison: Selecting Objects for Comparison
Object Comparison: Editing Attributes
Object Comparison: Comparing Objects
Object Comparison: Comparison Results
Marking items
Viewing marked items
Remarking items
Discussion Board (Annotation): View Messages
Discussion Board (Annotation): Post Messages/Replies
Collections Description
Other services Items of Interest (Binding service) Recent searches/discussions Recommendation User management Account creation Login
Items of Interest: Binding Service
Recent Searches/Discussions
Recommendation
User Management: New User Account
User Management: Login
User Management: Navigations
Outline Problems Background Approach ETANA-DL ETANA-DL Prototype System Modeling ETANA-DL ETANA-DL Services Analysis Conclusions Future Work
Heterogeneous data handling Site Artifact Type Original data source Number of attributes in original record Number of attributes in harvested record Number of records harvested LahavFigurine Tab-delimited text file Nimrin Bone field record Table in Oracle DB Seed field record Table in Oracle DB Umayri Bone field record 2 tables in Access DB Total10537
Heterogeneous data handling Site Data Analysis (in hours) Data Mapping (in hours) Data Provider Implementation (in hours) Service Provider Implementation (in hours) Lahav Nimrin48 41 Umayri Total
Heterogeneous data handling
Rapid prototyping: Lines of Code Type of Service LOC for implementing service LOC reused from components Total LOC Reuse Percentage Componentized Non- componentized Total
Rapid prototyping: Service development times Componentized Services Non-componentized Services
User Analysis Initial comments from all 3 projects, plus others interested in ETANA-DL Positive feedback – users liked: Data integration Prototype cross-collection information access services Information structuring Utility of supported services Negative feedback – user concerns: Need for service enhancements Usability
Outline Problems Background Approach ETANA-DL ETANA-DL Prototype System Modeling ETANA-DL ETANA-DL Services Analysis Conclusions Future Work
Conclusions Apply 5S to the archaeological domain Identified requirements for future versions of system Extensible and componentized approach for handling heterogeneous archaeological data from disparate sources Rapidly generated prototype archaeological DL Making primary archaeological data available without significant delay
Outline Problems Background Approach ETANA-DL ETANA-DL Prototype System Modeling ETANA-DL ETANA-DL Services Analysis Conclusions Future Work
Componentizing current DL services Creating next-generation DL services from expanding set of requirements Integrating richer content (Semi-)automatic data mapping Automating the ingest of DL content Enhancing interface capabilities Formal usability studies
Visual Browsing Visual Browse By sites
Visual Browsing: Topographical Drawings Full siteNorth west quadrant Square: N40/W20
Visual Browsing: Square information Loci layout Square: N40/W20 Locus: 86
Visual Browsing: locus sheet
Publications 1.U. Ravindranathan, R. Shen, M. A. Goncalves, W. Fan, E. A. Fox, J. W. Flanagan. ETANA-DL: A Digital Library for Integrated Handling of Heterogeneous Archaeological Data. To be presented at the ACM- IEEE Joint Conference on Digital Libraries (JCDL 2004), Tucson, AZ, June 7-11, U. Ravindranathan, R. Shen, M. A. Goncalves, W. Fan, E. A. Fox, J. W. Flanagan. ETANA-DL: Managing Complex Information Applications – An Archaeology Digital Library. Demo to be presented at the ACM-IEEE Joint Conference on Digital Libraries (JCDL 2004), Tucson, AZ, June 7-11, U. Ravindranathan, R. Shen, M. A. Goncalves, W. Fan, E. A. Fox, J. W. Flanagan. Prototyping Digital Libraries Handling Heterogeneous Data Sources – The ETANA-DL Case Study. European Conference on Digital Libraries (ECDL 2004), Bath, U.K., September 12-17, 2004 (submitted).
Questions/Feedback ??