Complex Data Transformations in Digital Libraries with Spatio-Temporal Information B. Martins, N. Freire, J. Borbinha Instituto Superior Técnico, Technical.

Slides:



Advertisements
Similar presentations
The REPOX system Nuno Freire -
Advertisements

Access control for geospatial information objects using/extending the eXtensible Access Control Markup Language Andreas Matheus, Technische Universität.
Schedule of Releases (since Tromso meeting) and New Access Interfaces.
Advanced XSLT II. Iteration in XSLT we sometimes wish to apply the same transform to a set of nodes we iterate through a node set the node set is defined.
An Introduction to MODS: The Metadata Object Description Schema Tech Talk By Daniel Gelaw Alemneh October 17, 2007 October 17, 2007.
Introduction to Spatial Database System Presented by Xiaozhi Yu.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
OneGeology-Europe - the first step to the European Geological SDI INSPIRE Conference 2010, Session Thematic Communities: Geology Krakow, June 24 th 2010.
Nov Copyright Galdos Systems Inc. November 2001 Geography Markup Language Enabling the Geo-spatial Web.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Implementation of One Stop Search by XSLT By Dave Low University of Hong Kong 9-Dec-2003.
September 15, 2003Houssam Haitof1 XSL Transformation Houssam Haitof.
Overview of Search Engines
XIS™ XML Intranet System. XIS, the XML Intranet System provides the foundation for your database production and management. XIS maximizes the flexible.
Batch-conversion of Non-standard Multiscript Records by XSLT Lucas Mak Metadata and Catalog Librarian Michigan State University Catalog Management Interest.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
OCLC Online Computer Library Center Two Paths to Interoperable Metadata Jean Godby, Devon Smith, Eric Childress DC-2003 September 29, 2003.
ECPRD seminar on the net IX”, Brussels, 2011 Faceted Search Some examples of applied faceted search on websites developed by the EP Jerry.
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
NCSU Libraries Ingest Workflow Issues: Metadata North Carolina Geospatial Data Archiving Project Steve Morris North Carolina State University Libraries.
Implemented Systems Presenter: Manos Karpathiotakis Extended Semantic Web Conference 2012.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Comparing XSLT and XQuery Michael Kay XTech 2005.
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
Lucas Mak and Dao Rong Gong Michigan State University Millennium and XML: Repurposing and Customizing Metadata May , 2009.
XML BIS4430 – unit 10. XML Origins Extensible Markup Language (XML) 1998 Inspired by Standard Generalized Markup Language (SGML) and HTML. SGML defines.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Interoperability in Information Schemas Ruben Mendes Orientador: Prof. José Borbinha MEIC-Tagus Instituto Superior Técnico.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
1 HKU CSIS DB Seminar: HKU CSIS DB Seminar: Web Services Oriented Data Processing and Integration Speaker: Eric Lo.
AIXM Users’ Conference, March Implementing AIXM in Instrument Flight Procedures Automation Presenter: Iain Hammond MacDonald, Dettwiler &
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
The OpenGIS Consortium Geog 516 Presentation #2 Rueben Schulz March 2004.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Serving society Stimulating innovation Supporting legislation Joint Research Centre The Inspire Geoportal Validator.
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Roy Tennant Life After MARC A Metadata Infrastructure for the 21st Century.
Pusan National University, Korea Joon-Seok Kim Taehoon Kim Ki-Joune Li.
1 Introduction  Extensible Markup Language (XML) –Uses tags to describe the structure of a document –Simplifies the process of sharing information –Extensible.
1 Overview of XSL. 2 Outline We will use Roger Costello’s tutorial The purpose of this presentation is  To give a quick overview of XSL  To describe.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
1 NODC Geoportal Server Yuanjie Li & Jefferson Ogata.
XML and Database.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Review of Paper: Johan Hjelm “Position dependent services using metadata profile matching” Youyong Zou Apr.15,2001.
INRIA - Progress report DBGlobe meeting - Athens November 29 th, 2002.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Alexandria Digital Library Project Four Steps to Geospatial Enlightenment Greg Janée Additional text in “Notes” view.
Martin Kruliš by Martin Kruliš (v1.1)1.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Apache Cocoon – XML Publishing Framework 데이터베이스 연구실 박사 1 학기 이 세영.
XML 1. Chapter 8 © 2013 Pearson Education, Inc. Publishing as Prentice Hall SAMPLE XML SCHEMA (XSD) 2 Schema is a record definition, analogous to the.
1 XML and XML in DLESE Katy Ginger November 2003.
Product Training Program
ESIP Discovery – Show & Tell ESIP Summer Meeting 2011 Matt Cechini
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
XML in Web Technologies
The Re3gistry software and the INSPIRE Registry
Application of Dublin Core and XML/RDF standards in the KIKERES
and perspectives for AIXM
Presentation transcript:

Complex Data Transformations in Digital Libraries with Spatio-Temporal Information B. Martins, N. Freire, J. Borbinha Instituto Superior Técnico, Technical University of Lisbon 2008 International Conference on Asia-Pacific Digital Libraries

Introduction and Motivation The DIGMAP project addressed the development of a digital library for materials related to old maps – Collecting metadata from different providers (e.g. OAI-PMH servers) – Processing the metadata and enriching it with inferred spatio-temporal information Challenges in handling heterogeneous metadata – Transforming the original sources into the DIGMAP format (i.e., TEL profile) – Dealing with data inconsistency, non-uniformity, incorrectness and incompleteness – Handling the spatio-temporal information (e.g. dates and geospatial coordinates) Challenges in DIGMAP service interoperability – Using the results from DIGMAP services to enrich the metadata DIGMAP required appropriate XML processing technology for dealing with the above challenges

The Proposed Solution Use XML processing languages like XSLT and XQuery Extend the XPath 2.0 function library – Functions for managing geospatial information – Functions for managing temporal information – Functions for text processing – Other miscellaneous functions All the advantages of declarative languages like XSLT and XQuery, together with powerful methods for handling complex transformations

Outline Introduction Proposed Extensions to the XPath Function Library Implementation Issues Test Cases Within the DIGMAP Project Conclusions and Future Work

The Proposed Extensions Extensions for geospatial data handling – Combining spatial elements according to a geospatial predicates such as distance or intersection – Input given in GML, KML or textual strings with geospatial coordinates Extensions for temporal reasoning – Combining temporal information according to the predicates of Allen’s Algebra for temporal intervals – Input given in GML or string encodings (e.g. the ISO 8601 formats) Extensions for text mining – Keyword matching and textual similarity – Standard text mining operations (e.g. language recognition) Other miscellaneous extensions – Handling JDBC calls and calls to external Web services

Geospatial Data Handling Operators for performing geospatial analysis based on the OGC Simple Features and Filter Encoding specifications – Distance, union, intersection or difference between two geometries – Validity of a given spatial filter – Check if two geometries are spatially related (e.g. containment or overlap) – Check if two geometries fall bellow a given distance threshold – Area, length, buffer, centroid, boundary or envelope of a geometry – Geometric computations (e.g. translation or scaling) over a geometry – Conversion between GML, KML, C-Square, Geohash or WKT encodings – Transformations on the coordinate systems used in geometries

Temporal Data Handling Operators for temporal analysis based on Allen's interval algebra – Distance, union, intersection or difference between temporal intervals – Check if two intervals are related (e.g. containment or overlap) Other operators for temporal data handling – Compute lengths for temporal intervals (e.g. return seconds or years) – Conversion between GML and string encodings

Textual Data Handling Keyword matching and textual similarity – Tokenization and keyword-based search – Phonetic similarity (Soundex and Double Metaphone) – String similarity (e.g. Edit Distance, Jaro, Jaro-Winkler, Q-grams, …) Standard text mining operations – Language recognition – Keyword extraction (statistically significant keywords) – Named entity recognition (regexp, dictionaries or machine learning) – Text classification (machine learning)

Miscellaneous Functions Calling external Web services (REST and SOAP) Conversion from XML to JavaScript Object Notation (JSON) Handling Java DataBase Connectivity (JDBC) calls Reading malformed HTML Converting MARC formats into XML (MarcXml or MarcXchange) …

Implementation Issues Proposed extensions implemented on top of SAXON – SAXON is an open source XSLT/XQuery processor – Extension functions coded in Java (static methods) – Extension functions called by binding the Java class to a specific namespace – SAXON takes care of converting the arguments to make the functions fit Most extensions are wrappers over existing open-source libraries – GeoTools and Java Topology Suite (JTS) for the geospatial functions – Lucene and Nux for keyword matching – SimPack for textual similarity – NGramJ and LingPipe for text mining – MARC4J for metadata crosswalks (i.e. handling MARC formats) – Apache AXIS for external Web service calls

Test Cases Within DIGMAP Conversion between different metadata standards – Converting UNIMARC, MARC21 and other formats into the DIGMAP format – Geospatial coordinates were often given originally in general textual fields – DIGMAP currently indexes over metadata records from different sources Wrappers around DIGMAP XML service interfaces – The DIGMAP Gazetteer uses formats like Alexandria DL Gazetteer Service format, KML, geoRSS, … – The DIGMAP GeoParser uses formats like SpatialML, geoRSS, OGC GeoParser, … – Converting between the different formats and calling the services for processing the metadata records Internal development of several DIGMAP services – Data integration within the DIGMAP Gazetteer – Convert different input sources into the Alexandria DL Gazetteer Content Standard – Handling duplicates and small corrections to the data The proposed approach was found to be expressive and computational performance was within acceptable bounds

An Example XQuery An XQuery for reading gazetteer data from an HTML source and convert the data Into the Alexandria DL Gazetteer Content format

Conclusions Data transformations in Digital Libraries can be very complex – Standard XML processing technology is often not enough – But simple extensions can add the required extra functionality We propose using extension functions to the XPath 2.0 library – Declarative syntax of XSLT and XQuery is not affected – Extension functions add the required extra functionality Used in DIGMAP collection building and service composition – Converting between different metadata formats – Handling the spatio-temporal information included in the metadata – Calling DIGMAP services to enrich the metadata records

Currently Ongoing Work Implementing a visual interface for encoding the metadata transformations Visual “pipelines” converted into XQuery instructions Hide the complexity of the XSLT/XQuery languages from non-expert users

Thanks for your attention.