Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu and Gagan Agrawal Enabling.

Slides:



Advertisements
Similar presentations
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Advertisements

IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
An Operational Metadata Framework For Searching, Indexing, and Retrieving Distributed GIServices on the Internet By Ming-Hsiang.
Data Grid Research Group Dept of Computer Science and Engineering The Ohio State University David Chiu and Gagan Agrawal Cost and Accuracy Sensitive Dynamic.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
New Approaches to GIS and Atlas Production Infrastructure for spatial data integration: across scales and projects Ilya Zaslavsky David Valentine San Diego.
An Architecture for Creating Collaborative Semantically Capable Scientific Data Sharing Infrastructures Anuj R. Jaiswal, C. Lee Giles, Prasenjit Mitra,
An Intelligent Broker Approach to Semantics-based Service Composition Yufeng Zhang National Lab. for Parallel and Distributed Processing Department of.
AceMedia Personal content management in a mobile environment Jonathan Teh Motorola Labs.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Introduction to databases from a bioinformatics perspective Misha Taylor.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Computer Science and Engineering A Middleware for Developing and Deploying Scalable Remote Mining Services P. 1DataGrid Lab A Middleware for Developing.
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Query Expansion.
Universität Stuttgart Universitätsbibliothek Information Retrieval on the Grid? Results and suggestions from Project GRACE Werner Stephan Stuttgart University.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Spatiotemporal Tile Indexing Scheme Oscar Pérez Cruz Polytechnic University of Puerto Rico Mentor: Dr. Ranga Raju Vatsavai Computational Sciences and Engineering.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
Ohio State University Department of Computer Science and Engineering 1 Supporting SQL-3 Aggregations on Grid-based Data Repositories Li Weng, Gagan Agrawal,
Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:
Ohio State University Middleware Systems Driven by Sensing Scenarios Gagan Agrawal CSE (Joint Work with Qian Zhu, David Chiu, Ron Li, Keith Bedford ….
Trisolda Jakub Yaghob Charles University in Prague, Czech Rep.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Auspice: AUtomatic Service Planning in Cloud/Grid Environments David Chiu Dissertation Defense May 25, 2010 Committee: Prof. Gagan Agrawal, Advisor Prof.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
LRI Université Paris-Sud ORSAY Nicolas Spyratos Philippe Rigaux.
Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
XML and Database.
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Ohio State University Department of Computer Science and Engineering An Approach for Automatic Data Virtualization Li Weng, Gagan Agrawal et al.
Elastic Cloud Caches for Accelerating Service-Oriented Computations Gagan Agrawal Ohio State University Columbus, OH David Chiu Washington State University.
1 WS-GIS: Towards a SOA-Based SDI Federation Fábio Luiz Leite Júnior Information System Laboratory University of Campina Grande
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
David Chiu and Gagan Agrawal Department of Computer Science and Engineering The Ohio State University 1 Supporting Workflows through Data-driven Service.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Knowledge Support for Modeling and Simulation Michal Ševčenko Czech Technical University in Prague.
MULTIMEDIA DATA MODELS AND AUTHORING
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Developing GRID Applications GRACE Project
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Introduction Multimedia initial focus
Doron Goldfarb & Yann LE FRANC
Research Issues in Electronic Commerce
Database Systems Chapter 1
Search Engine Architecture
Metadata Development in the Earth System Curator
The Ohio State University
Chaitali Gupta, Madhusudhan Govindaraju
Information Retrieval and Web Design
Presentation transcript:

Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu and Gagan Agrawal Enabling Ad Hoc Queries over Low-Level Scientific Data Sets

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Increased tremendously over the years Scientific Data Sets The collection of scientific data has increased over the years with new instruments, simulations, etc. Data sets are stored in repositories around the globe Just within U.S. entities in the geospatial domain ‣ NOAA: oceanic, climate, water quality,... ‣ NASA: ozone, air quality, tropical,... ‣ NRCS: land quality, watershed,...

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Increased tremendously over the years Scientific Data Sets The collection of scientific data has increased over the years with new instruments, simulations, etc. Data sets are stored in repositories around the globe Just within U.S. entities in the geospatial domain ‣ NOAA: oceanic, climate, water quality,... ‣ NASA: ozone, air quality, tropical,... ‣ NRCS: land quality, watershed,...

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Increased tremendously over the years Scientific Data Sets The collection of scientific data has increased over the years with new instruments, simulations, etc. Data sets are stored in repositories around the globe Just within U.S. entities in the geospatial domain ‣ NOAA: oceanic, climate, water quality,... ‣ NASA: ozone, air quality, tropical,... ‣ NRCS: land quality, watershed,...

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Data Repositories Web or Data Grid Infrastructure Mass Storage Systems (MSS)

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Scientific Data Sets Data sets are typically low level, i.e., ‣ Unstructured or semi-structured However, data is well-documented ‣ Accompanying XML-based metadata describing data sets is typically required in today’s repositories

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Data Repositories Mass Storage Systems (MSS) Grid/Web Services & portals Web or Data Grid Infrastructure

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Data Repositories in the Global Scale USEU AU...

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, What Do the Users Want? US EU AU... High level query... - Keywords - Natural language Don’t just give me the data, but... - Transform it - Manipulate it - Compose it with other processes and data sets And do this with the least amount of work required from me!

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, System Goals To enable queries over low level data sets, which involves: ‣ identification of relevant data sets ‣ automatic planning for the composition of dependent services (processes) for derivation... while being non-intrusive to existing schemes, i.e., ‣ avoids a standardized format for storing data sets ‣ accommodates heterogeneous metadata ‣ this system should - fit - into existing MSS and scientific computing infrastructures (Data Grid & the Web)

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, That’s good and all, but... Challenges Not without challenges... ‣ supporting high level user queries ‣ dealing with metadata from multiple entities ‣ efficiently identifying relevant data sets ‣ planning and executing accurate service compositions on the spot

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, That’s good and all, but... Challenges Not without challenges... ‣ supporting high level user queries ‣ dealing with metadata from multiple entities ‣ efficiently identifying relevant data sets ‣ planning and executing accurate service compositions on the spot DOMAIN KNOWLEDGE & SEMANTICS And without question, the need for

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Proposed System Overview

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, The Semantics Layer A Need for Domain Level Knowledge Assume the following service retrieves a satellite image pertaining to (x,y) with resolution respective to r Questions to ask the system: ‣ How to deduce that this service can be used? ‣ How to determine what information is needed for input? ‣ Did the user provide enough information to invoke this service? get_sat_image(double x, double y, double r) inputsTo longitudelatitudegrid_size outputsTo satellite image

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, In the Semantics Layer Applying Domain Information Domain concepts can be derived from executing a service Domain concepts can also be derived from retrieving an existing data set Service parameters represent different domain concepts

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Data Registration Service Indexing Data Sets Handling heterogeneous metadata For instance, just within the geospatial domain, CountryMetadata Standards USCSDGM AU, NZANZLIC EU??? CDN???...

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Data Registration Service Indexing Data Sets Handling heterogeneous metadata

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Data Registration Service Indexing Data Sets Metadata to DB transformations... (transform to spatial index)

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Data Registration Service Indexing Data Sets Metadata to DB transformations... insert

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Data Registration Service Indexing Data Sets

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Data Registration Service Indexing Data Sets

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Data Registration Service Indexing Data Sets

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Indexing Services Services (inputs, outputs) are also registered in much the same way

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, System Overview

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Supporting High Level Queries In supporting high level queries, recall our ontology for modeling domain semantics Entire system is domain-concept-driven So, we should decompose queries into concepts first

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Supporting High Level Queries

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Original Query: ‣ “return water level from station=32125 on 10/31/2008” The elements of our query have been parsed against the ontology Supporting High Level Queries

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Proposed System Overview

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, The Planning Layer Service Composition: An Example

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, In the Semantics Layer Applying Domain Information Domain concepts can be derived from executing a service Domain concepts can also be derived from retrieving an existing data set Service parameters represent different domain concepts

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, The Planning Layer Service Composition: An Example A subset of the ontology (unrolled)

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, The Planning Layer Service Composition begin compSrvc(concept, Q[...]) W := () //perform DFS starting from concept let v := concept be the currently visited node if v is a data type then W := (W, index.getData(v, Q)) else //v is a service let (p 1,..,p n ) be v’s params //recursive call on each p i W := (W, (v, compSrvc(p 1, Q ),..., compSrvc(p n, Q ))) end if return W end

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, The Planning Layer Service Composition: An Example A subset of the ontology (unrolled)

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, The Planning Layer Service Composition: An Example Ontology (unrolled) A Derived Execution PlanThis is what data registration provides

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Planning Times

Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, Conclusion Our system... ‣ proposes to unify heterogeneous metadata ‣ extracts certain metadata attributes and indexes low level data sets and services for fast access from distributed repositories ‣ automatically composes these services and data sets to answer user queries Questions - Comments? ‣ David Chiu ‣ Gagan Agrawal