Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu and Gagan Agrawal Enabling.

Similar presentations


Presentation on theme: "Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu and Gagan Agrawal Enabling."— Presentation transcript:

1 Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu and Gagan Agrawal Enabling Ad Hoc Queries over Low-Level Scientific Data Sets

2 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 2 Increased tremendously over the years Scientific Data Sets The collection of scientific data has increased over the years with new instruments, simulations, etc. Data sets are stored in repositories around the globe Just within U.S. entities in the geospatial domain ‣ NOAA: oceanic, climate, water quality,... ‣ NASA: ozone, air quality, tropical,... ‣ NRCS: land quality, watershed,...

3 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 3 Increased tremendously over the years Scientific Data Sets The collection of scientific data has increased over the years with new instruments, simulations, etc. Data sets are stored in repositories around the globe Just within U.S. entities in the geospatial domain ‣ NOAA: oceanic, climate, water quality,... ‣ NASA: ozone, air quality, tropical,... ‣ NRCS: land quality, watershed,...

4 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 4 Increased tremendously over the years Scientific Data Sets The collection of scientific data has increased over the years with new instruments, simulations, etc. Data sets are stored in repositories around the globe Just within U.S. entities in the geospatial domain ‣ NOAA: oceanic, climate, water quality,... ‣ NASA: ozone, air quality, tropical,... ‣ NRCS: land quality, watershed,...

5 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 5 Data Repositories Web or Data Grid Infrastructure Mass Storage Systems (MSS)

6 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 6 Scientific Data Sets Data sets are typically low level, i.e., ‣ Unstructured or semi-structured 0101071895 0.34 -2.45 0.50 -0.65 -0.62 -0.71 0.00 -0.96 0101071896 -1.71 0.49 0.27 -0.79 -1.53 0.60 0.09 -2.21 0101071897 -0.53 0.14 4.32 1.95 -1.55 -1.68 -1.32 -0.69 0101071898 1.90 -2.64 -1.70 1.11 -2.18 -1.08 -0.53 -0.25 0101071899 0.44 0.97 1.65 -0.71 -2.02 -2.10 -0.50 -2.03 0101071900 -1.65 1.19 -1.34 0.57 -1.37 7.00 -0.48 -1.77. However, data is well-documented ‣ Accompanying XML-based metadata describing data sets is typically required in today’s repositories

7 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 7 Data Repositories Mass Storage Systems (MSS) Grid/Web Services & portals Web or Data Grid Infrastructure

8 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 8 Data Repositories in the Global Scale USEU AU...

9 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 9 What Do the Users Want? US EU AU... High level query... - Keywords - Natural language Don’t just give me the data, but... - Transform it - Manipulate it - Compose it with other processes and data sets And do this with the least amount of work required from me!

10 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 10 System Goals To enable queries over low level data sets, which involves: ‣ identification of relevant data sets ‣ automatic planning for the composition of dependent services (processes) for derivation... while being non-intrusive to existing schemes, i.e., ‣ avoids a standardized format for storing data sets ‣ accommodates heterogeneous metadata ‣ this system should - fit - into existing MSS and scientific computing infrastructures (Data Grid & the Web)

11 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 11 That’s good and all, but... Challenges Not without challenges... ‣ supporting high level user queries ‣ dealing with metadata from multiple entities ‣ efficiently identifying relevant data sets ‣ planning and executing accurate service compositions on the spot

12 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 12 That’s good and all, but... Challenges Not without challenges... ‣ supporting high level user queries ‣ dealing with metadata from multiple entities ‣ efficiently identifying relevant data sets ‣ planning and executing accurate service compositions on the spot DOMAIN KNOWLEDGE & SEMANTICS And without question, the need for

13 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 13 Proposed System Overview

14 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 14 The Semantics Layer A Need for Domain Level Knowledge Assume the following service retrieves a satellite image pertaining to (x,y) with resolution respective to r Questions to ask the system: ‣ How to deduce that this service can be used? ‣ How to determine what information is needed for input? ‣ Did the user provide enough information to invoke this service? get_sat_image(double x, double y, double r) inputsTo longitudelatitudegrid_size outputsTo satellite image

15 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 15 In the Semantics Layer Applying Domain Information Domain concepts can be derived from executing a service Domain concepts can also be derived from retrieving an existing data set Service parameters represent different domain concepts

16 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 16 Data Registration Service Indexing Data Sets Handling heterogeneous metadata For instance, just within the geospatial domain, CountryMetadata Standards USCSDGM AU, NZANZLIC EU??? CDN???...

17 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 17 Data Registration Service Indexing Data Sets Handling heterogeneous metadata

18 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 18 Data Registration Service Indexing Data Sets Metadata to DB transformations... (transform to spatial index)

19 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 19 Data Registration Service Indexing Data Sets Metadata to DB transformations... insert

20 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 20 Data Registration Service Indexing Data Sets

21 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 21 Data Registration Service Indexing Data Sets

22 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 22 Data Registration Service Indexing Data Sets

23 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 23 Indexing Services Services (inputs, outputs) are also registered in much the same way

24 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 24 System Overview

25 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 25 Supporting High Level Queries In supporting high level queries, recall our ontology for modeling domain semantics Entire system is domain-concept-driven So, we should decompose queries into concepts first

26 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 26 Supporting High Level Queries

27 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 27 Original Query: ‣ “return water level from station=32125 on 10/31/2008” The elements of our query have been parsed against the ontology Supporting High Level Queries

28 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 28 Proposed System Overview

29 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 29 The Planning Layer Service Composition: An Example

30 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 30 In the Semantics Layer Applying Domain Information Domain concepts can be derived from executing a service Domain concepts can also be derived from retrieving an existing data set Service parameters represent different domain concepts

31 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 31 The Planning Layer Service Composition: An Example A subset of the ontology (unrolled)

32 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 32 The Planning Layer Service Composition begin compSrvc(concept, Q[...]) W := () //perform DFS starting from concept let v := concept be the currently visited node if v is a data type then W := (W, index.getData(v, Q)) else //v is a service let (p 1,..,p n ) be v’s params //recursive call on each p i W := (W, (v, compSrvc(p 1, Q ),..., compSrvc(p n, Q ))) end if return W end

33 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 33 The Planning Layer Service Composition: An Example A subset of the ontology (unrolled)

34 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 34 The Planning Layer Service Composition: An Example Ontology (unrolled) A Derived Execution PlanThis is what data registration provides

35 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 35 Planning Times

36 Enabling Ad Hoc Queries over Low Level Scientific Data SetsSSDBM ’09: New Orleans, LA. Jun 2-4, 2009 36 Conclusion Our system... ‣ proposes to unify heterogeneous metadata ‣ extracts certain metadata attributes and indexes low level data sets and services for fast access from distributed repositories ‣ automatically composes these services and data sets to answer user queries Questions - Comments? ‣ David Chiu chiud@cse.ohio-state.educhiud@cse.ohio-state.edu ‣ Gagan Agrawal agrawal@cse.ohio-state.eduagrawal@cse.ohio-state.edu


Download ppt "Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu and Gagan Agrawal Enabling."

Similar presentations


Ads by Google