Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.

Similar presentations


Presentation on theme: "Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling."— Presentation transcript:

1 Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling Ad Hoc Queries over Low-Level Scientific Data Sets

2 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 2 Presentation Outline Motivation ‣ Current Trends in Scientific Data Management ‣ Problem Discussion Data Registration Indexing ‣ Metadata Extraction ‣ Transformation Service Composition Conclusion

3 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 3 Increased tremendously over the years Scientific Data Sets The collection of scientific data has increased over the years with new instruments, simulations, etc. Data sets are stored in repositories around the globe Just within U.S. entities in the geospatial domain ‣ NOAA: oceanic, climate, water quality,... ‣ NASA: ozone, air quality, tropical,... ‣ NRCS: land quality, watershed,...

4 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 4 Increased tremendously over the years Scientific Data Sets The collection of scientific data has increased over the years with new instruments, simulations, etc. Data sets are stored in repositories around the globe Just within U.S. entities in the geospatial domain ‣ NOAA: oceanic, climate, water quality,... ‣ NASA: ozone, air quality, tropical,... ‣ NRCS: land quality, watershed,...

5 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 5 Increased tremendously over the years Scientific Data Sets The collection of scientific data has increased over the years with new instruments, simulations, etc. Data sets are stored in repositories around the globe Just within U.S. entities in the geospatial domain ‣ NOAA: oceanic, climate, water quality,... ‣ NASA: ozone, air quality, tropical,... ‣ NRCS: land quality, watershed,...

6 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 6 Data Repositories Web or Data Grid Infrastructure Mass Storage Systems (MSS)

7 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 7 Scientific Data Sets Data sets are typically low level, i.e., ‣ Unstructured or semi-structured 0101071895 0.34 -2.45 0.50 -0.65 -0.62 -0.71 0.00 -0.96 0101071896 -1.71 0.49 0.27 -0.79 -1.53 0.60 0.09 -2.21 0101071897 -0.53 0.14 4.32 1.95 -1.55 -1.68 -1.32 -0.69 0101071898 1.90 -2.64 -1.70 1.11 -2.18 -1.08 -0.53 -0.25 0101071899 0.44 0.97 1.65 -0.71 -2.02 -2.10 -0.50 -2.03 0101071900 -1.65 1.19 -1.34 0.57 -1.37 7.00 -0.48 -1.77. However, data is well-documented ‣ Accompanying XML-based metadata describing data sets is typically required in today’s repositories

8 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 8 Data Repositories Mass Storage Systems (MSS) Grid/Web Services & portals Web or Data Grid Infrastructure

9 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 9 Data Repositories in the Global Scale USEU AU...

10 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 10 What Do the Users Want? US EU AU... I don’t care where data is located. I also want to share my own data with others! Don’t just give me the data, but... - Transform it - Manipulate it - Compose it with other processes and data sets And do this with the least amount of work required from me!

11 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 11 System Goals To enable queries over low level data sets, which involves: ‣ identification of relevant data sets ‣ automatic planning for the composition of dependent services (processes) for derivation... while being non-intrusive to existing schemes, i.e., ‣ avoids a standardized format for storing data sets ‣ accommodates heterogeneous metadata ‣ this system should - fit - into existing MSS and scientific computing infrastructures (Data Grid & the Web)

12 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 12 That’s good and all, but... Challenges Not without challenges... ‣ dealing with metadata from multiple entities ‣ efficiently identifying relevant data sets ‣ planning and executing accurate service compositions on the spot

13 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 13 That’s good and all, but... Challenges Not without challenges... ‣ dealing with metadata from multiple entities ‣ efficiently identifying relevant data sets ‣ planning and executing accurate service compositions on the spot DOMAIN KNOWLEDGE & SEMANTICS And without question, the need for

14 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 14 The AUSPICE System AUSPICE: Automatic Service Planning and Execution in Cloud/Grid Environments

15 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 15 The Semantics Layer A Need for Domain Level Knowledge Assume the following service retrieves a satellite image pertaining to (x,y) with resolution respective to r Questions to ask the system: ‣ How to deduce that this service can be used? ‣ How to determine what information is needed for input? ‣ Did the user provide enough information to invoke this service? get_sat_image(double x, double y, double r) inputsTo longitudelatitudegrid_size outputsTo satellite image

16 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 16 In the Semantics Layer Applying Domain Information Domain concepts can be derived from executing a service Domain concepts can also be derived from retrieving an existing data set Service parameters represent different domain concepts

17 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 17 Data Registration Service Indexing Data Sets Handling heterogeneous metadata For instance, just within the geospatial domain, CountryMetadata Standards USCSDGM AU, NZANZLIC EU??? CDN???...

18 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 18 Data Registration Service Indexing Data Sets Handling heterogeneous metadata

19 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 19 Data Registration Service Indexing Data Sets Metadata Transformation... (transform to spatial index)

20 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 20 Data Registration Service Indexing Data Sets Metadata to DB transformations... insert

21 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 21 Data Registration Service Indexing Data Sets

22 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 22 Data Registration Service Indexing Data Sets

23 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 23 Data Registration Service Indexing Data Sets

24 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 24 In the Semantics Layer Applying Domain Information Data registration simplifies identification process within

25 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 25 Indexing Services Services (inputs, outputs) are also registered in much the same way

26 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 26 The Planning Layer Service Composition: An Example A subset of the ontology (unrolled)

27 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 27 The Planning Layer Service Composition begin compSrvc(concept, Q[...]) W := () //perform DFS starting from concept let v := concept be the currently visited node if v is a data type then W := (W, index.getData(v, Q)) else //v is a service let (p 1,..,p n ) be v’s params //recursive call on each p i W := (W, (v, compSrvc(p 1, Q ),..., compSrvc(p n, Q ))) end if return W end

28 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 28 The Planning Layer Service Composition: An Example Ontology (unrolled) A Derived Execution PlanThis is what data registration provides

29 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 29 Planning Times

30 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 30 Conclusion The AUSPICE System... ‣ unifies heterogeneous metadata ‣ extracts certain metadata attributes and indexes low level data sets and services for fast access from distributed repositories ‣ automatically composes these services and data sets to answer user queries Questions - Comments? ‣ David Chiu chiud@cse.ohio-state.educhiud@cse.ohio-state.edu ‣ Gagan Agrawal agrawal@cse.ohio-state.eduagrawal@cse.ohio-state.edu

31 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 31 System Overview

32 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 32 Supporting High Level Queries In supporting high level queries, recall our ontology for modeling domain semantics Entire system is domain-concept-driven So, we should decompose queries into concepts first

33 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 33 Supporting High Level Queries

34 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 34 Original Query: ‣ “return water level from station=32125 on 10/31/2008” The elements of our query have been parsed against the ontology Supporting High Level Queries

35 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 35 Proposed System Overview

36 D. Chiu & G. Agrawal. Enabling Ad Hoc Queries over Low-Level Scientific Data Sets SSDBM ’09 36 The Planning Layer Service Composition: An Example


Download ppt "Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling."

Similar presentations


Ads by Google