Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lehrstuhl Informatik III: Datenbanksysteme Grid-based Data Stream Processing in e-Science 1 Richard Kuntschke 1, Tobias Scholl 1, Sebastian Huber 1, Alfons.

Similar presentations


Presentation on theme: "Lehrstuhl Informatik III: Datenbanksysteme Grid-based Data Stream Processing in e-Science 1 Richard Kuntschke 1, Tobias Scholl 1, Sebastian Huber 1, Alfons."— Presentation transcript:

1 Lehrstuhl Informatik III: Datenbanksysteme Grid-based Data Stream Processing in e-Science 1 Richard Kuntschke 1, Tobias Scholl 1, Sebastian Huber 1, Alfons Kemper 1, Angelika Reiser 1, Hans-Martin Adorf 2, Gerard Lemson 3, and Wolfgang Voges 3 1 Lehrstuhl Informatik III: 1 Datenbanksysteme 1 Fakultät für Informatik 1 Technische Universität München 2 Max-Planck-Institut 2 für Astrophysik 3 Max-Planck-Institut 3 für extraterrestrische 3 Physik

2 Lehrstuhl Informatik III: Datenbanksysteme 2 Grid-based Data Stream Processing in e-Science Important Challenges in e-Science In general: Large and exponentially growing amounts of data Distributed data archives No unique identifiers Uncertainty In astrophysics: Spectral Energy Distributions (SEDs) Used to classify celestial objects (active galactic nuclei, brown dwarfs, neutron stars,...) Generation requires spatial (astrometric) matching

3 Lehrstuhl Informatik III: Datenbanksysteme 3 Grid-based Data Stream Processing in e-Science Spatial (Astrometric) Matching Current solutions … … load all data into main memory Uses a lot of memory Infeasible if memory size is insufficient … process all data at once and deliver the complete result at the end Inefficient No results until all processing has completed

4 Lehrstuhl Informatik III: Datenbanksysteme 4 Grid-based Data Stream Processing in e-Science Our Contributions StarGlobe Grid-based P2P Data Stream Management System implemented on top of Globus In-network processing Early filtering Parallelization Pipelining Load-balancing Mobile user-defined operators Astrophysical Example Workflow Astrometric matching Performance evaluation

5 Lehrstuhl Informatik III: Datenbanksysteme 5 Grid-based Data Stream Processing in e-Science The StarGlobe Architecture filter transform Load mobile operators Fct-Provider filter transform Stream 1 Publish Query 2 Subscribe

6 Lehrstuhl Informatik III: Datenbanksysteme 6 Grid-based Data Stream Processing in e-Science Traditional Approach: Bring Data to Code union NN_10 Data-Prov. B Data-Prov. CData-Prov. D Data-Prov. A

7 Lehrstuhl Informatik III: Datenbanksysteme 7 Grid-based Data Stream Processing in e-Science New Approach: Bring Code to Data Data-Prov. A scan NN_10 Data-Prov. B scan NN_10 Data-Prov. C scan NN_10 Data-Prov. D scan NN_10 union NN_10 Fct-Provider NN_10

8 Lehrstuhl Informatik III: Datenbanksysteme 8 Grid-based Data Stream Processing in e-Science Mobile User-Defined Operators Load user-defined operators from function provider servers in the network Common interface for integrating external operators Push-based iterator Flexibility

9 Lehrstuhl Informatik III: Datenbanksysteme 9 Grid-based Data Stream Processing in e-Science StreamIterator Interface open(Config, StreamWriter) Configuration parameters Writer for result stream next(StreamIteratorEvent) Next element in input stream Writing output to result stream using StreamWriter.write() close()

10 Lehrstuhl Informatik III: Datenbanksysteme 10 Grid-based Data Stream Processing in e-Science Communication between StreamProcessor and StreamIterator

11 Lehrstuhl Informatik III: Datenbanksysteme 11 Grid-based Data Stream Processing in e-Science Astrophysical Example Workflow

12 Lehrstuhl Informatik III: Datenbanksysteme 12 Grid-based Data Stream Processing in e-Science Distributed Query Evaluation Plan

13 Lehrstuhl Informatik III: Datenbanksysteme 13 Grid-based Data Stream Processing in e-Science Distributed Query Evaluation Plan

14 Lehrstuhl Informatik III: Datenbanksysteme 14 Grid-based Data Stream Processing in e-Science Distributed Query Evaluation Plan

15 Lehrstuhl Informatik III: Datenbanksysteme 15 Grid-based Data Stream Processing in e-Science Evaluation of Early Filtering

16 Lehrstuhl Informatik III: Datenbanksysteme 16 Grid-based Data Stream Processing in e-Science Conclusion Synergies between research in computer science and other scientific disciplines, e.g., astrophysics StarGlobe Handling large data volumes efficiently Early filtering, parallelization, pipelining Returning first results early on Pipelining Flexible support of domain-specific application logic Mobile user-defined operators Results also applicable to other domains

17 Lehrstuhl Informatik III: Datenbanksysteme 17 Grid-based Data Stream Processing in e-Science The SED Scenario 1.Catalog query 2.Spatial (astrometric) matching 3.Assembly of raw photometry 4.Photometric transformation 5.SED classification

18 Lehrstuhl Informatik III: Datenbanksysteme 18 Grid-based Data Stream Processing in e-Science Peer Architecture Built on top of Open Grid Services Architecture (OGSA) Peers implemented as communicating grid services Availability of services according to capabilities of peers FluX query engine for subscription evaluation

19 Lehrstuhl Informatik III: Datenbanksysteme 19 Grid-based Data Stream Processing in e-Science StreamProcessor Input stream Output stream write(StreamIteratorEvent) StreamIteratorEvent Communication between StreamProcessor and StreamIterator StreamIterator StreamIteratorEvent open(Config,Writer) photon StreamIteratorEvent next(StreamIteratorEvent) photon StreamIteratorEvent next(StreamIteratorEvent) StreamIteratorEvent

20 Lehrstuhl Informatik III: Datenbanksysteme 20 Grid-based Data Stream Processing in e-Science Sequence Diagram


Download ppt "Lehrstuhl Informatik III: Datenbanksysteme Grid-based Data Stream Processing in e-Science 1 Richard Kuntschke 1, Tobias Scholl 1, Sebastian Huber 1, Alfons."

Similar presentations


Ads by Google