Download presentation
Presentation is loading. Please wait.
Published byIsaac Bates Modified over 10 years ago
1
Lehrstuhl Informatik III: Datenbanksysteme Grid-based Data Stream Processing in e-Science 1 Richard Kuntschke 1, Tobias Scholl 1, Sebastian Huber 1, Alfons Kemper 1, Angelika Reiser 1, Hans-Martin Adorf 2, Gerard Lemson 3, and Wolfgang Voges 3 1 Lehrstuhl Informatik III: 1 Datenbanksysteme 1 Fakultät für Informatik 1 Technische Universität München 2 Max-Planck-Institut 2 für Astrophysik 3 Max-Planck-Institut 3 für extraterrestrische 3 Physik
2
Lehrstuhl Informatik III: Datenbanksysteme 2 Grid-based Data Stream Processing in e-Science Important Challenges in e-Science In general: Large and exponentially growing amounts of data Distributed data archives No unique identifiers Uncertainty In astrophysics: Spectral Energy Distributions (SEDs) Used to classify celestial objects (active galactic nuclei, brown dwarfs, neutron stars,...) Generation requires spatial (astrometric) matching
3
Lehrstuhl Informatik III: Datenbanksysteme 3 Grid-based Data Stream Processing in e-Science Spatial (Astrometric) Matching Current solutions … … load all data into main memory Uses a lot of memory Infeasible if memory size is insufficient … process all data at once and deliver the complete result at the end Inefficient No results until all processing has completed
4
Lehrstuhl Informatik III: Datenbanksysteme 4 Grid-based Data Stream Processing in e-Science Our Contributions StarGlobe Grid-based P2P Data Stream Management System implemented on top of Globus In-network processing Early filtering Parallelization Pipelining Load-balancing Mobile user-defined operators Astrophysical Example Workflow Astrometric matching Performance evaluation
5
Lehrstuhl Informatik III: Datenbanksysteme 5 Grid-based Data Stream Processing in e-Science The StarGlobe Architecture filter transform Load mobile operators Fct-Provider filter transform Stream 1 Publish Query 2 Subscribe
6
Lehrstuhl Informatik III: Datenbanksysteme 6 Grid-based Data Stream Processing in e-Science Traditional Approach: Bring Data to Code union NN_10 Data-Prov. B Data-Prov. CData-Prov. D Data-Prov. A
7
Lehrstuhl Informatik III: Datenbanksysteme 7 Grid-based Data Stream Processing in e-Science New Approach: Bring Code to Data Data-Prov. A scan NN_10 Data-Prov. B scan NN_10 Data-Prov. C scan NN_10 Data-Prov. D scan NN_10 union NN_10 Fct-Provider NN_10
8
Lehrstuhl Informatik III: Datenbanksysteme 8 Grid-based Data Stream Processing in e-Science Mobile User-Defined Operators Load user-defined operators from function provider servers in the network Common interface for integrating external operators Push-based iterator Flexibility
9
Lehrstuhl Informatik III: Datenbanksysteme 9 Grid-based Data Stream Processing in e-Science StreamIterator Interface open(Config, StreamWriter) Configuration parameters Writer for result stream next(StreamIteratorEvent) Next element in input stream Writing output to result stream using StreamWriter.write() close()
10
Lehrstuhl Informatik III: Datenbanksysteme 10 Grid-based Data Stream Processing in e-Science Communication between StreamProcessor and StreamIterator
11
Lehrstuhl Informatik III: Datenbanksysteme 11 Grid-based Data Stream Processing in e-Science Astrophysical Example Workflow
12
Lehrstuhl Informatik III: Datenbanksysteme 12 Grid-based Data Stream Processing in e-Science Distributed Query Evaluation Plan
13
Lehrstuhl Informatik III: Datenbanksysteme 13 Grid-based Data Stream Processing in e-Science Distributed Query Evaluation Plan
14
Lehrstuhl Informatik III: Datenbanksysteme 14 Grid-based Data Stream Processing in e-Science Distributed Query Evaluation Plan
15
Lehrstuhl Informatik III: Datenbanksysteme 15 Grid-based Data Stream Processing in e-Science Evaluation of Early Filtering
16
Lehrstuhl Informatik III: Datenbanksysteme 16 Grid-based Data Stream Processing in e-Science Conclusion Synergies between research in computer science and other scientific disciplines, e.g., astrophysics StarGlobe Handling large data volumes efficiently Early filtering, parallelization, pipelining Returning first results early on Pipelining Flexible support of domain-specific application logic Mobile user-defined operators Results also applicable to other domains
17
Lehrstuhl Informatik III: Datenbanksysteme 17 Grid-based Data Stream Processing in e-Science The SED Scenario 1.Catalog query 2.Spatial (astrometric) matching 3.Assembly of raw photometry 4.Photometric transformation 5.SED classification
18
Lehrstuhl Informatik III: Datenbanksysteme 18 Grid-based Data Stream Processing in e-Science Peer Architecture Built on top of Open Grid Services Architecture (OGSA) Peers implemented as communicating grid services Availability of services according to capabilities of peers FluX query engine for subscription evaluation
19
Lehrstuhl Informatik III: Datenbanksysteme 19 Grid-based Data Stream Processing in e-Science StreamProcessor Input stream Output stream write(StreamIteratorEvent) StreamIteratorEvent Communication between StreamProcessor and StreamIterator StreamIterator StreamIteratorEvent open(Config,Writer) photon StreamIteratorEvent next(StreamIteratorEvent) photon StreamIteratorEvent next(StreamIteratorEvent) StreamIteratorEvent
20
Lehrstuhl Informatik III: Datenbanksysteme 20 Grid-based Data Stream Processing in e-Science Sequence Diagram
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.