Download presentation
Presentation is loading. Please wait.
Published byChristiana Dawson Modified over 6 years ago
1
Framework for Querying Distributed Objects Managed by a Grid Infrastructure
Ruslan Fomkin and Tore Risch Uppsala DataBase Laboratory Uppsala University, Sweden
2
Ruslan Fomkin and Tore Risch, UDBL
Outline Introduction our project test application Grid infrastructure The Framework Status Related work Ongoing and future work VLDB DMG ' Ruslan Fomkin and Tore Risch, UDBL
3
Parallel Object Query System for Expensive Computations (POQSEC)
Flexible, scalable, and efficient parallel distributed query processor for scientific analyses over scientific data Scientific data: complex structure in files distributed in Grids Scientific analyses can be represented as declarative queries includes numerical computations batch or long running queries Utilization of external resources of the Grid VLDB DMG ' Ruslan Fomkin and Tore Risch, UDBL
4
Ruslan Fomkin and Tore Risch, UDBL
Software layers POQSEC provides scientific query management Grid provides computation management file management NorduGrid Middleware Application area provides computational libraries data management libraries ROOT library POQSEC ROOT NorduGrid Data Clusters VLDB DMG ' Ruslan Fomkin and Tore Risch, UDBL
5
High Energy Physics application
Analysis of collision events for presence of Higgs bosons Data produced by ATLAS simulation software (CERN) stored in files distributed in Grid managed by ROOT library (CERN) Analysis is selection of those events that satisfy predicates containing numerical operations VLDB DMG ' Ruslan Fomkin and Tore Risch, UDBL
6
Query representation of the analysis
General query SELECT ev FROM Event ev WHERE jetvetocut(ev) AND zvetocut(ev) AND topcut(ev) AND misseecuts(ev) AND leptoncuts(ev)AND threeleptoncut(ev); Example of predicate (cut) CREATE FUNCTION zvetocut(Event ev)-> Event AS WHERE NOTANY(oppositeLeptons(ev)) OR abs(invMass(oppositeLeptons(ev)) - zMass) >= minZMass; VLDB DMG ' Ruslan Fomkin and Tore Risch, UDBL
7
NorduGrid Advanced Resource Connector (NG)
Middleware between user and computational resources Computational resources owners retain full control managed by Local Batch System different policies NG is yet another user Accessing resources through NG is limited difficult to predict allocation of resources job specification transferring data through files VLDB DMG ' Ruslan Fomkin and Tore Risch, UDBL
8
Ruslan Fomkin and Tore Risch, UDBL
Related Work The Distributed Query Processing system Polar* and OGSA-DQP (UK) integrated part of a Grid infrastructure resources preallocated by a user STORM (Ohio) distributed query processor over flat files VLDB DMG ' Ruslan Fomkin and Tore Risch, UDBL
9
Ruslan Fomkin and Tore Risch, UDBL
Related work SDSS Batch Query System, CasJobs (Microsoft Research) batch database system supporting scientific queries cluster of SQL servers where data are stored ATLAS Distributed Analysis (US ATLAS) high-level interface for ATLAS and LHC scientist analyses as snippets of programming code VLDB DMG ' Ruslan Fomkin and Tore Risch, UDBL
10
Ruslan Fomkin and Tore Risch, UDBL
The Framework Basic tool for utilizing Grid Submission mechanism submit scientific query parallelize query to several jobs generate job scripts Babysitter submit jobs to Grid monitor execution download result Exchange mechanism deliver result objects through files VLDB DMG ' Ruslan Fomkin and Tore Risch, UDBL
11
Client and coordinator part
Grid Client Node Query Coordinator Coordinator server Job queue POQSEC Client Grid Meta- Database Submission Database Babysitter Local Storage NG Client POQSEC client personal database with application schema ROOT wrapper Query coordinator manages executions of queries Coordinator server receives queries creates jobs Grid Meta-Database computational resources data files Submission Database received submissions created jobs Babysitter interactions with NG VLDB DMG ' Ruslan Fomkin and Tore Risch, UDBL
12
Ruslan Fomkin and Tore Risch, UDBL
Query submission Grid Client Node Query Coordinator Coordinator server Job queue 1 POQSEC Client Grid Meta- Database Submission Database Babysitter 2 Local Storage NG Client User submits query file name selection number of jobs to parallelize CPU time for single job Coordinator server create jobs partitioning data between jobs xRSL scripts subquery scripts for execution VLDB DMG ' Ruslan Fomkin and Tore Risch, UDBL
13
Ruslan Fomkin and Tore Risch, UDBL
Query submission Grid Client Node Query Coordinator Coordinator server Job queue 3 POQSEC Client Grid Meta- Database Submission Database Babysitter 3 Local Storage 3 NG Client Babysitter submits jobs to NG Client NG Client finds Computing Element (CE) and submits each job 4 4 CE CE NG Grid Manager NG Grid Manager VLDB DMG ' Ruslan Fomkin and Tore Risch, UDBL
14
Ruslan Fomkin and Tore Risch, UDBL
Query execution NG Grid Manager downloads files Submits job to Local Batch System (LBS) LBS allocates CE nodes for each job according its policies and current CE load LBS starts executors (not synchronized) Executors process data and save results SE – storage element CE – computing element Executor (one per job) evaluate subquery application schema ROOT wrapper CE CE Storage SE NG Grid Manager 5 9 SE Executor wrapper Executor wrapper CE node CE node VLDB DMG ' Ruslan Fomkin and Tore Risch, UDBL
15
Ruslan Fomkin and Tore Risch, UDBL
Query result Grid Client Node Query Coordinator Coordinator server Job queue 10 POQSEC Client Grid Meta- Database Submission Database Babysitter 10 12 Local Storage NG Client 11 Babysitter polls NG for status of jobs and update status in Submission DB When job is finished it request NG to download result User can retrieve result when all jobs are ready CE CE Storage NG Grid Manager Executor wrapper Executor wrapper CE node CE node VLDB DMG ' Ruslan Fomkin and Tore Risch, UDBL
16
Ruslan Fomkin and Tore Risch, UDBL
Summary We provide declarative query interface for representation scientific queries parallel query execution in Grid (generating scripts) babysitter to keep track of job execution result delivering through files Importance of parallelization preliminary results show significant improvements Standalone desktop Grid, one job Grid, four jobs 3 hours 10 minutes 3 hours 45 minutes 24 minutes VLDB DMG ' Ruslan Fomkin and Tore Risch, UDBL
17
Ongoing and future work
Estimation time of executing query probing on small samples Dealing with underestimation of execution time Automatic parallelizing queries and resource brokering adaptive based on current load and job statistics Dealing with failures in Grid VLDB DMG ' Ruslan Fomkin and Tore Risch, UDBL
18
Thank you! Your Questions
? VLDB DMG ' Ruslan Fomkin and Tore Risch, UDBL
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.