Ruslan Fomkin and Tore Risch Uppsala DataBase Laboratory

Slides:



Advertisements
Similar presentations
Legacy code support for commercial production Grids G.Terstyanszky, T. Kiss, T. Delaitre, S. Winter School of Informatics, University.
Advertisements

17th February, 2000 by Maciej Korzeniowski (CERN-IT-IA-MI) 1 Oracle Discoverer Product Presentation  This is an ad hoc query and analysis tool for.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
CoreGRID Workpackage 5 Virtual Institute on Grid Information and Monitoring Services Authorizing Grid Resource Access and Consumption Erik Elmroth, Michał.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
UPPSALA DATABASE LABORATORY Managing Scientific Queries over Distributed Data in a Grid Environment Ruslan Fomkin.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
A tool to enable CMS Distributed Analysis
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
BaBar Grid Computing Eleonora Luppi INFN and University of Ferrara - Italy.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Interactive Job Monitor: CafMon kill CafMon tail CafMon dir CafMon log CafMon top CafMon ps LcgCAF: CDF submission portal to LCG resources Francesco Delli.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
PROGRESS: ICCS'2003 GRID SERVICE PROVIDER: How to improve flexibility of grid user interfaces? Michał Kosiedowski.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Nguyen Tuan Anh. VN-Grid: Goals  Grid middleware (focus of this presentation)  Tuan Anh  Grid applications  Hoai.
Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
EMI INFSO-RI ARC tools for revision and nightly functional tests Jozef Cernak, Marek Kocan, Eva Cernakova (P. J. Safarik University in Kosice, Kosice,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
ESG-CET Meeting, Boulder, CO, April 2008 Gateway Implementation 4/30/2008.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
Susanna Guatelli Geant4 in a Distributed Computing Environment S. Guatelli 1, P. Mendez Lorenzo 2, J. Moscicki 2, M.G. Pia 1 1. INFN Genova, Italy, 2.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Origami: Scientific Distributed Workflow in McIDAS-V Maciek Smuga-Otto, Bruce Flynn (also Bob Knuteson, Ray Garcia) SSEC.
Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
The EDG Testbed Deployment Details
Eleonora Luppi INFN and University of Ferrara - Italy
StoRM: a SRM solution for disk based storage systems
U.S. ATLAS Grid Production Experience
Workload Management System ( WMS )
POW MND section.
Middleware independent Information Service
Introduction to Grid Technology
A Messaging Infrastructure for WLCG
Interoperability & Standards
LCG middleware and LHC experiments ARDA project
Chapter 1: Introduction
DUCKS – Distributed User-mode Chirp-Knowledgeable Server
Support for ”interactive batch”
Initial job submission and monitoring efforts with JClarens
Job Application Monitoring (JAM)
Production Manager Tools (New Architecture)
Information Services Claudio Cherubino INFN Catania Bologna
The LHCb Computing Data Challenge DC06
Presentation transcript:

Framework for Querying Distributed Objects Managed by a Grid Infrastructure Ruslan Fomkin and Tore Risch Uppsala DataBase Laboratory Uppsala University, Sweden

Ruslan Fomkin and Tore Risch, UDBL Outline Introduction our project test application Grid infrastructure The Framework Status Related work Ongoing and future work VLDB DMG '05 02.08.2005 Ruslan Fomkin and Tore Risch, UDBL

Parallel Object Query System for Expensive Computations (POQSEC) Flexible, scalable, and efficient parallel distributed query processor for scientific analyses over scientific data Scientific data: complex structure in files distributed in Grids Scientific analyses can be represented as declarative queries includes numerical computations batch or long running queries Utilization of external resources of the Grid VLDB DMG '05 02.08.2005 Ruslan Fomkin and Tore Risch, UDBL

Ruslan Fomkin and Tore Risch, UDBL Software layers POQSEC provides scientific query management Grid provides computation management file management NorduGrid Middleware Application area provides computational libraries data management libraries ROOT library POQSEC ROOT NorduGrid Data Clusters VLDB DMG '05 02.08.2005 Ruslan Fomkin and Tore Risch, UDBL

High Energy Physics application Analysis of collision events for presence of Higgs bosons Data produced by ATLAS simulation software (CERN) stored in files distributed in Grid managed by ROOT library (CERN) Analysis is selection of those events that satisfy predicates containing numerical operations VLDB DMG '05 02.08.2005 Ruslan Fomkin and Tore Risch, UDBL

Query representation of the analysis General query SELECT ev FROM Event ev WHERE jetvetocut(ev) AND zvetocut(ev) AND topcut(ev) AND misseecuts(ev) AND leptoncuts(ev)AND threeleptoncut(ev); Example of predicate (cut) CREATE FUNCTION zvetocut(Event ev)-> Event AS WHERE NOTANY(oppositeLeptons(ev)) OR abs(invMass(oppositeLeptons(ev)) - zMass) >= minZMass; VLDB DMG '05 02.08.2005 Ruslan Fomkin and Tore Risch, UDBL

NorduGrid Advanced Resource Connector (NG) Middleware between user and computational resources Computational resources owners retain full control managed by Local Batch System different policies NG is yet another user Accessing resources through NG is limited difficult to predict allocation of resources job specification transferring data through files VLDB DMG '05 02.08.2005 Ruslan Fomkin and Tore Risch, UDBL

Ruslan Fomkin and Tore Risch, UDBL Related Work The Distributed Query Processing system Polar* and OGSA-DQP (UK) integrated part of a Grid infrastructure resources preallocated by a user STORM (Ohio) distributed query processor over flat files VLDB DMG '05 02.08.2005 Ruslan Fomkin and Tore Risch, UDBL

Ruslan Fomkin and Tore Risch, UDBL Related work SDSS Batch Query System, CasJobs (Microsoft Research) batch database system supporting scientific queries cluster of SQL servers where data are stored ATLAS Distributed Analysis (US ATLAS) high-level interface for ATLAS and LHC scientist analyses as snippets of programming code VLDB DMG '05 02.08.2005 Ruslan Fomkin and Tore Risch, UDBL

Ruslan Fomkin and Tore Risch, UDBL The Framework Basic tool for utilizing Grid Submission mechanism submit scientific query parallelize query to several jobs generate job scripts Babysitter submit jobs to Grid monitor execution download result Exchange mechanism deliver result objects through files VLDB DMG '05 02.08.2005 Ruslan Fomkin and Tore Risch, UDBL

Client and coordinator part Grid Client Node Query Coordinator Coordinator server Job queue POQSEC Client Grid Meta- Database Submission Database Babysitter Local Storage NG Client POQSEC client personal database with application schema ROOT wrapper Query coordinator manages executions of queries Coordinator server receives queries creates jobs Grid Meta-Database computational resources data files Submission Database received submissions created jobs Babysitter interactions with NG VLDB DMG '05 02.08.2005 Ruslan Fomkin and Tore Risch, UDBL

Ruslan Fomkin and Tore Risch, UDBL Query submission Grid Client Node Query Coordinator Coordinator server Job queue 1 POQSEC Client Grid Meta- Database Submission Database Babysitter 2 Local Storage NG Client User submits query file name selection number of jobs to parallelize CPU time for single job Coordinator server create jobs partitioning data between jobs xRSL scripts subquery scripts for execution VLDB DMG '05 02.08.2005 Ruslan Fomkin and Tore Risch, UDBL

Ruslan Fomkin and Tore Risch, UDBL Query submission Grid Client Node Query Coordinator Coordinator server Job queue 3 POQSEC Client Grid Meta- Database Submission Database Babysitter 3 Local Storage 3 NG Client Babysitter submits jobs to NG Client NG Client finds Computing Element (CE) and submits each job 4 4 CE CE NG Grid Manager NG Grid Manager VLDB DMG '05 02.08.2005 Ruslan Fomkin and Tore Risch, UDBL

Ruslan Fomkin and Tore Risch, UDBL Query execution NG Grid Manager downloads files Submits job to Local Batch System (LBS) LBS allocates CE nodes for each job according its policies and current CE load LBS starts executors (not synchronized) Executors process data and save results SE – storage element CE – computing element Executor (one per job) evaluate subquery application schema ROOT wrapper CE CE Storage SE NG Grid Manager 5 9 SE Executor wrapper Executor wrapper CE node CE node VLDB DMG '05 02.08.2005 Ruslan Fomkin and Tore Risch, UDBL

Ruslan Fomkin and Tore Risch, UDBL Query result Grid Client Node Query Coordinator Coordinator server Job queue 10 POQSEC Client Grid Meta- Database Submission Database Babysitter 10 12 Local Storage NG Client 11 Babysitter polls NG for status of jobs and update status in Submission DB When job is finished it request NG to download result User can retrieve result when all jobs are ready CE CE Storage NG Grid Manager Executor wrapper Executor wrapper CE node CE node VLDB DMG '05 02.08.2005 Ruslan Fomkin and Tore Risch, UDBL

Ruslan Fomkin and Tore Risch, UDBL Summary We provide declarative query interface for representation scientific queries parallel query execution in Grid (generating scripts) babysitter to keep track of job execution result delivering through files Importance of parallelization preliminary results show significant improvements Standalone desktop Grid, one job Grid, four jobs 3 hours 10 minutes 3 hours 45 minutes 24 minutes VLDB DMG '05 02.08.2005 Ruslan Fomkin and Tore Risch, UDBL

Ongoing and future work Estimation time of executing query probing on small samples Dealing with underestimation of execution time Automatic parallelizing queries and resource brokering adaptive based on current load and job statistics Dealing with failures in Grid VLDB DMG '05 02.08.2005 Ruslan Fomkin and Tore Risch, UDBL

Thank you! Your Questions ? VLDB DMG '05 02.08.2005 Ruslan Fomkin and Tore Risch, UDBL