 CMS data challenges. The nature of the problem.  What is GMA ?  And what is R-GMA ?  Performance test description  Performance test results  Conclusions.

Slides:



Advertisements
Similar presentations
21 Sep 2005LCG's R-GMA Applications R-GMA and LCG Steve Fisher & Antony Wilson.
Advertisements

WP2: Data Management Gavin McCance University of Glasgow.
IEEE NSS 2003 Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what.
CMS Grid Batch Analysis Framework
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
McFarm: first attempt to build a practical, large scale distributed HEP computing cluster using Globus technology Anand Balasubramanian Karthik Gopalratnam.
June 22-23, 2005 Technology Infusion Team Committee1 High Performance Parallel Lucene search (for an OAI federation) K. Maly, and M. Zubair Department.
G O B E Y O N D C O N V E N T I O N WORF: Developing DB2 UDB based Web Services on a Websphere Application Server Kris Van Thillo, ABIS Training & Consulting.
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
31 January 2007Craig E. Ward1 Large-Scale Simulation Experimentation and Analysis Database Programming Using Java.
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University22/9/2003 CMS Applications Progress towards GridPP milestones Data management.
JGMA: A Reference Implementation of the Grid Monitoring Architecture Mat Grove Distributed Systems Group University of Portsmouth
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
WP3 RGMA Deployment Laurence Field / RAL Steve Fisher / RAL.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Use of R-GMA in BOSS Henry Nebrensky (Brunel University) VRVS 26 April 2004 Some slides stolen from various talks at EDG 2 nd Review (
Introduction on R-GMA Shi Jingyan Computing Center IHEP.
Computer and Automation Research Institute Hungarian Academy of Sciences Presentation and Analysis of Grid Performance Data Norbert Podhorszki and Peter.
CDF Grid Status Stefan Stonjek 05-Jul th GridPP meeting / Durham.
Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what is R-GMA ? Performance.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
SZTAKI in DataGrid 2003 What to do this year. Topics ● Application monitoring (GRM) ● Analysis and Presentation (Pulse) ● Performance of R-GMA.
Real Time Monitor of Grid Job Executions Janusz Martyniak Imperial College London.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Application code Registry 1 Alignment of R-GMA with developments in the Open Grid Services Architecture (OGSA) is advancing. The existing Servlets and.
Grid infrastructure analysis with a simple flow model Andrey Demichev, Alexander Kryukov, Lev Shamardin, Grigory Shpiz Scobeltsyn Institute of Nuclear.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Tracking your tasks with Task Monitoring PAT eLearning – Module 11 Edward.
ALICE, ATLAS, CMS & LHCb joint workshop on
13 May 2004EB/TB Middleware meeting Use of R-GMA in BOSS for CMS Peter Hobson & Henry Nebrensky Brunel University, UK Some slides stolen from various talks.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)
WP3 Information and Monitoring Steve Fisher / RAL 23/9/2003.
Ch 14 QQ T F 1.A database table consists of fields and records. T F 2.Good data validation techniques can help improve data integrity. T F 3.An index is.
An information and monitoring system for static and dynamic information about grid resources, applications, networks … RDBMS Servlet aware of API during.
CLRC and the European DataGrid Middleware Information and Monitoring Services The current information service is built on the hierarchical database OpenLDAP.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
EGEE is a project funded by the European Union under contract IST R-GMA: Production Services for Information and Monitoring in the Grid John.
Distributed database system
DATABASE CONNECTIVITY TO MYSQL. Introduction =>A real life application needs to manipulate data stored in a Database. =>A database is a collection of.
WP3 RGMA Deployment Laurence Field / RAL Steve Fisher / RAL.
Grid Deployment Enabling Grids for E-sciencE BDII 2171 LDAP 2172 LDAP 2173 LDAP 2170 Port Fwd Update DB & Modify DB 2170 Port.
Website: Answering Continuous Queries Using Views Over Data Streams Alasdair J G Gray Werner.
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
The impact of R-GMA (upon WP1 and WP4). EDG (Paris) 6 Mar James MagowanImpact of R-GMA Grid Monitoring Architecture (GMA) We use it not only for.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.
Java Programming: Advanced Topics 1 Enterprise JavaBeans Chapter 14.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
EGEE is a project funded by the European Union under contract IST Information and Monitoring Services within a Grid R-GMA (Relational Grid.
The DCS Databases Peter Chochula. 31/05/2005Peter Chochula 2 Outline PVSS basics (boring topic but useful if one wants to understand the DCS data flow)
AMGA-Bookkeeping Carmine Cioffi Department of Physics, Oxford University UK Metadata Workshop Oxford, 05 July 2006.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
CERN 21 January 2005Piotr Nyczyk, CERN1 R-GMA Basics and key concepts Monitoring framework for computing Grids – developed by EGEE-JRA1-UK, currently used.
In the Name Of Almighty Allah. Java Application Connection To Mysql Created by Hasibullah (Sahibzada) Kabul Computer Science Faculty Afghanistan.
CMS Production Management Software Julia Andreeva CERN CHEP conference 2004.
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Outline Introduction and motivation, The architecture of Tycho,
Real Time Fake Analysis at PIC
Performance of the Relational Grid Monitoring Architecture (R-GMA)
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Scalability Tests With CMS, Boss and R-GMA
RELATIONAL GRID MONITORING ARCHITECHTURE
Grid Systems: What do we need from web service standards?
Presentation transcript:

 CMS data challenges. The nature of the problem.  What is GMA ?  And what is R-GMA ?  Performance test description  Performance test results  Conclusions

 As part of the preparations for data taking CMS is performing DATA CHALLENGES.  Large number of simulated events to optimise detectors and prepare software  Enormous processing requirements BUT each event is independent of all the others each event can be generated on a machine without any interaction with any other

Work split between farms. How to handle the book-keeping ? a data-base automatically updated Implemented via a job wrapper BOSS Output to and is intercepted and the information is recorded in a mySQL production database. Event generation and job accounting decoupled

Database Machine Submission Machine UI Worker Node (WN) WN

Database Machine Submission Machine UI

Producer Consumer Registry (Directory services) register producer locate producer address of producer data Ask for data

Developed for E(uropean) D(ata) G(rid) Extends the GMA in two important ways 1.Introduces a time stamp on the data. 2.A relational implementation 3.Hides the registry behind the API Can be used for information and monitoring Each V irtual O rganisation appears to have one RDBMS

The user interface to R-GMA is via SQL statements (not all SQL statements and structures are supported) Information is advertised via a table create Information is published via insert Information is read via select … from table The first read request registers the consumer as interested in this data. Relational queries are supported NOTE : sql is the interface – it should not be supposed an actual database lies behind it.

R-GMA can be dropped into the framework with very little disruption 1. Set up calls for mySQL are replaced by those for R-GMA producers 2. An archiver (joint consumer/producer) runs on a single machine which collects the data from all the running jobs and writes it to a local database (and possible republishes it). The data can then be queried either by direct mySQL calls or via R-GMA consumer (a distributed database has been created)

Database BOSS LAN Connection R-GMA WAN Connection

 The architecture of GMA clearly provides a putative solution to the wide area monitoring problem. BUT Does a specific implementation provide a practical solution Before entrusting CMS production to R-GMA, we must be confident that it will perform. What load will it fail at and why ?

35 chars.

Multi-threaded job each thread produces messages. Length 35 chars, suitable distribution. Threads starting time distribution can be altered. One machine delivers the R-GMA load of a farm. R-GMA servlet R-GMA consumer

One machine per grid cluster providing loads of greater than the cluster R-GMA consumer R-GMA servlet R-GMA servlet R-GMA servlet R-GMA servlet

R-GMA can survive loads of around 20% of the current CMS requirements and does provides a grid method for monitoring. An overload of a factor 2 jobs causes problems after about five minutes running. We believe these instabilities are soluble. When production starts in earnest we will compare reality with our model.