UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario.

Slides:



Advertisements
Similar presentations
WP2: Data Management Gavin McCance University of Glasgow November 5, 2001.
Advertisements

WP2: Data Management Gavin McCance University of Glasgow.
The Quantum Chromodynamics Grid James Perry, Andrew Jackson, Matthew Egbert, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
CMS Grid Batch Analysis Framework
ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
UK Tony Doyle - University of Glasgow title.open ( ); revolution {execute}; LHC Computing Challenge Methodology? Hierarchical Information in a Global Grid.
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
LHCb Computing Activities in UK Current activities UK GRID activities RICH s/w activities.
Database System Concepts and Architecture
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
2/10/2000 CHEP2000 Padova Italy The BaBar Online Databases George Zioulas SLAC For the BaBar Computing Group.
LHCb Software Meeting Glenn Patrick1 First Ideas on Distributed Analysis for LHCb LHCb Software Week CERN, 28th March 2001 Glenn Patrick (RAL)
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
LHCb DataModel Nick Brook Glenn Patrick University of Bristol Rutherford Lab Motivation DataModel Options Future plans.
BaBar Grid Computing Eleonora Luppi INFN and University of Ferrara - Italy.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
Nick Brook Current status Future Collaboration Plans Future UK plans.
File and Object Replication in Data Grids Chin-Yi Tsai.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
Tony Doyle & Gavin McCance - University of Glasgow ATLAS MetaData AMI and Spitfire: Starting Point.
ALICE, ATLAS, CMS & LHCb joint workshop on
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Grid Glasgow Outline LHC Computing at a Glance Glasgow Starting Point LHC Computing Challenge CPU Intensive Applications Timeline ScotGRID.
ATLAS Data Challenges US ATLAS Physics & Computing ANL October 30th 2001 Gilbert Poulard CERN EP-ATC.
CLRC and the European DataGrid Middleware Information and Monitoring Services The current information service is built on the hierarchical database OpenLDAP.
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
…building the next IT revolution From Web to Grid…
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
Grid Glasgow Outline LHC Computing at a Glance Glasgow Starting Point LHC Computing Challenge CPU Intensive Applications Timeline ScotGRID.
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
A New Tool For Measuring Detector Performance in ATLAS ● Arno Straessner – TU Dresden Matthias Schott – CERN on behalf of the ATLAS Collaboration Computing.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
6 march Building the INFN Grid Proposal outline a.ghiselli,l.luminari,m.sgaravatto,c.vistoli INFN Grid meeting, milano.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang.
- GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,
Event-Based Infrastructure for Reconciling Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey C. Fox.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
January 20, 2000K. Sliwa/ Tufts University DOE/NSF ATLAS Review 1 SIMULATION OF DAILY ACTIVITITIES AT REGIONAL CENTERS MONARC Collaboration Alexander Nazarenko.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
L. Perini DATAGRID WP8 Use-cases 19 Dec ATLAS short term grid use-cases The “production” activities foreseen till mid-2001 and the tools to be used.
VI/ CERN Dec 4 CMS Software Architecture vs Hybrid Store Vincenzo Innocente CMS Week CERN, Dec
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
ATLAS Physics Analysis Framework James R. Catmore Lancaster University.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
CERN IT Department CH-1211 Genève 23 Switzerland t Load testing & benchmarks on Oracle RAC Romain Basset – IT PSS DP.
WP2: Data Management Gavin McCance University of Glasgow.
Magda Distributed Data Manager Torre Wenaus BNL October 2001.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
The Holmes Platform and Applications
Database Replication and Monitoring
Moving the LHCb Monte Carlo production system to the GRID
ALICE analysis preservation
UK GridPP Tier-1/A Centre at CLRC
US ATLAS Physics & Computing
Development of LHCb Computing Model F Harris
Presentation transcript:

UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario GRID Data Management GRID Data Management Service Graph Service Graph Development Tools Development Tools Unified Modelling Language Compiler Efficiency Database Access Benchmark System Monitoring Prototype From Files b PEvent b PEventObj Vector b Pevent Obj b PsiDetector b PSiDigit b PMDT _Detector b PMDT _Digit b Pcalo Region b Pcalo Digit b Ptruth Vertex b Ptruth Track To Objects

UK Tony Doyle - University of Glasgow Physics Analysis ESD: Data or Monte Carlo Event Tags Event Selection Analysis Object Data AOD Analysis Object Data AOD Calibration Data Analysis, Skims Raw Data Tier 0,1 Collaboration wide Tier 2 Analysis Groups Tier 3, 4 Physicists Physics Analysis Physics Objects Physics Objects Physics Objects INCREASING DATA FLOWINCREASING DATA FLOW

UK Tony Doyle - University of Glasgow Data Hierarchy RAW, ESD, AOD, TAG RAW Recorded by DAQ Triggered events Detector digitisation ~2 MB/event ESD Pseudo-physical information: Clusters, track candidates (electrons, muons), etc. Reconstructedinformation ~100 kB/event AOD Physical information: Transverse momentum, Association of particles, jets, (best) id of particles, Physical info for relevant objects Selectedinformation ~10 kB/event TAG Analysisinformation ~1 kB/event Relevant information for fast event selection

UK Tony Doyle - University of Glasgow GRID Services Grid Services Grid Services Resource Discovery Scheduling Security Monitoring Data Access Policy Athena/Gaudi Services Athena/Gaudi Services Application manager Job Options service Event persistency service Detector persistency Histogram service User interfaces Visualization Database Database Event model Object federations Extensible interfaces and protocols being specified and developed: Tools: 1. UML 2. Java Protocols:1. XML 2. MySQL DataGRID Toolkit 3. LDAP }

UK Tony Doyle - University of Glasgow Virtual Data Scenario Example analysis scenario: Example analysis scenario: Physicist issues a query from Athena for a Monte Carlo dataset Issues: How expressive is this query? What is the nature of the query: declarative Creating new queries and language Algorithms are already available in local shared libraries An Athena service consults an ATLAS Virtual Data Catalog Consider possibilities: Consider possibilities: TAG file exists on local machine (e.g. Glasgow) Analyze it ESD file exists in a remote store (e.g. Edinburgh) Access relevant event files, then analyze that RAW File no longer exists (e.g. RAL) Regenerate, re-reconstruct, re-analyze !!! GRID Data Management

UK Tony Doyle - University of Glasgow GRID Data Management Goal: develop middle-ware infrastructure to manage petabyte-scale data Service levels reasonably well defined Identify Key Areas Within Software Structure UK

UK Tony Doyle - University of Glasgow 5 areas for development 5 areas for development Data Accessor - hides specific storage system requirements. Mass Storage Management group. Replication - improves access by wide-area caching. Globus toolkit offers sockets and a communication library, Nexus. Meta Data Management - data catalogues, monitoring information (e.g. access pattern), grid configuration information, policies. MySQL over Lightweight Directory Access Protocol (LDAP) being investigated. Security - ensuring consistent levels of security for data and meta data. Query optimisation - cost minimisation based on response time and throughput Monitoring Services group. Identifiable UK Contributions RAL Identifying Key Areas RAL

UK Tony Doyle - University of Glasgow 4 tasks defined in current UK WP2 4 tasks defined in current UK WP2 Service Discovery - locate grid services (Wolfgang Hoschek, Gavin McCance +...) SQL Database Service - store, query and retrieve metadata (Wolfgang Hoschek, Gavin McCance +...) Query Optimisation - cost model (Kurt Stockinger +…) Data Mining - semi-automatic discovery of events patterns, associations and anomalies: Grid metadata and HEP applications UK + CERN = UK++ Identifying Key Areas UK

UK Tony Doyle - University of Glasgow Service Graph sds.cern.ch sds.anl.gov sds.infn.itsds.ral.uk sds.padova-infn.it sds.trieste-infn.it sds.bologna-infn.it Optimisation? - combine all info on nodes from e.g. ScotGRID locally and advertise via Globus All nodes Grid Aware Allowed? Hierarchical Model

UK Tony Doyle - University of Glasgow Unified Modelling Language Standard method to define the architecture = UML Standard tool = TogetherSoft? Free for academic use. Runs under linux. I tried to generate an import/export module for MySQL under linux by copying the db2.config file and replacing the various column types by the ones that are available in MySQL. This works apart from the fact that the primary key generation fails and a schema is generated (which MySQL doesn't support). The Access97 type of primary key generation is fine for MySQL. I have seen that Access uses a specialized DB import/export class. How can I generate one for MySQL? DB Driver for MySQL under linux? Determine correct tools by testing..

UK Tony Doyle - University of Glasgow Compiler Efficiency Numerically intensive simulations: Minimal input and output data ATLAS Monte Carlo (gg H bb) 228 sec/3.5 Mb event on 800 MHz linux box CompilerSpeed (MFlops) Fortran (g77) 27 C (gcc)43 Java (jdk)41 Compiler Tests: LINPACK Industry Standard Compilers +OO Methods

UK Tony Doyle - University of Glasgow System Monitoring Prototype Tools: 1. Linux Kernel Info = /proc/stat 2. Enquire = Java client-server 3. Histograms = Java Analysis Studio 4. TCP/IP = Local WAN Instantaneous CPU Usage Scalable Architecture Individual Node Info.

UK Tony Doyle - University of Glasgow Industrial Partnership ping service ping monitor WAN LAN Adoption of OPEN Industry Standards +OO Methods Industry Research Council Monitoring Tools Exist Standard?

UK Tony Doyle - University of Glasgow System Monitoring Prototype Input from /proc/stat Instantaneous CPU, disk, memory Individual Node Info. is input to single Grid node user nice system idle cpu disk disk_rio disk_wio disk_rblk disk_wblk page swap intr ctxt btime processes Combined Info into e.g. distributed MySQL database Why start here? Need well-understood simple system to start tests and calibrate commercially available solutions.

UK Tony Doyle - University of Glasgow e.g. MySQL database daemon Basic 'crash-me' and associated tests Access times for basic insert, modify, delete, update database operations e.g. (on 256Mbyte, 800MHz Red Hat 6.2 linux box) Database Access Benchmark 350k data insert operations149 seconds 10k query operations97 seconds 350k data insert operations149 seconds 10k query operations97 seconds Many applications require database functionality Currently favoured HEP DataBase application e.g. BaBar, ZEUS software

UK Tony Doyle - University of Glasgow WP2 - Open Issues Many… Early Days Many… Early Days Working Standards? Working Standards? Scope Of UK Contribution Scope Of UK Contribution Service Discovery SQL Database Service Query Optimisation Data Mining Development Tools? Development Tools? UML TogetherSoft Database MySQL GDMP System Monitoring Standard Grid-Enabled Files Objects.. Input/Contributions welcome…. Input/Contributions welcome…. From Files b GridEvent b Obj Vector b GridEvent Obj b Grid Network b Network Digit b Grid CPU b Digit b Grid Disk b Digit b Grid Memory b Digit To Objects Teamwork