Presentation is loading. Please wait.

Presentation is loading. Please wait.

UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario.

Similar presentations


Presentation on theme: "UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario."— Presentation transcript:

1 UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario GRID Data Management GRID Data Management Service Graph Service Graph Development Tools Development Tools Unified Modelling Language Compiler Efficiency Database Access Benchmark System Monitoring Prototype From Files b PEvent b PEventObj Vector b Pevent Obj b PsiDetector b PSiDigit b PMDT _Detector b PMDT _Digit b Pcalo Region b Pcalo Digit b Ptruth Vertex b Ptruth Track To Objects

2 UK Tony Doyle - University of Glasgow Physics Analysis ESD: Data or Monte Carlo Event Tags Event Selection Analysis Object Data AOD Analysis Object Data AOD Calibration Data Analysis, Skims Raw Data Tier 0,1 Collaboration wide Tier 2 Analysis Groups Tier 3, 4 Physicists Physics Analysis Physics Objects Physics Objects Physics Objects INCREASING DATA FLOWINCREASING DATA FLOW

3 UK Tony Doyle - University of Glasgow Data Hierarchy RAW, ESD, AOD, TAG RAW Recorded by DAQ Triggered events Detector digitisation ~2 MB/event ESD Pseudo-physical information: Clusters, track candidates (electrons, muons), etc. Reconstructedinformation ~100 kB/event AOD Physical information: Transverse momentum, Association of particles, jets, (best) id of particles, Physical info for relevant objects Selectedinformation ~10 kB/event TAG Analysisinformation ~1 kB/event Relevant information for fast event selection

4 UK Tony Doyle - University of Glasgow GRID Services Grid Services Grid Services Resource Discovery Scheduling Security Monitoring Data Access Policy Athena/Gaudi Services Athena/Gaudi Services Application manager Job Options service Event persistency service Detector persistency Histogram service User interfaces Visualization Database Database Event model Object federations Extensible interfaces and protocols being specified and developed: Tools: 1. UML 2. Java Protocols:1. XML 2. MySQL DataGRID Toolkit 3. LDAP }

5 UK Tony Doyle - University of Glasgow Virtual Data Scenario Example analysis scenario: Example analysis scenario: Physicist issues a query from Athena for a Monte Carlo dataset Issues: How expressive is this query? What is the nature of the query: declarative Creating new queries and language Algorithms are already available in local shared libraries An Athena service consults an ATLAS Virtual Data Catalog Consider possibilities: Consider possibilities: TAG file exists on local machine (e.g. Glasgow) Analyze it ESD file exists in a remote store (e.g. Edinburgh) Access relevant event files, then analyze that RAW File no longer exists (e.g. RAL) Regenerate, re-reconstruct, re-analyze !!! GRID Data Management

6 UK Tony Doyle - University of Glasgow GRID Data Management Goal: develop middle-ware infrastructure to manage petabyte-scale data Service levels reasonably well defined Identify Key Areas Within Software Structure UK

7 UK Tony Doyle - University of Glasgow 5 areas for development 5 areas for development Data Accessor - hides specific storage system requirements. Mass Storage Management group. Replication - improves access by wide-area caching. Globus toolkit offers sockets and a communication library, Nexus. Meta Data Management - data catalogues, monitoring information (e.g. access pattern), grid configuration information, policies. MySQL over Lightweight Directory Access Protocol (LDAP) being investigated. Security - ensuring consistent levels of security for data and meta data. Query optimisation - cost minimisation based on response time and throughput Monitoring Services group. Identifiable UK Contributions RAL Identifying Key Areas RAL

8 UK Tony Doyle - University of Glasgow 4 tasks defined in current UK WP2 4 tasks defined in current UK WP2 Service Discovery - locate grid services (Wolfgang Hoschek, Gavin McCance +...) SQL Database Service - store, query and retrieve metadata (Wolfgang Hoschek, Gavin McCance +...) Query Optimisation - cost model (Kurt Stockinger +…) Data Mining - semi-automatic discovery of events patterns, associations and anomalies: Grid metadata and HEP applications UK + CERN = UK++ Identifying Key Areas UK

9 UK Tony Doyle - University of Glasgow Service Graph sds.cern.ch sds.anl.gov sds.infn.itsds.ral.uk sds.padova-infn.it sds.trieste-infn.it sds.bologna-infn.it Optimisation? - combine all info on nodes from e.g. ScotGRID locally and advertise via Globus All nodes Grid Aware Allowed? Hierarchical Model

10 UK Tony Doyle - University of Glasgow Unified Modelling Language Standard method to define the architecture = UML Standard tool = TogetherSoft? Free for academic use. Runs under linux. I tried to generate an import/export module for MySQL under linux by copying the db2.config file and replacing the various column types by the ones that are available in MySQL. This works apart from the fact that the primary key generation fails and a schema is generated (which MySQL doesn't support). The Access97 type of primary key generation is fine for MySQL. I have seen that Access uses a specialized DB import/export class. How can I generate one for MySQL? DB Driver for MySQL under linux? Determine correct tools by testing..

11 UK Tony Doyle - University of Glasgow Compiler Efficiency Numerically intensive simulations: Minimal input and output data ATLAS Monte Carlo (gg H bb) 228 sec/3.5 Mb event on 800 MHz linux box CompilerSpeed (MFlops) Fortran (g77) 27 C (gcc)43 Java (jdk)41 Compiler Tests: LINPACK Industry Standard Compilers +OO Methods

12 UK Tony Doyle - University of Glasgow System Monitoring Prototype Tools: 1. Linux Kernel Info = /proc/stat 2. Enquire = Java client-server 3. Histograms = Java Analysis Studio 4. TCP/IP = Local WAN Instantaneous CPU Usage Scalable Architecture Individual Node Info. http://ppewww.ph.gla.ac.uk/~skilli/grid1.html

13 UK Tony Doyle - University of Glasgow Industrial Partnership ping service ping monitor WAN LAN Adoption of OPEN Industry Standards +OO Methods Industry Research Council Monitoring Tools Exist Standard?

14 UK Tony Doyle - University of Glasgow System Monitoring Prototype Input from /proc/stat Instantaneous CPU, disk, memory Individual Node Info. is input to single Grid node user nice system idle cpu 469607 1593 823764 6044637 disk 51306 0 0 0 disk_rio 11002 0 0 0 disk_wio 40304 0 0 0 disk_rblk 87872 0 0 0 disk_wblk 322378 0 0 0 page 29693 49417 swap 33 1447 intr 18916942 7339601 27941 0 2 2 0 3 0 1 0 9331361 0 869060 1 619454 729516 0 0 ctxt 62664003 btime 984922120 processes 107015 Combined Info into e.g. distributed MySQL database Why start here? Need well-understood simple system to start tests and calibrate commercially available solutions.

15 UK Tony Doyle - University of Glasgow e.g. MySQL database daemon Basic 'crash-me' and associated tests Access times for basic insert, modify, delete, update database operations e.g. (on 256Mbyte, 800MHz Red Hat 6.2 linux box) Database Access Benchmark 350k data insert operations149 seconds 10k query operations97 seconds 350k data insert operations149 seconds 10k query operations97 seconds Many applications require database functionality Currently favoured HEP DataBase application e.g. BaBar, ZEUS software

16 UK Tony Doyle - University of Glasgow WP2 - Open Issues Many… Early Days Many… Early Days Working Standards? Working Standards? Scope Of UK Contribution Scope Of UK Contribution Service Discovery SQL Database Service Query Optimisation Data Mining Development Tools? Development Tools? UML TogetherSoft Database MySQL GDMP System Monitoring Standard Grid-Enabled Files Objects.. Input/Contributions welcome…. Input/Contributions welcome…. From Files b GridEvent b Obj Vector b GridEvent Obj b Grid Network b Network Digit b Grid CPU b Digit b Grid Disk b Digit b Grid Memory b Digit To Objects Teamwork


Download ppt "UK Tony Doyle - University of Glasgow Grid Data Management Introduction Introduction Physics Analysis Data Hierarchy GRID Services Virtual Data Scenario."

Similar presentations


Ads by Google