Database Readiness Workshop Summary Dirk Duellmann, CERN IT For the LCG 3D project SC4 / pilot WLCG Service Workshop.

Slides:



Advertisements
Similar presentations
D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.
Advertisements

CERN - IT Department CH-1211 Genève 23 Switzerland t Relational Databases for the LHC Computing Grid The LCG Distributed Database Deployment.
BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.
Database monitoring and service validation Dirk Duellmann CERN IT/PSS and 3D
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
LCG 3D StatusDirk Duellmann1 LCG 3D Throughput Tests Scheduled for May - extended until end of June –Use the production database clusters at tier 1 and.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
LCG Database Workshop Summary and Proposal for the First Distributed Production Phase Dirk Duellmann, CERN IT (For the LCG 3D Project :
SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.
RLS Tier-1 Deployment James Casey, PPARC-LCG Fellow, CERN 10 th GridPP Meeting, CERN, 3 rd June 2004.
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
ATLAS Metrics for CCRC’08 Database Milestones WLCG CCRC'08 Post-Mortem Workshop CERN, Geneva, Switzerland June 12-13, 2008 Alexandre Vaniachine.
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
Databases Technologies and Distribution Techniques Dirk Duellmann, CERN HEPiX, Rome, April 4th 2006.
ATLAS Scalability Tests of Tier-1 Database Replicas WLCG Collaboration Workshop (Tier0/Tier1/Tier2) Victoria, British Columbia, Canada September 1-2, 2007.
Workshop Summary (my impressions at least) Dirk Duellmann, CERN IT LCG Database Deployment & Persistency Workshop.
LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project CHEP 2006, 15th February, Mumbai.
ATLAS Database Operations Invited talk at the XXI International Symposium on Nuclear Electronics & Computing Varna, Bulgaria, September 2007 Alexandre.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
Database Administrator RAL Proposed Workshop Goals Dirk Duellmann, CERN.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
3D Workshop Outline & Goals Dirk Düllmann, CERN IT More details at
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
CERN - IT Department CH-1211 Genève 23 Switzerland t COOL Conditions Database for the LHC Experiments Development and Deployment Status Andrea.
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project Meeting with LHCC Referees, March 21st 06.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
CERN IT Department CH-1211 Geneva 23 Switzerland t WLCG Operation Coordination Luca Canali (for IT-DB) Oracle Upgrades.
LCG 3D Project Update (given to LCG MB this Monday) Dirk Duellmann CERN IT/PSS and 3D
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Site Services and Policies Summary Dirk Düllmann, CERN IT More details at
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
Database Project Milestones (+ few status slides) Dirk Duellmann, CERN IT-PSS (
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
A quick summary and some ideas for the 2005 work plan Dirk Düllmann, CERN IT More details at
Storage & Database Team Activity Report INFN CNAF,
CERN - IT Department CH-1211 Genève 23 Switzerland t Service Level & Responsibilities Dirk Düllmann LCG 3D Database Workshop September,
Database Requirements Updates from LHC Experiments WLCG Grid Deployment Board Meeting CERN, Geneva, Switzerland February 7, 2007 Alexandre Vaniachine (Argonne)
Dario Barberis: ATLAS DB S&C Week – 3 December Oracle/Frontier and CondDB Consolidation Dario Barberis Genoa University/INFN.
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 6 April 2005.
INFN Tier1/Tier2 Cloud WorkshopCNAF, 22 November 2006 Conditions Database Services How to implement the local replicas at Tier1 and Tier2 sites Andrea.
Database Readiness Workshop Summary Dirk Duellmann, CERN IT For the LCG 3D project GDB meeting, March 8th 06.
WLCG IPv6 deployment strategy
Status of Distributed Databases (LCG 3D)
Dirk Duellmann CERN IT/PSS and 3D
IT-DB Physics Services Planning for LHC start-up
LCG 3D Distributed Deployment of Databases
Database Services at CERN Status Update
3D Application Tests Application test proposals
Elizabeth Gallas - Oxford ADC Weekly September 13, 2011
Database Readiness Workshop Intro & Goals
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Conditions Data access using FroNTier Squid cache Server
Workshop Summary Dirk Duellmann.
3D Project Status Report
LHC Data Analysis using a worldwide computing grid
The LHCb Computing Data Challenge DC06
Presentation transcript:

Database Readiness Workshop Summary Dirk Duellmann, CERN IT For the LCG 3D project SC4 / pilot WLCG Service Workshop 11th February, Mumbai

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann2 Why a LCG Database Deployment Project? LCG today provides an infrastructure for distributed access to file based data and file replication Physics applications (and grid services) require a similar services for data stored in relational databases –Several applications and services already use RDBMS –Several sites have already experience in providing RDBMS services Goals for common project as part of LCG –increase the availability and scalability of LCG and experiment components –allow applications to access data in a consistent, location independent way –allow to connect existing db services via data replication mechanisms –simplify a shared deployment and administration of this infrastructure during 24 x 7 operation Scope set by PEB – Online - Offline - Tier sites

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann3 3D Participants and Responsibilities LCG 3D is a joint project between –Service users: experiments and grid s/w projects –Service providers: LCG tier sites including CERN Project itself has (as all projects) limited resources (2 FTE) –Mainly coordinating requirement discussions, testbed and production configuration, setup and support –Rely on experiments/projects to define and validate their application function and requirements –Rely on sites for local implementation and deployment of testbed and production setup

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann4 LCG 3D Service Architecture T2 - local db cache -subset data -only local service M O O O M T1- db back bone - all data replicated - reliable service T0 - autonomous reliable service Oracle Streams http cache (SQUID) Cross DB copy & MySQL/SQLight Files O Online DB -autonomous reliable service -Pit or CERN CC F S S SS

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann5 Online-Offline Connection A well-documented schema was reported at the last LCG3D Workshop Artwork by Richard Hawkings Slide : A. Vaniachine

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann6

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann7

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann8 LCG Database Deployment Plan After October ‘05 workshop a database deployment plan has been presented to LCG GDB and MB – Two production phases March - Sept ‘06 : partial production service –Production service (parallel to existing testbed) –H/W requirements defined by experiments/projects –Based on Oracle 10gR2 –Subset of LCG tier 1 sites: ASCC, CERN, BNL, CNAF, GridKA, IN2P3, RAL Sept ‘06- onwards : full production service –Adjusted h/w requirements (defined at summer ‘06 workshop) –Other tier 1 sites joined in: PIC, NIKHEF, NDG, TRIUMF

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann9 Proposed Tier 1 Hardware Setup Propose to setup for first 6 month –2/3 dual-cpu database nodes with 2GB or more Setup as RAC cluster (preferably) per experiment ATLAS: 3 nodes with 300GB storage (after mirroring) LHCb: 2 nodes with 100GB storage (after mirroring) Shared storage (eg FibreChannel) proposed to allow for clustering –2-3 dual-cpu Squid nodes with 1GB or more Squid s/w packaged by CMS will be provided by 3D 100GB storage per node Need to clarify service responsibility (DB or admin team?) Target s/w release: Oracle 10gR2 –RedHat Enterprise Server to insure Oracle support

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann10 DB Readiness Workshop last Monday Readiness of the production services at T0/T1 –status reports from tier 0 and tier 1 sites –technical problems with the proposed setup (RAC clusters)? –open questions from sites to experiments? Readiness of experiment (and grid) database applications –Application list, code release, data model and deployment schedule –Successful validation at T0 and (if required T1)? –Any new deployment problems seen by experiment users which need a service change Review site/experiment milestones from the database project plan –(Re-)align with other work plans - eg experiment challenges, SC4

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann11 T0 Database Service Evolution Until summer 2005 –Solaris based shared Physics DB cluster (2-nodes for HA) Low CPU power, hard to extend, shared by all experiments –(many) linux disk servers as DB servers High maintenance load, no resource sharing, no redundancy Now consolidation on extensible database clusters –No sharing across experiments –Higher quality building blocks Midrange PCs (RedHat ES) FibreChannel attached disk arrays As of last month - all LHC services moved Slide : Maria Girone

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann12 Service Throttling - Resource Usage Reports Run into degraded service after single remote user submitted many (idle) jobs –Defined account profile for larger apps Db accounts are shared among many users –Switched on idle session “sniping” (default = 3h idle time) Producing weekly resource overviews to experiment database coordinator –Allow experiment to prioritize resources and identify unexpected usage patterns –Which jobs/users got affected by what limit? Slide : Maria Girone

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann13 CERN Hardware evolution for 2006 Current State ALICEATLASCMSLHCbGrid3DNon-LHCValidation - 2-node offline 2-node -- 2x2-node 2-node online test Pilot on disk server Proposed structure in Q node4-node 4--node2-node2-node (PDB replacement) 2-node valid/test 2-node pilot Compass?? Online? Linear ramp-up budgeted for hardware resources in Planning next major service extension for Q3 this year Slide : Maria Girone

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann14 CERN RAC Expansion for Q2 New mid-range servers received and installed –Passed acceptance tests by IT-FIO Waiting for additional disk-arrays and fibre channel switches –Expect delivery end of February Planning the setting up in collaboration with IT-FIO Proceed in two steps –February: Extension of existing RACs with additional CPUs Cabling work for fibre channel and IP networks has started –March: Creation of new RACs eg dedicated experiment validation servers after disk-arrays and switches arrived Slide : Maria Girone

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann15 Tier 0 preps Database Service extension going according to schedule –Resource prioritization and extension planning needs experiment involvement (and real deployment experience) –Significant lead time for h/w orders - need experiment / project requests early! Also Streams and Frontier setups proceeding well –New downstream capture proposal under test - seems promising to avoid some couplings observed in the test bed during site problems Need production setup for Database Monitoring (Oracle Grid Control 10gR2) –Tier 1s may use another local grid control instance –Two agents reporting into common 3D and local Grid Control

Gordon D. Brown e-Science, RAL SC4 / pilot WLCG Service Workshop, 11th February, Mumbai 16 SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann 16 3D Database Hardware Structure

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann17

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann18 Tier 1 Progress Sites largely on schedule for a service start end of March –h/w either installed already (BNL, CNAF, IN2P3) or expect delivery of order shortly (GridKA, RAL) –Some problems with Oracle Clusters technology encountered and solved! –Active participation from sites - DBA community building up First DBA meeting focusing on RAC installation, setup and monitoring hosted by Rutherford scheduled for second half of March Need to involve remaining Tier 1 sites now –Establishing contact to PIC, NIKHEF, NSG, TRIUMF to follow workshops, and meetings

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann19 Service Issues Oracle Issues –X.509(proxy) certificates - will they be supported by Oracle? –s/w and support licenses for Tier 1 –instant client distribution within LCG –With commercial Oracle contact (IT-DES group) and IT license officer Application Server support During initial phase (March-Sept) CMS proposed to support tomcat/frontier/squid setup Will discuss other experiments requirements

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann20 Databases in Middleware & Castor Took place already for services used in SC3 –Existing setups at the sites –Existing experience with SC workloads -> extrapolate to real production LFC, FTS - Tier 0 and above –Low volume, but high availability requirements –CERN: Run on 2-node Oracle cluster; outside single box Oracle or MySQL CASTOR 2 - CERN and some T1 sites –Need to understand scaling up to LHC production rates –CERN: Run on 3 Oracle servers Currently not driving the requirements for the database service Need to consolidate databases configs and procedures –may reduce effort/diversity at CERN and Tier 1 sites

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann21 LCG Application s/w Status COOL and POOL have released versions based on CORAL –Includes re-try and failover required for reliable db service use These features need be tested for experiment –POOL includes production version FroNTier plug-in Control of SQUID caching may still be required to implement more realistic caching policies –These releases (or bug fixes) are target for 2006 deployment LCG s/w expected to be stable by end of February for distributed deployment as part of SC4 or experiment challenges Caveats: –COOL still has important functionality items on the development plan for this year –Conditions schema stability will need careful planning for COOL and FroNTier

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann22 Experiment Applications Status Conditions - Driving the database service size at T0 and T1 –EventTAGs (may become significant - need replication tests and concrete experiment deployment models) Framework integration and DB workload generators exist –successfully tested in various COOL and POOL/FroNTier tests –T0 performance and replication tests (T0->T1) looks ok Conditions: Online -> Offline replication only starting now –May need additional emphasis for online tests to avoid surprises –CMS and ATLAS are executing online test plans Progress in defining concrete conditions data models –CMS showed most complete picture (for Magnet Test) –Still quite some uncertainty about volumes, numbers of clients

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann23 Test Status : 3D testbed Replication test in progress –Offline->T1: COOL ATLAS : Stefan Stonjek (CERN, RAL, Oxford) COOL LHCb : Marco Clemencic (CERN, RAL, GridKA?) FroNtier CMS : Lee Lueking (CERN and several t1/t2 sites) ARDA AMGA: Birger Koblitz (CERN->CERN) AMI : Solveig Albrandt (IN2P3->CERN) –Online->offline: CMS Conditions : Saima Iqbal (functional testing) ATLAS : (Gancho Dimitrov) Server setup, networking config with pit network LHCb : planning with LHCb online Coordination during weekly 3D meetings –Status: successful functional test - ramping up volume/load

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann24

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann25

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann26 Summary Database Production Service and Schedule defined (unchanged since GDB/MB approval) –Phase 1 - end of March: ASCC, BNL, CERN, CNAF, IN2P3, RAL –Full deployment - end of September: PIC, NIKHEF, NDG, TRIUMF Consolidation with grid service oracle setups Setup progressing on schedule at tier 0 and 1 sites Application performance tests progressing First larger scale conditions replication tests with promising results for streams and frontier technologies Concrete conditions data models still missing for key detectors

SC4 / pilot WLCG Service Workshop, 11th February, Mumbai Dirk Duellmann27 “My Conclusions” There is little reason to believe that a distributed database service will move into stable production any quicker than any of the other grid services Should start now with larger scale production operation to resolve the unavoidable deployment issues Need the cooperation of experiments and sites to make sure that concrete requests can be quickly validated against a concrete distributed service