3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project Meeting with LHCC Referees, March 21st 06.

Slides:



Advertisements
Similar presentations
RLS Production Services Maria Girone PPARC-LCG, CERN LCG-POOL and IT-DB Physics Services 10 th GridPP Meeting, CERN, 3 rd June What is the RLS -
Advertisements

D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.
CERN - IT Department CH-1211 Genève 23 Switzerland t Relational Databases for the LHC Computing Grid The LCG Distributed Database Deployment.
Database monitoring and service validation Dirk Duellmann CERN IT/PSS and 3D
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
LCG 3D StatusDirk Duellmann1 LCG 3D Throughput Tests Scheduled for May - extended until end of June –Use the production database clusters at tier 1 and.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
LCG Database Workshop Summary and Proposal for the First Distributed Production Phase Dirk Duellmann, CERN IT (For the LCG 3D Project :
SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
ATLAS Metrics for CCRC’08 Database Milestones WLCG CCRC'08 Post-Mortem Workshop CERN, Geneva, Switzerland June 12-13, 2008 Alexandre Vaniachine.
Databases Technologies and Distribution Techniques Dirk Duellmann, CERN HEPiX, Rome, April 4th 2006.
Workshop Summary (my impressions at least) Dirk Duellmann, CERN IT LCG Database Deployment & Persistency Workshop.
LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project CHEP 2006, 15th February, Mumbai.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
Database Administrator RAL Proposed Workshop Goals Dirk Duellmann, CERN.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
3D Workshop Outline & Goals Dirk Düllmann, CERN IT More details at
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
CERN - IT Department CH-1211 Genève 23 Switzerland t COOL Conditions Database for the LHC Experiments Development and Deployment Status Andrea.
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
Database Readiness Workshop Summary Dirk Duellmann, CERN IT For the LCG 3D project SC4 / pilot WLCG Service Workshop.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
SC4 Planning Planning for the Initial LCG Service September 2005.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
D. Duellmann - IT/DB LCG - POOL Project1 The LCG Pool Project and ROOT I/O Dirk Duellmann What is Pool? Component Breakdown Status and Plans.
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
3D Testing and Monitoring Lee Lueking LCG 3D Meeting Sept. 15, 2005.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
CERN IT Department CH-1211 Geneva 23 Switzerland t WLCG Operation Coordination Luca Canali (for IT-DB) Oracle Upgrades.
LCG 3D Project Update (given to LCG MB this Monday) Dirk Duellmann CERN IT/PSS and 3D
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
LCG Service Challenges SC2 Goals Jamie Shiers, CERN-IT-GD 24 February 2005.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Site Services and Policies Summary Dirk Düllmann, CERN IT More details at
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
Database Project Milestones (+ few status slides) Dirk Duellmann, CERN IT-PSS (
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
A quick summary and some ideas for the 2005 work plan Dirk Düllmann, CERN IT More details at
2007 Workshop on INFN ComputingRimini, 7 May 2007 Conditions Database Service implications at Tier1 and Tier2 sites Andrea Valassi (CERN IT-PSS)
CERN - IT Department CH-1211 Genève 23 Switzerland t Service Level & Responsibilities Dirk Düllmann LCG 3D Database Workshop September,
DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.
1 LCG Distributed Deployment of Databases A Project Proposal Dirk Düllmann LCG PEB 20th July 2004.
Database Requirements Updates from LHC Experiments WLCG Grid Deployment Board Meeting CERN, Geneva, Switzerland February 7, 2007 Alexandre Vaniachine (Argonne)
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 6 April 2005.
INFN Tier1/Tier2 Cloud WorkshopCNAF, 22 November 2006 Conditions Database Services How to implement the local replicas at Tier1 and Tier2 sites Andrea.
Database Readiness Workshop Summary Dirk Duellmann, CERN IT For the LCG 3D project GDB meeting, March 8th 06.
WLCG IPv6 deployment strategy
Status of Distributed Databases (LCG 3D)
Dirk Duellmann CERN IT/PSS and 3D
LCG Service Challenge: Planning and Milestones
Lee Lueking WLCG Workshop DB BoF 22 Jan. 2007
IT-DB Physics Services Planning for LHC start-up
LCG 3D Distributed Deployment of Databases
Database Services at CERN Status Update
3D Application Tests Application test proposals
Database Readiness Workshop Intro & Goals
LCG Distributed Deployment of Databases A Project Proposal
Conditions Data access using FroNTier Squid cache Server
Workshop Summary Dirk Duellmann.
LHCb Conditions Database TEG Workshop 7 November 2011 Marco Clemencic
Presentation transcript:

3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project Meeting with LHCC Referees, March 21st 06

Meeting with LHCC RefereesDirk Duellmann2 Why a LCG Database Deployment Project? LCG today provides an infrastructure for distributed access to file based data and file replication Physics applications (and grid services) require a similar services for data stored in relational databases –Several applications and services already use RDBMS –Several sites have already experience in providing RDBMS services Goals for common project as part of LCG –increase the availability and scalability of LCG and experiment components –allow applications to access data in a consistent, location independent way –allow to connect existing db services via data replication mechanisms –simplify a shared deployment and administration of this infrastructure during 24 x 7 operation Scope set by PEB – Online - Offline - Tier sites

Meeting with LHCC RefereesDirk Duellmann3 3D Participants and Responsibilities LCG 3D is a joint project between –Service users: experiments and grid s/w projects –Service providers: LCG tier sites including CERN Project itself has (as all projects) limited resources (2 FTE) –Mainly coordinating requirement discussions, testbed and production configuration, setup and support –Rely on experiments/projects to define and validate their application function and requirements –Rely on sites for local implementation and deployment of testbed and production setup

Meeting with LHCC RefereesDirk Duellmann4 DB Readiness Workshop - Feb 6th Readiness of the production services at T0/T1 –status reports from tier 0 and tier 1 sites –technical problems with the proposed setup (RAC clusters)? –open questions from sites to experiments? Readiness of experiment (and grid) database applications –Application list, code release, data model and deployment schedule –Successful validation at T0 and (if required T1)? –Any new deployment problems seen by experiment users which need a service change Review site/experiment milestones from the database project plan –(Re-)align with other work plans - eg experiment challenges, SC4

Meeting with LHCC Referees Dirk Duellmann5 Online-Offline Connection A well-documented schema was reported at the last LCG3D Workshop Artwork by Richard Hawkings Slide : A. Vaniachine

Meeting with LHCC RefereesDirk Duellmann6

Meeting with LHCC RefereesDirk Duellmann7

Meeting with LHCC Referees Dirk Duellmann8 Offline FroNTier Resources/Deployment Tier-0: 2-3 Redundant FroNTier servers. Tier-1: 2-3 Redundant Squid servers. Tier-N: 1-2 Squid Servers. Typical Squid server requirements: –CPU/MEM/DISK/NIC=1GHz/1 GB/100GB/Gbit –Network: visible to Worker LAN (private network) and WAN (internet) –Firewall: Two Ports open for URI (FroNTier Launchpad) access and SNMP monitoring (typically 8000 and 3401 respectively) Squid non-requirements –Special hardware (although high-throughput Disk I/O is good) –Cache backup (if disk dies or is corrupted, start from scratch and reload automatically) Squid is easy to install and requires little on-going administration. Squid(s) Tomcat(s) Squid DB Squid Tier 0 Tier 1 Tier N FroNTier Launchpad http JDBC Slide : Lee Lueking

Meeting with LHCC RefereesDirk Duellmann9 Experiment Architectures Commonalities & Differences Focus on common part of the experiment plans –Oracle database setup online (all), offline(all), tier 1(ATLAS/LHCb) –Mysql is also mentioned for T0 and T1: as addition, not as replacement –Distribution online-offline ATLAS, CMS, LHCb: Oracle streams Alice: collect / transfer data via Alice software –Distribution tier 0 - tier 1: ATLAS, LHCb: Oracle streams CMS: Frontier (ATLAS interested) Alice: File transfer and conditions lookup via file catalog –Tier 1 - Tier 2 ATLAS: based on MySQL/SQLight files Alice/CMS/LHCb: no tier 2 database service (apart from grid services)

Meeting with LHCC RefereesDirk Duellmann10 LCG 3D Service Architecture T2 - local db cache -subset data -only local service M O O O M T1- db back bone - all data replicated - reliable service T0 - autonomous reliable service Oracle Streams http cache (SQUID) Cross DB copy & MySQL/SQLight Files O Online DB -autonomous reliable service F S S SS R/O Access at Tier 1/2 (at least initially)

Meeting with LHCC RefereesDirk Duellmann11 LCG 3D Replication Testbed Since last summer - databases at ASCC, CERN, CNAF, GridKA, (FNAL), IN2P3, RAL Many replication test in progress –Offline->T1: COOL/Streams ATLAS : Stefan Stonjek (CERN, RAL, Oxford?) COOL/Streams LHCb : Marco Clemencic (CERN, RAL, GridKA?) POOL/FroNtier CMS : Lee Lueking (CERN and several t1/t2 sites) AMGA/Streams: Birger Koblitz (CERN->CERN) AMI/Streams : Solveig Albrandt (IN2P3->CERN) LFC/Streams: workplan proposed, starting with IN2P3 VOMS/Streams: workplan proposed, starting with CNAF –Online->offline: Conditions/Streams CMS : Saima Iqbal (functional testing) COOL/Streams ATLAS : (Gancho Dimitrov) Server setup, networking config with pit network LHCb : planning with LHCb online Coordination between sites/experiments during weekly 3D meetings –Status: successful functional test - ramping up volume/load –Need experiment involvement to define target scale

Meeting with LHCC RefereesDirk Duellmann12 Last Year - Distribution Technology Studies & Results Frontier has been integrated into LCG software framework –POOL plug-in in used by CMS –workplan for COOL/Frontier by ATLAS (with ATLAS manpower) Frontier/POOL in 3D testbed has been successfully used by CMS from several tier 1 and tier 2 sites –Squid caches were indeed quick to setup and easy to deploy Streams functionality has been confirmed by a number tests –With COOL in ATLAS and LHCb LHCb: streams functionality shown, streams overhead can be neglected wrt to intrinsic (COOL client) performance –With POOL in CMS Calibration data written online can be streamed to offline database and can be picked up as C++ objects through POOL/ORA (direct DB connection) or POOL/Frontier (squid cached connection)

Meeting with LHCC RefereesDirk Duellmann13 Distribution Technologies - Move to Production Deployment Moved on from functionality to production deployment & scalability –Finalize technology report for database service, streams and frontier Further decoupling between Tier 0 and Tier 1 sites –Streams data capture now done on a separate machine based on log files - no impact of eg WAN problems on T0 database Further decoupling among Tier 1 sites –Prototyping with Oracle decoupled queues for functional sites (changes leave CERN capture machine with little latency) and problem sites (change queue kept on disk for few days) T0 Frontier servers - high availability / scaling –3-Node production setup with DNS load balancing and client failover –Backend: of 4-node experiment database cluster

Meeting with LHCC RefereesDirk Duellmann14 Building Block for Tier 0/1 - Oracle Database Clusters Two+ dual-CPU nodes Shared storage (eg FC SAN) Scale CPU and I/O ops (independently) Transparent failover and s/w patches LHC database services are deployed on RAC All 3D production sites agreed to setup a RAC cluster

Meeting with LHCC RefereesDirk Duellmann15 CERN Hardware evolution for 2006 Current State ALICEATLASCMSLHCbGrid3DNon-LHCValidation - 2-node offline 2-node -- 2x2-node 2-node online test Pilot on disk server Proposed structure in Q node4-node 4--node2-node2-node (PDB replacement) 2-node valid/test 2-node pilot Compass?? Online? Linear ramp-up budgeted for hardware resources in Planning next major service extension for Q3 this year Slide : Maria Girone

Meeting with LHCC Referees Dirk Duellmann16 Frontier Production Configuration at Tier 0 Squid runs in http-accelerator mode (as a reverse proxy server) Slide : Luis Ramos

Meeting with LHCC RefereesDirk Duellmann17 LCG Database Deployment Plan After October ‘05 workshop a database deployment plan has been presented to LCG GDB and MB – Two production phases April - Sept ‘06 : partial production service –Production service (parallel to existing testbed) –H/W requirements defined by experiments/projects –Based on Oracle 10gR2 –Subset of LCG tier 1 sites: ASCC, CERN, BNL, CNAF, GridKA, IN2P3, RAL October ‘06- onwards : full production service –Adjusted h/w requirements (defined at summer ‘06 workshop) –Other tier 1 sites joined in: PIC, NIKHEF, NDG, TRIUMF

Meeting with LHCC RefereesDirk Duellmann18 Tier 1 Hardware Setup Propose to setup for April deployment phase –2/3 dual-cpu database nodes with 2GB or more Setup as RAC cluster (preferably) per experiment ATLAS: 3 nodes with 300GB storage (after mirroring) LHCb: 2 nodes with 100GB storage (after mirroring) Shared storage (eg FibreChannel) proposed to allow for clustering –2-3 dual-cpu Squid nodes with 1GB or more Squid s/w packaged by CMS will be provided by 3D 100GB storage per node Need to clarify service responsibility (DB or admin team?) Target s/w release: Oracle 10gR2 –RedHat Enterprise Server to insure Oracle support Production setups for Castor and Grid Services will be required in addition –Schedule setup consolidation into SC4 workplan

Meeting with LHCC RefereesDirk Duellmann19 Tier 1 Progress Sites largely on schedule for a service start end of March –h/w either installed already (BNL, CNAF, IN2P3) or expect delivery of order shortly (GridKA, RAL) –Some problems with Oracle Clusters technology encountered and solved! –Active participation from sites - DBA community building up First DBA meeting focusing on RAC installation, setup and monitoring hosted by Rutherford scheduled for second half of March Need to involve remaining Tier 1 sites now! –Established contact to PIC, NIKHEF/SARA, NSG, TRIUMF to follow workshops, and meetings Next work shop 23rd of March hosted by RAL –Focus: finalizing DB Server and monitoring setup at T0 and T1

Meeting with LHCC RefereesDirk Duellmann20 Progress on Open Issues Open Issues –X.509(proxy) certificates - supported by Oracle? Investigating fallback solutions Closed –s/w and support licenses for Tier 1 Close to formal agreement –Instant client distribution within LCG Agreed with Oracle –Frontier apps server support During initial phase (March-Sept) CMS proposed to support tomcat/frontier/squid setup

Meeting with LHCC RefereesDirk Duellmann21 Databases in Middleware & Castor Took place already for services used in SC3 –Existing setups and experience with SC workloads LFC, FTS - Tier 0 and above –Low volume, but high availability requirements –CERN: Run on 2-node Oracle cluster; outside single box Oracle or MySQL CASTOR 2 - CERN and some T1 sites –Need to understand scaling to LHC rates Need to consolidate middleware databases with (larger) experiment database setups

Meeting with LHCC RefereesDirk Duellmann22 LCG Software Progress LCG Applications now based on CORAL –Includes re-try and failover required for reliable db service DB lookup based on XML based list of databases Prototyping integration with LFC with CAT team (India) –POOL includes production version FroNTier plug-in Concrete caching policies –S/w on schedule for 2006 deployment LCG s/w expected to be stable by end of February for distributed deployment as part of SC4 or experiment challenges Caveats: –COOL still has important functionality items on the development plan for this year –Schema changes will need careful planning for COOL and FroNTier

Meeting with LHCC RefereesDirk Duellmann23 Experiment Applications Status Conditions - Driving the database service size at T0 and T1 –EventTAGs (may become significant - need replication tests and concrete experiment deployment models) Framework integration and DB workload generators exist –Functionality tested in various COOL and POOL/FroNTier tests –T0 performance and replication tests (T0->T1) look ok Conditions: Online -> Offline replication starting now –CMS and ATLAS are executing online test plans Progress in defining concrete conditions data models –CMS showed most complete picture (for Magnet Test) –Still some uncertainty about volumes, numbers of clients

Meeting with LHCC RefereesDirk Duellmann24 Summary Significant progress in all areas of the project Production Schedule defined Phase 1 - end of March: ASCC, BNL, CERN, CNAF, IN2P3, RAL –Full deployment - end of September: PIC, NIKHEF, NDG, TRIUMF DB clusters deployed at Tier 0 and Tier 1 –Distribution setup common components of experiment plans –Oracle service (online/T0/T1), Streams(online/T0/T1), Frontier(T1/T2), MySQL/SQLight(T2+) Setup progressing on schedule at tier 0 and 1 sites Application moved to performance testing –First larger scale conditions replication tests with promising results for streams and frontier technologies Main risk: remaining uncertainty on conditions payload

Meeting with LHCC RefereesDirk Duellmann25 Milestones / Schedules Project Milestones –3D Replication Technology Writeup - May ‘06 Test responsible (based on individual test docs) –Database Service Definition - June ‘06 Site responsible (based on LCG TDR document) –Backup/Recovery Strategy for T0 & T1 - August ‘06 –Database Lookup Service (LFC based) - August ‘06 Experiment Deployment Plans –Concrete Conditions Data Models for main detectors defined (eg the detectors accounting for 80% in volume/access) –Conditions deployed at Tier 1s –Conditions replicated between Online and Offline

Meeting with LHCC RefereesDirk Duellmann26 Conclusions There is little reason to believe that a distributed database service will move into stable production any quicker than any of the other grid services Should start now with larger scale production operation to resolve the unavoidable deployment issues Need the cooperation of experiments (and sites) to make sure that concrete requests can be quickly validated against the pre-production service