Databases Technologies and Distribution Techniques Dirk Duellmann, CERN HEPiX, Rome, April 4th 2006.

Slides:

Advertisements

Similar presentations

Object Persistency & Data Handling Session C - Summary Object Persistency & Data Handling Session C - Summary Dirk Duellmann.

Advertisements

ITEC474 INTRODUCTION.

Cloud Computing: Theirs, Mine and Ours Belinda G. Watkins, VP EIS - Network Computing FedEx Services March 11, 2011.

High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ

DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance

D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.

CERN - IT Department CH-1211 Genève 23 Switzerland t Relational Databases for the LHC Computing Grid The LCG Distributed Database Deployment.

CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle and Streams Diagnostics and Monitoring Eva Dafonte Pérez Florbela Tique Aires.

F Fermilab Database Experience in Run II Fermilab Run II Database Requirements Online databases are maintained at each experiment and are critical for.

Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.

BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.

Database monitoring and service validation Dirk Duellmann CERN IT/PSS and 3D

Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.

LCG 3D StatusDirk Duellmann1 LCG 3D Throughput Tests Scheduled for May - extended until end of June –Use the production database clusters at tier 1 and.

SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.

CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.

Daniela Anzellotti Alessandro De Salvo Barbara Martelli Lorenzo Rinaldi.

Workshop Summary (my impressions at least) Dirk Duellmann, CERN IT LCG Database Deployment & Persistency Workshop.

LCG 3D Project Status and Production Plans Dirk Duellmann, CERN IT On behalf of the LCG 3D project CHEP 2006, 15th February, Mumbai.

ArcGIS Server for Administrators

1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.

Database Administrator RAL Proposed Workshop Goals Dirk Duellmann, CERN.

CERN Physics Database Services and Plans Maria Girone, CERN-IT

3D Workshop Outline & Goals Dirk Düllmann, CERN IT More details at

DB Questions and Answers open session Carlos Fernando Gamboa, BNL WLCG Collaboration Workshop, CERN Geneva, April 2008.

CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.

CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.

Database Readiness Workshop Summary Dirk Duellmann, CERN IT For the LCG 3D project SC4 / pilot WLCG Service Workshop.

CERN Database Services for the LHC Computing Grid Maria Girone, CERN.

CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,

Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.

3D Testing and Monitoring Lee Lueking LCG 3D Meeting Sept. 15, 2005.

3D Project Status Dirk Duellmann, CERN IT For the LCG 3D project Meeting with LHCC Referees, March 21st 06.

Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.

CERN IT Department CH-1211 Geneva 23 Switzerland t WLCG Operation Coordination Luca Canali (for IT-DB) Oracle Upgrades.

LCG 3D Project Update (given to LCG MB this Monday) Dirk Duellmann CERN IT/PSS and 3D

David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.

Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.

CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.

FroNtier Stress Tests at Tier-0 Status report Luis Ramos LCG3D Workshop – September 13, 2006.

Database CNAF Barbara Martelli Rome, April 4 st 2006.

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

Site Services and Policies Summary Dirk Düllmann, CERN IT More details at

Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.

Database Project Milestones (+ few status slides) Dirk Duellmann, CERN IT-PSS (

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.

INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.

A quick summary and some ideas for the 2005 work plan Dirk Düllmann, CERN IT More details at

I NTRODUCTION OF W EEK 2  Assignment Discussion  Due this week:  1-1 (Exam Proctor): everyone including in TLC  1-2 (SQL Review): review SQL  Review.

CERN - IT Department CH-1211 Genève 23 Switzerland t Service Level & Responsibilities Dirk Düllmann LCG 3D Database Workshop September,

DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.

Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.

1 LCG Distributed Deployment of Databases A Project Proposal Dirk Düllmann LCG PEB 20th July 2004.

Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 6 April 2005.

Database Readiness Workshop Summary Dirk Duellmann, CERN IT For the LCG 3D project GDB meeting, March 8th 06.

Jean-Philippe Baud, IT-GD, CERN November 2007

WLCG IPv6 deployment strategy

Dirk Duellmann CERN IT/PSS and 3D

High Availability Linux (HA Linux)

Lee Lueking WLCG Workshop DB BoF 22 Jan. 2007

LCG 3D Distributed Deployment of Databases

Database Services at CERN Status Update

3D Application Tests Application test proposals

Database Readiness Workshop Intro & Goals

LCG Distributed Deployment of Databases A Project Proposal

Conditions Data access using FroNTier Squid cache Server

Workshop Summary Dirk Duellmann.

Oracle Storage Performance Studies

Database Services for CERN Deployment and Monitoring

Presentation transcript:

Databases Technologies and Distribution Techniques Dirk Duellmann, CERN HEPiX, Rome, April 4th 2006

HEPiX, April 4th, 2006Dirk Duellmann, CERN2 Outline A little bit of history –Database vendors - RDBMS, ODBMS, ORDBMS, RDBMS, ? Database in the Grid –HA/Scaling -> Database Clusters (in Luca’s talk) –Distribution -> Streams –Redundancy -> Data Guard –Scaling by distribution -> DB Caching Services Outlook…

HEPiX, April 4th, 2006Dirk Duellmann, CERN3 One cycle of HEP use of databases.. ‘90: RDBMS + files (DESY, FNAL, CERN,… w/ Oracle) –Why? Simpler than older database models! ‘95: ODBMS for “all data” (Babar, COMPASS, HARP w/ Objectivity) –Why? Ease of OO language binding! ‘00: Espresso - HEP prototype implementation of ODBMS –Why? ODBMS still a nice market…! Need to cotrol the code.. ‘01: ORDBMS ( Oracle ) –Why? HEP does not have manpower to write a DB! ‘03: RDBMS + files (COMPASS, HARP, LHC w/ Oracle, MySQL, SQLight) –Why? DB vendor abstraction! Assume only commonly available feature set Control object to table mapping inside HEP code (eg POOL) More detail -> Jamie’s CHEP ‘06 talk

HEPiX, April 4th, 2006Dirk Duellmann, CERN4 What can/has been learned? Changes / decisions were driven by… –Changing trust in commercial companies and their long term viability –Changing HEP focus (OO application design - rather than reliable service) … more than by technology differences Several quite different technologies have been used with (some) success –Several experiment software frameworks moved along with the changes –We learned how to implement proper C++ object bindings (also from ODBMS…) Databases host critical data that needs DB features –Database technology (still?) too complex/expensive to host more/all HEP data –Making DBs simpler/cheaper (by dropping functionality/reliability) does not help

HEPiX, April 4th, 2006Dirk Duellmann, CERN5 LCG 3D Service Architecture T2 - local db cache -subset data -only local service M O O O M T1- db back bone - all data replicated - reliable service T0 - autonomous reliable service Oracle Streams http cache (SQUID) Cross DB copy & MySQL/SQLight Files O Online DB -autonomous reliable service F S S SS R/O Access at Tier 1/2 (at least initially)

HEPiX, April 4th, 2006Dirk Duellmann, CERN6 Building Block for Tier 0/1 - Database Clusters (->Luca’s Talk) Two+ dual-CPU nodes Shared storage (eg FC SAN) Scale CPU and I/O ops (independently) Transparent failover and s/w patches LHC database services are deployed on RAC All 3D production sites agreed to setup RAC clusters

HEPiX, April 4th, 2006 Dirk Duellmann, CERN7 How to keep Databases up-to-date? Asynchronous Replication via Streams CNAFRAL SinicaFNAL IN2P3BNL CERN LCR LCR LCR LCR LCR LCR LCR LCR insert into emp values ( 03, “Joan”, ….) applypropagationcaptureapplypropagationcapture Slide : Eva Dafonte Perez

HEPiX, April 4th, 2006Dirk Duellmann, CERN8 Offline FroNTier Resources/Deployment Tier-0: 2-3 Redundant FroNTier servers. Tier-1: 2-3 Redundant Squid servers. Tier-N: 1-2 Squid Servers. Typical Squid server requirements: –CPU/MEM/DISK/NIC=1GHz/1 GB/100GB/Gbit –Network: visible to Worker LAN (private network) and WAN (internet) –Firewall: Two Ports open for URI (FroNTier Launchpad) access and SNMP monitoring (typically 8000 and 3401 respectively) Squid non-requirements –Special hardware (although high-throughput Disk I/O is good) –Cache backup (if disk dies or is corrupted, start from scratch and reload automatically) Squid is easy to install and requires little on-going administration. Squid(s) Tomcat(s) Squid DB Squid Tier 0 Tier 1 Tier N FroNTier Launchpad http JDBC Slide : Lee Lueking

HEPiX, April 4th, 2006 Dirk Duellmann, CERN9 Frontier Production Configuration at Tier 0 Squid runs in http-accelerator mode (as a reverse proxy server) Slide : Luis Ramos

HEPiX, April 4th, 2006Dirk Duellmann, CERN10 LCG Database Deployment Plan Two production phases April - Sept ‘06 : partial production service Production service (parallel to existing testbed) H/W requirements defined by experiments/projects Based on Oracle 10gR2 Subset of tier 1 sites: ASCC, CERN, BNL, CNAF, GridKA, IN2P3, RAL Transfer rate tests schedule with those sites –April: complete streams/frontier setup with production DBs –May: ramp up to maximum distribution rate on production setup October ‘06- onwards : full production service Adjusted h/w requirements (defined at summer ‘06 workshop) Other tier 1 sites joined in: PIC, NIKHEF, NDG, TRIUMF

HEPiX, April 4th, 2006Dirk Duellmann, CERN11 Validation & Throttling Different target communities: –Application developer: How can I speedup benchmark X? –Production mgr: How can I make sure the production uses all resources? –DB Admins: How can I make sure the service stays up over night? DB Application optimization often perceived as black magic –Complex s/w stack with many internal optimizations (& bottlenecks!) CPU, I/O but also net connections, cache use, query plan, table design, indices, bind variables, library cache latch, etc… Database react highly nonlinear wrt access pattern changes –Need throttling to insure service availability (->Oracle resource profiles) –Need developer and DBA together to do validation test at scale

HEPiX, April 4th, 2006Dirk Duellmann, CERN12 Oracle Enterprise Manager (now: Grid Control) Web based user interface –Agent based system collecting the status from DB server, OS host, plans for storage & switch plugins –DBA level detail and direct changes of DB parameters Customizable reports, metrics and policies Eg: which machines run which DB and OS version and patchlevel OEM deployment today –Used locally at several HEP sites (eg CERN setup has some 200 targets inside OEM) Evaluating wide area use in LCG 3D testbed –New release currently under test Very useful as diagnostic tool –Need to gain trust to use OEM also as alerting or s/w upgrade tool User access : integration into local fabric monitoring, eg Lemon

HEPiX, April 4th, 2006Dirk Duellmann, CERN13 Additional Redundancy for Disaster Recovery Oracle DataGuard: a copy of the database is kept current by shipping and applying redo logs Standby Database Standby SitePrimary Site Production Database

HEPiX, April 4th, 2006Dirk Duellmann, CERN14 Database Futures Multicore for DB servers (many threads) –HEP setups today use mostly dual CPU cluster nodes R/W applications scaling often limited by cluster interconnect traffic –Multicore will allow more CPUs per box (buffer cache) need the memory size and bandwidth to grow accordingly 64-bit will allow large server memory –more apps to “run in memory” - reduce disk I/O Real benefits depend on size of hot data vs server cache –Eg size / number of conditions data versions shared by concurrent database clients –Needs validation with realistic experiment data models

HEPiX, April 4th, 2006Dirk Duellmann, CERN15 Conclusions The new (and the old) model : RDBMS as part of hybrid approach –Object features and vendor abstraction (controlled by HEP code) –Databases store key data components in HEP computing (not more yet) More recent: Linux DB clusters allow affordable scalable services for LHC startup –CERN: 2 sun nodes -> ~50 dual CPU Linux nodes -> ~100 nodes by 2007 –Recovery of even cheaper IDE DB servers is more expensive Grid & DBs require new approaches: WAN connect databases –Need to keep their key promises: consistency, reliability –Oracle streams and FroNtier under validation Rather complementary than compeeting Multicore and 64-bit promise to further reduce disk and IC I/O Need larger scale deployment to validate (distributed) DB services –Large effort for experiments and sites (while focus is still on files…)