Elizabeth Gallas - Oxford ADC Weekly September 13, 2011

Slides:



Advertisements
Similar presentations
The Latest news … and Future of ATLAS Databases Elizabeth Gallas - Oxford ATLAS Software & Computing Workshop CERN November 29 to December 3, 2010.
Advertisements

Backup & Recovery Concepts for Oracle Database
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
Database Deployment on OSG Yuri Smirnov BNL US ATLAS DDM operations and MC production Workshop, BNL September 28-29, 2006.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
LCG 3D StatusDirk Duellmann1 LCG 3D Throughput Tests Scheduled for May - extended until end of June –Use the production database clusters at tier 1 and.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
Online Database Support Experiences Diana Bonham, Dennis Box, Anil Kumar, Julie Trumbo, Nelly Stanfield.
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
ATLAS Metrics for CCRC’08 Database Milestones WLCG CCRC'08 Post-Mortem Workshop CERN, Geneva, Switzerland June 12-13, 2008 Alexandre Vaniachine.
Introduction: Distributed POOL File Access Elizabeth Gallas - Oxford – September 16, 2009 Offline Database Meeting.
LHC: ATLAS Experiment meeting “Conditions” data challenge Elizabeth Gallas - Oxford - August 29, 2009 XLDB3.
Workshop Summary (my impressions at least) Dirk Duellmann, CERN IT LCG Database Deployment & Persistency Workshop.
ATLAS Detector Description Database Vakho Tsulaia University of Pittsburgh 3D workshop, CERN 14-Dec-2004.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
Analysis trains – Status & experience from operation Mihaela Gheata.
08-Nov Database TEG workshop, Nov 2011 ATLAS Oracle database applications and plans for use of the Oracle 11g enhancements Gancho Dimitrov.
Conditions Metadata for TAGs Elizabeth Gallas, (Ryan Buckingham, Jeff Tseng) - Oxford ATLAS Software & Computing Workshop CERN – April 19-23, 2010.
Alwayson Availability Groups
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
CERN IT Department CH-1211 Genève 23 Switzerland t Streams Service Review Distributed Database Workshop CERN, 27 th November 2009 Eva Dafonte.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
CERN IT Department CH-1211 Geneva 23 Switzerland t WLCG Operation Coordination Luca Canali (for IT-DB) Oracle Upgrades.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
PIC port d’informació científica Luis Diaz (PIC) ‏ Databases services at PIC: review and plans.
Site Services and Policies Summary Dirk Düllmann, CERN IT More details at
ELSSISuite Services QIZHI ZHANG Argonne National Laboratory on behalf of the TAG developers group ATLAS Software and Computing Week, 4~8 April, 2011.
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
CERN IT Department CH-1211 Geneva 23 Switzerland t Distributed Database Operations Workshop CERN, 17th November 2010 Dawid Wójcik Streams.
TAG and iELSSI Progress Elisabeth Vinek, CERN & University of Vienna on behalf of the TAG developers group.
Database Project Milestones (+ few status slides) Dirk Duellmann, CERN IT-PSS (
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
WLCG Service Report Jean-Philippe Baud ~~~ WLCG Management Board, 24 th August
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
A quick summary and some ideas for the 2005 work plan Dirk Düllmann, CERN IT More details at
Dario Barberis: ATLAS DB S&C Week – 3 December Oracle/Frontier and CondDB Consolidation Dario Barberis Genoa University/INFN.
Dissemination and User Feedback Castor deployment team Castor Readiness Review – June 2006.
WLCG ‘Weekly’ Service Report ~~~ WLCG Management Board, 19 th August 2008.
Jean-Philippe Baud, IT-GD, CERN November 2007
LFC consolidation STEP-0
Oracle structures on database applications development
Dirk Duellmann CERN IT/PSS and 3D
Database Replication and Monitoring
Virtualization and Clouds ATLAS position
High Availability Linux (HA Linux)
U.S. ATLAS Grid Production Experience
IT-DB Physics Services Planning for LHC start-up
Cisco Data Virtualization
Database Services at CERN Status Update
3D Application Tests Application test proposals
By Anil Kumar CD/CSS/DSG June 06, 2005
BDII Performance Tests
Database Readiness Workshop Intro & Goals
Update on Plan for KISTI-GSDC
AMI – Status November Solveig Albrand Jerome Fulachier
Readiness of ATLAS Computing - A personal view
WLCG Service Interventions
Conditions Data access using FroNTier Squid cache Server
Workshop Summary Dirk Duellmann.
Upgrading to Microsoft SQL Server 2014
ATLAS DC2 & Continuous production
Status and plans for bookkeeping system and production tools
Frontier Status Alessandro De Salvo on behalf of the Frontier group
Presentation transcript:

Elizabeth Gallas - Oxford ADC Weekly September 13, 2011 Database Operations Elizabeth Gallas - Oxford ADC Weekly September 13, 2011

Elizabeth Gallas - Databases Overview Brief notes Oracle 11g validation ATLR Replication User incidents (since S&C Week) Frontier ADCR Sep 2011 Elizabeth Gallas - Databases

Elizabeth Gallas - Databases Brief Notes LFC migration See Graeme’s talks … ATLARC / TAG Services Popular: Event Picking & other TAG Services/Reports Increasing requests for queries/cross checks using TAG DB AMI Database Master Server: issues at Lyon late in July  full recovery, no data loss (early August) DBA issue help: DQ2, Panda, DDM, AKTR, AGIS … Indexing Query optimization Development improvements AGIS Schema Running in production mode on integration (INTR) server  Needs to move to production ASAP Oracle 11g testing Sep 2011 Elizabeth Gallas - Databases 3

Elizabeth Gallas - Databases Oracle 11g Validation All production DBs will upgrade to Oracle 11g Scheduled: very early January 2012 Testing reduces risks ! Participation of developers – essential DBAs & resources ready to help (platforms available since late May) DBA’s initiated validation campaign in August As announced in Roman’s talk (S&C Week – July) ATLARC may upgrade to 11g in October 2010 Take early advantage: Features, Performance improvements Latest was summarized yesterday in Gancho’s talk at the ADC Development meeting: https://twiki.cern.ch/twiki/bin/viewauth/Atlas/DBOpsValidation11g Sep 2011 Elizabeth Gallas - Databases 4

Elizabeth Gallas - Databases ATLR Status … August: no holiday … DB usage is “evolving” (growing) … Developers finding increased utility for Conditions data We have powerful tools to access this data People using it in new ways, a great thing ! Release 17: increased DB access Studying logs to quantify differences Tier-0: increased capacity … other bottlenecks loosened (file staging) … Database access now limiting Tier-0 job throughput  Recent Technical Stop used for testing Frontier usage by Tier-0 (coordinated with Frontier experts) No problems using CERN Frontier; Improved DB access time BUT: some jobs had more DB retrievals for MUONALIGN (See Hans’ talk in ADC Development meeting yesterday) Trigger Reprocessing: Early August: Bug (improper disconnects) problems: fixed Currently: Trigger experts speeding up validation cycle Use OFFSITE resources (Tier-1s): Timescale: ASAP Development effort to later (also) use Frontier: test “in the next month” Sep 2011 Elizabeth Gallas - Databases 5

Elizabeth Gallas - Databases Oracle Streams Recent request to run Trigger Reprocessing at BNL Need to export ATLAS_CONF_TRIGGER_REPR to BNL Decided to add to Oracle Streams By default, it will go to all Tier-1s Added benefit … available if/when these jobs use Frontier Steps: adding this Schema to Oracle Streams Must insure stability of all schemas under replication https://twiki.cern.ch/twiki/bin/view/Atlas/DatabaseSchemasUnderReplication This Schema: 200 MB (not a volume issue) Owner account locking Trigger expert (Joerg) working with DBAs: Small schema changes required to meet requirements If all goes according to plan, intervention this week to add this Schema to the replication to all Tier-1s Wednesday 10:00 – 12:30 Requires replication to be stopped during intervention Sep 2011 Elizabeth Gallas - Databases 6

Incidents: User Access to Conditions 2 Frontier crashes at CERN Frontier site in 1 week Follow up: Users – working independently on different projects Developer: looking into SCT noise Developer: adding info to Lumi Data Summary Metadata Reports Why did Frontier crash ? Under investigation (memory issue?) Frontier “load” last week: “intense queries” from L1 Calo studies Query time usually <2 sec, these were 20-30 seconds Follow up with developer Query is a reasonable request Executed in reasonable time given nature of request Look for ways to improve queries  Raise number of Frontier DB connections from 10 to 20 Additional Notes:  Incidents: reasoning behind dedicated Frontier launchpad for Tier-0 Incidents NOT a problem on Oracle side, just for Frontier Tracking down these issues reflects a lot of improvements in Frontier monitoring and understanding of Frontier logging An ongoing effort Sep 2011 Elizabeth Gallas - Databases 7

Tier-1s / Frontier Status Oracle+Frontier servers: RAL, Lyon, KIT, BNL, TRIUMF and CERN Frontier Meetings: Aug 11, Aug 25, Sep 9 https://www.racf.bnl.gov/docs/services/frontier/meetings/minutes Skipping weeks with Tier-1 Service Coordination meetings Current failover strategy: Some Frontier launchpads still not open (as recommended) Frontier fail-over only to sites with open access configuration and resilient server deployment Need updated Frontier https://savannah.cern.ch/bugs/index.php?86408 Needed for failover to work WAS thought to NOT to be urgent …changed our minds … when specific sites had issues / hurricanes … raise urgency To be included in LCG 60(d) Improving Frontier Monitoring and follow up on frequent/intense queries Still a work and investigations to be done – takes time Sep 2011 Elizabeth Gallas - Databases 8

Elizabeth Gallas - Databases ADCR Status ADCR Database Early August: Alerts of storage and Oracle ASM problems. Made controlled switch to standby hardware. Added to standby for robustness, capacity: 2 storage arrays 3rd node Current status: SR open to Oracle on primary hardware - in progress.  From Gancho: ADCR on standby hardware … performing better … Doubling of buffer pool cache (now 13 GB ) thus less IOPS … Adding 2 storage arrays: ADCR has 72 disks (instead of 4 arrays = 48 disks) Sep 2011 Elizabeth Gallas - Databases 9