Presentation is loading. Please wait.

Presentation is loading. Please wait.

Elizabeth Gallas - Oxford ADC Weekly September 13, 2011

Similar presentations


Presentation on theme: "Elizabeth Gallas - Oxford ADC Weekly September 13, 2011"— Presentation transcript:

1 Elizabeth Gallas - Oxford ADC Weekly September 13, 2011
Database Operations Elizabeth Gallas - Oxford ADC Weekly September 13, 2011

2 Elizabeth Gallas - Databases
Overview Brief notes Oracle 11g validation ATLR Replication User incidents (since S&C Week) Frontier ADCR Sep 2011 Elizabeth Gallas - Databases

3 Elizabeth Gallas - Databases
Brief Notes LFC migration See Graeme’s talks … ATLARC / TAG Services Popular: Event Picking & other TAG Services/Reports Increasing requests for queries/cross checks using TAG DB AMI Database Master Server: issues at Lyon late in July  full recovery, no data loss (early August) DBA issue help: DQ2, Panda, DDM, AKTR, AGIS … Indexing Query optimization Development improvements AGIS Schema Running in production mode on integration (INTR) server  Needs to move to production ASAP Oracle 11g testing Sep 2011 Elizabeth Gallas - Databases 3

4 Elizabeth Gallas - Databases
Oracle 11g Validation All production DBs will upgrade to Oracle 11g Scheduled: very early January 2012 Testing reduces risks ! Participation of developers – essential DBAs & resources ready to help (platforms available since late May) DBA’s initiated validation campaign in August As announced in Roman’s talk (S&C Week – July) ATLARC may upgrade to 11g in October 2010 Take early advantage: Features, Performance improvements Latest was summarized yesterday in Gancho’s talk at the ADC Development meeting: Sep 2011 Elizabeth Gallas - Databases 4

5 Elizabeth Gallas - Databases
ATLR Status … August: no holiday … DB usage is “evolving” (growing) … Developers finding increased utility for Conditions data We have powerful tools to access this data People using it in new ways, a great thing ! Release 17: increased DB access Studying logs to quantify differences Tier-0: increased capacity … other bottlenecks loosened (file staging) … Database access now limiting Tier-0 job throughput  Recent Technical Stop used for testing Frontier usage by Tier-0 (coordinated with Frontier experts) No problems using CERN Frontier; Improved DB access time BUT: some jobs had more DB retrievals for MUONALIGN (See Hans’ talk in ADC Development meeting yesterday) Trigger Reprocessing: Early August: Bug (improper disconnects) problems: fixed Currently: Trigger experts speeding up validation cycle Use OFFSITE resources (Tier-1s): Timescale: ASAP Development effort to later (also) use Frontier: test “in the next month” Sep 2011 Elizabeth Gallas - Databases 5

6 Elizabeth Gallas - Databases
Oracle Streams Recent request to run Trigger Reprocessing at BNL Need to export ATLAS_CONF_TRIGGER_REPR to BNL Decided to add to Oracle Streams By default, it will go to all Tier-1s Added benefit … available if/when these jobs use Frontier Steps: adding this Schema to Oracle Streams Must insure stability of all schemas under replication This Schema: 200 MB (not a volume issue) Owner account locking Trigger expert (Joerg) working with DBAs: Small schema changes required to meet requirements If all goes according to plan, intervention this week to add this Schema to the replication to all Tier-1s Wednesday 10:00 – 12:30 Requires replication to be stopped during intervention Sep 2011 Elizabeth Gallas - Databases 6

7 Incidents: User Access to Conditions
2 Frontier crashes at CERN Frontier site in 1 week Follow up: Users – working independently on different projects Developer: looking into SCT noise Developer: adding info to Lumi Data Summary Metadata Reports Why did Frontier crash ? Under investigation (memory issue?) Frontier “load” last week: “intense queries” from L1 Calo studies Query time usually <2 sec, these were seconds Follow up with developer Query is a reasonable request Executed in reasonable time given nature of request Look for ways to improve queries  Raise number of Frontier DB connections from 10 to 20 Additional Notes:  Incidents: reasoning behind dedicated Frontier launchpad for Tier-0 Incidents NOT a problem on Oracle side, just for Frontier Tracking down these issues reflects a lot of improvements in Frontier monitoring and understanding of Frontier logging An ongoing effort Sep 2011 Elizabeth Gallas - Databases 7

8 Tier-1s / Frontier Status
Oracle+Frontier servers: RAL, Lyon, KIT, BNL, TRIUMF and CERN Frontier Meetings: Aug 11, Aug 25, Sep 9 Skipping weeks with Tier-1 Service Coordination meetings Current failover strategy: Some Frontier launchpads still not open (as recommended) Frontier fail-over only to sites with open access configuration and resilient server deployment Need updated Frontier Needed for failover to work WAS thought to NOT to be urgent …changed our minds … when specific sites had issues / hurricanes … raise urgency To be included in LCG 60(d) Improving Frontier Monitoring and follow up on frequent/intense queries Still a work and investigations to be done – takes time Sep 2011 Elizabeth Gallas - Databases 8

9 Elizabeth Gallas - Databases
ADCR Status ADCR Database Early August: Alerts of storage and Oracle ASM problems. Made controlled switch to standby hardware. Added to standby for robustness, capacity: 2 storage arrays 3rd node Current status: SR open to Oracle on primary hardware - in progress.  From Gancho: ADCR on standby hardware … performing better … Doubling of buffer pool cache (now 13 GB ) thus less IOPS … Adding 2 storage arrays: ADCR has 72 disks (instead of 4 arrays = 48 disks) Sep 2011 Elizabeth Gallas - Databases 9


Download ppt "Elizabeth Gallas - Oxford ADC Weekly September 13, 2011"

Similar presentations


Ads by Google