RCF Status Extended outage of the Mass Storage System (HPSS) last Wednesday –Latest transaction logs of namespace DB were erroneously deleted in the production.

RCF Status Extended outage of the Mass Storage System (HPSS) last Wednesday –Latest transaction logs of namespace DB were erroneously deleted in the production system by Administrator –Last backup was taken ~1:30 hour before mishap –Investigated two possible routes to fix the problem 1.Fix the database w/o restoring it from backup 2.Restore DB as far as possible from backup  1. would have been too time consuming (would have to get IBM involved, wait for their response and check DB consistency)  2. was chosen because it’s much faster to restore the service in anticipation of continuing data taking (already resumed at that time). Extracted information from HPSS Cache as to files that needed to be re-transferred  In summary  Length of service outage: ~13 hours  Data Loss  PHENIX: No files were lost (keeping files in buffer at CH until safe on tape)  STAR: 118 files (deleting files from buffer once stored in HPSS cache) –Fortunately all this happened during an APEX day … –Have implemented measures to avoid this mistake happening again

Raw Data Volume collected & archived since 11/26/07 11/26 01/14 470 TB 125 TB PHENIX Raw Data STAR Raw Data

RAW Data Collected in RHIC Runs Run8 (d-AU only)Run3 (d-AU only)

Results from Data Taking STAR PHENIX 3 Gigabits/second PHENIX / RCF Network Link (10 Gbps max.) Data Migration to Tape 1500 Megabits/second STAR / RCF Network Links (2 * 1 Gbps max.) 1500 GB/hour 6,000 GB HPSS Outage

RCF Status Extended outage of the Mass Storage System (HPSS) last Wednesday –Latest transaction logs of namespace DB were erroneously deleted in the production.

Similar presentations

Presentation on theme: "RCF Status Extended outage of the Mass Storage System (HPSS) last Wednesday –Latest transaction logs of namespace DB were erroneously deleted in the production."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

RCF Status Extended outage of the Mass Storage System (HPSS) last Wednesday –Latest transaction logs of namespace DB were erroneously deleted in the production.

Similar presentations

Presentation on theme: "RCF Status Extended outage of the Mass Storage System (HPSS) last Wednesday –Latest transaction logs of namespace DB were erroneously deleted in the production."— Presentation transcript:

Similar presentations

About project

Feedback