Download presentation
Presentation is loading. Please wait.
Published bySuzanna Marcia Fitzgerald Modified over 9 years ago
1
RCF Status Extended outage of the Mass Storage System (HPSS) last Wednesday –Latest transaction logs of namespace DB were erroneously deleted in the production system by Administrator –Last backup was taken ~1:30 hour before mishap –Investigated two possible routes to fix the problem 1.Fix the database w/o restoring it from backup 2.Restore DB as far as possible from backup 1. would have been too time consuming (would have to get IBM involved, wait for their response and check DB consistency) 2. was chosen because it’s much faster to restore the service in anticipation of continuing data taking (already resumed at that time). Extracted information from HPSS Cache as to files that needed to be re-transferred In summary Length of service outage: ~13 hours Data Loss PHENIX: No files were lost (keeping files in buffer at CH until safe on tape) STAR: 118 files (deleting files from buffer once stored in HPSS cache) –Fortunately all this happened during an APEX day … –Have implemented measures to avoid this mistake happening again
2
Raw Data Volume collected & archived since 11/26/07 11/26 01/14 470 TB 125 TB PHENIX Raw Data STAR Raw Data
3
RAW Data Collected in RHIC Runs Run8 (d-AU only)Run3 (d-AU only)
4
Results from Data Taking STAR PHENIX 3 Gigabits/second PHENIX / RCF Network Link (10 Gbps max.) Data Migration to Tape 1500 Megabits/second STAR / RCF Network Links (2 * 1 Gbps max.) 1500 GB/hour 6,000 GB HPSS Outage
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.