Download presentation
Presentation is loading. Please wait.
Published byKathlyn Hodge Modified over 8 years ago
1
Status: Central Storage Services CD/LSC/CSI/CSG June 26, 2007
2
Storage Services 1. File Based Storage NFS/CIFS (BlueArc) – Fast on-site access AFS – Global access, authenticated filesystem 2. Block Based Storage Fibre-Channel connect to SAN 3. Archival Storage Backups 1. File Based Storage NFS/CIFS (BlueArc) – Fast on-site access AFS – Global access, authenticated filesystem 2. Block Based Storage Fibre-Channel connect to SAN 3. Archival Storage Backups
3
NAS Status Newest Service 2 Production clusters 1. Fermi-Blue (1 st generation cluster) 2. RHEA (2 nd generation cluster) Newest Service 2 Production clusters 1. Fermi-Blue (1 st generation cluster) 2. RHEA (2 nd generation cluster)
4
NAS Status 3/06 – NAS heads ordered (Fermi-Blue) 5/06 – Pilot deployment SLF, DSG, KITS, PPD and FESS department servers Year 1 projection: 10TB deployed storage 3/06 – NAS heads ordered (Fermi-Blue) 5/06 – Pilot deployment SLF, DSG, KITS, PPD and FESS department servers Year 1 projection: 10TB deployed storage
5
NAS Status Phase 1 Phase 2 Phase 3 Year 2 Year 3 Year 1 Department Servers, Array Consolidation Rollout to Farms servers Rollout to Farms Workers Projected Rollout
6
NAS Status Phase 1 Phase 2 Phase 3 Year 2 Year 3 Year 1 Department Servers, Array Consolidation Rollout to Farms servers Rollout to Farms Workers Actual Rollout
7
NAS Status Actual Year 1 Deployment Q2 2006 : Pilot Program Early adopters Timing CMS “home” area evaluation Fermigrid NFS issues Actual Year 1 Deployment Q2 2006 : Pilot Program Early adopters Timing CMS “home” area evaluation Fermigrid NFS issues
8
NAS Status Actual Year 1 Deployment (cont) Q3 2006: Production Phase 1 in full production CMS + Fermigrid go production (Phase 2) Additional NAS heads purchase (RHEA) Year 1 projection revised to 200TB deployed storage Actual Year 1 Deployment (cont) Q3 2006: Production Phase 1 in full production CMS + Fermigrid go production (Phase 2) Additional NAS heads purchase (RHEA) Year 1 projection revised to 200TB deployed storage
9
NAS Status Actual Year 1 deployment (cont) Q4/2006 CMS and Fermigrid deploy to worker nodes (Phase-3) Q1/Q2 2007 D0/CDF/Miniboone begin consolidation of servers into central NAS service Requests for space from LHC, ILC and SDSS Actual Year 1 deployment (cont) Q4/2006 CMS and Fermigrid deploy to worker nodes (Phase-3) Q1/Q2 2007 D0/CDF/Miniboone begin consolidation of servers into central NAS service Requests for space from LHC, ILC and SDSS
10
NAS Status NAS Storage Growth Year 1
11
NAS Status Q3 2007 storage deployment @ 425-905TB
12
NAS Status Current customers Experiments CMS, CDF, D0, FermiGrid/OSG, Miniboone ILC, LHC, SDSS, Sciboone(?) Departments CD, Directorate, FESS, ES&H, PPD, VMS Services Scientific Linux (FERMI), CVS, KITS, Alphaflow, Enstore Current customers Experiments CMS, CDF, D0, FermiGrid/OSG, Miniboone ILC, LHC, SDSS, Sciboone(?) Departments CD, Directorate, FESS, ES&H, PPD, VMS Services Scientific Linux (FERMI), CVS, KITS, Alphaflow, Enstore
13
NAS Status Benefits Stability -- Savings multiplier Effort re-directed towards supporting application Reduced downtime Increased productivity Consolidation (30+ servers/storage arrays) Reduce equipment support costs Reduce power + cooling Benefits Stability -- Savings multiplier Effort re-directed towards supporting application Reduced downtime Increased productivity Consolidation (30+ servers/storage arrays) Reduce equipment support costs Reduce power + cooling
14
NAS Status Benefits (cont) Ease of use Familiar storage solution – minimal training Flexible Choice of storage tiers, price points Benefits (cont) Ease of use Familiar storage solution – minimal training Flexible Choice of storage tiers, price points
15
NAS Status Challenges Growth higher than expected – Lun limit Each cluster is limited to 256 luns Each lun limited to 2TB Upgrade to 64TB lun support expected EOY 2008 Criticality of service Central location Offsite DR required? Challenges Growth higher than expected – Lun limit Each cluster is limited to 256 luns Each lun limited to 2TB Upgrade to 64TB lun support expected EOY 2008 Criticality of service Central location Offsite DR required?
16
NAS Status Challenges (cont) Backup of large data an issue Large data areas >5TB Millions of files Logistics Power Floor space Challenges (cont) Backup of large data an issue Large data areas >5TB Millions of files Logistics Power Floor space
17
NAS Status FY08 Plans Expansion of service Participate in Tier 3 evaluation Development of better reporting tools FY08 Plans Expansion of service Participate in Tier 3 evaluation Development of better reporting tools
18
NAS Status More info: http://computing.fnal.gov/nasan/bluearc.html Questions? More info: http://computing.fnal.gov/nasan/bluearc.html Questions?
19
SAN Status 272 Fibre-Channel ports 128 ports added to fabric in ‘07 (CMS contribution) Qlogic switches 2Gb Fibre Channel Connections 272 Fibre-Channel ports 128 ports added to fabric in ‘07 (CMS contribution) Qlogic switches 2Gb Fibre Channel Connections
20
SAN Status 23 storage arrays 12 centrally managed Database Array (3PAR) purchased and tested D0ora2 deployment 7/2/2007 Start retiring 1 st Generation Tier 2 storage arrays (Infortrend) 11 externally managed 23 storage arrays 12 centrally managed Database Array (3PAR) purchased and tested D0ora2 deployment 7/2/2007 Start retiring 1 st Generation Tier 2 storage arrays (Infortrend) 11 externally managed
21
SAN Status 346TB 156TB centrally managed 190TB externally managed 346TB 156TB centrally managed 190TB externally managed
22
SAN Status SAN fabric opened up to external members CMS, CDF, D0, Miniboone Must retire LSI storage array End of support (year end 2007) Impacts IMAP/POP, AFS, DSG(CDF) SAN fabric opened up to external members CMS, CDF, D0, Miniboone Must retire LSI storage array End of support (year end 2007) Impacts IMAP/POP, AFS, DSG(CDF)
23
SAN Status FY07 Plans Additional HDS array NAS storage for SDSS, Windows Migration, DSG Block storage for LSI migration FY07 Plans Additional HDS array NAS storage for SDSS, Windows Migration, DSG Block storage for LSI migration
24
SAN Status FY07 Plans (Cont) Purchase 2 Nexsan SATAbeasts Replace 4 Infortrend arrays Backup cache disk, DSG RMAN disks Test as possible tier 3 candidates FY07 Plans (Cont) Purchase 2 Nexsan SATAbeasts Replace 4 Infortrend arrays Backup cache disk, DSG RMAN disks Test as possible tier 3 candidates
25
SAN Status FY08 Plans Additional capacity for 3PAR For sparing DSG migration Additional capacity for NAS Decommission remaining Infortrend arrays Other tier 3 alternatives (nexgen HDS, DDN) Virtualization across arrays FY08 Plans Additional capacity for 3PAR For sparing DSG migration Additional capacity for NAS Decommission remaining Infortrend arrays Other tier 3 alternatives (nexgen HDS, DDN) Virtualization across arrays
26
SAN Status Questions?
27
Site Backup Status Service entering 4 th year 10/07 2 Backup Servers Chasm (infrastructure and business) Canyon (experiment) 1 Library (600 slots) 8 SAIT-1 Tape Drives 2 Infortrend Storage arrays TiBS Backup Software Service entering 4 th year 10/07 2 Backup Servers Chasm (infrastructure and business) Canyon (experiment) 1 Library (600 slots) 8 SAIT-1 Tape Drives 2 Infortrend Storage arrays TiBS Backup Software
28
Site Backup Status 22TB+ data 12,700+ backup volumes 5,506 UNIX/Windows, 7171 AFS, 25 NDMP 452+ clients 18.5% increase in past 6 months (3.7TB) No single volume > 100GB 22TB+ data 12,700+ backup volumes 5,506 UNIX/Windows, 7171 AFS, 25 NDMP 452+ clients 18.5% increase in past 6 months (3.7TB) No single volume > 100GB
29
Site-Backup Status Typical Daily Backup Timeline (canyon) Incr/NetworkMergesRetry Debug 6:00PM2:00AM1:00PM1:40PM 24 hour window
30
Site-Backup Status Issues Resolving client backup issues High client volatility Reconfiguration/Renaming/Reinstalls Large delta in data Contacting admins Slow client network performance Issues Resolving client backup issues High client volatility Reconfiguration/Renaming/Reinstalls Large delta in data Contacting admins Slow client network performance
31
Site-Backup Status Issues (cont) Merge problems Can be difficult to debug Tape drive/Software or combination Cache disk Multiple disk failures Issues (cont) Merge problems Can be difficult to debug Tape drive/Software or combination Cache disk Multiple disk failures
32
Site-Backup Status Issues (cont) SAIT-1 drive performance issues Tapes written on one drive are slow to read on another Long debug time > 1 hour Usually requires multiple replacements Sony and Spectra investigating Too few Issues (cont) SAIT-1 drive performance issues Tapes written on one drive are slow to read on another Long debug time > 1 hour Usually requires multiple replacements Sony and Spectra investigating Too few
33
Site-Backup Status FY07 Plans Chasm Canyon IP Disk Cache SAIT-1 Drives SAN Migrate more backups to NDMP Relieve pressure on chasm Migrate clients from canyon to chasm Relieve pressure on canyon LTO-4 NDMP
34
Site-Backup Status FY07 Plans (cont) Upgrade cache disks Replace aging Infortrend disks Higher performing array RAID 6 FY07 Plans (cont) Upgrade cache disks Replace aging Infortrend disks Higher performing array RAID 6
35
Site-Backup Status Challenges Desire from users to expand backups Larger backup volumes Larger backup sets Challenges Desire from users to expand backups Larger backup volumes Larger backup sets
36
Site-Backup Status FY08 Plans Upgrade Servers to Solaris 10 Faster IP stack and Filesystem Upgrade server hardware Faster bus speed Utilize faster cache disk Take advantage of faster filesystem Feed faster tape drives Migrate canyon backups to LTO-4 FY08 Plans Upgrade Servers to Solaris 10 Faster IP stack and Filesystem Upgrade server hardware Faster bus speed Utilize faster cache disk Take advantage of faster filesystem Feed faster tape drives Migrate canyon backups to LTO-4
37
Site-Backup Status FY08 Plans (cont) Investigate Disk-based library TiBS specific implementation Use common disks as a disk library Synchronous copy to tape (also) Faster restores, possibly backups May increase overall backup system throughput FY08 Plans (cont) Investigate Disk-based library TiBS specific implementation Use common disks as a disk library Synchronous copy to tape (also) Faster restores, possibly backups May increase overall backup system throughput
38
Site-Backup Status FY08 Plans (cont) Investigate Virtual Tape Library Agnostic solution (not TiBS specific) Asynchronous copy to tape Emulate tape drives and libraries Faster restores and backups Will increase overall backup system throughput Some systems have data-deduplication Inline or post-process FY08 Plans (cont) Investigate Virtual Tape Library Agnostic solution (not TiBS specific) Asynchronous copy to tape Emulate tape drives and libraries Faster restores and backups Will increase overall backup system throughput Some systems have data-deduplication Inline or post-process
39
Site-Backup Status More information: http://computing.fnal.gov/site-backups Questions? More information: http://computing.fnal.gov/site-backups Questions?
40
AFS Status 12 AFS servers ~17TB storage Largest customers: Minos and Web Roughly 8-10% increase per year (Based off number of volumes) Must migrate servers off of LSI storage array and onto HDS Tier 2 storage. 12 AFS servers ~17TB storage Largest customers: Minos and Web Roughly 8-10% increase per year (Based off number of volumes) Must migrate servers off of LSI storage array and onto HDS Tier 2 storage.
41
AFS Status FY07 Plans Migrate data to HDS Tier 2 disks Migration partially complete (1.8TB installed) Tier 2 storage re-allocated to NAS due to high demand Test Solaris 10 AFS server with ZFS FY07 Plans Migrate data to HDS Tier 2 disks Migration partially complete (1.8TB installed) Tier 2 storage re-allocated to NAS due to high demand Test Solaris 10 AFS server with ZFS
42
AFS Status FY08 Plans Upgrade Servers to Solaris 10 Faster OS – filesystem and IP stack Newer CPUs – low power Dual Power Supply Upgrade OpenAFS Multi-domain support Support for > 2GB files Promote RO copies to RW copies FY08 Plans Upgrade Servers to Solaris 10 Faster OS – filesystem and IP stack Newer CPUs – low power Dual Power Supply Upgrade OpenAFS Multi-domain support Support for > 2GB files Promote RO copies to RW copies
43
AFS Status More information: http://computing.fnal.gov/nasan/afs.html Questions? More information: http://computing.fnal.gov/nasan/afs.html Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.