Download presentation
Presentation is loading. Please wait.
1
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases Luca Canali, IT-DM After-C5 Presentation CERN, May 8 th, 2009
2
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 2 Outline An overview of the challenges for data management and physics DB regarding storage –Description of main architectural details –Production experience –Our working model for storage sizing, architecture and rollout Our activities in testing new storage solutions –Results detailed for FC and iSCSI
3
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 3 Data management and storage Service managers handle a ‘deep technology stack’. Example, DBAs at CERN: –More involved in lower stack than ‘traditional DBAs’ –Running DB Service (match users requirements, help app developers to optimize applications) –Keep system up, tune SQL execution, backup, security, replication –Database software installation, monitoring, patching –DBAs are involved in activities related to HW provisioning and setup, tuning, monitoring: Servers, Storage, Network
4
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 4 Enterprise-class vs. commodity HW RAC ASM Grid-like, Scale OUT
5
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 5 A real-world example, RAC7
6
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 6 Why storage is a very interesting area in the coming years Storage market is very conservative –Few vendors share market for large enterprise solutions –Enterprise storage has typically a high premium Opportunities –Commodity HW/grid-like solutions provide order of magnitude gain in cost/performance –New products coming to the market promise many changes: –Solid state disks, high capacity disks, high performance and low cost interconnects
7
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 7 Commodity HW for critical data handling services High-end and low-end storage can be easily bought and used out of the box Low-end storage for critical services requires customization Recipe for production rollout: –Understand requirements –Consult storage and HW procurement experts –Decide on suitable architecture –Test and Measure (and learn from production) –Deploy the ‘right’ hardware and software to achieve desired level of high availability and performance and share with Tier1s and online
8
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 8 HW layer – HD, the basic element Hard disk technology –Basic block of storage since 40 years –Main intrinsic limitation: latency
9
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 9 HD specs HDs are limited –In particular seek time is unavoidable (7.2k to 15k rpm, ~2-10 ms) –100-200 IOPS –Throughput ~100MB/s, typically limited by interface –Capacity range 300GB -2TB –Failures: mechanical, electric, magnetic, firmware issues. –In our experience with ~2000 disks in prod we have about 1 disk failure per week
10
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 10 Enterprise disks Performance –Enterprise disks offer more ‘performance’ –They spin faster and have better interconnect protocols (e.g. SAS vs SATA) –Typically of low capacity –Our experience: often not competitive in cost/perf vs. SATA Reliability –Evidence that low-end and high end disks don’t differ significantly
11
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 11 HD wrap-up HD is a old but evergreen technology –In particular disk capacities have increased of one order of magnitude in just a few years –At the same time prices have gone down (below 0.1 USD per GB for consumer products) –1.5 TB consumer disks, and 450GB enterprise disks are common –2.5’’ drives are becoming standard to reduce power consumption
12
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 12 Scaling out the disk The challenge for storage systems –Scale out the disk performance to meet demands –Throughput –IOPS –Latency –Capacity Sizing storage systems –Must focus on critical metric(s) –Avoid ‘capacity trap’
13
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 13 RAID and redundancy Storage arrays with RAID are the traditional approach –implement RAID to protect data. –Parity based: RAID5, RAID6 –Stripe and mirror: RAID10 Scalability problem of RAID –For very large configurations the time between two failures can become close to RAID volume rebuild time (!) –That’s also why RAID6 is becoming more popular than RAID5
14
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 14 Beyond RAID Google and Amazon don’t use RAID Main idea: –Divide data in ‘chunks’ –Write multiple copies of the chunks –Examples: Google file system, Amazon S3 Additional advantages: –Removes the constraint of locally storing redundancy inside one storage arrays –Facilitates moving, refreshing, and relocating data chunks –Allows to deploy low-cost arrays
15
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 15 Our experience Physics DB storage uses Oracle ASM –Volume manager and cluster file system integrated with Oracle –Soon to serve also a general-purpose cluster file system (involved in 11gR2 beta testing) –Oracle files are divided in chunks –Chunks are distributed evenly across storage –Chunks are written in multiple copies (2 or 3 it depends on file type and configuration) –Allows the use of low-cost storage arrays: does not need RAID support (JBOD is enough)
16
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 16 The interconnect Several technologies available with different characteristics –SAN –NAS –iSCSI –Direct attach
17
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 17 The interconnect Throughput challenge –It takes 1 day to copy/backup 10 TB over 1 GBPS network
18
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 18 Fiber channel SAN FC SAN is currently most used architecture for enterprise level storage –Fast and low overhead on server CPU Used for physics DBs and Tier1s SAN networks with max 64 ports at low cost –Measured: 8 Gbps transfer rate (4+4 dual ported HBAs for redundancy and load balancing) –Proof of concept FC backup (LAN free) reached full utilization of tape heads –Scalable: proof of concept ‘Oracle supercluster’ of 410 SATA disks, and 14 dual quadcore servers
19
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 19 Case Study: the largest cluster I have ever installed, RAC5 The test used:14 servers
20
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 20 Multipathed fiber channel 8 FC switches: 4Gbps (10Gbps uplink)
21
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 21 Many spindles 26 storage arrays (16 SATA disks each)
22
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 22 Case Study: I/O metrics for the RAC5 cluster Measured, sequential I/O –Read: 6 GB/sec –Read-Write: 3+3 GB/sec Measured, small random IO –Read: 40K IOPS (8 KB read ops) Note: –410 SATA disks, 26 HBAS on the storage arrays –Servers: 14 x 4+4Gbps HBAs, 112 cores, 224 GB of RAM
23
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 23 Testing storage ORION –Oracle provides a testing utility that has proven to give same results as more complex SQL based tests –Sharing experience: it’s not a DBA tool, it can be used to test storage for other purposes –Used for stress testing (our experience: identified controller problems in RAC5 in 2008) In the following some examples of results –Metrics measured for various disk types –FC results –iSCSI 1 Gbps and 10 GigE results
24
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 24 Metrics of interest Basic IO metrics measured by ORION –IOPS for random I/O (8KB) –MBPS for sequential I/O (in chunks of 1 MB) –Latency associated with the IO operations Simple to use –Get started:./orion_linux_em64t -run simple -testname mytest -num_disks 2 –More info: https://twiki.cern.ch/twiki/bin/view /PSSGroup/OrionTests
25
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 25 ORION output, an example
26
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 26 ORION results, small random read IOPS Disks UsedArrayIOPSIOPS / DISK Mirrored Capacity 128x SATAInfortrend 16-bay 1200010024 TB 120x Raptor 2.5’’ Infortrend 12-bay 1760015018 TB 144xWD ‘Green disks’ Infortrend 12-bay 10300 12600 70 90 72 TB 22 TB 96x Raptor 3.5’’cmsonline Infortrend 16-bay 160001606.5 TB 80x SAS Pics Netapps RAID-DP 170002107.5 TB
27
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 27 iSCSI iSCSI is interesting for cost reduction –Get rid of ‘specialized’ FC network Many concerns on performance though, due to –IP interconnect throughput –CPU usage –Adoption seems to be only for low-end systems at the moment –10GigE tests very promising
28
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 28 iSCSI 1 Gbps, Infortrend Scalability tests IOPS, FC vs. iSCSI Data: D. Wojcik
29
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 29 iSCSI 1 Gbps, Infortrend Scalability tests IOPS, FC vs. iSCSI Data: D. Wojcik
30
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 30 iSCSI tests, 10 GigE Recent ORION tests on 10 GigE iSCSI –‘CERN-made’ disk servers that export storage as iSCSI over 10 GigE –Details of HW in next slide –ORION tests up to 3 disk arrays (of 14 drives) –Almost linear scalability –Up to 42 disks tested -> 4000 IOPS at saturation –85% CPU idle during test –IOPS of a single disk: ~110 IOPS –Overall, these are preliminary test data Data: A. Horvath
31
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 31 iSCSI on 10 GigE HW details Test HW installed by IT-FIO and IT-CS –2 Clovertown quad-core processors of 2.00 GHz –Blackford mainboard –8 GB of RAM –16 SATA-2 drives of 500 GB, 7'200 rpm –RAID controller 3ware 9650SE-16ML –Intel 10GigE dual port server adapter PCIexpress (EXPX9502CX4 - Oplin) –HP Procurve 10GigE switch Data: H. Meinhard
32
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 32 NAS IT-DES experience of using NAS for databases Netapp filer can use several protocols, the main being NFS –Compared to FC throughput is limited by Gbps Ethernet, trunking or use of 10 GigE also possible Overall different solution from SAN and iSCSI –The filer contains a server with CPU and OS –In particular the proprietary WAFL filesystem is capable of creating read-only snapshots –Proprietary Data ONTAP OS runs on the filer box –Higher cost due to high-end features
33
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 33 The quest for ultimate latency reduction Solid state disks provide unique specs –Seek time are at least one order of magnitude better than best HDs –A single disk can provide >10k random read IOPS –High read throughput
34
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 34 SSD (flash) problems Flash based SSD still suffer from major problems for enterprise solutions –Cost/GB: more than 10 times vs. ‘normal HDs’ –Small capacity compared to HDs –They have several issues with write performance –Limited number of erase cycles –Need to write entire cells (issue for transactional activities) –Some workarounds for write performance and cell lifetime improvements are being implemented, different quality from different vendors and grade –A field in rapid evolution
35
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 35 Conclusions Storage technologies are in a very interesting evolution phase On one side ‘old-fashioned storage technologies’ give more capacity and performance for a lower price every year –currently used for production by physics DB services (offline and online) and Tier1s New ideas and implementations are emerging for scaling out very large data sets without RAID –Google file system, Amazon S3, SUN’s ZFS –Oracle’s ASM (which is in production at CERN and Tier1s) 10 GigE Ethernet and SSD are new players in the storage game with high potential –10 GigE iSCSI tests with FIO and CS are very promising
36
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Storage for Data Management and Physics Databases, Luca Canali - 36 Acknowledgments Many thanks to Dawid, Jacek, Maria Andras, Andreas, Helge, Tim Bell, Bernd Eric, Nilo
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.