Ken Cantrell, NetApp Mark Rogov, EMC David Fair, SNIA ESF Chair, Intel March 8, 2016
SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.
About The Speakers Mark Rogov EMC Advisory Systems Dr. David Fair SNIA ESF Chair & Intel Ethernet Networking Marketing Manager Ken Cantrell NetApp Manager Perf
Storage Performance Benchmarking METRICS AND TERMINOLOGY FILE COMPONENTS WORKLOAD DEFINITIONS TODAY FUTURE WEBCASTSJULY 30, 2015 SOLUTION UNDER TEST R/WR/WINTROTECHRAIDFUNEND BLOCK COMPONENTS OCT 20, 2015
Session 1 – Terminology and Context GRAPH FUN CONTEXT MAKES METRICS MATTER CONTEXT MAKES METRICS MATTER OPS COUNT EVERY PROTOCOL OPERATION PER SECOND MB/S PAYLOAD SUM OF EVERY OPERATION PER SECOND TERMINOLOGY IOPS COUNT EVERY IO OPERATION PER SECOND RESPONSE TIME TIME TARGET TAKES TO REPLY TO AN IO R/WR/WINTROTECHRAIDFUNEND
Session 2 – The Slowest Component Matters Most DISK BOUND CLIENT BOUND SLOW COMPONENT MATTERS MOST BOTTLENECKS ALWAYS EXIST 3 PERFORMANCE PRINCIPLES I NCREASE P ARALLELISM D O L ESS W ORK D O W ORK F ASTER R/WR/WINTROTECHRAIDFUNEND
S PLIT E QUALLY Enterprise Storage Capacity Shipped In 3Q’ EXABYTES W ORLD P OPULATION 4.5 G IGABYTES PER INDIVIDUAL METRICS AND TERMINOLOGY 1541 COPIES OF OUR FIRST WEBCAST P OWER P OINT OR R/WR/WINTROTECHRAIDFUNEND
Eventually, All Data Goes To Block Storage BLOCK STORAGESOLUTIONS UNDER TEST HYPERVISOR VMVM VMVM VMVM VMVM VMVM VMVM VMVM VMVM WORKLOADS R/WR/WINTROTECHRAIDFUNEND
Agenda R EADING, W RITING ; W HAT IS THE D IFFERENCE ? H OW DOES THIS TECH WORK ANYWAY ? W HAT IF YOU NEED MORE THAN ONE ? P ERFORMANCE ? S UMMARY I NTRODUCTION END FUN RAID TECH INTRO
Let’s Take A Drive… And Test It! R/WR/WINTROTECHRAIDFUNEND IOPS
Detour! What Does “Random” Mean? A QUICK BROWN FOX JUMPED OVER A LAZY DOG KEYS ARE ALL OVER R/WR/WINTROTECHRAIDFUNEND I MAGINE THAT THE K EYBOARD IS A DISK DRIVE A QU I C K
What Does “Sequential” Mean? EVERY KEY IS NEXT TO PREVIOUS R/WR/WINTROTECHRAIDFUNEND I MAGINE THAT THE K EYBOARD IS A DISK DRIVE
“Sequential Read” Example “SEQUENTIAL READ” R/WR/WINTROTECHRAIDFUNEND
Let’s Take A Drive… And Test It! R/WR/WINTROTECHRAIDFUNEND IOPS
Let’s Take Two Drives… And Test Them! R/WR/WINTROTECHRAIDFUNEND IOPS HDD
And Add More SSDs R/WR/WINTROTECHRAIDFUNEND SINGLE DRIVE IOPS HDD
Agenda R EADING, W RITING ; W HAT IS THE D IFFERENCE ? H OW DOES THIS TECH WORK ANYWAY ? W HAT IF YOU NEED MORE THAN ONE ? P ERFORMANCE ? S UMMARY I NTRODUCTION END FUN RAID R/W INTRO
How Does This Tech Work? F LASH HDD OR D ISK D RIVE R/WR/WINTROTECHRAIDFUNEND
Spinning Drives And Sectors SPIN SEEK READ WRITE R/WR/WINTROTECHRAIDFUNEND
Flash And NAND Gates WRITE 0 W RITE INPUT WRITE 1 W RITE INPUT E RASE INPUT E RASE INPUT C OMMON INPUT C OMMON INPUT R/WR/WINTROTECHRAIDFUNEND
P AGE Flash Construction 2KiB 4KiB 8KiB … F LASH BLOCK W RITE —1 PAGE AT A TIME P AGES PER B LOCK ( DEP ON MODEL ) 128, 256, …, E TC. B LOCK F LASH D EVICE NUMBER OF BLOCKS DEFINES C APACITY R EDIRECT O N O VER -W RITE M OST F LASH R/WR/WINTROTECHRAIDFUNEND 0 X 00 0 X 01 0 X 02 0 X AA L OGICAL T O P HYSICAL R EDIRECTION M AP
Garbage Collection B LOCK F LASH D EVICE NUMBER OF BLOCKS DEFINES C APACITY G ARBAGE C OLLECTION R/WR/WINTROTECHRAIDFUNEND 0 X 00 0 X 01 0 X 02 0 X AA L OGICAL T O P HYSICAL R EDIRECTION M AP B LOCK E RASE
Sequential Vs. Random SPIN SEEK SSD OR F LASH HDD OR D ISK D RIVE W RITE R EAD EVERYTHING IS RANDOM IO FOR F LASH E RASE + W RITE R EAD W RITE R EAD S EQUENTIAL W RITE R EAD R ANDOM S EEK /S PIN + W RITE S EEK /S PIN + R EAD SLOWER PERFORMANCE R/WR/WINTROTECHRAIDFUNEND
Agenda R EADING, W RITING ; W HAT IS THE D IFFERENCE ? H OW DOES THIS TECH WORK ANYWAY ? W HAT IF YOU NEED MORE THAN ONE ? P ERFORMANCE ? S UMMARY I NTRODUCTION END FUN TECH R/W INTRO
Just One? PROTECTION CAPACITY R/WR/WINTROTECHRAIDFUNEND
RAID—Redundant Array Of Inexpensive Disks CLIENTS / HOSTS STORAGE CONTROLLER PHYSICAL STORAGE FRONT - END CONNECT BACK - END CONNECT RAID R/WR/WINTROTECHRAIDFUNEND
FRONT ENDBACK END CLIENTS / HOSTS PHYSICAL STORAGE VIRTUAL / LOGICAL R/WR/WINTROTECHRAIDFUNEND PROTECTION CAPACITY 100% NONE 100% RAID-0 (Striping Without Parity)
RAID-1 (Mirroring) FRONT ENDBACK END CLIENTS / HOSTS PHYSICAL STORAGE VIRTUAL / LOGICAL R/WR/WINTROTECHRAIDFUNEND PROTECTION CAPACITY 50% 1 D RIVE 100% 50%
RAID-3, -4, -5 [-6, -DP]* Striping With Parity* FRONT ENDBACK END PHYSICAL STORAGE VIRTUAL / LOGICAL P* * RAID-6/-DP requires more than one parity N IS NUMBER OF DATA DRIVES P IS NUMBER OF PARITY DRIVES R/WR/WINTROTECHRAIDFUNEND PROTECTION CAPACITY N P N “I T D EPENDS ”
RAID Partial Writes All Single Parity RAID: RAID-3, -4, -5, and etc. R/WR/WINTROTECHRAIDFUNEND CLIENTS / HOSTS VIRTUAL / LOGICAL 100 R ANDOM 70R/30W IO S FRONT ENDBACK END 100 IO S 70R/30W = 70 R EAD + 30 W RITE IO S B ACKEND = (70R + 30 * (2W + 2R)) = 190 IO S 190 D ISK IO S S INGLE P ARTIAL W RITE : R EAD O LD D ATA R EAD O LD P ARITY C ALCULATE N EW P ARITY W RITE N EW D ATA W RITE N EW P ARITY CLIENTS / HOSTS VIRTUAL / LOGICAL FRONT ENDBACK END WRITE NEW DATA OLD DATA OLD PARITY NEW PARITY NEW DATA 2 R EADS 2 W RITES 2 READS 2 WRITES 1 READ RAID P ENALTY
RAID Implementation CLIENTS / HOSTS STORAGE CONTROLLER PHYSICAL STORAGE FRONT - END CONNECT BACK - END CONNECT RAID R/WR/WINTROTECHRAIDFUNEND
Erasure Coding Implementation CLIENTS / HOSTS STORAGE CONTROLLERS PHYSICAL STORAGE FRONT - END CONNECT BACK - END CONNECT SCALE OUT ERASURE CODING R/WR/WINTROTECHRAIDFUNEND
Erasure Coding N + M = FRONT ENDBACK END PHYSICAL STORAGE P22202 P1 R/WR/WINTROTECHRAIDFUNEND PROTECTION CAPACITY N M “ DIFFERENT ” N IS NUMBER OF DATA BLOCKS M IS NUMBER OF PROTECTION BLOCKS
Agenda R EADING, W RITING ; W HAT IS THE D IFFERENCE ? H OW DOES THIS TECH WORK ANYWAY ? W HAT IF YOU NEED MORE THAN ONE ? P ERFORMANCE ? S UMMARY I NTRODUCTION END RAID TECH R/W INTRO
What “Really” Happens With RAID-5? R/WR/WINTROTECHRAIDFUNEND IOPS
What “Really” Happens With RAID-5? R/WR/WINTROTECHRAIDFUNEND IOPS
What “Really” Happens With RAID-5? R/WR/WINTROTECHRAIDFUNEND IOPS
What “Really” Happens With RAID-5? R/WR/WINTROTECHRAIDFUNEND IOPS
What “Really” Happens With RAID-5? R/WR/WINTROTECHRAIDFUNEND IOPS 318
What “Really” Happens With RAID-5? R/WR/WINTROTECHRAIDFUNEND IOPS
What “Really” Happens With RAID-5? R/WR/WINTROTECHRAIDFUNEND MiB/s
What “Really” Happens With RAID-5? R/WR/WINTROTECHRAIDFUNEND MB/s
Flash In The Real World CLIENTS / HOSTS STORAGE CONTROLLER PHYSICAL STORAGE FRONT - END CONNECT BACK - END CONNECT MB/s R EQUIRES 10 X 6Gb/ S SAS 5685 MB/ S R EQUIRES 6 X PCI E 3.0 L ANES R EQUIRES 4 X 16Gb FC H OW MANY HOSTS ? R/WR/WINTROTECHRAIDFUNEND ??
Flash In The Real World CLIENTS / HOSTS STORAGE CONTROLLER PHYSICAL STORAGE FRONT - END CONNECT BACK - END CONNECT MB/s R EQUIRES 10 X 6G B / S SAS 5685 MB/ S R EQUIRES 6 X PCI E 3.0 L ANES R EQUIRES 4 X 16G B FC H OW MANY HOSTS ? R/WR/WINTROTECHRAIDFUNEND N EW T ECH
Flash In The Real World CLIENTS / HOSTS STORAGE CONTROLLER PHYSICAL STORAGE FRONT - END CONNECT BACK - END CONNECT MB/s R EQUIRES 10 X 6G B / S SAS 5685 MB/ S R EQUIRES 6 X PCI E 3.0 L ANES R EQUIRES 4 X 16G B FC H OW MANY HOSTS ? R/WR/WINTROTECHRAIDFUNEND BLOCK IS THE FOUNDATION OF S TORAGE P ERFORMANCE BLOCK IS THE FOUNDATION OF S TORAGE P ERFORMANCE
Agenda R EADING, W RITING ; W HAT IS THE D IFFERENCE ? H OW DOES THIS TECH WORK ANYWAY ? W HAT IF YOU NEED MORE THAN ONE ? P ERFORMANCE ? S UMMARY I NTRODUCTION FUN RAID TECH R/W INTRO
Storage Performance Benchmarking METRICS AND TERMINOLOGY FILE COMPONENTS WORKLOAD DEFINITIONS TODAY FUTURE WEBCASTSJULY 30, 2015 SOLUTION UNDER TEST BLOCK COMPONENTS OCT 20, 2015 R/WR/WINTROTECHRAIDFUNEND
After This Webcast A PDF and a PPT of the slides for this and all previous parts of this Webcast series will be posted to the SNIA Ethernet Storage Forum (ESF) website and available on-demand PPT and PDF: Presentation Recording: A full Q&A from this webcast, including answers to questions we couldn't get to today, will be posted to the SNIA-ESF blog Follow us Next Webcast – Second Half of 2016 “Storage Performance Benchmarking: Part 4”
Appendix – Additional Reading
Appendix – More Reading SNIA S3 TWG Guide to SSD Performance: SNIA S3 TWG SSD Performance Primer, 2013: APrimer2013.pdfhttp:// APrimer2013.pdf Benchmarking methods for randomly sampling from a file, and why random seeks can (usually) hurt performance: Excellent hard drive overview: SSD Performance results: SSD Performance results: Intel Performance Benchmarking for PCIe* and NVMe* Enterprise Solid-State Drives: ssds-white-paper.pdf ssds-white-paper.pdf SSD M.2 Interface: next-ssd/ next-ssd/ More complete SSD interface article, covering NVMe, U.2 and M.2: nvme/ nvme/ SSD vs HDD performance characteristics: 3.htmlhttp:// 3.html
RAID RAID Perf Calculator: RAID Reliability Calculator: RAID Failure Calculator: RAID Survival Rate Simulation: about-raid-survival-rates/ about-raid-survival-rates/