Lustre WAN @ 100GBit Testbed Michael Kluge michael.kluge@tu-dresden.de Robert Henschel, Stephen Simms {henschel,ssimms}@indiana.edu.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advanced Piloting Cruise Plot.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.
and 6.855J Spanning Tree Algorithms. 2 The Greedy Algorithm in Action
We need a common denominator to add these fractions.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Multiplication Facts Review. 6 x 4 = 24 5 x 5 = 25.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
Around the World AdditionSubtraction MultiplicationDivision AdditionSubtraction MultiplicationDivision.
1. 2 Objectives Become familiar with the purpose and features of Epsilen Learn to navigate the Epsilen environment Develop a professional ePortfolio on.
Re-examining Instruction Reuse in Pre-execution Approaches By Sonya R. Wolff Prof. Ronald D. Barnes June 5, 2011.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Chapter 1: Introduction to Scaling Networks
ABC Technology Project
1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.
VOORBLAD.
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
“Start-to-End” Simulations Imaging of Single Molecules at the European XFEL Igor Zagorodnov S2E Meeting DESY 10. February 2014.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
© 2012 National Heart Foundation of Australia. Slide 2.
Adding Up In Chunks.
Universität Kaiserslautern Institut für Technologie und Arbeit / Institute of Technology and Work 1 Q16) Willingness to participate in a follow-up case.
Sets Sets © 2005 Richard A. Medeiros next Patterns.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Addition 1’s to 20.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Subtraction: Adding UP
Equal or Not. Equal or Not
Slippery Slope
Januar MDMDFSSMDMDFSSS
Week 1.
Analyzing Genes and Genomes
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Energy Generation in Mitochondria and Chlorplasts
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
CpSc 3220 Designing a Database
Presentation transcript:

Lustre WAN @ 100GBit Testbed Michael Kluge michael.kluge@tu-dresden.de Robert Henschel, Stephen Simms {henschel,ssimms}@indiana.edu

Overall Testbed Setup and Hardware Content Overall Testbed Setup and Hardware Sub-Project 2 – Parallel File Systems Peak Bandwidth Setup LNET – Tests Details of Small File I/O in the WAN Michael Kluge

Hardware Setup (1) Michael Kluge 100GbE 100GbE 100GBit/s Lambda 60km Dark Fiber 17*10GbE 17*10GbE 16*8 Gbit/s FC 16*8 Gbit/s FC 5*40 Gbit/s QDR IB 16*20 Gbit/s DDR IB Michael Kluge

Hardware Setup (2) – File System View 32 Nodes 1 Subnet 100GbE 16*8 Gbit/s 16*8 Gbit/s 5*40 Gbit/s QDR IB 16*20 Gbit/s DDR IB Michael Kluge

Hardware-Setup (2) – Bandbreiten unidirektional 12,5 GB/s 100GbE 12,8 GB/s 12,8 GB/s 16*8 Gbit/s 16*8 Gbit/s 5*40 Gbit/s QDR IB 16*20 Gbit/s DDR IB 32,0 GB/s 20,0 GB/s Michael Kluge

Hardware Setup (3) 12,5 GB/s 12,5 GB/s Michael Kluge 100GbE 16*8 Gbit/s 16*8 Gbit/s 5*40 Gbit/s QDR IB 16*20 Gbit/s DDR IB Michael Kluge

Hardware Setup (4) – DDN Gear 2 x S2A9900 in Dresden 1 x SFA10000 in Freiberg Michael Kluge

Sub-Project 2 – Wide Area File Systems HPC File Systems are expensive and require some human resources fast access to data is key for an efficient HPC system utilization technology evaluation as regional HPC center install and compare different parallel file systems GPFS Michael Kluge

Scenario to get Peak Bandwidth 100GbE 16*8 Gbit/s 16*8 Gbit/s 5*40 Gbit/s QDR IB 16*20 Gbit/s DDR IB Michael Kluge

Lustre OSS Server Tasks OSS NODE DDR/QDR IB LNET routing From Local Cluster 10 GE From / To Other Site FC-8 OSC OBDFILTER To DDN Storage Michael Kluge

two distinct Lustre networks, one for metadata, one for file content Lustre LNET Setup idea&picture by Eric Barton two distinct Lustre networks, one for metadata, one for file content Michael Kluge

IOR Setup for Maximum Bandwidth 24 clients on each site 24 processes per client stripe size 1, 1 MiB block size Direct I/O 21,9 GB/s 100GbE Writing to Freiberg 10,8 GB/S Writing to Dresden 11,1 GB/S 16*8 Gbit/s 16*8 Gbit/s 5*40 Gbit/s QDR IB 16*20 Gbit/s DDR IB Michael Kluge

IU has been running Lustre over the WAN LNET Self Test IU has been running Lustre over the WAN As production service since Spring 2008 Variable performance on production networks Interested in how LNET scales over distance Isolate the network performance Eliminates variable client and server performance Simulated latency in a clean environment Used NetEM kernel module to vary latency Not optimized for multiple streams Future work will use hardware for varying latency 100Gb link provided clean 400KM to test Michael Kluge

Single Client LNET performance LNET measurements from one node Concurrency (RPCs in flight) from 1 to 32 Michael Kluge

LNET Self Test at 100Gb With 12 writers and 12 readers with a ratio of 1:1 we were able to achieve 11.007 Gbyte (88.05%) Using 12 writers and 16 readers with a ratio of 12:16 we were able to achieve 12.049 Gbyte (96.39%) Michael Kluge

IOZONE Setup to evaluate small I/O 100GbE 200km,400km 16*8 Gbit/s 1 Client in Dresden vs. 1 Client in Freiberg (200km and 400km) small file IOZONE Benchmark, one S2A 9900 in Dresden how „interactive“ can a 100GbE link be at this distance Michael Kluge

Evaluation of Small File I/O (1) – Measurements 200km Michael Kluge

Evaluation of Small File I/O (2) – Measurements 400km measurement error Michael Kluge

Evaluation of Small File I/O (3) - Model observations up to 1 MB all graphs look the same each stripe takes a penalty that is close to the latency each additional MB on each stripe takes an additional penalty possible model parameters latency, #rpcs (through file size), penalty per stripe, penalty per MB, memory bandwidth, network bandwidth best model up to now has only two components: stripe penalty per stripe, where the penalty time is slightly above the latency penalty per MB, where penalty time is the inverse of the clients‘ network bandwidth what can be concluded from that: client contacts OSS servers for each stripe in a sequential fashion – this is really bad for WAN file systems allthough client cache is enabled, it returns from the I/O call after the RPC is on the wire (and not after the data is inside the kernel) – is this really necessary? Michael Kluge

Lustre can make use of the available bandwidth Conclusion stable test equipment Lustre can make use of the available bandwidth reused ZIH monitoring infrastructure short tests cycles through DDN port monitor program traces with IOR events, ALU router events, DDN, etc. FhGFS reached 22,58 GB/s bidirectional, 12,4 GB/s unidirectional Michael Kluge

Fragen? GPFS Michael Kluge

Performance Analyse (1) – DDN Monitor Michael Kluge

IOZONE Resultate für 8GB-Dateien und verschiedene Blockgrößen 1 Client in Dresden mittelgroße Dateien mit dem IOZONE Benchmark nur eine S2A 9900 in Dresden verwendet Michael Kluge

IOZONE Resultate für 16K-Dateien und verschiedene Blockgrößen (Freiberg) 1 Client in Freiberg Schreiben einer Datei mit variierender Dateigröße zwischen 64KiB und 512 MiB und verschiedenem Striping zwischen 1 und 24, feste Blockgröße von 64KB Michael Kluge

Performance Analyse (2) – Infrastruktur-Instrumentierung Michael Kluge

Performance Analyse (3) – Ein Port schneller Michael Kluge

Performance Analyse (3) – Media Error on SATA Disk Michael Kluge

Resultat IOR Unidirektional Reading from Dresden 11,8 GB/s 100GbE 16*8 Gbit/s 16*8 Gbit/s 5*40 Gbit/s QDR IB 16*20 Gbit/s DDR IB Michael Kluge

IOZONE Resultate für 16K-Dateien und verschiedene Blockgrößen 1 Client in Dresden vs. 1 Client in Freiberg kleine Dateien mit dem IOZONE Benchmark nur eine S2A 9900 in Dresden verwendet Michael Kluge

IOZONE Resultate für 8GB-Dateien und verschiedene Blockgrößen 1 Client in Dresden vs. 1 Client in Freiberg mittelgroße Dateien mit dem IOZONE Benchmark nur eine S2A 9900 in Dresden verwendet Michael Kluge