IBM ATS Deep Computing © 2007 IBM Corporation High Performance IO HPC Workshop – University of Kentucky May 9, 2007 – May 10, 2007 Andrew Komornicki, Ph.

Slides:



Advertisements
Similar presentations
ECMWF 1 COM HPCF 2004: High performance file I/O High performance file I/O Computer User Training Course 2004 Carsten Maaß User Support.
Advertisements

3.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Process An operating system executes a variety of programs: Batch system.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
04/14/2008CSCI 315 Operating Systems Design1 I/O Systems Notice: The slides for this lecture have been largely based on those accompanying the textbook.
OS2-1 Chapter 2 Computer System Structures. OS2-2 Outlines Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Processes CSCI 444/544 Operating Systems Fall 2008.
Wednesday, June 07, 2006 “Unix is user friendly … it’s just picky about it’s friends”. - Anonymous.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
Abhinav Kamra Computer Science, Columbia University 2.1 Operating System Concepts Silberschatz, Galvin and Gagne  2002 Chapter 2: Computer-System Structures.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 Comparison of Communication and I/O of the Cray T3E and IBM SP Jonathan Carter NERSC User.
Chapter 2: Computer-System Structures
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong.
Disk and I/O Management
General System Architecture and I/O.  I/O devices and the CPU can execute concurrently.  Each device controller is in charge of a particular device.
Lecture 11: DMBS Internals
Discussion Week 8 TA: Kyle Dewey. Overview Exams Interrupt priority Direct memory access (DMA) Different kinds of I/O calls Caching What I/O looks like.
ITEC 502 컴퓨터 시스템 및 실습 Chapter 8-2: I/O Management (Review) Mi-Jung Choi DPNM Lab. Dept. of CSE, POSTECH.
CHAPTER 2: COMPUTER-SYSTEM STRUCTURES Computer system operation Computer system operation I/O structure I/O structure Storage structure Storage structure.
Silberschatz, Galvin, and Gagne  Applied Operating System Concepts Module 2: Computer-System Structures Computer System Operation I/O Structure.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
Chapter 2: Computer-System Structures
1 CSE Department MAITSandeep Tayal Computer-System Structures Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection.
Analysis of the ROOT Persistence I/O Memory Footprint in LHCb Ivan Valenčík Supervisor Markus Frank 19 th September 2012.
The Vesta Parallel File System Peter F. Corbett Dror G. Feithlson.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem.
1 CS.217 Operating System By Ajarn..Sutapart Sappajak,METC,MSIT Chapter 2 Computer-System Structures Slide 1 Chapter 2 Computer-System Structures.
Silberschatz, Galvin and Gagne  Applied Operating System Concepts Chapter 2: Computer-System Structures Computer System Architecture and Operation.
Silberschatz, Galvin and Gagne  Operating System Concepts Six Step Process to Perform DMA Transfer.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Review of Computer System Organization. Computer Startup For a computer to start running when it is first powered up, it needs to execute an initial program.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
Fall 2000M.B. Ibáñez Lecture 26 I/O Systems II. Fall 2000M.B. Ibáñez Application I/O Interface I/O system calls encapsulate device behaviors in generic.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
Introduction to Operating Systems Concepts
Internal Parallelism of Flash Memory-Based Solid-State Drives
Module 12: I/O Systems I/O hardware Application I/O Interface
Chapter 2: Computer-System Structures(Hardware)
Chapter 2: Computer-System Structures
Memory COMPUTER ARCHITECTURE
Lecture 16: Data Storage Wednesday, November 6, 2006.
Chapter 9: Virtual Memory
Threads CSSE 332 Operating Systems Rose-Hulman Institute of Technology
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage.
Lecture 45 Syed Mansoor Sarwar
Lecture 11: DMBS Internals
CSCI 315 Operating Systems Design
Page Replacement.
Module 2: Computer-System Structures
I/O Systems I/O Hardware Application I/O Interface
Operating System Concepts
CS703 - Advanced Operating Systems
Module 2: Computer-System Structures
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 2: Computer-System Structures
Chapter 2: Computer-System Structures
Module 2: Computer-System Structures
Module 2: Computer-System Structures
Chapter 1: Introduction CSS503 Systems Programming
Module 12: I/O Systems I/O hardwared Application I/O Interface
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

IBM ATS Deep Computing © 2007 IBM Corporation High Performance IO HPC Workshop – University of Kentucky May 9, 2007 – May 10, 2007 Andrew Komornicki, Ph. D. Balaji Veeraraghavan, Ph. D.

IBM ATS Deep Computing © 2007 IBM Corporation Agenda  Introduction  General IO performance  Results of some small tests.  Modular IO libraries, Linux and AIX

IBM ATS Deep Computing © 2007 IBM Corporation I/O Optimization  Analyze the IO pattern  Determine optimization method  Optimize in user space  Minimize source code changes  Possibly relink with libtkio.so

IBM ATS Deep Computing © 2007 IBM Corporation General I/O Performance  C:  Do not use fopen(), fread(), or fwrite(); These are inefficient due to small (4KB) IO blocks and extra memory copies.  Use instead: POSIX open(), read(), write() Direct (raw) IO will eliminate an additional memory copy  FORTRAN:  Use unformatted IO

IBM ATS Deep Computing © 2007 IBM Corporation Asynchronous IO, an example  Non Blocking IO  aio_read(), aio_write(), aio_return();  Completion Notification  Polling with aio_error();  Block until complete with aio_suspend():  Cancellation of IO requests  aio_cancel();  Large File enabled  Removes the 2GB file size limitation  POSIX conforming

IBM ATS Deep Computing © 2007 IBM Corporation Results of Bonnie IO test  Run on Blade system in San Mateo Lab  System Memory, 5 Gbytes  File systems, ext2, and ext3  All tests done in four stages: Writing with putc()...done Rewriting...done Writing intelligently...done Reading with getc()...done Reading intelligently... done

IBM ATS Deep Computing © 2007 IBM Corporation Results of Bonnie IO test, Block IO performance Size (MB) Write (Kbytes/sec) Read(Kbytes/sec) __________________________________________ ,524 2,233, ,237 1,658, ,599 50, ,656 50,677

IBM ATS Deep Computing © 2007 IBM Corporation Results of Bonnie IO test Results for ext2 file system, time in seconds Size (MB) User System Elapsed _________________________________

IBM ATS Deep Computing © 2007 IBM Corporation Results of Bonnie IO test Results for ext3 file system, time in seconds Size (MB) User System Elapsed ______________________________________

IBM ATS Deep Computing © 2007 IBM Corporation Modular I/O (MIO) Familiar and flexible runtime interface  MIO modules  mio  trace  pf  MIO available on both Linux and AIX

IBM ATS Deep Computing © 2007 IBM Corporation MIO user code interface  open MIO_open  read MIO_read  writeMIO_write  closeMIO_close  lseekMIO_lseek  fcntl MIO_fcntl  ftruncate MIO_ftruncate

IBM ATS Deep Computing © 2007 IBM Corporation MIO run time interface  MIO_STATS="file name"  MIO_FILES=" *.dat* [trace|pf ] *.inp [aix]"  MIO_DEBUG="ALL"  MIO_DEFAULTS="trace/mbytes, pf/cache=10m“

IBM ATS Deep Computing © 2007 IBM Corporation trace module  summary of file activity  binary events file  low cpu overhead  typical options  /stats  /mbytes /gbytes /tbytes  /events=mio.evt

IBM ATS Deep Computing © 2007 IBM Corporation pf module  User selectable cache size  User selectable page size  User selectable prefetch depth  Direct or system buffered IO  Global or private cache  Usage summary

IBM ATS Deep Computing © 2007 IBM Corporation pf module  detects sequential I/O  user memory buffering  options  /global  /cache_size=10m  /page_size=1m  /prefetch=1  /stride=1  /direct  /stats

IBM ATS Deep Computing © 2007 IBM Corporation Relink with libtkio.a  libtkio.a has shared object members  tkio.so 32 bit and 64 bit  Entry points for open,open64,close,read,write,lseek,lseek64 fcntl,ffinfo,fstat,fstat64,fstatfs,fsync ftruncate,ftruncate64 unlink,aio_...

IBM ATS Deep Computing © 2007 IBM Corporation Default tkio behavior  Uses dlopen and dlsym for runtime linking tkio entrycalls open64libc(shr.o) open64 closelibc(shr.o) close readlibc(shr.o) read writelibc(shr.o) write lseek64libc(shr.o) lseek64 fsynclibc(shr.o) fsync ……

IBM ATS Deep Computing © 2007 IBM Corporation tkio runtime interface  setenv TKIO_ALTLIB so_name/print/abort  export TKIO_ALTLIB=so_name/print/abort  so_name is name of shared library Either name.so or libname.a(name.so)  tkio calls function in so_name that returns a structure filled with I/O entry points to replace default entry points  /print option outputs a print to stderr indicating success of load  /abort issues exit(-1) if load is not successfull

IBM ATS Deep Computing © 2007 IBM Corporation tkio using MIO  setenv TKIO_ALTLIB get_mio_ptrs_64.so tkio entryCalls Open64libmio(mio.o) MIO_open64 Closelibmio(mio.o) MIO_close Readlibmio(mio.o) MIO_read Writelibmio(mio.o) MIO_write Lseek64libmio(mio.o) MIO_lseek64 Fsynclibmio(mio.o) MIO_fsync …

IBM ATS Deep Computing © 2007 IBM Corporation kernel Application libc libtkio Fortran I/O Demonstration only open64 write read lseek6 4 close ->open64 ->write ->read ->lseek64 ->close stdio fopen frwrite fread fclose libmio ->MIO_open64 ->MIO_write ->MIO_read ->MIO_lseek64 ->MIO_close X

IBM ATS Deep Computing © 2007 IBM Corporation kernel libc libtkio open64 write read lseek6 4 close ->open64 ->write ->read ->lseek64 ->close libmio ->MIO_open64 ->MIO_write ->MIO_read ->MIO_lseek64 ->MIO_close trace pf aix

IBM ATS Deep Computing © 2007 IBM Corporation System buffered Data Movement user space kernel 256k b system buffers MIO space

pf cached Data Movement user space kernel 256k b 5 x 2mb system buffers MIO space

O_DIRECT Data Movement user space kernel O_DIREC T 256k b 5 x 2mb system buffers MIO space

Asynchronous Data Movement user space kernel O_DIREC T 256k b 5 x 2mb system buffers MIO space

IBM ATS Deep Computing © 2007 IBM Corporation Trace close : program pf : /bmwfs/cdh108.T20536_13.SCR300 : (281946/ )= mbytes/s current size=0 max_size=16277 mode =0777 sector size=4096 oflags =0x302=RDWR CREAT TRUNC open write read seek fcntl trunc close size MSC.NASTRAN trace output from program pf Min/Max Request size in bytes Mbytes requested and Mbytes delivered Number of occurances

IBM ATS Deep Computing © 2007 IBM Corporation Trace close : pf aix : /bmwfs/cdh108.T20536_13.SCR300 : (276645/ )= mbytes/s current size=0 max_size=16276 mode =0777 sector size=4096 oflags =0x =RDWR CREAT TRUNC DIRECT open write awrite suspend mbytes/s read aread suspend mbytes/s seek fcntl trunc close size pages MSC.NASTRAN trace output

IBM ATS Deep Computing © 2007 IBM Corporation pf close for /bmwfs/cdh108.T20536_13.SCR300 global cache 0: 150 pages of bytes 29739/29749 pages not preread for write / prefetches : prefetch= write behinds writes reads page writes 37772/33124 mbytes transferred program --> > pf --> > aix program < <-- pf < <-- aix MSC.NASTRAN pf output

IBM ATS Deep Computing © 2007 IBM Corporation time ( seconds ) file position ( bytes ) DataView file activity plot

IBM ATS Deep Computing © 2007 IBM Corporation time ( seconds ) file position ( bytes ) DataView file activity plot

IBM ATS Deep Computing © 2007 IBM Corporation time ( seconds ) file position ( bytes ) suspend time hidden time queuing time Asynchronous I/O plotting

IBM ATS Deep Computing © 2007 IBM Corporation time ( seconds ) file position ( bytes ) cache page activity

IBM ATS Deep Computing © 2007 IBM Corporation MSC.Nastran performance gains 16 cpu 32GB NH2 node 2.2M dof, 767GB I/O, 8 copies 2GB memory per copy 114MB/sec 198MB/sec 8 SSA, 16 loops, 4 disk/loop

IBM ATS Deep Computing © 2007 IBM Corporation MIO Summary  Demonstrated performance gains  Simple to implement  Flexible run time interface  Delivered as a shared object library  Contact: