Operated by Los Alamos National Security, LLC for NNSA U N C L A S S I F I E D Slide 1 Data Release James Nunez ( High Performance Computing Division Los Alamos National Lab August 10, 2011
Operated by Los Alamos National Security, LLC for NNSA U N C L A S S I F I E D Slide 2 Measurement and Understanding ScalaTrace Darshan – Petascale I/O Characterization Tool
Operated by Los Alamos National Security, LLC for NNSA U N C L A S S I F I E D Slide 3 Links to Existing Available LANL Data Machine Failure/Usage/Event/Location & Disk Failure Data Sets Traces of MPI-IO Based Synthetic MPI-IO based synthetic & MPI-File Tree Walk File Systems Statistics Survey (fsstats) Code File Systems Statistics Survey (fsstats) Results Los Alamos, Pacific Northwest, Oakridge National Lab and other production file system results at LANL Workstation Data at USENIX Computer Failure Data Repository
Operated by Los Alamos National Security, LLC for NNSA U N C L A S S I F I E D Slide 4 Planned Data Release – Archive Data Archive and file system listing information data HPSS, GPFS-archive and NFS space An entry in the data file looks like: drwx /1/1/3/2 drwx /1/1/3/2/5 -rw-rw /1/1/165/1611/24/3212/2120 The format of each entry is: MODE USER_ID GROUP_ID FILE_SIZE MODIFICATION_TIME CREATION_TIME BLOCKSIZE PATH
Operated by Los Alamos National Security, LLC for NNSA U N C L A S S I F I E D Slide 5 Planned Data Release – Supercomputer Previous Nine years of computer operational failure data, over 23,000 records for several thousand machines Several million usage records (job size, processors/machines used, duration, time, etc.) Machine Layout Information (Building, room, Rack location in room, node location in rack, hot/cold rows, etc.) Refresh of Failure Data and Machine Layout from 2006 to present Includes old machines and some new ones
Operated by Los Alamos National Security, LLC for NNSA U N C L A S S I F I E D Slide 6 Machine Information
Operated by Los Alamos National Security, LLC for NNSA U N C L A S S I F I E D Slide 7 Machine Layout Information Anonymous Information Machine “name” Location - building, room RackPosition Position in Rack, Direction Row facing Hot/Cold Row N#,1 to 26,28 to 35,"1 to 37, top to bottom", 1,23,28,1,rear to N/Hot 2,23,28,2,rear to N/Hot 3,23,28,3,rear to N/Hot 4,23,28,4,rear to N/Hot 5,23,28,5,rear to N/Hot 6,23,28,6,rear to N/Hot 7,23,28,7,rear to N/Hot 8,23,28,8,rear to N/Hot … 159,23,34,19,rear to N/Hot 160,23,34,20,rear to N/Hot 161,23,34,21,rear to N/Hot 162,23,34,22,rear to N/Hot 163,23,34,23,rear to N/Hot 164,23,34,24,rear to N/Hot