Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551.

Slides:



Advertisements
Similar presentations
K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
Phillip Dickens, Department of Computer Science, University of Maine. In collaboration with Jeremy Logan, Postdoctoral Research Associate, ORNL. Improving.
Systems Analysis, Prototyping and Iteration Systems Analysis.
Dec 5, 2007University of Virginia1 Efficient Dynamic Tainting using Multiple Cores Yan Huang University of Virginia Dec
Matt Wolfe LC Development Environment Group Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
1 DB2 Access Recording Services Auditing DB2 on z/OS with “DBARS” A product developed by Software Product Research.
Pipelined Profiling and Analysis on Multi-core Systems Qin Zhao Ioana Cutcutache Weng-Fai Wong PiPA.
Chapter 19: Network Management Business Data Communications, 4e.
1 Lecture 6 Performance Measurement and Improvement.
Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/15 EICE team Model-Level Debugging of Embedded Real-Time Systems Wolfgang Haberl, Markus.
VisIt Software Engineering Infrastructure and Release Process LLNL-PRES Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Accurate and Efficient Replaying of File System Traces Nikolai Joukov, TimothyWong, and Erez Zadok Stony Brook University (FAST 2005) USENIX Conference.
Portability CPSC 315 – Programming Studio Spring 2008 Material from The Practice of Programming, by Pike and Kernighan.
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
Regression testing Tor Stållhane. What is regression testing – 1 Regression testing is testing done to check that a system update does not re- introduce.
Introduction and simple using of Oracle Logistics Information System Yaxian Yao
Yavor Todorov. Introduction How it works OS level checkpointing Application level checkpointing CPR for parallel programing CPR functionality References.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Blaise Barney, LLNL ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Bottlenecks: Automated Design Configuration Evaluation and Tune.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Evaluating Impact of Storage on Smartphone Energy Efficiency David T. Nguyen.
August 01, 2008 Performance Modeling John Meisenbacher, MasterCard Worldwide.
ANDROID Presented By Mastan Vali.SK. © artesis 2008 | 2 1. Introduction 2. Platform 3. Software development 4. Advantages Main topics.
Trace Generation to Simulate Large Scale Distributed Application Olivier Dalle, Emiio P. ManciniMar. 8th, 2012.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.
Test Loads Andy Wang CIS Computer Systems Performance Analysis.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
IPDPS 2005, slide 1 Automatic Construction and Evaluation of “Performance Skeletons” ( Predicting Performance in an Unpredictable World ) Sukhdeep Sodhi.
Parallelizing Security Checks on Commodity Hardware Ed Nightingale Dan Peek, Peter Chen Jason Flinn Microsoft Research University of Michigan.
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
Blaise Barney, LLNL ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Analysis of the ROOT Persistence I/O Memory Footprint in LHCb Ivan Valenčík Supervisor Markus Frank 19 th September 2012.
DARPA Jul A Binary Agent Technology for COTS Software Integrity Anant Agarwal Richard Schooler InCert Software.
1 Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University.
Matt Wolfe LC Development Environment Group Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA.
Capabilities of Software. Object Linking & Embedding (OLE) OLE allows information to be shared between different programs For example, a spreadsheet created.
Instrumentation of Xen VMs for efficient VM scheduling and capacity planning in hybrid clouds. Kurt Vermeersch Coordinator: Sam Verboven.
ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA This work.
CS 346 – Chapter 2 OS services –OS user interface –System calls –System programs How to make an OS –Implementation –Structure –Virtual machines Commitment.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory ASC STAT Team: Greg Lee, Dong Ahn (LLNL), Dane Gardner (LANL)
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
Contents Introduction Available OSF Solutions for VM UFO Design Implementation Evaluation Discussion Conclusions References.
Virtual Infrastructure By: Andy Chau Farzana Mohsini Anya Mojiri Virginia Nguyen Bobby Phimmasane.
Determina, Inc. Persisting Information Across Application Executions Derek Bruening Determina, Inc.
Click to add text Systems Analysis, Prototyping and Iteration.
Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.
Buffer Overflow Attack- proofing of Code Binaries Ramya Reguramalingam Gopal Gupta Gopal Gupta Department of Computer Science University of Texas at Dallas.
Test Plan: Introduction o Primary focus: developer testing –Implementation phase –Release testing –Maintenance and enhancement o Secondary focus: formal.
The Performance Evaluation Research Center (PERC) Participating Institutions: Argonne Natl. Lab.Univ. of California, San Diego Lawrence Berkeley Natl.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
QEMU, a Fast and Portable Dynamic Translator Fabrice Bellard (affiliation?) CMSC 691 talk by Charles Nicholas.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association SYSTEM ARCHITECTURE GROUP DEPARTMENT OF COMPUTER.
If you have a transaction processing system, John Meisenbacher
Automation Living in a Paper Oriented World and The Steps to Automation.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
“Port Monitor”: progress & open questions Torsten Wilde and James Kohl Oak Ridge National Laboratory CCA Forum Quarterly Meeting Santa Fe, NM ~ October.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
Module 4: Troubleshooting Web Servers. Overview Use IIS 7.0 troubleshooting features to gather troubleshooting information Use the Runtime Control and.
Heat Simulations with COMSOL
Regression testing Tor Stållhane.
Presentation transcript:

Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 John May PSDW08, 17 November 2008 LLNL-PRES

2 Lawrence Livermore National Laboratory I/O benchmarking: What’s going on here?  Is my computer’s I/O system “fast”?  Is the I/O system keeping up with my application?  Is the app using the I/O system effectively?  What tools do I need to answer these questions?  And what exactly do I mean by “I/O system” anyway? For this talk, an I/O system is everything involved in storing data from the filesystem to the storage hardware

3 Lawrence Livermore National Laboratory Existing tools can measure general or application-specific performance  IOzone automatically measures I/O system performance for different operations and parameters  Relatively little ability to customize I/O requests  Many application-oriented benchmarks SWarp, MADbench2…  Interface-specific benchmarks IOR, //TRACE

4 Lawrence Livermore National Laboratory Measuring the performance that matters  System benchmarks only measure general response, not application-specific response  Third-party application-based benchmarks may not generate the stimulus you care about  In-house applications may not be practical benchmarks Difficult for nonexperts to build and run Nonpublic source cannot be distributed to vendors and collaborators  Need benchmarks that… Can be generated and used easily Model application-specific characteristics

5 Lawrence Livermore National Laboratory Script-based benchmarks emulate real apps  Capture trace data from application and generate the same sequence of operations in a replay-benchmark  We began with //TRACE from CMU (Ganger’s group) Records I/O events and intervening “compute” times Focused on parallel I/O, but much of the infrastructure is useful for our sequential I/O work Application Capture library open compute read compute … Replay scriptReplay tool Script-based benchmark

6 Lawrence Livermore National Laboratory Challenges for script-based benchmarking: Recording I/O calls at the the right level fprintf( … ); /* work */ fprintf( … ); /* more work */ fprintf( … ); … write( … ); open compute write …  Instrumenting at high level + Easy with LD_PRELOAD - Typically generates more events, so logs are bigger - Need to replicate formatting - Timing includes computation  Instrumenting at low level + Fewer types of calls to capture + Instrumentation is at I/O system interface - Cannot use LD_PRELOAD to intercept all calls

7 Lawrence Livermore National Laboratory First attempt at capturing system calls: Linux strace utility  Records any selected set of system calls  Easy to use: just add to command line  Produces easily parsed output $ strace -r -T -s 0 -e trace=file,desc ls execve("/bin/ls", [,...], [/* 29 vars */]) = open("/etc/ld.so.cache", O_RDONLY) = fstat64(3, {st_mode=S_IFREG|0644, st_size=64677,...}) = close(3) = open("/lib/librt.so.1", O_RDONLY) = read(3, ""..., 512) =

8 Lawrence Livermore National Laboratory Strace results look reasonably accurate, but overall runtime is exaggerated Read (sec.)Write (sec.)Elapsed (sec.) Uninstrumented Instrumented Application Replay

9 Lawrence Livermore National Laboratory For accurate recording, gather I/O calls using binary instrumentation  Can intercept and instrument specific system-level calls  Overhead of instrumentation is paid at program startup  Existing Jockey library works well for x86_32, but not ported to other platforms  Replay can be portable, though mov 4 (%esp), %ebx int $0x80 ret mov $0x4, %eax save current statemov 4 (%esp), %ebx nop ret jmp to trampolinecall my write function restore state jmp to original code InstrumentedUninstrumented

10 Lawrence Livermore National Laboratory Issues for accurate replay  Replay engine must be able to read and parse events quickly  Reading script must not interfere significantly with I/O activities being replicated  Script must be portable across platforms Accurate replay Minimize I/O impact of reading script Correctly reproduce inter-event delays open compute write … Replayed I/O events

11 Lawrence Livermore National Laboratory Accurate replay: Preparsing, compression, and buffering  Text-formatted output script is portable across platforms  Instrumentation output is parsed into binary format and compressed (~30:1) Conversion done on target platform  Replay engine reads and buffers script data during “compute” phases between I/O events open compute write … … ………… preparsingcompressionbuffering

12 Lawrence Livermore National Laboratory Replay timing and profile match original application well Read (sec.)Write (sec.)Elapsed(sec.) Uninstrumented Instrumented Application Replay

13 Lawrence Livermore National Laboratory Things that didn’t help  Compressing text script as it’s generated Only 2:1 compression Time of I/O events themselves are not what’s very important during instrumentation phase  Replicating the memory footprint Memory used by application is taken from same pool as I/O buffer cache Smaller application (like the replay engine) should go faster because more buffer space available Replicated memory footprint by tracking brk() and mmap() calls, but it made no difference!

14 Lawrence Livermore National Laboratory Conclusions on script-based I/O benchmarking  Gathering accurate I/O traces is harder than it seems Currently, no solution is both portable and efficient  Replay is easier, but efficiency still matters  Many possibilities for future work—which matter most? File name transformation Parallel trace and replay More portable instrumentation How to monitor mmap’d I/O?