Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Microprocessors General Features To be Examined For Each Chip Jan 24 th, 2002.
1 Architectural Complexity: Opening the Black Box Methods for Exposing Internal Functionality of Complex Single and Multiple Processor Systems EECC-756.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Continuously Recording Program Execution for Deterministic Replay Debugging.
Fast Track to ColdFusion 9. Getting Started with ColdFusion Understanding Dynamic Web Pages ColdFusion Benchmark Introducing the ColdFusion Language Introducing.
1 System Design: Addressing Design Goals We are starting from an initial design, along with a set of design goals. What is the next step?
Progress Report 11/1/01 Matt Bridges. Overview Data collection and analysis tool for web site traffic Lets website administrators know who is on their.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
1-1 Embedded Software Development Tools and Processes Hardware & Software Hardware – Host development system Software – Compilers, simulators etc. Target.
1 CS6320 – Why Servlets? L. Grewe 2 What is a Servlet? Servlets are Java programs that can be run dynamically from a Web Server Servlets are Java programs.
Module 8: Monitoring SQL Server for Performance. Overview Why to Monitor SQL Server Performance Monitoring and Tuning Tools for Monitoring SQL Server.
Replay Debugging for Distributed Systems Dennis Geels, Gautam Altekar, Ion Stoica, Scott Shenker.
Scott Pinkerton Sample GUI/Application Portfolio 1.
Module 15: Monitoring. Overview Formulate requirements and identify resources to monitor in a database environment Types of monitoring that can be carried.
Chapter 7 Requirement Modeling : Flow, Behaviour, Patterns And WebApps.
Selecting and Implementing An Embedded Database System Presented by Jeff Webb March 2005 Article written by Michael Olson IEEE Software, 2000.
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
LiNK: An Operating System Architecture for Network Processors Steve Muir, Jonathan Smith Princeton University, University of Pennsylvania
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
ANDROID Presented By Mastan Vali.SK. © artesis 2008 | 2 1. Introduction 2. Platform 3. Software development 4. Advantages Main topics.
Parallelizing Security Checks on Commodity Hardware E.B. Nightingale, D. Peek, P.M. Chen and J. Flinn U Michigan.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
Ideas to Improve SharePoint Usage 4. What are these 4 Ideas? 1. 7 Steps to check SharePoint Health 2. Avoid common Deployment Mistakes 3. Analyze SharePoint.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
High Performance Embedded Computing © 2007 Elsevier Chapter 1, part 2: Embedded Computing High Performance Embedded Computing Wayne Wolf.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Platinu m Sponsor s Silver Sponsors Gold Sponsor s.
Dynamic Resource Monitoring and Allocation in a virtualized environment.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
CS 360 Lecture 11.  In most computer systems:  The cost of people (development) is much greater than the cost of hardware  Yet, performance is important.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
PPT 4: Reproducing the Problem CEN Software Testing.
Application Software System Software.
31 Oktober 2000 SEESCOASEESCOA STWW - Programma Work Package 5 – Debugging Task Generic Debug Interface K. De Bosschere e.a.
4/26/2017 Use Cloud-Based Load Testing Service to Find Scale and Performance Bottlenecks Randy Pagels Sr. Developer Technology Specialist © 2012 Microsoft.
Execution Replay and Debugging. Contents Introduction Parallel program: set of co-operating processes Co-operation using –shared variables –message passing.
Sunpyo Hong, Hyesoon Kim
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Software System Performance CS 560. Performance of computer systems In most computer systems:  The cost of people (development) is much greater than.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Compilers: History and Context COMP Outline Compilers and languages Compilers and architectures – parallelism – memory hierarchies Other uses.
10/2/20161 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam King,
SOFTWARE TESTING TRAINING TOOLS SUPPORT FOR SOFTWARE TESTING Chapter 6 immaculateres 1.
Chapter Goals Describe the application development process and the role of methodologies, models, and tools Compare and contrast programming language generations.
Programmable Logic Devices
Debugging Memory Issues
Game Architecture Rabin is a good overview of everything to do with Games A lot of these slides come from the 1st edition CS 4455.
Real-time Software Design
Introduction to Operating Systems
QNX Technology Overview
CSC Classes Required for TCC CS Degree
ColdFusion Performance Troubleshooting and Tuning
Yikes! Why is my SystemVerilog Testbench So Slooooow?
Tools.
Oracle Memory Internals
Tools.
EE 4xx: Computer Architecture and Performance Programming
Addressing Design Goals
Performance And Scalability In Oracle9i And SQL Server 2000
CS5123 Software Validation and Quality Assurance
Presentation transcript:

Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their software Goals – Quickly simulate application execution over many architecture configurations – Analyze and debug programs designed for parallel architecture – Allow developers to view and query intermediate execution traces Approach – Software-based log-and-replay system

Design Pin tools log memory accesses, function calls, image loads, locking, memory allocation, etc. SQLite database store execution trace Collection of replay and analysis programs take database as input – Replay execution on simulated hardware – Analyze memory access patterns – Lockset verification

SQLite Database Developer can easily examine all trace data SQLite Database Browser Many types of custom analysis can be formulated using SQL E.g. which threads access a particular memory buffer

Replay and Analysis Variety of analysis tools have been implemented on top of the database Cache simulator – Prototype of a complete memory hierarchy simulator Lockset algorithm – Detect which locks protect which memory buffers in a parallel program – Identify inconsistent locking patterns and unprotected accesses

Simulator Performance Goal is to amortize cost of running instrumented code over many simulations Increase # loop iterations (N) in simple benchmark application In practice, prototype version is bogged down by I/O costs of database For large applications, running pure Pin versions is better (for now)

Logging Performance Writing logging information dominates execution time Log size grows exponentially as N increases Allow application to dynamically enable/disable logging

Parallel Program Analysis Memory accesses in parallel programs tracked at buffer level Support pthreads and Open MP Discover unprotected accesses to shared memory Example: == ANALYSE ======================================= Thread 0 -> [ , , , ]> Thread 1 -> [ , , ]> shared buffer t1 [] t2 [] shared buffer t1 [] t2 [] shared buffer t1 [] t2 [] == ANALYSE ======================================= Thread 0 -> [ , , , ]> Thread 1 -> [ , , ]> shared buffer t1 [666]> t2 [666]> shared buffer t1 [] t2 [] shared buffer t1 [] t2 [] result NOT protected result protected by Internal Open MP lock result protected by Internal Open MP lock

Future Work Implement trace compression – Will greatly reduce log size – Database I/O dominates execution time so performance will significantly improve Allow analysis of program behavior over multiple executions Track lock contention and feed information back to system – Recommend particular lock types – Pin memory that is used exclusively by one thread Use logged information to support deterministic replay