Slide 1 LIRA: Linux Inter-process Race Analyzer Tipp Moseley Intel Corporation University of Colorado.

Slides:

Advertisements

Similar presentations

© Copyrights 1998 Algorithmic Research Ltd. All rights Reserved D a t a S e c u r i t y A c r o s s t h e E n t e r p r i s e Algorithmic Research a company.

Advertisements

Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,

1 Chao Wang, Yu Yang*, Aarti Gupta, and Ganesh Gopalakrishnan* NEC Laboratories America, Princeton, NJ * University of Utah, Salt Lake City, UT Dynamic.

Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.

Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets Tayfun Elmas 1, Shaz Qadeer 2, Serdar Tasiran 1 1 Koç University, İstanbul,

Chapter 6 Security Kernels.

Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO and THOMAS ANDERSON.

Secure web browsers, malicious hardware, and hardware support for binary translation Sam King.

Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science

Scaling Model Checking of Dataraces Using Dynamic Information Ohad Shacham Tel Aviv University IBM Haifa Lab Mooly Sagiv Tel Aviv University Assaf Schuster.

Atomicity in Multi-Threaded Programs Prachi Tiwari University of California, Santa Cruz CMPS 203 Programming Languages, Fall 2004.

CSE 451: Operating Systems Section 6 Project 2b; Midterm Review.

Today From threads to file systems

TaintCheck and LockSet LBA Reading Group Presentation by Shimin Chen.

CS533 Concepts of Operating Systems Class 3 Data Races and the Case Against Threads.

Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.

Precept 3 COS 461. Concurrency is Useful Multi Processor/Core Multiple Inputs Don’t wait on slow devices.

INTRODUCTION OS/2 was initially designed to extend the capabilities of DOS by IBM and Microsoft Corporations. To create a single industry-standard operating.

3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.

2: OS Structures 1 Jerry Breecher OPERATING SYSTEMS STRUCTURES.

3.5 Interprocess Communication

Advanced OS Chapter 3p2 Sections 3.4 / 3.5. Interrupts These enable software to respond to signals from hardware. The set of instructions to be executed.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Processes.

Slide 6-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 6.

/ PSWLAB Eraser: A Dynamic Data Race Detector for Multithreaded Programs By Stefan Savage et al 5 th Mar 2008 presented by Hong,Shin Eraser:

Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.

CS252: Systems Programming Ninghui Li Final Exam Review.

Selecting and Implementing An Embedded Database System Presented by Jeff Webb March 2005 Article written by Michael Olson IEEE Software, 2000.

Wind River VxWorks Presentation

© 2012 IBM Corporation Rational Insight | Back to Basis Series Chao Zhang Unit Testing.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.

Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

MySQL. Dept. of Computing Science, University of Aberdeen2 In this lecture you will learn The main subsystems in MySQL architecture The different storage.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto OS-Related Hardware.

4P13 Week 1 Talking Points. Kernel Organization Basic kernel facilities: timer and system-clock handling, descriptor management, and process Management.

Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO, and THOMAS ANDERSON Ethan.

Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,

30 October Agenda for Today Introduction and purpose of the course Introduction and purpose of the course Organization of a computer system Organization.

COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.

Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.

Bugs (part 1) CPS210 Spring Papers  Bugs as Deviant Behavior: A General Approach to Inferring Errors in System Code  Dawson Engler  Eraser: A.

Processes and Virtual Memory

Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.

Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.

Threads. Readings r Silberschatz et al : Chapter 4.

HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.

Lecture 5 Rootkits Hoglund/Butler (Chapters 1-3).

Eraser: A dynamic Data Race Detector for Multithreaded Programs Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas Anderson Presenter:

Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.

1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.

FastTrack: Efficient and Precise Dynamic Race Detection [FlFr09] Cormac Flanagan and Stephen N. Freund GNU OS Lab. 23-Jun-16 Ok-kyoon Ha.

1 COMP 3500 Introduction to Operating Systems Project 4 – Processes and System Calls Part 4: Managing File System State Dr. Xiao Qin Auburn University.

Detecting Data Races in Multi-Threaded Programs

Client-Side Malware Protection for your site

Chapter 4 – Thread Concepts

Olatunji Ruwase* Shimin Chen+ Phillip B. Gibbons+ Todd C. Mowry*

Chapter 4 – Thread Concepts

CSE451 I/O Systems and the Full I/O Path Autumn 2002

PostgreSQL Database and C++ Interface (and Midterm Topics)

System Structure B. Ramamurthy.

Virtual Memory CSCI 380: Operating Systems Lecture #7 -- Review and Lab Suggestions William Killian.

Operating Systems Lecture 1.

Reverse engineering through full system simulations

Hardware-less Testing for RAS Software

Why Threads Are A Bad Idea (for most purposes)

CSCI 380: Operating Systems William Killian

Why Threads Are A Bad Idea (for most purposes)

Why Threads Are A Bad Idea (for most purposes)

Eraser: A dynamic data race detector for multithreaded programs

Presentation transcript:

Slide 1 LIRA: Linux Inter-process Race Analyzer Tipp Moseley Intel Corporation University of Colorado

Slide 2 Introduction 1.Motivation for Software Quality 2.Related work 3.Why Lira? 4.Race & Deadlock Detection Algorithms 5.Design 6.Pitfalls 7.Results

Slide 3 Motivation for Software Quality Various tools, vendors, targets –Many gaps in availability (X-Scale, IPF) Suite of tools adds value to underlying hardware Quality + manageability + security => buy more silicon Platform enabling –IBM’s Power, Motorola both have quality tools –These tools are fundamental for supporting platform

Slide 4 Related Work Memory tools –Purify –Valgrind –Etnus –Bitraker –3 rd Race/deadlock tools –Eraser –Ladybug –Intel Thread Checker –Multirace –Lira

Slide 5 What is Lira? –Linux Inter-process Race Analyzer –Dynamically detect race conditions for Memory read/write Generic resource (file, socket, etc) –Dynamically detect potential deadlock for Any combination of inter-process synchronization primatives –User must insert callbacks in locking code

Slide 6 Why Lira? Many enterprise systems depend on shared memory and concurrency No tool exists for debugging across processes Java w/ native threads XF86, Gnome, KDE Baan Samba Sybase Oracle MySQL PostgreSQL Apache SAP

Slide 7 Race Detection Eraser memory states Shamelessly borrowed from Savage, et al. SOSP 1997

Slide 8 Race Detection Eraser Lockset Algorithm Let locks held(t) be the set of locks held in any mode by thread t. Let write locks held(t) be the set of locks held in write mode by thread t. For each v, initialize C(v) to the set of all locks. On each read of v by thread t, set C(v) := C(v) ∩ locks held(t); if C(v) := { }, then issue a warning. On each write of v by thread t, set C(v) := C(v) ∩ write locks held(t); if C(v) == { }, then issue a warning. Also shamelessly borrowed from Savage, et al. SOSP 1997

Slide 9 Deadlock Detection Only checks full ordering of lock hierarchy –Does not recognize that a->b->c and a->c->b are both OK (though bad practice) Data structures: –For each lock, maintain before and after set Each time a lock l is acquired: before(l) = before(l)  locks-held If before(l)  after(l) != {} then ERROR for l2 in parents(l) do after(l2) = after(l2)  l If contains(before(l2), l) then ERROR

Slide 10 Design - Issues Must follow exec() and communicate via shared memory as well Different address spaces –Shared memory binds to different addresses –Files to different file descriptors No common synchronization api or model –OS Semaphores –flock(), fcntl() –lock; xchgb

Slide 11 Design – Overview

Slide 12 Design – Front End Initialize Pin –Instrument memory refs, system calls, and user locking callbacks e.g. LIRA_LockEx_HW(&my_lock) –Patch execve() system call with pin –t lira – children get Pin’d, also Unique feature to Lira –Other tools require user to modify scripts by hand –$ pin –t lira – make test

Slide 13 Design – Front End –Filter irrelevent information –Maintain information about shared memory, file descriptors Client needs + offset because effective address differs across address spaces Client needs entire file path instead of fd –Send shared memory refs, lock ops, other callbacks to log buffer

Slide 14 Design – Back End LiraClient: –Parse ASCII data from ShmLogReader –Maintain state tables for each shared memory address, file descriptor –Drive RaceAnalyzer –Report race conditions RaceAnalyzer: –Generic implementation of Eraser algorithm –Check lock ordering LockModel: –Generic representation of various locking primitives –LockModelSemaphore, LockModelHardware, etc

Slide 15 Design – IPC How do we communicate data from multiple processes to the client process? ShmLogWriter -> ShmLogReader –Maintain a synchronized log file, protected by shared memory lock –If file becomes to big, begin new file –Online client deletes files when done processing (data may take up gigabytes of space in minutes) –Offline client processes files after execution completes

Slide 16 Pitfalls Maintaining state of sem/shm/fds from syscalls was painful –Solution: cache information from /proc Offline processing can lead to enormous logs –Solution: Online processing and delete processed info

Slide 17 Pitfalls Inferring meaning of lock operations was faulty at best –Solution: Offer user callbacks to capture intended meaning of synchronization operations. Unacceptably slow –Solution: -O6, cache frequently used data, do some work at instrumentation time, inline frequent calls

Slide 18 Sample Program // INITIALIZATION int *shmem = getShmem(sizeof(int)); sharedlock_t lock0 = getSharedLock(); sharedlock_t lock1 = getSharedLock(); lock_init(&lock0); lock_init(&lock1); fork(); // make 2 processes

Slide 19 Sample Program int i = 0; while( i < ) { lock(&lock0); lock(&lock1); *shmem++; // ERROR: UNINITIALIZED READ! unlock(&lock1); unlock(&lock0); } // ERROR: wrong lock hierarchy – potential deadlock! lock(&lock1); lock(&lock0); unlock(&lock0); unlock(&lock1); *shmem++; // ERROR: no locks held! exit(0);

Slide 20 Results Uninitialized LOAD –WARNING: possible uninitialized LOAD for offset=0 opsize=4 at pc=0x tid=0 pid=32576 srcfile=tests/locktest0.C srcline=42 No locks held for stdout –ERROR: no locks held for FWR to file=/dev/pts/3 at pc=0x420d18bc tid=0 pid=32584 srcfile=tests/locktest0.C srcline=40

Slide 21 Results Inconsistent locks held –ERROR: inconsistent locks held for LOAD to offset=0 opsize=4 at pc=0x804953b tid=0 pid=32576 srcfile=tests/locktest0.C srcline=86 Bad lock hierarchy –ERROR: inconsistent lock order at pc=0x8048ebc tid=0 pid=32576 srcfile=tests/../LiraCallbacks.h srcline=64

Slide 22 Future Work Lira: –Further optimization Still at ~500x slowdown (improved from >100000x) Work with Pin team to only instrument shared memory segments –Code that does not touch shared memory not instrumented –Find some bugs! LIRA can find potential errors, user must verify –Lots of work to figure out if a LIRA report really is an error in a large program (i.e. PostgreSQL, Oracle) –Potential analysis integration with Intel Thread Checker

Slide 23 Questions?

Slide 24 References 2.cs.cmu.edu/afs/cs/academic/class/ f03/public/doc/atom-user.pdfhttp://www- 2.cs.cmu.edu/afs/cs/academic/class/ f03/public/doc/atom-user.pdf pers/1997/eraser-sosp97.ps.gzhttp:// pers/1997/eraser-sosp97.ps.gz