Microsoft Research Faculty Summit 2008. Yuanyuan(YY) Zhou Associate Professor University of Illinois, Urbana-Champaign.

Slides:



Advertisements
Similar presentations
Concurrent Predicates: A Debugging Technique for Every Parallel Programmer PACT 13 Justin Gottschlich Gilles Pokam Cristiano Pereira Youfeng Wu Intel Corporation.
Advertisements

Test process essentials Riitta Viitamäki,
Concurrency Issues Motivation, Problems, Directions Dennis Kafura - CS Operating Systems1.
Understanding and Detecting Real-World Performance Bugs
The complexity of predicting atomicity violations Azadeh Farzan Univ of Toronto P. Madhusudan Univ of Illinois at Urbana Champaign.
1 Model checking. 2 And now... the system How do we model a reactive system with an automaton ? It is convenient to model systems with Transition systems.
An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy University of Michigan.
Concurrency Control II
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Guoliang Jin, Linhai Song, Wei Zhang, Shan Lu, and Ben Liblit University of Wisconsin–Madison Automated Atomicity- Violation Fixing.
Iterative Context Bounding for Systematic Testing of Multithreaded Programs Madan Musuvathi Shaz Qadeer Microsoft Research.
/ PSWLAB Concurrent Bug Patterns and How to Test Them by Eitan Farchi, Yarden Nir, Shmuel Ur published in the proceedings of IPDPS’03 (PADTAD2003)
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
An Case for an Interleaving Constrained Shared-Memory Multi- Processor CS6260 Biao xiong, Srikanth Bala.
S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, B. Calder UCSD and Microsoft PLDI 2007.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Yuanyuan ZhouUIUC-CS Architectural Support for Software Bug Detection Yuanyuan (YY) Zhou and Josep Torrellas University of Illinois at Urbana-Champaign.
1 ACID Properties of Transactions Chapter Transactions Many enterprises use databases to store information about their state –e.g., Balances of.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
Concurrency and Software Transactional Memories Satnam Singh, Microsoft Faculty Summit 2005.
1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.
Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics Shan Lu, Soyeon Park, Eunsoo Seo and Yuanyuan Zhou Appeared.
Exceptions and Mistakes CSE788 John Eisenlohr. Big Question How can we improve the quality of concurrent software?
CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.
An Empirical Study of Reported Bugs in Server Software with Implications for Automated Bug Diagnosis Swarup Kumar Sahoo, John Criswell, Vikram Adve Department.
Programming Paradigms for Concurrency Part 2: Transactional Memories Vasu Singh
Concurrency, Mutual Exclusion and Synchronization.
1 Principles of Computer Science I Prof. Nadeem Abdul Hamid CSC 120 – Fall 2005 Lecture Unit 10 - Testing.
Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith.
Pallavi Joshi* Mayur Naik † Koushik Sen* David Gay ‡ *UC Berkeley † Intel Labs Berkeley ‡ Google Inc.
What Change History Tells Us about Thread Synchronization RUI GU, GUOLIANG JIN, LINHAI SONG, LINJIE ZHU, SHAN LU UNIVERSITY OF WISCONSIN – MADISON, USA.
Presenter : Cheng-Ta Wu David Lin1, Ted Hong1, Farzan Fallah1, Nagib Hakim3, Subhasish Mitra1, 2 1 Department of EE and 2 Department of CS Stanford University,
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
The Relational Model1 Transaction Processing Units of Work.
What Type of Defects are Really Discovered in Code Reviews. Mika V
Xusheng Xiao North Carolina State University CSC 720 Project Presentation 1.
“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore.
Deadlock Analysis with Fewer False Positives Thread T1: sync(G){ sync(L1){ sync(L2){} } }; T3 = new T3(); j3.start(); J3.join(); sync(L2){ sync(L1){} }
Ali Kheradmand, Baris Kasikci, George Candea Lockout: Efficient Testing for Deadlock Bugs 1.
Concurrency Properties. Correctness In sequential programs, rerunning a program with the same input will always give the same result, so it makes sense.
CS527 Topics in Software Engineering (Software Testing and Analysis) Darko Marinov August 30, 2011.
CS533 – Spring Jeanie M. Schwenk Experiences and Processes and Monitors with Mesa What is Mesa? “Mesa is a strongly typed, block structured programming.
Threads and Singleton. Threads  The JVM allows multiple “threads of execution”  Essentially separate programs running concurrently in one memory space.
Detecting Atomicity Violations via Access Interleaving Invariants
Grigore Rosu Founder, President and CEO Professor of Computer Science, University of Illinois
Fortran Compilers David Padua University of Illinois at Urbana-Champaign.
Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)
Soyeon Park, Shan Lu, Yuanyuan Zhou UIUC Reading Group by Theo.
CS 5150 Software Engineering Lecture 22 Reliability 3.
Embedded System Lab. 최 진 화최 진 화 Kilmo Choi 최길모 A Study of Linux File System Evolution L. Lu, A. C. Arpaci-Dusseau, R. H. ArpaciDusseau,
Testing Concurrent Programs Sri Teja Basava Arpit Sud CSCI 5535: Fundamentals of Programming Languages University of Colorado at Boulder Spring 2010.
Embedded System Design and Development Introduction to Embedded System.
1 Software Testing. 2 What is Software Testing ? Testing is a verification and validation activity that is performed by executing program code.
Tanakorn Leesatapornwongsa, Jeffrey F. Lukman, Shan Lu, Haryadi S. Gunawi.
Lecture 20: Consistency Models, TM
Learning from Mistakes: A Comprehensive Study on Real-World Concurrency Bug Characteristics Ben Shelton.
Healing Data Races On-The-Fly
Regression Testing with its types
CSC 591/791 Reliable Software Systems
Lazy Diagnosis of In-Production Concurrency Bugs
Specifying Multithreaded Java semantics for Program Verification
A Comprehensive Study on Real World Concurrency Bugs in Node.js
Lecture 22: Consistency Models, TM
AppMetrics® Benefits “Maximize the availability of your applications built on the Microsoft platform”
CSE 486/586 Distributed Systems Concurrency Control --- 2
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Testing and Debugging Concurrent Code
Programming with Shared Memory Specifying parallelism
Automatically Diagnosing and Repairing Error Handling Bugs in C
Presentation transcript:

Microsoft Research Faculty Summit 2008

Yuanyuan(YY) Zhou Associate Professor University of Illinois, Urbana-Champaign

Concurrency bug detection What types of bugs are un-solved? How bugs are diagnosed in practice? Concurrent program testing What are the manifestation conditions for concurrency bugs? Concurrent program language design How many mistakes can be avoided What other support do we need? All can benefit from a closer look at real world concurrency bugs need to know need to know

105 real-world concurrency bugs from 4 large open source programs Study from 4 dimensions Bug patterns Manifestation condition Diagnosing strategy Fixing methods Details in our ASPLOS’08 paper

5 8 years10 years7 years6 yearsBug DB history LOC (M line) C++ Mainly CC++/CLanguage GUIClientServer Software Type OpenOfficeMozillaApacheMySQL

Limitations No scientific computing applications No JAVA programs Never-enough bug samples 2 6 OpenOffice Total 1649Deadlock Non-deadlock MozillaApacheMySQL

Classified based on root causes Categories Atomicity violation The desired atomicity of certain code region is violated Order violation The desired order between two (sets of) accesses is flipped Others X X Thread 1Thread 2 Thread 1Thread 2 Pattern

We should focus on atomicity violation and order violation Bug detection tools for order violation bugs are desired *There are 3-bug overlap between Atomicity and Order Implications

Note that order violations can be fixed by adding locks to ensure atomicity with the previous operation to ensure order. But the root cause is the incorrect assumption about execution order.

OK Woops!

Bug manifestation condition A specific execution order among a smallest set of memory accesses The bug is guaranteed to manifest, as long as the condition is satisfied How many threads are involved? How many variables are involved? How many accesses are involved? 11 Manifestation

Single variables are more common  The widely-used simplification is reasonable Multi-variable concurrency bugs are non-negligible  Techniques to detect multi-variable concurrency bugs are needed (66%) 25 (34%) # of Bugs Implications Findings

13 js_FlushPropertyCache ( … ) { memset ( cache  table, 0, SIZE); … cache  empty = TRUE; } js_PropertyCacheFill ( … ) { cache  table[indx] = obj; … cache  empty = FALSE; } Thread 1Thread 2 Control the order among accesses to any one variable can not guarantee the bug manifestation

Concurrent program testing can focus on small groups of accesses The testing target shrinks from exponential to polynomial 14 Non-deadlock bugs Deadlock bugs 7 (9%) 1 (3%) # of Bugs Double lock

101 out of 105 (96%) bugs involve at most two threads Most bugs can be reliably disclosed if we check all possible interleaving between each pair of threads Few bugs cannot Example: Intensive resource competition among many threads causes unexpected delay

Adding/changing locks20 (27%) Condition check19 (26%) Data-structure change19 (26%) Code switch10 (13%) Other 6 ( 8%) No silver bullet for fixing concurrency bugs. Lock usage information is not enough to fix bugs. Fix Implications 20 (27%)

Give up resource acquisition19 (61%) Change resource acquisition order 7 (23%) Split the resource to smaller ones 1 ( 3%) Others 4 (13%) 17 We need to pay attention to the correctness of ``fixed’’ deadlock bugs Might introduce non-deadlock bugs Might introduce non-deadlock bugs Fix Implications

Impact of concurrency bugs ~ 70% leads to program crash or hang Reproducing bugs are critical to diagnosis Many examples Programmers lack diagnosis tools No automated diagnosis tools mentioned Most are diagnosed via code review Reproduce bugs are extremely hard and directly determines the diagnosing time 60% 1st-time patches still contain concurrency bugs (old or new) Usefulness and concerns of transactional memory Refer to our paper for more details and examples

Bug detection needs to look at order-violation bugs and multi-variable concurrency bugs Testing can target at more realistic interleaving coverage goals Fixing concurrency bugs is not trivial and not easy to get right Support from automated tools is needed Diagnosing tools are needed, especially bug reproducing tools Current bug detection tools are not widely used 19

AVIO: atomicity violation detection [ASPLOS’06] Automatically extracting interleaving invariants to detect violations Multi-variable concurrency bug detection [SOSP’07] Mining variable correlations and then feed them to bug detectors Concurrency interleaving testing coverage criteria [FSE’07]