CGCExplorer: A Semi-Automated Search Procedure for Provably Correct Concurrent Collectors Martin Vechev Eran Yahav David Bacon University of CambridgeIBM.

Slides:

Advertisements

Similar presentations

Copyright 2000 Cadence Design Systems. Permission is granted to reproduce without modification. Introduction An overview of formal methods for hardware.

Advertisements

Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.

Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?

Tintu David Joy. Agenda Motivation Better Verification Through Symmetry-basic idea Structural Symmetry and Multiprocessor Systems Mur ϕ verification system.

Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.

Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } P1() Challenge: Correct and Efficient Synchronization { ……………………………

1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.

Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.

Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.

Foundational Certified Code in a Metalogical Framework Karl Crary and Susmit Sarkar Carnegie Mellon University.

“FENDER” AUTOMATIC MEMORY FENCE INFERENCE Presented by Michael Kuperstein, Technion Joint work with Martin Vechev and Eran Yahav, IBM Research 1.

A Rely-Guarantee-Based Simulation for Verifying Concurrent Program Transformations Hongjin Liang, Xinyu Feng & Ming Fu Univ. of Science and Technology.

Paraglide Martin Vechev Eran Yahav Martin Vechev Eran Yahav.

Lecture 10: Heap Management CS 540 GMU Spring 2009.

Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)

Automated and Modular Refinement Reasoning for Concurrent Programs Collaborators: Chris Hawblitzel (Microsoft) Erez Petrank (Technion) Serdar Tasiran (Koc.

MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.

PARTIAL-COHERENCE ABSTRACTIONS FOR RELAXED MEMORY MODELS Presented by Michael Kuperstein, Technion Joint work with Martin Vechev, IBM Research and Eran.

On-the-Fly Garbage Collection: An Exercise in Cooperation Edsget W. Dijkstra, Leslie Lamport, A.J. Martin and E.F.M. Steffens Communications of the ACM,

Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose.

MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.

CS 536 Spring Automatic Memory Management Lecture 24.

An Efficient Machine-Independent Procedure for Garbage Collection in Various List Structures, Schorr and Waite CACM August 1967, pp Curtis Dunham.

Thread-modular Abstraction Refinement Tom Henzinger Ranjit Jhala Rupak Majumdar Shaz Qadeer.

Termination Proofs for Systems Code Andrey Rybalchenko, EPFL/MPI joint work with Byron Cook, MSR and Andreas Podelski, MPI PLDI’2006, Ottawa.

Martin Vechev IBM Research Michael Kuperstein Technion Eran Yahav Technion (FMCAD’10, PLDI’11) 1.

Vertically Integrated Analysis and Transformation for Embedded Software John Regehr University of Utah.

MOSTLY PARALLEL GARBAGE COLLECTION Authors : Hans J. Boehm Alan J. Demers Scott Shenker XEROX PARC Presented by:REVITAL SHABTAI.

Correctness-Preserving Derivation of Concurrent Garbage Collection Algorithms Martin T. Vechev Eran Yahav David F. Bacon University of Cambridge IBM T.J.

Deriving Linearizable Fine-Grained Concurrent Objects Martin Vechev Eran Yahav IBM T. J. Watson Research Center Martin Vechev Eran Yahav IBM T. J. Watson.

1 Eran Yahav and Mooly Sagiv School of Computer Science Tel-Aviv University Verifying Safety Properties.

1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.

From last time S1: l := new Cons p := l S2: t := new Cons *p := t p := t l p S1 l p tS2 l p S1 t S2 l t S1 p S2 l t S1 p S2 l t S1 p L2 l t S1 p S2 l t.

Transaction Ordering Verification using Trace Inclusion Refinement Mike Jones 11 January 2000.

Damien Doligez Georges Gonthier POPL 1994 Presented by Eran Yahav Portable, Unobtrusive Garbage Collection for Multiprocessor Systems.

Computing OverApproximations with Bounded Model Checking Daniel Kroening ETH Zürich.

Comparison Under Abstraction for Verifying Linearizability Daphna Amit Noam Rinetzky Mooly Sagiv Tom RepsEran Yahav Tel Aviv UniversityUniversity of Wisconsin.

Formal verification Marco A. Peña Universitat Politècnica de Catalunya.

UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.

A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.

SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.

Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.

Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology High-level Specification and Efficient Implementation.

1 What NOT to do I get sooooo Frustrated! Marking the SAME wrong answer hundreds of times! I will give a list of mistakes which I particularly hate marking.

Testing and Verifying Atomicity of Composed Concurrent Operations Ohad Shacham Tel Aviv University Nathan Bronson Stanford University Alex Aiken Stanford.

Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.

Chameleon Automatic Selection of Collections Ohad Shacham Martin VechevEran Yahav Tel Aviv University IBM T.J. Watson Research Center Presented by: Yingyi.

1 Real-Time Replication Garbage Collection Scott Nettles and James O’Toole PLDI 93 Presented by: Roi Amir.

Incremental Garbage Collection Uwe Kern 23. Januar 2002

Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.

Pattern-based Synthesis of Synchronization for the C++ Memory Model Yuri Meshman, Noam Rinetzky, Eran Yahav 1.

Concurrent Garbage Collection Presented by Roman Kecher GC Seminar, Tel-Aviv University 23-Dec-141.

Automated and Modular Refinement Reasoning for Concurrent Programs Shaz Qadeer.

UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.

GARBAGE COLLECTION IN AN UNCOOPERATIVE ENVIRONMENT Hans-Juergen Boehm Computer Science Dept. Rice University, Houston Mark Wieser Xerox Corporation, Palo.

A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.

ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.

Concurrent Mark-Sweep Presented by Eyal Dushkin GC Seminar, Tel-Aviv University

® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.

Reference Counting. Reference Counting vs. Tracing Advantages ✔ Immediate ✔ Object-local ✔ Overhead distributed ✔ Very simple Trivial implementation for.

GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.

Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.

Advanced Algorithms Analysis and Design

Dynamic Compilation Vijay Janapa Reddi

Seminar in automatic tools for analyzing programs with dynamic memory

Cycle Tracing Chapter 4, pages , From: "Garbage Collection and the Case for High-level Low-level Programming," Daniel Frampton, Doctoral Dissertation,

Ulterior Reference Counting Fast GC Without The Wait

David F. Bacon, Perry Cheng, and V.T. Rajan

Objective of This Course

Strategies for automatic memory management

Presentation transcript:

CGCExplorer: A Semi-Automated Search Procedure for Provably Correct Concurrent Collectors Martin Vechev Eran Yahav David Bacon University of CambridgeIBM T.J. Watson Research Center Noam Rinetzky Tel Aviv University

Synthesizing Concurrent Algorithms Designing practical and efficient concurrent algorithms is hard  trading off simplicity for performance  fine-grained coordination Result: sub-optimal, buggy algorithms Need a more structured approach to synthesize correct and optimal implementations out of coarse-grained specifications Some tasks are best done by machine, while others are best done by human insight; and a properly designed system will find the right balance. – D. Knuth

Synthesizing Concurrent Collectors Concurrent garbage collectors  Widely used  Must be correct, but also fast and scalable  Many algorithms, not many formal proofs A challenge problem for verification and synthesis Concurrency Heap with no a priori bound Focus on a specific family of collection algorithms  A generalization of Dijkstra’s algorithm  Concurrent, Tracing, Non-moving Single mutator, single collector (non-parallel)

Contributions Unifying framework – collection algorithms as common skeleton with parametric functions Trace Step Mutator Step Expose Mutator Collector

Contributions

specified various sets of blocks in 10 cycles explored 1,600,000 collection algorithms found 6 correct algorithms hundreds of variations Contributions

Overview High-level designFind a sufficient local invariant Find a sufficient abstraction Low-level searchVerify local invariant High-level designFind algorithm outline Find building blocks Low-level searchexplore algorithm space Generation Verification

Algorithm Space - Counting Algorithms Track collector’s progress (wavefront) Count pointer installations from behind wavefront  Increment on install, decrement on delete  Up to a predetermined counting threshold expose objects with count > 0 when finished tracing root scanned field object header 1 Collector wavefront

update source field to target obj check wavefront if source field behind wavefront - update new target object count - update old target object count read field value update wavefront (collector progress) mark target object select objects with count > 0 produce new roots Counting Algorithms: High Level View Trace Step Mutator Step Expose Mutator Collector

{ M1: old = source.field M2: w = source.field.WF M3: w  new.MC++ M4: w  log = log U {new} M5: w  old.MC-- M6: source.fld = new } { C1: dst = source.field C2: source.field.WF = true C3: mark dst } { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0)  mark o E4: (mc > 0)  V = V U {o} return V } Trace Step (source, field)Mutator Step (source, field, new) Set Expose (log) Coarse-Grained to Fine-Grained Synchronization What now ? Can we remove atomics ? Result is incorrect, may lose objects! atomic

{ M1: old = source.field M2: w = source.field.WF M3: w  new.MC++ M4: w  log = log U {new} M5: w  old.MC-- M6: source.fld = new } { C1: dst = source.field C2: source.field.WF = true C3: mark dst } { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0)  mark o E4: (mc > 0)  V = V U {o} return V } Trace Step (source, field)Mutator Step (source, field, new) Set Expose (log) What now ? Can we remove atomics ? Coarse-Grained to Fine-Grained Synchronization

{ C1: dst = source.field C2: source.field.WF = true C3: mark dst } { M1: old = source.field M2: w = source.field.WF M5: w  old.MC-- M3: w  new.MC++ M4: w  log = log U {new} M6: source.fld = new } { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0)  mark o E4: (mc > 0)  V = V U {o} return V } Trace Step (source, field)Mutator Step (source, field, new) Set Expose (log) What now ? Can we remove atomics ? “When in doubt, use brute force.” --Ken Thompson “When in doubt, use brute force.” --Ken Thompson Coarse-Grained to Fine-Grained Synchronization

Tracing Step Building Blocks Mutator Building Blocks Expose Building Blocks M1: old = source.field M2: w = source.field.WF M3: w  new.MC++ M4: w  log = log U {new} M5: w  old.MC-- M6: source.fld = new C1: dst = source.field C3: mark dst C2: source.field.WF = true E1: o= remove element from log E2: mc = o.MC E3: (mc > 0)  mark o E4: (mc > 0)  V = V U {o} System Input – Building Blocks Input Constraints Mutator blocks: [M3, M4] Tracing blocks: [C1, C3] Expose blocks: [ E1, E2, E3, E4 ] Dataflow e.g. M2 < M3

System Output – (Verified) Algorithms Mutator Step (source, field, new) { M1: old = source.field M6: source.fld = new M2: w = source.field.WF M3: w  new.MC++ M4: w  log = log U {new} M5: w  old.MC— } Set Expose(log) { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0)  mark o E4: (mc > 0)  V = V U {o} } Trace Step (source, field) { C1: dst = source.field C3: mark dst C2: source.field.WF = true } Explored 306 variations in around 2 mins Least atomic (verified) algorithm with given blocks

But What Now ? How do we get further improvement? Need more insights Need new building blocks  Example: start and end of collector reading a field Coordination Meta-data AtomicityOrdering

Continuing the Search… We derived a non-atomic algorithm (at the granularity of blocks)  Non atomic write-barrier, collector step and expose  System explored over 1,600,000 algorithms (took ~34 hours) All experiments took ~41 machine hours and ~3 human hours

CGC: Challenge for Automatic Verification Unbounded heap and sequence of mutations Checking a global invariant is hard  State space too big even for partial checking  3 nodes can quickly consume several GB in the SPIN model checker Solution Manually boil down to a local invariant Automatically prove local invariant  Use abstraction - unbounded number of concrete nodes conservatively represented by small, bounded number of abstract nodes

What Do We Prove? Want to prove collector safety  Retaining all live objects Local invariant: for every object  If an object is referenced from a scanned field at time of expose, it is either marked, or its count > 0 Show for any arbitrary object, under any arbitrary sequence of mutations

hiddn 2 root scanned field Abstraction Intuition Select tracked representative object Track reference count only for the selected object object header wavefront

hiddn 2 root Abstraction Intuition Only up to a fixed number of pointers matter – up to counting threshold Track these precisely Forget the rest scanned field object header wavefront

Recap High-level designFind a sufficient local invariant Find a sufficient abstraction Low-level searchVerify local invariant High-level designFind algorithm outline Find building blocks Low-level searchExplore algorithm space Generation Verification Find proof outline Find proof building blocks

What’s next? Concurrent Collector Synthesis  Get real algorithms  Mapping to real machine instructions Yet another level of search Synthesis of other concurrent algorithms  In the pipeline – concurrent set algorithms Local abstractions for concurrent programs

Invited Questions 1)Are your algorithms practical?Are your algorithms practical? 2)What are the limitations of this approach? Would it work for my problem?What are the limitations of this approach? Would it work for my problem? 3)How do you prove that your algorithms terminate?How do you prove that your algorithms terminate? 4)Can you show another algorithm?Can you show another algorithm? 5)How do you reduce the number of calls to the model- checker?How do you reduce the number of calls to the model- checker? 6)You didn’t mention any related workYou didn’t mention any related work 7)Can you give more details on experimental results?Can you give more details on experimental results?

ANSWERS FOLLOW

Where Do Building Blocks Come From? Read/write of heap location, and Collector coordination meta-data  e.g., collector progress, state flags

start_1 start_2 countmarked end_1 end_2 fld_1 fld_2 header fld_2start_3end_3 6 bits 5 bits … 1 bit 0 bits … start_1 start_2 countmarked end_1 end_2 fld_1 fld_2 header fld_2start_3 start_2 countmarked end_1 end_2 fld_1 fld_2 header fld_2start_3end_3 start_1 countmarked fld_1 fld_2 header fld_2 countmarked fld_1 fld_2 header fld_2end_3 countmarked fld_1 fld_2 header fld_2 Progress Coordination Metadata

Collector Building BlocksMutator Building Blocks Expose Building Blocks E1: o = remove element from log E2: mc = o.MC E3: (mc > 0)  mark o E4: (mc > 0)  V = V U {o} Refined Input – Finer Building Blocks M1: old = source.field M2s: ws = source.field.WFs M2e: we = source.field.WFe M3s: ws  new.MC++ M4s: ws  log = log U {new} M5e: we  old.MC-- M6: source.fld = new C1: dst = source.field C3: mark dst C2s: source.field.WFs = true C2e: source.field.WFe = true Input Constraints Mutator: [ M3s, M4s ] Tracing: [C1, C3], C2s < [C1, C3] < C2e Expose: [ E1, E2, E3, E4 ] Dataflow: e.g. M2s < M3s

Trace Step (source, field)Mutator Step (source, field, new) Set expose (log) { M1: old = source.field M2e: we = source.field.WFe M6: source.fld = new M2s: ws = source.field.WFs M3s: ws  new.MC++ M4s: ws  log = log U {new} M5e: we  old.MC– } { C2s: source.field.WFs = true C1: dst = source.field C3: mark dst C2e: source.field.WFe = true } { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0)  mark o E4: (mc > 0)  V = V U {o} } System Output Constraints = Insights. e.g.: M2e < M6 < M2s C2s < C13 < C2e and.

(Some) Related Work Superoptimizer: a look at the smallest program, Massalin, ASPLOS’87  Finite state, limited length of instruction sequences Programming by Sketching, Solar-Lezama et. al., PLDI’05  Finite state Sketching with Stencils, Solar Leazma et. al., PLDI’07 Automatic discovery of mutual exclusion algorithms, Bar David and Taubenfeld, PODC’03  Finite state Correctness-Preserving Derivation of Concurrent Garbage Collection Algorithms, PLDI’06 CheckFence: Sebastian Burckhardt, Rajeev Alur and Milo M. K. Martin, PLDI’07 …

Algorithm Exploration less atomic more atomic different orders

Algorithm Exploration less atomic more atomic different orders less atomic more atomic different orders less atomic more atomic differe nt orders Trace StepMutator Step Expose

Limitations Need algorithm designer insights  Designer needs to understand results of each phase Abstraction is tailor-made  Designing an abstraction for the next collector? Pushing the limits of current model-checkers  Multiple mutators? Unbounded number of mutators?  Better partial-order reduction may help

Are Your Algorithms Practical? Are your algorithms correct? Honest answer: not yet  So far focused on correctness more than on performance  However, counting algorithms are of practical interest The moral is that for the design of multiprocessor installations we cannot rely on the traditional approach of the optimistic engineer, who, when the design looks reasonable, puts it together to see if it works. -- Edsger W.Dijkstra

Experimental Results RunTotalCheckedCorrectTime (min) Timed out TOTAL About 180 minutes of human working with the system (3.8 Ghz Xeon processor and 8 Gb memory running version 4 of RedHat Linux.)

Why Does it Work? Ingredients  Relentless optimism  Limited setting Limited Setting  single collector, single mutator  counting threshold is known  algorithm skeleton is fixed  algorithm uses a barrier before moving to the sweep phase  … (see paper)

Concurrent  Single mutator, single collector (not parallel) Tracing  Computes transitive reachability from roots Non-Moving  Collector does not relocate objects Algorithm Space - Counting Algorithms

How Do You Prove Termination? Manually

DEMONS START HERE IF NOT EARLIER

Synthesizing Concurrent Algorithms Some tasks are best done by machine, while others are best done by human insight; and a properly designed system will find the right balance. – D. Knuth it seems unavoidable that multiprocessor installations will be built… it seems equally unavoidable that many of them will be put together by aforementioned optimistic engineer. I shudder at the thought of all the new bugs: they will only delight the Devil. Am I too pessimistic? Nobody knows the trouble I have seen... --Edsger W.Dijkstra