Data Representation Synthesis PLDI’2011, ESOP’12, PLDI’12 Peter Hawkins, Stanford University Alex Aiken, Stanford University Kathleen Fisher, Tufts Martin.

Slides:



Advertisements
Similar presentations
Inferring Locks for Atomic Sections Cornell University (summer intern at Microsoft Research) Microsoft Research Sigmund CheremTrishul ChilimbiSumit Gulwani.
Advertisements

Transparency No. 1 Java Collection API : Built-in Data Structures for Java.
Guy Golan-GuetaTel-Aviv University Nathan Bronson Stanford University Alex Aiken Stanford University G. Ramalingam Microsoft Research Mooly Sagiv Tel-Aviv.
Semantics Static semantics Dynamic semantics attribute grammars
Transaction Management: Concurrency Control CS634 Class 17, Apr 7, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of concurrency control Performance tuning.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Concurrency Control Chapter 17 Sections
Architecture Representation
Concurrency Control Part 2 R&G - Chapter 17 The sequel was far better than the original! -- Nobody.
1 CS216 Advanced Database Systems Shivnath Babu Notes 11: Concurrency Control.
Data Representation Synthesis PLDI’2011 *, ESOP’12, PLDI’12 * CACM’12 Peter Hawkins, Stanford University Alex Aiken, Stanford University Kathleen Fisher,
1 Concurrency Specification. 2 Outline 4 Issues in concurrent systems 4 Programming language support for concurrency 4 Concurrency analysis - A specification.
Chapter 6 Process Synchronization: Part 2. Problems with Semaphores Correct use of semaphore operations may not be easy: –Suppose semaphore variable called.
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File-System Interface.
COLT: Effective Testing of Concurrent Collection Usage Alex Aiken Nathan Bronson Stanford University Martin Vechev Eran Yahav IBM Research Mooly Sagiv.
Database management concepts Database Management Systems (DBMS) An example of a database (relational) Database schema (e.g. relational) Data independence.
Persistent State Service 1 Distributed Object Transactions  Transaction principles  Concurrency control  The two-phase commit protocol  Services for.
Lock Inference for Systems Software John Regehr Alastair Reid University of Utah March 17, 2003.
A Type System for Expressive Security Policies David Walker Cornell University.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.
Testing Atomicity of Composed Concurrent Operations Ohad Shacham Tel Aviv University Nathan Bronson Stanford University Alex Aiken Stanford University.
Comparison Under Abstraction for Verifying Linearizability Daphna Amit Noam Rinetzky Mooly Sagiv Tom RepsEran Yahav Tel Aviv UniversityUniversity of Wisconsin.
Presentation Topic 18.7 of Book Tree Protocol Submitted to: Prof. Dr. T.Y.LIN Submitted By :Saurabh Vishal.
SWE 4743 Abstract Data Types Richard Gesick. SWE Abstraction Classification, generalization, and aggregation are the basic ways we have of structuring.
C o n f i d e n t i a l Developed By Nitendra NextHome Subject Name: Data Structure Using C Title: Overview of Data Structure.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
OSE 2013 – synchronization (lec3) 1 Operating Systems Engineering Locking & Synchronization [chapter #4] By Dan Tsafrir,
CIS 720 Concurrency Control. Locking Atomic statement –Can be used to perform two or more updates atomically Th1: …. ;……. Th2:…………. ;…….
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Data Structures and Abstract Data Types "Get your data structures correct first, and the rest of the program will write itself." - David Jones.
Runtime Refinement Checking of Concurrent Data Structures (the VYRD project) Serdar Tasiran Koç University, Istanbul, Turkey Shaz Qadeer Microsoft Research,
Unit III : Introduction To Data Structures and Analysis Of Algorithm 10/8/ Objective : 1.To understand primitive storage structures and types 2.To.
Verifying Atomicity via Data Independence Ohad Shacham Yahoo Labs, Israel Eran Yahav Technion, Israel Guy Gueta Yahoo Labs, Israel Alex Aiken Stanford.
Testing and Verifying Atomicity of Composed Concurrent Operations Ohad Shacham Tel Aviv University Nathan Bronson Stanford University Alex Aiken Stanford.
Querying Structured Text in an XML Database By Xuemei Luo.
Enabling Refinement with Synthesis Armando Solar-Lezama with work by Zhilei Xu and many others*
Chapter 10: File-System Interface Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Jan 1, 2005 Chapter 10: File-System.
111 © 2002, Cisco Systems, Inc. All rights reserved.
ROBERT BOCCHINO, ET AL. UNIVERSAL PARALLEL COMPUTING RESEARCH CENTER UNIVERSITY OF ILLINOIS A Type and Effect System for Deterministic Parallel Java *Based.
Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.
1 Concurrency Control II: Locking and Isolation Levels.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Inferring Synchronization under Limited Observability Martin Vechev, Eran Yahav, Greta Yorsh IBM T.J. Watson Research Center (work in progress)
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
1 Chapter 2 The Big Picture. 2 Overview ● The big picture answers to several questions.  What are data structures?  What data structures do we study?
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Consider the program fragment below left. Assume that the program containing this fragment executes t1() and t2() on separate threads running on separate.
Transaction Management Overview. Transactions Concurrent execution of user programs is essential for good DBMS performance. – Because disk accesses are.
Session 1 Module 1: Introduction to Data Integrity
Data Design and Implementation. Definitions Atomic or primitive type A data type whose elements are single, non-decomposable data items Composite type.
1 Lecture 4: Transaction Serialization and Concurrency Control Advanced Databases CG096 Nick Rossiter [Emma-Jane Phillips-Tait]
Roman Manevich Rashid Kaleem Keshav Pingali University of Texas at Austin Synthesizing Concurrent Graph Data Structures: a Case Study.
Parallel Data Structures. Story so far Wirth’s motto –Algorithm + Data structure = Program So far, we have studied –parallelism in regular and irregular.
G5AIAI Introduction to AI
1 Chapter 4 Unordered List. 2 Learning Objectives ● Describe the properties of an unordered list. ● Study sequential search and analyze its worst- case.
David Evans CS201J: Engineering Software University of Virginia Computer Science Lecture 5: Implementing Data Abstractions.
1 Concurrency Control. 2 Why Have Concurrent Processes? v Better transaction throughput, response time v Done via better utilization of resources: –While.
Chapter 8: Concurrency Control on Relational Databases
Amir Kamil and Katherine Yelick
Database management concepts
Lecture 21: Concurrency & Locking
Database management concepts
Lecture 22: Intro to Transactions & Logging IV
Introduction to Data Structure
Amir Kamil and Katherine Yelick
Developing and testing enterprise Java applications
Concurrent Cache-Oblivious B-trees Using Transactional Memory
Lecture 4: File-System Interface
Presentation transcript:

Data Representation Synthesis PLDI’2011, ESOP’12, PLDI’12 Peter Hawkins, Stanford University Alex Aiken, Stanford University Kathleen Fisher, Tufts Martin Rinard, MIT Mooly Sagiv, TAU

Research Interests Verifying properties of low level data structure manipulations Synthesizing low level data structure manipulations – Composing ADT operations Concurrent programming via smart libraries

Motivation Sophisticated data structures exist Increase abstraction level Simplifies programming – Concurrency Integrated into modern PL But hard to use – Composing several operations

next table [0] [1] [2] [3] [4] [5] null index= 2index = 0index = 3index = 1index= thttpd: Web Server Conn Map Representation Invariants: 1.  n: Map.  v: Z. table[v] =n  n->index=v 2.  n: Map. n->rc =|{n' : Conn. n’->file_data = n}|

static void add_map(Map *m) { int i = hash(m); … table[i] = m ; … m->index= i ; … m->rc++; } thttptd:mmc.c Representation Invariants: 1.  n: Map.  v:Z. table[v] =n  index[n]=v 2.  n:Map. rc[n] =|{n' : Conn. file_data[n’] = n}| broken restored

attr = new HashMap(); … Attribute removeAttribute(String name){ Attribute val = null; synchronized(attr) { found = attr.containsKey(name) ; if (found) { val = attr.get(name); attr.remove(name); } return val; } TOMCAT Motivating Example TOMCAT 5.* Invariant: removeAttribute(name) returns the removed value or null if the name does not exist

attr = new ConcurrentHashMap(); … Attribute removeAttribute(String name){ Attribute val = null; /* synchronized(attr) { */ found = attr.containsKey(name) ; if (found) { val = attr.get(name); attr.remove(name); } /* } */ return val; } TOMCAT Motivating Example TOMCAT 6.* attr.put(“A”, o); attr.remove(“A”); o Invariant: removeAttribute(name) returns the removed value or null if the name does not exist

Multiple ADTs Invariant: Every element that added to eden is either in eden or in longterm public void put(K k, V v) { if (this.eden.size() >= size) { this.longterm.putAll(this.eden); this.eden.clear(); } this.eden.put(k, v); }

filesystem=1 s_list s_files filesystem=2 s_list s_files filesystems file=14 f_list f_fs_list file=6 f_list f_fs_list file=2 f_list f_fs_list file=7 f_list f_fs_list file=5 f_list f_fs_list file_in_use file_unused Filesystem Example

Access Patterns Find all mounted filesystems Find cached files on each filesystem Iterate over all used or unused cached files in Least- Recently-Used order filesystem=1 s_list s_files filesystem=2 s_list s_files filesystems file=14 f_list f_fs_list file=6 f_list f_fs_list file=2 f_list f_fs_list file=7 f_list f_fs_list file=5 f_list f_fs_list file_in_use file_unused

11 Desired Properties of Data Representations Correctness Efficiency for access patterns Simple to reason about Easy to evolve

Specifying Combination of Shared Data Structures Separation Logic First order logic+ reachability Monadic 2 nd order logic on trees Graph grammars …

fsfileinuse 114F 27T 25F 16T 12F filesystems filesystem=1 s_list s_files filesystem=2 s_list s_files file=14 f_list f_fs_list file=6 f_list f_fs_list file=2 f_list f_fs_list file=7 f_list f_fs_list file=5 f_list f_fs_list file_in_use file_unused {fs, file}  {inuse} fsfileinuse 117F 134T 142F 25T 25F Filesystem Example

Filesystem Example Containers: View 1 fs:1 fs:2 file:14 file:6 file:2 file:7 file:5 filesystem=1 s_list s_files filesystem=2 s_list s_files filesystems file=14 f_list f_fs_list file=6 f_list f_fs_list file=2 f_list f_fs_list file=7 f_list f_fs_list file=5 f_list f_fs_list file_in_use file_unused

Filesystem Example Containers: View 2 filesystem=1 s_list s_files filesystem=2 s_list s_files filesystems file=14 f_list f_fs_list file=6 f_list f_fs_list file=2 f_list f_fs_list file=7 f_list f_fs_list file=5 f_list f_fs_list file_in_use file_unused fs: 2, file:7 fs: 1, file:14 fs: 2, file:5 inuse:T inuse:F fs: 1, file:2

Filesystem Example Containers: Both Views fs:1 fs:2 file:14 file:6 file:2 file:7 file:5 filesystem=1 s_list s_files filesystem=2 s_list s_files filesystems file=14 f_list f_fs_list file=6 f_list f_fs_list file=2 f_list f_fs_list file=7 f_list f_fs_list file=5 f_list f_fs_list file_in_use file_unused fs: 2, file:7 fs: 1, file14 fs: 2, file:5 inuse:T inuse:F fs  file  inuse {fs, file}  { inuse}

Decomposing Relations High-level shape descriptor A rooted directed acyclic graph The root represents the whole relation The sub-graphs rooted at each node represent sub-relations Edges are labeled with columns Each edge maps a relation into a sub-relation Different types of edges for different primitive data- structures Shared nodes correspond to shared sub-relations fs  file  inuse {fs, file}  { inuse} inuse fs, file fs inuse file list array list

Filesystem Example fsfileinuse 114F 27T 25F 16T 12F inuse fs, file fs inuse file inuse 14F 6T 2F fileinuse 7T 5F F T F T F fs:1 fs:2 file:14 file:6 file:2 file:7 file:5

Memory Decomposition(Left) inuse fs, file fs inuse file inuse:F inuse:T inuse:F Inuse:T Inuse:F fs:1 fs:2 file:14 file:6 file:2 file:7 file:5

Filesystem Example fsfileinuse 114F 27T 25F 16T 12F fsfile fsfile fs:2 file:7 inuse T T F F F fs:1 file:6 fs:1 file:14 fs:2 file:5 fs:1 file:2 inuse:T inuse:F inuse fs, file fs inuse file

Memory Decomposition(Right) fs:2 file:7 fs:1 file:6 fs:1 file:14 fs:2 file:5 fs:1 file:2 inuse:T inuse:F inuse:Tinuse:FInuse:TInuse:F inuse fs, file fs inuse file

Decomposition Instance fs:2 file:7 fs:1 file:6 fs:1 file:14 fs:2 file:5 fs:1 file:2 inuse:T inuse:F inuse:Tinuse:FInuse:TInuse:F fsfileinuse 114F 27T 25F 16T 12F fs:1 file:14 file:6 file:2 file:7 file:5 fs:2 fs  file  inuse {fs, file}  { inuse} inuse fs, file fs inuse file

Shape Descriptor fs:2 file:7 fs:1 file:6 fs:1 file:14 fs:2 file:5 fs:1 file:2 inuse:T inuse:F inuse:Tinuse:FInuse:TInuse:F fsfileinuse 114F 27T 25F 16T 12F fs:1 file:14 file:6 file:2 file:7 file:5 fs:2 fs  file  inuse {fs, file}  { inuse} inuse fs, file fs inuse file

Memory State fs:2 file:7 fs:1 file:6 fs:1 file:14 fs:2 file:5 fs:1 file:2 inuse:T inuse:F inuse:Tinuse:FInuse:TInuse:F fsfileinuse 114F 27T 25F 16T 12F fs:1 file:14 file:6 file:2 file:7 file:5 fs  file  inuse {fs, file}  { inuse} inuse fs, file fs inuse file fs:2 list array list s_list f_fs_list f_list

Memory State fsfileinuse 114F 27T 25F 16T 12F fs  file  inuse {fs, file}  { inuse} inuse fs, file fs inuse file list array list filesystems filesystem=1 s_list s_files filesystem=2 s_list s_files file=14 f_list f_fs_list file=6 f_list f_fs_list file=2 f_list f_fs_list file=7 f_list f_fs_list file=5 f_list f_fs_list file_in_use file_unused

Adequacy A decomposition is adequate if it can represent every possible relation matching a relational specification Adequacy enforces sufficient conditions for adequacy Not every decomposition is a good representation of a relation

Adequacy of Decompositions All columns are represented Nodes are consistent with functional dependencies – Columns bound to paths leading to a common node must functionally determine each other

Respect Functional Dependencies file inuse  {file}  {inuse}

Respect Functional Dependencies file,fs inuse {file, fs}  {inuse}

Adequacy and Sharing fs, file fs inuse file inuse Columns bound on a path to an object x must functionally determine columns bound on any other path to x {fs, file}  {inuse, fs, file}

Adequacy and Sharing fs inuse file inuse Columns bound on a path to an object x must functionally determine columns bound on any other path to x  {fs, file}  {inuse, fs}

Decompositions and Aliasing A decomposition is an upper bound on the set of possible aliases Example: there are exactly two paths to any instance of node w

The RelC Compiler[PLDI’11] Relational Specification Programmer Decomposition Data structure designer RelC C++

The RelC Compiler fs  file  inuse {fs, file}  {inuse} foreach  filesystems s.t. fs= 5 do … RelC C++ inuse fs, file fs list inuse file list array list Relational Specification Graph decomposition

Query Plans foreach  filesystems if inuse=T do … fs, file fs list inuse file inuse list array list scan Cost proportional to the number of files

Query Plans foreach  filesystems if inuse=T do … fs, file fs list inuse file inuse list array list access scan Cost proportional to the number of files in use

Removal and graph cuts remove fs list file inuse list array list fsfileinuse 114F 27T 25F 16T 12F filesystems fs:1 s_list s_files fs:2 s_list s_files file:14 f_list f_fs_list file:6 f_list f_fs_list f_list f_fs_list file:7 f_list f_fs_list file:5 f_list f_fs_list inuse:T inuse:F file:2    

Removal and graph cuts remove fsfileinuse 114F 27T 25F 16T 12F filesystems fs:2 s_list s_files file:7 f_list f_fs_list file:5 f_list f_fs_list inuse:T inuse:F fs list file inuse list array list

Abstraction Theorem If the programmer obeys the relational specification and the decomposition is adequate and if the individual containers are correct Then the generated low-level code maintains the relational abstraction relation remove memory state memory state low level code remove  

40 Filesystem Example Concurrent ^ Lock-based concurrent code is difficult to write and difficult to reason about Hard to choose the correct granularity of locking Hard to use concurrent containers correctly

41 Granularity of Locking Fine-grained locking Coarse-grained locking More complex Easier to reason about Easier to change More scalable Lower overhead Less complex

42 Concurrent Containers Concurrent containers are hard to write Excellent concurrent containers exist (e.g., Java’s ConcurrentHashMap and ConcurrentSkipListMap ) Composing concurrent containers is tricky – It is not usually safe to update containers in paralel – Lock association is tricky In practice programmers often * use concurrent containers incorrectly * 44% of the time [Shacham et al., 2011]

43 Concurrent Data Representation Synthesis[PLDI’12] RelScala Compiler Concurrent Data Structures, Atomic Transactions Relational Specification Concurrent Decomposition

44 Concurrent Decompositions Concurrent decompositions describe how to represent a relation using concurrent containers and locks Concurrent decompositions specify: – the choice of containers – the number and placement of locks – which locks protect which data = ConcurrentHashMap = TreeMap

45 Concurrent Interface Interface consisting of atomic operations:

46 Goal: Serializability Thread 1 Thread 2 Time Every schedule of operations is equivalent to some serial schedule

47 Goal: Serializability Time Thread 1 Thread 2 Every schedule of operations is equivalent to some serial schedule

48 Two-Phase Locking Two phase locking protocol: Well-locked: To perform a read or write, a thread must hold the corresponding lock Two-phase: All lock acquisitions must precede all lock releases Attach a lock to each piece of data Theorem [Eswaran et al., 1976]: Well-locked, two-phase transactions are serializable

49 Two Phase Locking Attach a lock to every edge Problem 2: Too many locks DecompositionDecomposition Instance We’re done! Problem 1: Can’t attach locks to container entries

Lock Placements 1. Attach locks to nodes DecompositionDecomposition Instance

51 Coarse-Grained Locking DecompositionDecomposition Instance

52 Finer-Grained Locking DecompositionDecomposition Instance

53 Lock Striping DecompositionDecomposition Instance

54 Lock Placements: Domination DecompositionDecomposition Instance Locks must dominate the edges they protect

55 Lock Placements: Path-Closure All edges on a path between an edge and its lock must share the same lock

56 Decompositions and Aliasing A decomposition is an upper bound on the set of possible aliases Example: there are exactly two paths to any instance of node w We can reason about the relationship between locks and heap cells soundly

57 Lock Ordering Prevent deadlock via a topological order on locks

58 Queries and Deadlock 2. lookup(tv) 1. acquire(t) 3. acquire(v) 4. scan(vw) Query plans must acquire the correct locks in the correct order Example: find files on a particular filesystem

Autotuner Given a fixed set of primitive types – list, circular list, doubly-linked list, array, map, … A workload Exhaustively enumerate all the adequate decompositions up to certain size The compiler can automatically pick the best representation for the workload

Directed Graph Example (DFS) Columns src  dst  weight Functional Dependencies – {src, dst}  {weight} Primitive data types – map, list … src dst weight map list src dst src weight map list map dst src weight map list dst weight map src map weight dst list src list

61 Concurrent Synthesis Find optimal combination of Decomposition Shape Container Data Structures ConcurrentHashMap ConcurrentSkipListMap CopyOnWriteArrayList Array HashMap TreeMap LinkedList Lock Implementations ReentrantReadWriteLock ReentrantLock Lock Striping Factors Lock Placement

62 Concurrent Graph Benchmark Start with an empty graph Each thread performs 5 x 10 5 random operations Distribution of operations a-b-c-d (a% find successors, b% find predecessors, c% insert edge, d% remove edge) Plot throughput with varying number of threads Based on Herlihy’s benchmark of concurrent maps

Black = handwritten, isomorphic to blue Key = ConcurrentHashMap = HashMap Results: % find successor, 35% find predecessor, 20% insert edge, 10% remove edge

(Some) Related Projects Relational synthesis: [Cohen & Campbell 1993], [Batory & Thomas 1996], [Smaragdakis & Batory 1997], [Batory et al. 2000] [Manevich, 2012] … Two-phase locking and Predicate Locking [Eswaran et al., 1976], Tree and DAG locking protocols [Attiya et al., 2010], Domination Locking [Golan-Gueta et al., 2011] Lock Inference for Atomic Sections: [McCloskey et al.,2006], [Hicks, 2006], [Emmi, 2007]

Summary Decompositions describe how to represent a relation using concurrent containers Lock placements capture different locking strategies Synthesis explores the combined space of decompositions and lock placements to find the best possible concurrent data structure implementations

The Big Idea Simpler code Correctness by construction Find better combinations of data structures and locking Benefits Relational Specification Low Level Concurrent Code

Transactional Objects with Foresight Guy Gueta G. Ramalingam Mooly Sagiv Eran Yahav

Motivation Modern libraries provide atomic ADTs Hidden synchronization complexity Not composable – Sequences of operations are not necessarily atomic

Transactional Objects with Foresight The client declares the intended use of an API via temporal specifications (foresight) The library utilizes the specification – Synchronize between operations which do not serialize with foresight Foresight can be automatically inferred with sequential static analysis

Example: Positive Counter Operation  ) Operation  ) Composite operations are not atomic even when Inc/Dec are atomic Enforce atomicity using future information Future information can be utilized for effective synchronization 

Possible Executions (initial value=0)

Main Results Define the notion of ADTs that utilize foresight information – Show that they are strict generalization of existing locking mechanisms Create a Java Map ADT which utilizes foresight Static analysis for inferring conservative foresight information Performance is comparable to manual synchronization Can be used for real composite operations

The RelC Compiler fs  file  inuse {fs, file}  {inuse} foreach  filesystems s.t. fs= 5 do … RelC C++ inuse fs, file fs list inuse file list array list Relational Specification Graph decomposition