Data Representation Synthesis PLDI’2011, ESOP’12, PLDI’12 Peter Hawkins, Stanford University Alex Aiken, Stanford University Kathleen Fisher, Tufts Martin.

Data Representation Synthesis PLDI’2011, ESOP’12, PLDI’12 Peter Hawkins, Stanford University Alex Aiken, Stanford University Kathleen Fisher, Tufts Martin Rinard, MIT Mooly Sagiv, TAU http://theory.stanford.edu/~hawkinsp/

Research Interests Verifying properties of low level data structure manipulations Synthesizing low level data structure manipulations – Composing ADT operations Concurrent programming via smart libraries

Motivation Sophisticated data structures exist Increase abstraction level Simplifies programming – Concurrency Integrated into modern PL But hard to use – Composing several operations

next table [0] [1] [2] [3] [4] [5] null index= 2index = 0index = 3index = 1index= 5 2 03 1 1 thttpd: Web Server Conn Map Representation Invariants: 1.  n: Map.  v: Z. table[v] =n  n->index=v 2.  n: Map. n->rc =|{n' : Conn. n’->file_data = n}|

static void add_map(Map *m) { int i = hash(m); … table[i] = m ; … m->index= i ; … m->rc++; } thttptd:mmc.c Representation Invariants: 1.  n: Map.  v:Z. table[v] =n  index[n]=v 2.  n:Map. rc[n] =|{n' : Conn. file_data[n’] = n}| broken restored

attr = new HashMap(); … Attribute removeAttribute(String name){ Attribute val = null; synchronized(attr) { found = attr.containsKey(name) ; if (found) { val = attr.get(name); attr.remove(name); } return val; } TOMCAT Motivating Example TOMCAT 5.* Invariant: removeAttribute(name) returns the removed value or null if the name does not exist

attr = new ConcurrentHashMap(); … Attribute removeAttribute(String name){ Attribute val = null; /* synchronized(attr) { */ found = attr.containsKey(name) ; if (found) { val = attr.get(name); attr.remove(name); } /* } */ return val; } TOMCAT Motivating Example TOMCAT 6.* attr.put(“A”, o); attr.remove(“A”); o Invariant: removeAttribute(name) returns the removed value or null if the name does not exist

Multiple ADTs Invariant: Every element that added to eden is either in eden or in longterm public void put(K k, V v) { if (this.eden.size() >= size) { this.longterm.putAll(this.eden); this.eden.clear(); } this.eden.put(k, v); }

filesystem=1 s_list s_files filesystem=2 s_list s_files filesystems file=14 f_list f_fs_list file=6 f_list f_fs_list file=2 f_list f_fs_list file=7 f_list f_fs_list file=5 f_list f_fs_list file_in_use file_unused Filesystem Example

Access Patterns Find all mounted filesystems Find cached files on each filesystem Iterate over all used or unused cached files in Least- Recently-Used order filesystem=1 s_list s_files filesystem=2 s_list s_files filesystems file=14 f_list f_fs_list file=6 f_list f_fs_list file=2 f_list f_fs_list file=7 f_list f_fs_list file=5 f_list f_fs_list file_in_use file_unused

11 Desired Properties of Data Representations Correctness Efficiency for access patterns Simple to reason about Easy to evolve

Specifying Combination of Shared Data Structures Separation Logic First order logic+ reachability Monadic 2 nd order logic on trees Graph grammars …

fsfileinuse 114F 27T 25F 16T 12F filesystems filesystem=1 s_list s_files filesystem=2 s_list s_files file=14 f_list f_fs_list file=6 f_list f_fs_list file=2 f_list f_fs_list file=7 f_list f_fs_list file=5 f_list f_fs_list file_in_use file_unused {fs, file}  {inuse} fsfileinuse 117F 134T 142F 25T 25F Filesystem Example

Filesystem Example Containers: View 1 fs:1 fs:2 file:14 file:6 file:2 file:7 file:5 filesystem=1 s_list s_files filesystem=2 s_list s_files filesystems file=14 f_list f_fs_list file=6 f_list f_fs_list file=2 f_list f_fs_list file=7 f_list f_fs_list file=5 f_list f_fs_list file_in_use file_unused

Filesystem Example Containers: View 2 filesystem=1 s_list s_files filesystem=2 s_list s_files filesystems file=14 f_list f_fs_list file=6 f_list f_fs_list file=2 f_list f_fs_list file=7 f_list f_fs_list file=5 f_list f_fs_list file_in_use file_unused fs: 2, file:7 fs: 1, file:14 fs: 2, file:5 inuse:T inuse:F fs: 1, file:2

Filesystem Example Containers: Both Views fs:1 fs:2 file:14 file:6 file:2 file:7 file:5 filesystem=1 s_list s_files filesystem=2 s_list s_files filesystems file=14 f_list f_fs_list file=6 f_list f_fs_list file=2 f_list f_fs_list file=7 f_list f_fs_list file=5 f_list f_fs_list file_in_use file_unused fs: 2, file:7 fs: 1, file14 fs: 2, file:5 inuse:T inuse:F fs  file  inuse {fs, file}  { inuse}

Decomposing Relations High-level shape descriptor A rooted directed acyclic graph The root represents the whole relation The sub-graphs rooted at each node represent sub-relations Edges are labeled with columns Each edge maps a relation into a sub-relation Different types of edges for different primitive data- structures Shared nodes correspond to shared sub-relations fs  file  inuse {fs, file}  { inuse} inuse fs, file fs inuse file list array list

Filesystem Example fsfileinuse 114F 27T 25F 16T 12F inuse fs, file fs inuse file inuse 14F 6T 2F fileinuse 7T 5F F T F T F fs:1 fs:2 file:14 file:6 file:2 file:7 file:5

Memory Decomposition(Left) inuse fs, file fs inuse file inuse:F inuse:T inuse:F Inuse:T Inuse:F fs:1 fs:2 file:14 file:6 file:2 file:7 file:5

Filesystem Example fsfileinuse 114F 27T 25F 16T 12F fsfile 27 16 fsfile 114 25 12 fs:2 file:7 inuse T T F F F fs:1 file:6 fs:1 file:14 fs:2 file:5 fs:1 file:2 inuse:T inuse:F inuse fs, file fs inuse file

Memory Decomposition(Right) fs:2 file:7 fs:1 file:6 fs:1 file:14 fs:2 file:5 fs:1 file:2 inuse:T inuse:F inuse:Tinuse:FInuse:TInuse:F inuse fs, file fs inuse file

Decomposition Instance fs:2 file:7 fs:1 file:6 fs:1 file:14 fs:2 file:5 fs:1 file:2 inuse:T inuse:F inuse:Tinuse:FInuse:TInuse:F fsfileinuse 114F 27T 25F 16T 12F fs:1 file:14 file:6 file:2 file:7 file:5 fs:2 fs  file  inuse {fs, file}  { inuse} inuse fs, file fs inuse file

Shape Descriptor fs:2 file:7 fs:1 file:6 fs:1 file:14 fs:2 file:5 fs:1 file:2 inuse:T inuse:F inuse:Tinuse:FInuse:TInuse:F fsfileinuse 114F 27T 25F 16T 12F fs:1 file:14 file:6 file:2 file:7 file:5 fs:2 fs  file  inuse {fs, file}  { inuse} inuse fs, file fs inuse file

Memory State fs:2 file:7 fs:1 file:6 fs:1 file:14 fs:2 file:5 fs:1 file:2 inuse:T inuse:F inuse:Tinuse:FInuse:TInuse:F fsfileinuse 114F 27T 25F 16T 12F fs:1 file:14 file:6 file:2 file:7 file:5 fs  file  inuse {fs, file}  { inuse} inuse fs, file fs inuse file fs:2 list array list s_list f_fs_list f_list

Memory State fsfileinuse 114F 27T 25F 16T 12F fs  file  inuse {fs, file}  { inuse} inuse fs, file fs inuse file list array list filesystems filesystem=1 s_list s_files filesystem=2 s_list s_files file=14 f_list f_fs_list file=6 f_list f_fs_list file=2 f_list f_fs_list file=7 f_list f_fs_list file=5 f_list f_fs_list file_in_use file_unused

Adequacy A decomposition is adequate if it can represent every possible relation matching a relational specification Adequacy enforces sufficient conditions for adequacy Not every decomposition is a good representation of a relation

Adequacy of Decompositions All columns are represented Nodes are consistent with functional dependencies – Columns bound to paths leading to a common node must functionally determine each other

Respect Functional Dependencies file inuse  {file}  {inuse}

Respect Functional Dependencies file,fs inuse {file, fs}  {inuse}

Adequacy and Sharing fs, file fs inuse file inuse Columns bound on a path to an object x must functionally determine columns bound on any other path to x {fs, file}  {inuse, fs, file}

Adequacy and Sharing fs inuse file inuse Columns bound on a path to an object x must functionally determine columns bound on any other path to x  {fs, file}  {inuse, fs}

Decompositions and Aliasing A decomposition is an upper bound on the set of possible aliases Example: there are exactly two paths to any instance of node w

The RelC Compiler[PLDI’11] Relational Specification Programmer Decomposition Data structure designer RelC C++

The RelC Compiler fs  file  inuse {fs, file}  {inuse} foreach  filesystems s.t. fs= 5 do … RelC C++ inuse fs, file fs list inuse file list array list Relational Specification Graph decomposition

Query Plans foreach  filesystems if inuse=T do … fs, file fs list inuse file inuse list array list scan Cost proportional to the number of files

Query Plans foreach  filesystems if inuse=T do … fs, file fs list inuse file inuse list array list access scan Cost proportional to the number of files in use

Removal and graph cuts remove fs list file inuse list array list fsfileinuse 114F 27T 25F 16T 12F filesystems fs:1 s_list s_files fs:2 s_list s_files file:14 f_list f_fs_list file:6 f_list f_fs_list f_list f_fs_list file:7 f_list f_fs_list file:5 f_list f_fs_list inuse:T inuse:F file:2    

Removal and graph cuts remove fsfileinuse 114F 27T 25F 16T 12F filesystems fs:2 s_list s_files file:7 f_list f_fs_list file:5 f_list f_fs_list inuse:T inuse:F fs list file inuse list array list

Abstraction Theorem If the programmer obeys the relational specification and the decomposition is adequate and if the individual containers are correct Then the generated low-level code maintains the relational abstraction relation remove memory state memory state low level code remove  

40 Filesystem Example Concurrent ^ Lock-based concurrent code is difficult to write and difficult to reason about Hard to choose the correct granularity of locking Hard to use concurrent containers correctly

41 Granularity of Locking Fine-grained locking Coarse-grained locking More complex Easier to reason about Easier to change More scalable Lower overhead Less complex

42 Concurrent Containers Concurrent containers are hard to write Excellent concurrent containers exist (e.g., Java’s ConcurrentHashMap and ConcurrentSkipListMap ) Composing concurrent containers is tricky – It is not usually safe to update containers in paralel – Lock association is tricky In practice programmers often * use concurrent containers incorrectly * 44% of the time [Shacham et al., 2011]

43 Concurrent Data Representation Synthesis[PLDI’12] RelScala Compiler Concurrent Data Structures, Atomic Transactions Relational Specification Concurrent Decomposition

44 Concurrent Decompositions Concurrent decompositions describe how to represent a relation using concurrent containers and locks Concurrent decompositions specify: – the choice of containers – the number and placement of locks – which locks protect which data = ConcurrentHashMap = TreeMap

45 Concurrent Interface Interface consisting of atomic operations:

46 Goal: Serializability Thread 1 Thread 2 Time Every schedule of operations is equivalent to some serial schedule

47 Goal: Serializability Time Thread 1 Thread 2 Every schedule of operations is equivalent to some serial schedule

48 Two-Phase Locking Two phase locking protocol: Well-locked: To perform a read or write, a thread must hold the corresponding lock Two-phase: All lock acquisitions must precede all lock releases Attach a lock to each piece of data Theorem [Eswaran et al., 1976]: Well-locked, two-phase transactions are serializable

49 Two Phase Locking Attach a lock to every edge Problem 2: Too many locks DecompositionDecomposition Instance We’re done! Problem 1: Can’t attach locks to container entries

Lock Placements 1. Attach locks to nodes DecompositionDecomposition Instance

51 Coarse-Grained Locking DecompositionDecomposition Instance

52 Finer-Grained Locking DecompositionDecomposition Instance

53 Lock Striping DecompositionDecomposition Instance

54 Lock Placements: Domination DecompositionDecomposition Instance Locks must dominate the edges they protect

55 Lock Placements: Path-Closure All edges on a path between an edge and its lock must share the same lock

56 Decompositions and Aliasing A decomposition is an upper bound on the set of possible aliases Example: there are exactly two paths to any instance of node w We can reason about the relationship between locks and heap cells soundly

57 Lock Ordering Prevent deadlock via a topological order on locks

58 Queries and Deadlock 2. lookup(tv) 1. acquire(t) 3. acquire(v) 4. scan(vw) Query plans must acquire the correct locks in the correct order Example: find files on a particular filesystem

Autotuner Given a fixed set of primitive types – list, circular list, doubly-linked list, array, map, … A workload Exhaustively enumerate all the adequate decompositions up to certain size The compiler can automatically pick the best representation for the workload

Directed Graph Example (DFS) Columns src  dst  weight Functional Dependencies – {src, dst}  {weight} Primitive data types – map, list … src dst weight map list src dst src weight map list map dst src weight map list dst weight map src map weight dst list src list

61 Concurrent Synthesis Find optimal combination of Decomposition Shape Container Data Structures ConcurrentHashMap ConcurrentSkipListMap CopyOnWriteArrayList Array HashMap TreeMap LinkedList Lock Implementations ReentrantReadWriteLock ReentrantLock Lock Striping Factors Lock Placement

62 Concurrent Graph Benchmark Start with an empty graph Each thread performs 5 x 10 5 random operations Distribution of operations a-b-c-d (a% find successors, b% find predecessors, c% insert edge, d% remove edge) Plot throughput with varying number of threads Based on Herlihy’s benchmark of concurrent maps

Black = handwritten, isomorphic to blue Key = ConcurrentHashMap = HashMap... 63 Results: 35-35-20-10 35% find successor, 35% find predecessor, 20% insert edge, 10% remove edge

(Some) Related Projects Relational synthesis: [Cohen & Campbell 1993], [Batory & Thomas 1996], [Smaragdakis & Batory 1997], [Batory et al. 2000] [Manevich, 2012] … Two-phase locking and Predicate Locking [Eswaran et al., 1976], Tree and DAG locking protocols [Attiya et al., 2010], Domination Locking [Golan-Gueta et al., 2011] Lock Inference for Atomic Sections: [McCloskey et al.,2006], [Hicks, 2006], [Emmi, 2007]

Summary Decompositions describe how to represent a relation using concurrent containers Lock placements capture different locking strategies Synthesis explores the combined space of decompositions and lock placements to find the best possible concurrent data structure implementations

The Big Idea Simpler code Correctness by construction Find better combinations of data structures and locking Benefits Relational Specification Low Level Concurrent Code

Transactional Objects with Foresight Guy Gueta G. Ramalingam Mooly Sagiv Eran Yahav

Motivation Modern libraries provide atomic ADTs Hidden synchronization complexity Not composable – Sequences of operations are not necessarily atomic

Transactional Objects with Foresight The client declares the intended use of an API via temporal specifications (foresight) The library utilizes the specification – Synchronize between operations which do not serialize with foresight Foresight can be automatically inferred with sequential static analysis

Example: Positive Counter Operation A @mayUse(Inc); Inc() @mayUse(  ) Operation B @mayUse(Dec); Dec() @mayUse(  ) Composite operations are not atomic even when Inc/Dec are atomic Enforce atomicity using future information Future information can be utilized for effective synchronization 

Possible Executions (initial value=0)

Main Results Define the notion of ADTs that utilize foresight information – Show that they are strict generalization of existing locking mechanisms Create a Java Map ADT which utilizes foresight Static analysis for inferring conservative foresight information Performance is comparable to manual synchronization Can be used for real composite operations

The RelC Compiler fs  file  inuse {fs, file}  {inuse} foreach  filesystems s.t. fs= 5 do … RelC C++ inuse fs, file fs list inuse file list array list Relational Specification Graph decomposition

Data Representation Synthesis PLDI’2011, ESOP’12, PLDI’12 Peter Hawkins, Stanford University Alex Aiken, Stanford University Kathleen Fisher, Tufts Martin.

Similar presentations

Presentation on theme: "Data Representation Synthesis PLDI’2011, ESOP’12, PLDI’12 Peter Hawkins, Stanford University Alex Aiken, Stanford University Kathleen Fisher, Tufts Martin."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Representation Synthesis PLDI’2011, ESOP’12, PLDI’12 Peter Hawkins, Stanford University Alex Aiken, Stanford University Kathleen Fisher, Tufts Martin.

Similar presentations

Presentation on theme: "Data Representation Synthesis PLDI’2011, ESOP’12, PLDI’12 Peter Hawkins, Stanford University Alex Aiken, Stanford University Kathleen Fisher, Tufts Martin."— Presentation transcript:

Similar presentations

About project

Feedback