Serialization Sets A Dynamic Dependence-Based Parallel Execution Model Matthew D. Allen Srinath Sridharan Gurindar S. Sohi University of Wisconsin-Madison
Motivation Multicore processors ubiquitous – Performance via parallel execution Multithreaded programming is problematic – Dependences encoded statically – Difficult to reason about locks, synchronization – Many errors not found in sequential programs – Execution is nondeterministic Need better parallel execution models! February 16, 20092PPoPP 2009
Serialization Sets Overview Sequential program with annotations – Identify potentially independent methods – Associate a serializer with these methods Serializer groups dependent method invocations into serialization sets – Runtime executes in order to honor dependences Serializer attempts to map independent methods invocations into different sets – Runtime opportunistically parallelizes execution February 16, 2009PPoPP 20093
Serialization Sets Overview Sequential program with no locks and no explicit synchronization Deterministic, race-free execution Comparable performance to multithreading – Sometimes better! February 16, 2009PPoPP 20094
Outline Overview Serialization Sets Execution Model Prometheus: C++ Library for SS Experimental Evaluation Related Work & Conclusions February 16, 2009PPoPP 20095
Running Example February 16, 2009PPoPP trans_t* trans; while ((trans = get_trans ()) != NULL) { account_t* account = trans->account; if (trans->type == DEPOSIT) account->deposit (trans->amount); if (trans->type == WITHDRAW) account->withdraw (trans->amount); } Several static unknowns! # of transactions? Points to? Loop-carried dependence?
Multithreading Strategy February 16, 2009PPoPP trans_t* trans; while ((trans = get_trans ()) != NULL) { account_t* account = trans[i]->account; if (trans->type == DEPOSIT) account->deposit (trans->amount); if (trans->type == WITHDRAW) account->withdraw (trans->amount); } 1)Read all transactions into an array 2)Divide chunks of array among multiple threads Oblivious to what accounts each thread may access! → Methods must lock account to → ensure mutual exclusion
Serialization Sets Potentially independent methods – Modify only data owned by object – Fields / Data members – Pointers to non-shared data – Consistent with OO practices (modularity, encapsulation, information hiding) Modifying methods for independence – Store return value in object, retrieve with accessor – Copy pointer data February 16, 2009PPoPP 20098
Serialization Sets Divide program into isolation epochs – Data partitioned into domains Privately writable: data that may be read or written by a single serialization set – Object or set of objects – Serializer dynamically identifies serialization set for each method invocation Shared read-only: data that may be read (but not written) by any method February 16, 2009PPoPP 20099
writable pw_account_t; begin_isolation (); trans_t* trans; while ((trans = get_trans ()) != NULL) { pw_account_t* account = trans->account; if (trans->type == DEPOSIT) delegate(account, deposit, trans->amount); if (trans->type == WITHDRAW) delegate(account, withdraw, trans->amount); } end_isolation (); End isolation epoch Example with Serialization Sets February 16, 2009PPoPP Declare privately-writable account Begin isolation epoch Delegate indicates potentially- independent operations Serializer type: uses account number to compute serialization set At execution, delegate: 1)Executes serializer 2)Identifies serialization set 3)Inserts invocation in serialization set
delegate February 16, 2009PPoPP deposit acct=100 $2000 SS #100SS #200SS #300 withdraw acct=300 $350 withdraw acct=200 $1000 withdraw acct=100 $50 deposit acct=300 $5000 withdraw acct=100 $20 withdraw acct=200 $1000 deposit acct=100 $300 Program context Delegate context Serializer: computes SS with account number ss_t ss = account->get_number();
Program thread Delegate threads Program context February 16, 2009PPoPP deposit acct=100 $2000 SS #100SS #200SS #300 withdraw acct=300 $350 withdraw acct=200 $1000 withdraw acct=100 $50 deposit acct=300 $5000 withdraw acct=100 $20 withdraw acct=200 $1000 deposit acct=100 $300 Delegate context Delegate 0Delegate 1 deposit acct=100 $2000 withdraw acct=100 $50 withdraw acct=100 $20 deposit acct=100 $300 withdraw acct=200 $1000 withdraw acct=300 $350 deposit acct=300 $5000 withdraw acct=200 $1000 delegate Race-free, deterministic execution without synchronization!
Parallel Execution w/o Sharing 1.Vary data in privately-writable/read-only domains in alternating epochs Outputs of one epoch become inputs of the next 2.Associative, commutative methods Operate on local copy of state Reduction to summarize result 3.Containers manipulated by program context Delegate operations on underlying data February 16, 2009PPoPP
Outline Overview Serialization Sets Execution Model Prometheus: C++ Library for SS Experimental Evaluation Related Work & Conclusions February 16, 2009PPoPP
Prometheus: C++ Library for SS Template library – Compile-time instantiation of SS data structures – Metaprogramming for static type checking Runtime orchestrates parallel execution Portable – x86, x86_64, SPARC V9 – Linux, Solaris February 16, 2009PPoPP
Prometheus Serializers Serializers – Subclass serializer base class and override method – Or use built-in serializer supplied by library Reducibles – Subclass reducible base class and override virtual reduce method – Reduction automatically performed on first use after isolation epoch ends February 16, 2009PPoPP
Prometheus Runtime February 16, 2009PPoPP Program Thread Delegate Thread 0 Delegate Thread 2 Delegate Thread 1 Delegate assignment: SS % NUM_THREADS Communication queues: Fast-Forward [PPoPP 2008] + Polymorphic interface
Debugging Support Tag all data accessed by serialization set – Objects – Smart pointers Any data accessed by multiple serialization sets indicates programmer error Problem: can’t detect some kinds of missing annotations – Future work: static checking of annotations February 16, 2009PPoPP
Debugging Support Deterministic model means we can simulate SS execution in sequential program – Prometheus support for compiling debug version – Do all debugging on sequential program! Correct sequential → correct parallel (caveat: for a given input) February 16, 2009PPoPP
Outline Overview Serialization Sets Execution Model Prometheus: C++ Library for SS Experimental Evaluation Related Work & Conclusions February 16, 2009PPoPP
Evaluation Methodology Benchmarks – Lonestar, NU-MineBench, PARSEC, Phoenix Conventional Parallelization – pthreads, OpenMP Prometheus versions – Port program to sequential C++ program – Idiomatic C++: OO, inheritance, STL – Parallelize with serialization sets February 16, 2009PPoPP
Results Summary February 16, 2009PPoPP Socket AMD Barcelona (4-way multicore) = 16 total cores
Results Summary February 16, 2009PPoPP
Outline Overview Serialization Sets Execution Model Prometheus: C++ Library for SS Experimental Evaluation Related Work & Conclusions February 16, 2009PPoPP
Related Work Actors / Active Objects – Hewitt [JAI 1977] MultiLisp – Halstead [ACM TOPLAS 1985] Inspector-Executor – Wu et al. [ICPP 1991] Jade – Rinard and Lam [ACM TOPLAS 1998] Cilk – Frigo et al. [PLDI 1998] OpenMP February 16, 2009PPoPP
Conclusions Sequential program with annotations – No explicit synchronization, no locks Programmers focus on keeping computation private to object state – Consistent with OO programming practices Dependence-based model – Deterministic race-free parallel execution Performance close to, and sometimes better, than multithreading February 16, 2009PPoPP
Questions February 16, 2009PPoPP