CVM (Coherent Virtual Machine)
CVM CVM is a user-level library Enable the program to exploit shared- memory semantics over message-passing hardware. Page-based DSM Written in C++ Built on top of UDP or MPI
CVM CVM was created by Pete Keleher in CVM was created specifically as a platform for protocol experimentation. These slides are based on the material in CVM manual, which can be found on website (
CVM Routines
Initialization / Termination Initialization –cvm_startup(int, char**) Called after the program processes its own argument. program Termination –cvm_finish() Called by master process, it will wait until all processes are completed –cvm_exit(char*, …) A quick exit for error
Example Most program are in the following form int main(int argc,char*argv[]) { … cvm_startup(argc,argv); … cvm_finish(); }
Process Creation cvm_create_procs(func_ptr worker) –Create the execution entries on all slave machines. –The function should be in the form void (*worker)() –There are some pre-defined macro and variables can be used. cvm_num_procs, cvm_proc_id, PID, TID
Shared memory allocation cvm_alloc(int sz) –Generally, all shared data in CVM programs is necessarily dynamically allocated. –All calls to cvm_alloc() must be completed before cvm_create_procs() –The usage is the same as malloc() int *buf = (int*)cvm_alloc( sizeof(int) * N )
Synchronization cvm_lock(int id), cvm_unlock(int id) –Acquire and release the global lock specified by id; –Current maximum number of lock is Can be modified in cvm.h cvm_barrier(int id) –Perform a global barrier. –The id parameter is currently ignored.
Access shared data The processes should lock the same ‘id’ when they access the shared data. –As the shared-memory, mutex is need to be ensure. lock()unlock() Memory operation lock() Lazy Release Consistency Without this lock, The memory info can’t be renew
Cont. Using barrier to exchange all info among machines. Barrier() All shared data are synchronized. P[0:9]=1 P[10:19]=2 P[20:29]=3 P[30:39]=4
synchronization Wait & signal –cvm_signal_pause(), cvm_signal(int pid) The signal can be buffered. (only one) –The order doesn’t matter. signal() buffered.. signal_pause() signal() buffered.. signal_pause() Works fine! signal() signal_pause() Blocks at the second pause
CVM arguments the command line –$./cvmprog d : turn on the debugging output -n : specify the # of procs -P : specify the size of pages -t : use per-node multithreading –hide communication latency. -X : specify the protocol
Consistency protocol Default is lazy multi-writer (0) –Allowing multiple writer to simultaneously access the same page without communication Using diff Lazy single-writer (1) –Only a single writer can access the page at a time. (false sharing) Sequentially consistent single-writer (2) –Every write will invoke invalidation. (lots of comm.)
Home-based RC Home-based multi-writer (3) Sometimes, the LRC still needs to send lots of diffs. Lock() unlock() diffs Two sets of diffs
Cont. Every page has its own home(-node), which take care of it. –All diffs are sent to the home. Lock() unlock() diffs Diffs or whole page diffs Home-node
Example code #include “cvm.h” #include #define DATA_SZ 1000 int *data,*psum,*gidx; void worker() { int lidx; psum[cvm_proc_id] = 0; do { cvm_lock(0); lidx=*gidx++; cvm_unlock(0); if( lidx > DATA_SZ) break; psum[cvm_proc_id]+=data[lidx]; }while(1); cvm_barrier(0); // the psum need to be synchronized }
int main(int argc, char *argv[]) { int sum, i; cvm_startup(argc,argv); // allocation of shared data gidx = cvm_alloc(sizeof(int)); data = cvm_alloc(sizeof(int)*DATA_SZ); psum = cvm_alloc(sizeof(int)*cvm_num_procs); // data initialization for(i=0;i<DATA_SZ;i++) data[i] = i+1; cvm_create_procs(worker); worker(); for(sum=0,i=0;i<cvm_num_procs;i++) sum += psum[i]; printf(“The summation from 1 to %d is %d\n”, DATA_SZ,sum); cvm_finish(); }
Without contention #include “cvm.h” #include #define DATA_SZ 1000 int *psum, *data; void worker() { int i; psum[PID] = 0; // PID is the same as cvm_proc_id for(i=PID;i<DATA_SZ;i+=cvm_num_procs) psum[PID] += data[i]; cvm_barrier(0); // still for psum }
int main(int argc, char *argv[]) { int sum,i; cvm_startup(argc,argv); // allocation of shared data psum = cvm_alloc(sizeof(int)*cvm_num_procs); data = cvm_alloc(sizeof(int)*DATA_SZ); // data initialization for(i=0;i<DATA_SZ;i++) data[i] = i+1; cvm_create_procs(worker); worker(); for(sum=0, i=0;i<cvm_num_procs;i++) sum += psum[i]; printf(“The summation from 1 to %d is %d\n”, DATA_SZ,sum); cvm_finish(); }
cvm_reduce cvm_reduce(void *global, void *local, int rtype, int dtype, int num) –Similar to MPI_Reduce –Four operations are provided. min, max, sum, product E.g. cvm_reduce(sum, psum, REDUCE_sum, REDUCE_int, 1); –Need #include ”reduce.h”