CVM (Coherent Virtual Machine). CVM CVM is a user-level library Enable the program to exploit shared- memory semantics over message-passing hardware.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.
MPI Collective Communications
Threads. Readings r Silberschatz et al : Chapter 4.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
November 1, 2005Sebastian Niezgoda TreadMarks Sebastian Niezgoda.
Threads By Dr. Yingwu Zhu. Review Multithreading Models Many-to-one One-to-one Many-to-many.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Home: Phones OFF Please Unix Kernel Parminder Singh Kang Home:
Lecture 18 Threaded Programming CPE 401 / 601 Computer Network Systems slides are modified from Dave Hollinger.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
Threads© Dr. Ayman Abdel-Hamid, CS4254 Spring CS4254 Computer Network Architecture and Programming Dr. Ayman A. Abdel-Hamid Computer Science Department.
Advanced Programming in the UNIX Environment Hop Lee.
Programming with Shared Memory Introduction to OpenMP
CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Distributed Shared Memory Systems and Programming
TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.
Object Oriented Analysis & Design SDL Threads. Contents 2  Processes  Thread Concepts  Creating threads  Critical sections  Synchronizing threads.
CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Shell (Part 2). Example r What if we want to support something like this: m ps –le | sort r One process should execute ps –le and another should execute.
TreadMarks Presented By: Jason Robey. Cool pic from last semester.
Threads and Thread Control Thread Concepts Pthread Creation and Termination Pthread synchronization Threads and Signals.
CS345 Operating Systems Threads Assignment 3. Process vs. Thread process: an address space with 1 or more threads executing within that address space,
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Share Memory Program Example int array_size=1000 int global_array[array_size] main(argc, argv) { int nprocs=4; m_set_procs(nprocs); /* prepare to launch.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February Session 11.
Shell (Addendum). Example r What if we want to support something like this: m ps –le | sort r One process should execute ps –le and another should execute.
CS333 Intro to Operating Systems Jonathan Walpole.
Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.
Lecture 7: POSIX Threads - Pthreads. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Pthreads.
Page 1 Distributed Shared Memory Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
PVM (Parallel Virtual Machine)‏ By : Vishal Prajapati Course CS683 Computer Architecture Prof. Moreshwar R Bhujade.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
CS307 Operating Systems Threads Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University Spring 2011.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
Copyright ©: Nahrstedt, Angrave, Abdelzaher
Copyright ©: Nahrstedt, Angrave, Abdelzaher1 Tarek Abdelzaher Vikram Adve CS241 Systems Programming System Calls and I/O.
PThread Synchronization. Thread Mechanisms Birrell identifies four mechanisms commonly used in threading systems –Thread creation –Mutual exclusion (mutex)
Threads. Thread A basic unit of CPU utilization. An Abstract data type representing an independent flow of control within a process A traditional (or.
2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
1 Parallel and Distributed Processing Lecture 5: Message-Passing Computing Chapter 2, Wilkinson & Allen, “Parallel Programming”, 2 nd Ed.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
Message Passing Interface Using resources from
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February Session 12.
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
1 Reading compiler errors ls2.c:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘:’ token In file included from /usr/include/stdio.h:75,
A process is a program in execution A running system consists of multiple processes – OS processes Processes started by the OS to do “system things” –
Distributed Shared Memory
CS4402 – Parallel Computing
Threads Threads.
Unix Process Management
CS 584.
Operating System Concepts
Programming with Shared Memory
Programming with Shared Memory
Parallel Processing - MPI
Presentation transcript:

CVM (Coherent Virtual Machine)

CVM CVM is a user-level library Enable the program to exploit shared- memory semantics over message-passing hardware. Page-based DSM Written in C++ Built on top of UDP or MPI

CVM CVM was created by Pete Keleher in CVM was created specifically as a platform for protocol experimentation. These slides are based on the material in CVM manual, which can be found on website (

CVM Routines

Initialization / Termination Initialization –cvm_startup(int, char**) Called after the program processes its own argument. program Termination –cvm_finish() Called by master process, it will wait until all processes are completed –cvm_exit(char*, …) A quick exit for error

Example Most program are in the following form int main(int argc,char*argv[]) { … cvm_startup(argc,argv); … cvm_finish(); }

Process Creation cvm_create_procs(func_ptr worker) –Create the execution entries on all slave machines. –The function should be in the form void (*worker)() –There are some pre-defined macro and variables can be used. cvm_num_procs, cvm_proc_id, PID, TID

Shared memory allocation cvm_alloc(int sz) –Generally, all shared data in CVM programs is necessarily dynamically allocated. –All calls to cvm_alloc() must be completed before cvm_create_procs() –The usage is the same as malloc() int *buf = (int*)cvm_alloc( sizeof(int) * N )

Synchronization cvm_lock(int id), cvm_unlock(int id) –Acquire and release the global lock specified by id; –Current maximum number of lock is Can be modified in cvm.h cvm_barrier(int id) –Perform a global barrier. –The id parameter is currently ignored.

Access shared data The processes should lock the same ‘id’ when they access the shared data. –As the shared-memory, mutex is need to be ensure. lock()unlock() Memory operation lock() Lazy Release Consistency Without this lock, The memory info can’t be renew

Cont. Using barrier to exchange all info among machines. Barrier() All shared data are synchronized. P[0:9]=1 P[10:19]=2 P[20:29]=3 P[30:39]=4

synchronization Wait & signal –cvm_signal_pause(), cvm_signal(int pid) The signal can be buffered. (only one) –The order doesn’t matter. signal() buffered.. signal_pause() signal() buffered.. signal_pause() Works fine! signal() signal_pause() Blocks at the second pause

CVM arguments the command line –$./cvmprog d : turn on the debugging output -n : specify the # of procs -P : specify the size of pages -t : use per-node multithreading –hide communication latency. -X : specify the protocol

Consistency protocol Default is lazy multi-writer (0) –Allowing multiple writer to simultaneously access the same page without communication Using diff Lazy single-writer (1) –Only a single writer can access the page at a time. (false sharing) Sequentially consistent single-writer (2) –Every write will invoke invalidation. (lots of comm.)

Home-based RC Home-based multi-writer (3) Sometimes, the LRC still needs to send lots of diffs. Lock() unlock() diffs Two sets of diffs

Cont. Every page has its own home(-node), which take care of it. –All diffs are sent to the home. Lock() unlock() diffs Diffs or whole page diffs Home-node

Example code #include “cvm.h” #include #define DATA_SZ 1000 int *data,*psum,*gidx; void worker() { int lidx; psum[cvm_proc_id] = 0; do { cvm_lock(0); lidx=*gidx++; cvm_unlock(0); if( lidx > DATA_SZ) break; psum[cvm_proc_id]+=data[lidx]; }while(1); cvm_barrier(0); // the psum need to be synchronized }

int main(int argc, char *argv[]) { int sum, i; cvm_startup(argc,argv); // allocation of shared data gidx = cvm_alloc(sizeof(int)); data = cvm_alloc(sizeof(int)*DATA_SZ); psum = cvm_alloc(sizeof(int)*cvm_num_procs); // data initialization for(i=0;i<DATA_SZ;i++) data[i] = i+1; cvm_create_procs(worker); worker(); for(sum=0,i=0;i<cvm_num_procs;i++) sum += psum[i]; printf(“The summation from 1 to %d is %d\n”, DATA_SZ,sum); cvm_finish(); }

Without contention #include “cvm.h” #include #define DATA_SZ 1000 int *psum, *data; void worker() { int i; psum[PID] = 0; // PID is the same as cvm_proc_id for(i=PID;i<DATA_SZ;i+=cvm_num_procs) psum[PID] += data[i]; cvm_barrier(0); // still for psum }

int main(int argc, char *argv[]) { int sum,i; cvm_startup(argc,argv); // allocation of shared data psum = cvm_alloc(sizeof(int)*cvm_num_procs); data = cvm_alloc(sizeof(int)*DATA_SZ); // data initialization for(i=0;i<DATA_SZ;i++) data[i] = i+1; cvm_create_procs(worker); worker(); for(sum=0, i=0;i<cvm_num_procs;i++) sum += psum[i]; printf(“The summation from 1 to %d is %d\n”, DATA_SZ,sum); cvm_finish(); }

cvm_reduce cvm_reduce(void *global, void *local, int rtype, int dtype, int num) –Similar to MPI_Reduce –Four operations are provided. min, max, sum, product E.g. cvm_reduce(sum, psum, REDUCE_sum, REDUCE_int, 1); –Need #include ”reduce.h”