CRL (C Region Library) Chao Huang, James Brodman, Hassan Jafri CS498LVK.

Slides:

Advertisements

Similar presentations

Multiple Processor Systems

Advertisements

The Effect of Network Total Order, Broadcast, and Remote-Write on Network- Based Shared Memory Computing Robert Stets, Sandhya Dwarkadas, Leonidas Kontothanassis,

L.N. Bhuyan Adapted from Patterson’s slides

Distributed Shared Memory

CS 258 Parallel Computer Architecture Lecture 15.1 DASH: Directory Architecture for Shared memory Implementation, cost, performance Daniel Lenoski, et.

Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.

Computer Architecture Introduction to MIMD architectures Ola Flygt Växjö University

CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.

Introduction to MIMD architectures

November 1, 2005Sebastian Niezgoda TreadMarks Sebastian Niezgoda.

Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

DISTRIBUTED CONSISTENCY MANAGEMENT IN A SINGLE ADDRESS SPACE DISTRIBUTED OPERATING SYSTEM Sombrero.

1  1998 Morgan Kaufmann Publishers Chapter 9 Multiprocessors.

Performance Implications of Communication Mechanisms in All-Software Global Address Space Systems Chi-Chao Chang Dept. of Computer Science Cornell University.

CSCS: A Concise Implementation of User-Level Distributed Shared Memory Zhi Zhai Feng Shen Computer Science and Engineering University of Notre Dame Dec.

Multiple Processor Systems 8.1 Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.

ECE669 L17: Memory Systems April 1, 2004 ECE 669 Parallel Computer Architecture Lecture 17 Memory Systems.

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

Multiprocessor Cache Coherency

MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.

Distributed Shared Memory Systems and Programming

UNIX SVR4 COSC513 Zhaohui Chen Jiefei Huang. UNIX SVR4 UNIX system V release 4 is a major new release of the UNIX operating system, developed by AT&T.

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.

CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.

CSIE30300 Computer Architecture Unit 15: Multiprocessors Hsin-Chou Chi [Adapted from material by and

ECE200 – Computer Organization Chapter 9 – Multiprocessors.

TECHNIQUES FOR REDUCING CONSISTENCY- RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel.

CS425/CSE424/ECE428 – Distributed Systems Nikita Borisov - UIUC1 Some material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra,

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.

Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.

Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.

A comparison of CC-SAS, MP and SHMEM on SGI Origin2000.

1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.

Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.

CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 9: May 3, 2001 Distributed Shared Memory.

1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )

Distributed Computing Systems CSCI 6900/4900. Review Definition & characteristics of distributed systems Distributed system organization Design goals.

Region-Based Software Distributed Shared Memory Song Li, Yu Lin, and Michael Walker CS Operating Systems May 1, 2000.

The University of Adelaide, School of Computer Science

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Centralized Multiprocessor.

Domain Name System: DNS To identify an entity, TCP/IP protocols use the IP address, which uniquely identifies the Connection of a host to the Internet.

Siva and Osman March 7, 2000 Cache Coherence Schemes for Multiprocessors Sivakumar M Osman Unsal.

CSCI2510 Tutorial 5 Introduction to Cache Zong Wen

Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.

Distributed Shared Memory

Lecture 21 Synchronization

Definition of Distributed System

The University of Adelaide, School of Computer Science

Multiprocessor Cache Coherency

The University of Adelaide, School of Computer Science

Advanced Operating Systems

CMSC 611: Advanced Computer Architecture

Example Cache Coherence Problem

Directory-based Protocol

Outline Midterm results summary Distributed file systems – continued

Multiprocessors - Flynn’s taxonomy (1966)

Multiple Processor Systems

CS 213 Lecture 11: Multiprocessor 3: Directory Organization

Lecture 25: Multiprocessors

High Performance Computing

Lecture 25: Multiprocessors

Chapter 4 Multiprocessors

The University of Adelaide, School of Computer Science

Cache coherence CEG 4131 Computer Architecture III

Lecture 24: Virtual Memory, Multiprocessors

Lecture 23: Virtual Memory, Multiprocessors

Lecture 17 Multiprocessors and Thread-Level Parallelism

The University of Adelaide, School of Computer Science

Multiprocessors and Multi-computers

Presentation transcript:

CRL (C Region Library) Chao Huang, James Brodman, Hassan Jafri CS498LVK

Introduction CRL is an all-software distributed shared memory (DSM) system –Provides shared address space –Built on PVM “Region”: an arbitrarily sized, continuous area of memory –Consistent cached copy at local nodes

Functions Environment –crl_init –crl_num_nodes, crl_self_addr Basic region operations –rid_t rgn_create(unsigned size) –void rgn_destroy(rid_t rgn_id) –rid_t rgn_rid(void *rgn) –unsigned rgn_size(void *rgn) –void rgn_flush(void* rgn)

Functions Region mapping –void* rgn_map(rid_t rgn_id) –void rgn_unmap(void* rgn) Region read and write –void rgn_start_read(void *rgn) –void rgn_end_read(void *rgn) –void rgn_start_write(void *rgn) –void rgn_end_write(void *rgn)

Functions Global synchronization –void rgn_barrier(void) –void rgn_bcast_send(int len, void *buf) –void rgn_bcast_recv(int len, void *buf) –double rgn_reduce_dadd(double arg) –double rgn_reduce_dmin(double arg) –double rgn_reduce_dmax(double arg)

Example /* Compute the dot product of * two n-element vectors, each * of which is represented by * appropriately-sized region * x: region identifier for 1st vector * y: address at which 2nd vector is already mapped */ double dotprod(rid_t x, double *y, int n) { int i; double *z; double rslt; /* map 1st vector and initiate read operation */ z = (double *) rgn_map(x); rgn_start_read(z); /* initiate read operation on 2nd vector */ rgn_start_read(y); /* compute dot product */ rslt = 0; for (i=0; i<n; i++) rslt += z[i] * y[i]; /* terminate read operations and unmap 1st vector */ rgn_end_read(y); rgn_end_read(z); rgn_unmap(z); return rslt; }

Discussions All-software: latency of communication operations may be higher than hardware based system Region size can be chosen to correspond to user data structures (programmer’s responsibility) Fixed-home, directory-based invalidate protocol Ordered message delivery: 32-bit version number tags each region Unmapped region cache : unique mapping can be cached after unmapped

URC Enables Lazy Release Consistency for CRL rgn_start_op can be satisfied locally if region is not invalidated before next time it is mapped Even if data/region is invalidated, later accesses can be satisfied more quickly

Software Prototype implementation available Platforms –CM-5 Thinking Machines (message passing multicomputer) –Alewife (Distributed memory multiprocessor). Provides Native shared memory support –TCP/Unix Implementation for SunOS Expect a Linux port soon

Machine Characteristics CM-5Alewife Throughput34us14us Latency8MB/sec18MB/sec

Basic Ops Latencies CM-5 (us)Alewife (us) Alewife native(us) Start read hit End read hit 32.5 Start read miss 0 inv Start write miss 1 inv Start write miss 6 inv

Applications 32-way completion time of apps with CRL on Alewife comparable to that of Alewife native shared memory –How? Upto 5 remote headers supported by LimitLESS (Alewife’s software-based cache-coherence subsystem)