An Implementation of User-level Distributed Shared Memory

Slides:



Advertisements
Similar presentations
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Advertisements

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Christian Delbe1 Christian Delbé OASIS Team INRIA -- CNRS - I3S -- Univ. of Nice Sophia-Antipolis November Automatic Fault Tolerance in ProActive.
Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.
Distributed Shared Memory
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
November 1, 2005Sebastian Niezgoda TreadMarks Sebastian Niezgoda.
Implementation of Page Management in Mome, a User-Level DSM Yvon Jégou Paris Project, IRISA/INRIA FRANCE.
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Distributed Operating Systems CS551 Colorado State University at Lockheed-Martin Lecture 4 -- Spring 2001.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs John Oleszkiewicz, Li Xiao, Yunhao Liu IEEE TRASACTION ON PARALLEL.
CSCS: A Concise Implementation of User-Level Distributed Shared Memory Zhi Zhai Feng Shen Computer Science and Engineering University of Notre Dame Dec.
Distributed Resource Management: Distributed Shared Memory
High Performance Communication using MPJ Express 1 Presented by Jawad Manzoor National University of Sciences and Technology, Pakistan 29 June 2015.
COM S 614 Advanced Systems Novel Communications U-Net and Active Messages.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed Shared Memory.
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Distributed Shared Memory Systems and Programming
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Transparent Process Migration: Design Alternatives and the Sprite Implementation Fred Douglis and John Ousterhout.
Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.
A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang.
Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.
Lecture 5 Barriers and MPI Introduction Topics Barriers Uses implementations MPI Introduction Readings – Semaphore handout dropboxed January 24, 2012 CSCE.
TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems Present By: Blair Fort Oct. 28, 2004.
1 Chapter 9 Distributed Shared Memory. 2 Making the main memory of a cluster of computers look as though it is a single memory with a single address space.
Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
FTOP: A library for fault tolerance in a cluster R. Badrinath Rakesh Gupta Nisheeth Shrivastava.
TEXT: Distributed Operating systems A. S. Tanenbaum Papers oriented on: 1.OS Structures 2.Shared Memory Systems 3.Advanced Topics in Communications 4.Distributed.
Region-Based Software Distributed Shared Memory Song Li, Yu Lin, and Michael Walker CS Operating Systems May 1, 2000.
Implementation and Performance of Munin (Distributed Shared Memory System) Dongying Li Department of Electrical and Computer Engineering University of.
Parallel Programming Models EECC 756 David D. McGann 18 May, 1999.
A Parallel Communication Infrastructure for STAPL
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Miraj Kheni Authors: Toyotaro Suzumura, Koji Ueno
Distributed Shared Memory
Architecture and Design of AlphaServer GS320
Definition of Distributed System
Reactive NUMA A Design for Unifying S-COMA and CC-NUMA
Accelerating Large Charm++ Messages using RDMA
NGS computation services: APIs and Parallel Jobs
Ivy Eva Wu.
University of Technology
Using SCTP to hide latency in MPI programs
Advanced Operating Systems
Distributed Shared Memory
Distributed Garbage Collection
Using Packet Information for Efficient Communication in NoCs
Outline Midterm results summary Distributed file systems – continued
Distributed Shared Memory
HPML Conference, Lyon, Sept 2018
SCTP-based Middleware for MPI
User-level Distributed Shared Memory
Concurrency: Mutual Exclusion and Process Synchronization
Operating Systems Lecture 1.
Distributed Computing:
High Performance Computing
EE 4xx: Computer Architecture and Performance Programming
Database System Architectures
Chapter 01: Introduction
Distributed Resource Management: Distributed Shared Memory
A Virtual Machine Monitor for Utilizing Non-dedicated Clusters
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

An Implementation of User-level Distributed Shared Memory Wei Zhang & Shu Liu

DSM: Shared Memory + Distributed Memory 2019/7/31 Final Report

Problems & Solutions Problems Solutions Granularity Use 4-Kbyte Page as the unit of sharing Data location/Mapping Centralized server Communication MPI (Message Passing Interface) 2019/7/31 Final Report

Cont. Problems Solutions Memory Coherence in parallelism a: each page has one dynamic owner b: multi readers (make copies) c: single writer (only owner can write the page) d: lock & barrier (synchronize page operation) 2019/7/31 Final Report

Design Overview 2019/7/31 Final Report

For a read 2019/7/31 Final Report

For a write 2019/7/31 Final Report

Implementation Data structures: Important system calls Page Table in each node Pageinfo in server Important system calls mmap() mprotect() SIGSEGV signal: handle page fault pthread: receive page fault request and send data At first, all the shared memory is allocated in server 2019/7/31 Final Report

Cont. MPI: create a cluster and be responsible for communication #include ”dsm.h”: a simple yet powerful API Name Function dsm_startup() initialization dsm_malloc (int size) allocate shared memory for the process dsm_barrier () global synchronization dsm_clock() count elapsed time dsm_lock() page synchronization dsm_exit() clean up and shut DSM down 2019/7/31 Final Report 9

Cont. Include dsm header file Start dsm system Allocate shared memory Synchronize Free shared memory Exit 2019/7/31 Final Report

Evaluation Assumptions: Benchmarks: server congestion is not the bottleneck network is reliable Benchmarks: Jacobi: partial differential equations: Ax=b MM: parallel matrix multiply: C=AB Scan: multi-iteration scan program Focus: multi-iteration write program 2019/7/31 Final Report

Cont. Speedup 2019/7/31 Final Report

Cont. Page Fault 2019/7/31 Progress Report Final Report 13

Conclusion & Future work Achieved what we claimed Improvement: Blocking Communication-> Non-blocking Communication Other Memory Consistency Model (MRMW) Decrease network communication

Thank you! 2019/7/31 Final Report 15