This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/2007-2013] under.

Slides:



Advertisements
Similar presentations
Parallelism Lecture notes from MKP and S. Yalamanchili.
Advertisements

Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Exploiting Distributed Version Concurrency in a Transactional Memory Cluster Kaloian Manassiev, Madalin Mihailescu and Cristiana Amza University of Toronto,
Phase Reconciliation for Contended In-Memory Transactions Neha Narula, Cody Cutler, Eddie Kohler, Robert Morris MIT CSAIL and Harvard 1.
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
Evaluating Database-Oriented Replication Schemes in Software Transacional Memory Systems Roberto Palmieri Francesco Quaglia (La Sapienza, University of.
This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/ ] under.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:
Presented by: Ofer Kiselov & Omer Kiselov Supervised by: Dmitri Perelman Final Presentation.
DMITRI PERELMAN IDIT KEIDAR TRANSACT 2010 SMV: Selective Multi-Versioning STM 1.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CS 584. A Parallel Programming Model We need abstractions to make it simple. The programming model needs to fit our parallel machine model. Abstractions.
University of Michigan Electrical Engineering and Computer Science 1 Parallelizing Sequential Applications on Commodity Hardware Using a Low-Cost Software.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
AN OPTIMISTIC CONCURRENCY CONTROL ALGORITHM FOR MOBILE AD-HOC NETWORK DATABASES Brendan Walker.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Evaluation of Memory Consistency Models in Titanium.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer, Pascal Felber (TU Dresden, Germany / Uni Neuchatel, Switzerland)
Atlanta, Georgia TiNy Threads on BlueGene/P: Exploring Many-Core Parallelisms Beyond The Traditional OS Handong Ye, Robert Pavel, Aaron Landwehr, Guang.
Window-Based Greedy Contention Management for Transactional Memory Gokarna Sharma (LSU) Brett Estrade (Univ. of Houston) Costas Busch (LSU) 1DISC 2010.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
A GPU Implementation of Inclusion-based Points-to Analysis Mario Méndez-Lojo (AMD) Martin Burtscher (Texas State University, USA) Keshav Pingali (U.T.
WG5: Applications & Performance Evaluation Pascal Felber
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
Programming Paradigms for Concurrency Pavol Cerny Vasu Singh Thomas Wies Part III – Message Passing Concurrency.
Lowering the Overhead of Software Transactional Memory Virendra J. Marathe, Michael F. Spear, Christopher Heriot, Athul Acharya, David Eisenstat, William.
1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University.
Parallelism Can we make it faster? 25-Apr-17.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Technology from seed Exploiting Off-the-Shelf Virtual Memory Mechanisms to Boost Software Transactional Memory Amin Mohtasham, Paulo Ferreira and João.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
StealthTest: Low Overhead Online Software Testing Using Transactional Memory Jayaram Bobba, Weiwei Xiong*, Luke Yen †, Mark D. Hill, and David A. Wood.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Concurrent Message Processing using Transactional Memory in the Actor Model WTM 2013 Pascal Felber, Derin Harmanci, Yaroslav Hayduk and Anita Sobe
1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
CISC 879 : Advanced Parallel Programming Rahul Deore Dept. of Computer & Information Sciences University of Delaware Exploring Memory Consistency for Massively-Threaded.
Serialization Sets A Dynamic Dependence-Based Parallel Execution Model Matthew D. Allen Srinath Sridharan Gurindar S. Sohi University of Wisconsin-Madison.
Agenda  Quick Review  Finish Introduction  Java Threads.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Concurrent Revisions: A deterministic concurrency model. Daan Leijen & Sebastian Burckhardt Microsoft Research (OOPSLA 2010, ESOP 2011)
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Gargamel: A Conflict-Aware Contention Resolution Policy for STM Pierpaolo Cincilla, Marc Shapiro, Sébastien Monnet.
Workshop on Transactional Memory 2012 Walther Maldonado Moreira University of Neuchâtel (UNINE), Switzerland Pascal Felber UNINE Gilles Muller INRIA, France.
Optimizing Distributed Actor Systems for Dynamic Interactive Services
CS161 – Design and Architecture of Computer Systems
PHyTM: Persistent Hybrid Transactional Memory
Faster Data Structures in Transactional Memory using Three Paths
Multi-Processing in High Performance Computer Architecture:
A Lock-Free Algorithm for Concurrent Bags
Department of Computer Science University of California,Santa Barbara
Department of Computer Science University of California, Santa Barbara
Chapter 1 Introduction.
Concurrency and Immutability
WG4: Language Integration & Tools
Parallel Programming in C with MPI and OpenMP
Department of Computer Science University of California, Santa Barbara
Operating System Overview
Presentation transcript:

This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/ ] under grant agreement n° Yaroslav Hayduk, Anita Sobe, Pascal Felber University of Neuchâtel, Switzerland Dynamic Parallel Message Processing with Transactional Memory in the Actor Model DMTM January 22, 2014

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber A bit of background: The Actor Model Hewitt & Baker (IFIP Congress’77) – „Laws for Communicating Parallel Processes“ Motivated by the prospect of highly parallel computing machines with many microprocessors + own local memory 2

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber OOP and actors Everything is an actor (VS an object) Asynchronous message passing Has access to its local state only Strong encapsulation Inherently concurrent 3

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber OOP and actors: Communication 4 Object A ObjectB.publicMethod() ObjectB.publicField=10 ActorB.publicField = 10 ActorA [SendMessageTo] ActorB VS asynchronous message passing direct access Illegal: strong encapsulation Object B Object A

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber Problem statement Sequential processing of messages limits performance & throughput Concurrent message processing using STM (Hayduk et al., OPODIS 2013) is NOT optimal in cases of high contention 5

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber Main contributions We propose to adapt the number of threads to the workload extract read-only messages from the transactional context 6

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber Concurrent message processing Local Actor state In progress: Contains 8 A A B B Actor C (“List” Actor) Remove 9 Contains 2 Problems in the case of high contention? In progress: Insert 8 Thread Pool …… Active threads

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber Idea 1: Dynamically adjust the # of threads Local Actor state In progress: Contains 8 A A B B Actor C (“List” Actor) Remove 9 Contains 2 In progress: Insert 8 Thread Pool …… Active threadsIdle resources

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber Idea 1: Dynamically adjust the # of threads Use a simple heuristic Measure the If decrease the thread count, otherwise – increase it 9

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber Can we exploit the idle resources? Use them for specific read-only operations operations with relaxed atomicity and isolation semantics! 10 Idea 2: Use idle threads for relaxed operations

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber Example: List Actor operations 11 Actor private listData=Ref(data) Very these depending on Insert/ Remove/Contai ns Messages Relaxed sum ….. Messages

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber Modified Scala STM 12 Ref volatile protected_data Ref object methods: get() …. relaxedGet() …. Transaction Scopes: atomic{} atomic.unrecorded{} Since it’s a write-back STM, we can safely access the value directly Our singleRelaxedGet() operation

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber Idea 2: Use idle threads for relaxed consist. task Local Actor state In progress: Contains 8 A A B B Actor C (“List” Actor) Remove 9 Contains 2 In progress: Inconsistent Sum Thread Pool … TM Threads Relaxed consist. Threads

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber Experimental settings Software: Scala 2.12 & Akka 2.10 & ScalaSTM 0.7 Hardware: 48-core AMD Opteron 6172 CPUs running at 2.1GHz 14

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber 15 Evaluation: Application overview A A A List Range (1..25) Actor Global relaxed list sum A List Range ( ) Actor ………. B B TM Insert/Remove/C ontains 1) Stateful distributed sorted integer linked-list

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber 16 Evaluation: Application overview 2) Multiple-point geostatistics application (Hydra) When found, assign the value Z(y) to the simulation grid

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber 2) The Multiple-point geostatistics application 17 Actor private grid=Ref(data) Simulation Messages Snapshot Messages

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber Results – list benchmark 18 List write-dominated workload: static thread allocation. List write-dominated workload: dynamic thread allocation. More write-write conflicts resolved by killing one txn more read-write conflicts, which are resolved by waiting

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber Results – Hydra 19 Static thread allocation; STM message throughput and total message throughput. Dynamic thread allocation; STM message throughput and total message throughput. Static ratio 3 Read-only threads 29 TM threads

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber Results – Hydra 20 Simulation of the hydraulic subsurface: static and dynamic thread allocation; rollbacks. Benefits of varying the TM thread count Static ratio 3 Read-only threads 29 TM threads

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber Summary By dynamically varying the number of threads, we increased the Actor message throughput; used idle resources for relaxed consistency operations; the combination of both approaches yields the best performance 21

DMTM Yaroslav Hayduk, Anita Sobe, Pascal Felber Questions? 22