Pedro C. Diniz Information Sciences Institute Viterbi School of Engineering Atomic-Delayed Execution: A Concurrent Programming Model for Incomplete Graph-based.

Slides:

Advertisements

Similar presentations

Chapter 5: Tree Constructions

Advertisements

Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.

Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.

Dynamic Feedback: An Effective Technique for Adaptive Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California,

Commutativity Analysis: A New Analysis Technique for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz April 7 th, 2010 Youngjoon Jo.

LSRP: Local Stabilization in Shortest Path Routing Hongwei Zhang and Anish Arora Presented by Aviv Zohar.

Programming Language Semantics Java Threads and Locks Informal Introduction The Java Specification Language Chapter 17.

Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz

Synthesis of Interface Specifications for Java Classes Rajeev Alur University of Pennsylvania Joint work with P. Cerny, G. Gupta, P. Madhusudan, W. Nam,

Concurrency and Software Transactional Memories Satnam Singh, Microsoft Faculty Summit 2005.

CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.

02/19/2007CSCI 315 Operating Systems Design1 Process Synchronization Notice: The slides for this lecture have been largely based on those accompanying.

Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.

Exceptions and Mistakes CSE788 John Eisenlohr. Big Question How can we improve the quality of concurrent software?

Pthread (continue) General pthread program structure –Encapsulate parallel parts (can be almost the whole program) in functions. –Use function arguments.

Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010.

Argonne National Laboratory is a U.S. Department of Energy laboratory managed by U Chicago Argonne, LLC. Xin Zhao *, Pavan Balaji † (Co-advisor) and William.

Programming with POSIX* Threads Intel Software College.

ABSTRACT The real world is concurrent. Several things may happen at the same time. Computer systems must increasingly contend with concurrent applications.

Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.

Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

Fundamentals of Parallel Computer Architecture - Chapter 51 Chapter 5 Parallel Programming for Linked Data Structures Yan Solihin.

Thinking in Parallel – Implementing In Code New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.

Binary Trees In computer science, a binary tree is a tree data structure in which each node has at most two children, which are referred to as the left.

An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Synchronization Emery Berger and Mark Corner University.

Java Card Technology Ch05: Atomicity and transactions Instructors: Fu-Chiung Cheng ( 鄭福炯 ) Associate Professor Computer Science & Engineering Computer.

CS 2200 Presentation 18b MUTEX. Questions? Our Road Map Processor Networking Parallel Systems I/O Subsystem Memory Hierarchy.

Department of Computer Science 1 Some Practice Let’s practice for the final a little bit. OK?

Multiple-goal Search Algorithms and their Application to Web Crawling Dmitry Davidov and Shaul Markovitch Computer Science Department Technion, Haifa 32000,

Fundamentals of Algorithms MCS - 2 Lecture # 17. Binary Search Trees.

Synchronization Questions answered in this lecture: Why is synchronization necessary? What are race conditions, critical sections, and atomic operations?

Chapter 13 Recursion Copyright © 2016 Pearson, Inc. All rights reserved.

Parallel Graph Algorithms

Control Flow Testing Handouts

Handouts Software Testing and Quality Assurance Theory and Practice Chapter 4 Control Flow Testing

System Programming and administration

Compositional Pointer and Escape Analysis for Java Programs

Faster Data Structures in Transactional Memory using Three Paths

Binary Search Tree (BST)

Extra: B+ Trees CS1: Java Programming Colorado State University

Outline of the Chapter Basic Idea Outline of Control Flow Testing

Martin Rinard Laboratory for Computer Science

L21: Putting it together: Tree Search (Ch. 6)

Alyce Brady CS 470: Data Structures CS 510: Computer Algorithms

Threads and Memory Models Hal Perkins Autumn 2011

Introduction to the C Language

The Active Object Pattern

Recitation 14: Proxy Lab Part 2

Find in a linked list? first last 7  4  3  8 NULL

Optimizing MapReduce for GPUs with Effective Shared Memory Usage

Threads and Memory Models Hal Perkins Autumn 2009

RAID Redundant Array of Inexpensive (Independent) Disks

Critical section problem

Background and Motivation

Multicore programming

October 6, 2011 Dr. Itamar Arel College of Engineering

Multicore programming

Data Structures & Algorithms

CSC 143 Binary Search Trees.

CSE 153 Design of Operating Systems Winter 19

Data Structures in Ethereum

Chapter 13 Recursion Copyright © 2010 Pearson Addison-Wesley. All rights reserved.

CS 144 Advanced C++ Programming May 7 Class Meeting

Multidisciplinary Optimization

Presentation transcript:

Pedro C. Diniz Information Sciences Institute Viterbi School of Engineering Atomic-Delayed Execution: A Concurrent Programming Model for Incomplete Graph-based Computations

Big-Data and Graph Analytics Cyber-Security Large Network Systems Social Networks Combination of the above Challenges Ton of bytes (not ton of flops) Massive Concurrency but Little data locality Low Computation to Communication ratio Frequent Synchronization Work tends to be Dynamic and Imbalanced Data may even become unavailable Programming for this Application Domain is Non-Trivial Motivation

Example: Minimum Distance to Root Node Simple Pointer-based Acyclic Graph Computation Compute for each node the Minimal Distance to a “root” Node Store Value of Distance in Node Save Selected Nodes in Set

Example: Minimum Distance to Root Node Because the Graph is Potentially Very Big Cannot Do It Sequentially Limited in Time Need to Tolerate “incorrect” Answers Exploit Concurrency Atomic Updates to Distance in Node Skip if Value is Already Lower than Argument

Example: Concurrent Traversal Create a Thread at Each Invocation Visit Nodes and Check Distance against Argument Update Distance Atomically and Proceed

Example: Concurrent Traversal Create a Thread at Each Invocation Visit Nodes and Check Distance against Argument Update Distance Atomically and Proceed

Example: Concurrent Traversal Create a Thread at Each Invocation Visit Nodes and Check Distance against Argument Update Distance Atomically and Proceed Yes, we may do more work than sequential 2 1

class node { int depth; node *left, *right; }; Example: Code void node::traversal(int { time(T) } { atomic { if(depth > val){ depth = val; } par { if (left != NULL) left->traversal(val+1); if (right != NULL) right->traversal(val+1); } exception { error.memory : { continue; } timer.expired : { return; } }

class node { int depth; node *left, *right; }; Example: Code void node::traversal(int { time(T) } { atomic { if(depth > val){ depth = val; } else { return; } par { if (left != NULL) left->traversal(val+1); if (right != NULL) right->traversal(val+1); } exception { error.memory : { continue; } timer.expired : { return; } }

exception { timer.expired : { time (T) { par { if (left != NULL) left->traversal(val+1); if (right != NULL) right->traversal(val+1); } Example: Delayed Execution When Time Expires: Return Control Continue for another Time Quantum Separate Thread Updates Objects Atomically

Concepts: Objects, Concurrency and Atomic Objects and Methods Data Encapsulation Separability (key): Decouple Updates to Object from Concurrent Invocations Uses only symbolically constant object data and arguments Atomicity: Avoids Race but not indeterminism Facilitates Reasoning In Principle could have Many Atomic Sections Concurrency

Experiments: Concurrency Environment Using pthreads Master threads and N Workers Work stealing at a work-pool Exception flag is checked when attempting to steal work Objects in C share a Pool of Mutex Locks Some possible false contention Timed and Delayed Execution Sharing two global Timers (for simplicity)

Experiments: Graph Computation Search Image Feature in Graph Nodes represent people and have 1 image Edges represent associations Collect from a given “root” node Nodes at distance greater than 2 Share the same features (computational intensive) Graphs Synthetically-Generated with RMAT algorithm Experiments: Timed Executions Faults in Node Edges

Results: Completeness and “Correctness”

Tolerance to “Errors”

Summary Object-based programming model with timed and delayed executions Geared towards computations in very large data sets where the data cannot be traversed in useful time or is simply unavailable due to uncorrected memory errors. Presented experimental results for a concurrent incomplete graph-based computation to deliver feasible results in strict time bounds and in the presence of memory errors. Foresee the need to allow programmers to specify time limits for the computation so that systems can make progress with limited, and incomplete, data.

Acknowledgements Partial support for this work was provided by the US Army Research Office (Award W911NF ) Partial support for this work was provided by the US Department of Energy (DoE) Office of Science, Advanced Scientific Computing Research through the SciDAC-3 SUPER Research Institute (Contract Number DE-SC ) Acknowledgements

Pedro Diniz