March 2, 2016 UWB Shizuoka Univ. Workshop

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

List Ranking on GPUs Sathish Vadhiyar. List Ranking on GPUs Linked list prefix computations – computations of prefix sum on the elements contained in.
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
Spark: Cluster Computing with Working Sets
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Chris Rouse CSS Cooperative Education Faculty Research Internship Winter / Spring 2014.
FLANN Fast Library for Approximate Nearest Neighbors
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Distributed Multi-Agent Management in a parallel-programming simulation and analysis environment: diffusion, guarded migration, merger and termination.
CSS Cooperative Education Faculty Research Internship Spring / Summer 2013 Richard Romanus 08/23/2013 Developing and Extending the MASS Library (Java)
Jpeg Analyzer Ben Applegate CSS497 Advisor: Dr. Munehiro Fukuda.
Distributed shared memory. What we’ve learnt so far  MapReduce/Dryad as a distributed programming model  Data-flow (computation as vertex, data flow.
 2004 Deitel & Associates, Inc. All rights reserved. 1 Chapter 4 – Thread Concepts Outline 4.1 Introduction 4.2Definition of Thread 4.3Motivation for.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
April 23, 2013Research in Progress Seminar MASS: A Multi-Agent Spatial Simulation Library Munehiro Fukuda, Ph.D. School of Science, Technology, Engineering,
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.
Distributed mega-scale Agent Management in MASS: diffusion, guarded migration, merger and termination Cherie Wasous CSS_700 Thesis – Winter 2014 (Jan.
WINTER 2016 – TERM PRESENTATION MICHAEL O’KEEFE. PAST RESEARCH - SUMMER 2015 Continued Jason Woodring’s research on UWCA Main issue with UWCA is the slow.
Compute and Storage For the Farm at Jlab
Chapter 4 – Thread Concepts
Enhancements for Voltaire’s InfiniBand simulator
Agent Teamwork Research Assistant
TensorFlow– A system for large-scale machine learning
MASS Java Documentation, Verification, and Testing
Big Data is a Big Deal!.
Last Class: Introduction
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.
A Dynamic Scheduling Framework for Emerging Heterogeneous Systems
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Miraj Kheni Authors: Toyotaro Suzumura, Koji Ueno
Chapter 3: Process Concept
Thank you, chairman for the kind introduction. And hello, everyone.
CSS434 Presentation Guide
Operating Systems (CS 340 D)
Processes and Threads Processes and their scheduling
Parallel Programming By J. H. Wang May 2, 2017.
Chapter 4 – Thread Concepts
Process Management Presented By Aditya Gupta Assistant Professor
Lecture 21 Concurrency Introduction
Lecture 1 Runtime environments.
Java Software Structures: John Lewis & Joseph Chase
Parallel NetCDF + MASS Development
Chapter 4: Threads.
MASS CUDA Performance Analysis and Improvement
Threads and Data Sharing
Cse 344 May 4th – Map/Reduce.
MapReduce Algorithm Design Adapted from Jimmy Lin’s slides.
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Data-Intensive Computing: From Clouds to GPU Clusters
GENERAL VIEW OF KRATOS MULTIPHYSICS
HPML Conference, Lyon, Sept 2018
What is Concurrent Programming?
Threads and Concurrency
Multithreaded Programming
What is Concurrent Programming?
Immersed Boundary Method Simulation in Titanium Objectives
Future EU Grid Projects
Introduction Separate application from different supplier run on different operating systems need some higher level of coupling Application often evolve.
Threads and Concurrency
Chapter 4: Threads.
Concurrency: Processes CSE 333 Summer 2018
NETWORK PROGRAMMING CNET 441
Lecture 1 Runtime environments.
Agent-Based Computing CSS599 Winter 2018
Chapter 01: Introduction
MapReduce: Simplified Data Processing on Large Clusters
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

March 2, 2016 UWB Shizuoka Univ. Workshop Agent-Based Parallelization of Micro-Simulation and Spatial Data Analysis Munehiro Fukuda Computing & Software Systems University of Washington Bothell

Background Simulation Data Analysis Conventional models March 2, 2016 UWB Shizuoka Univ. Workshop Background Simulation Data Analysis Conventional models Macroscopic Mathematical flow models Partial differential equations Agent-based models Micro-simulation Emergent collective group behavior Text data analysis Text, JSON, CSV formats Keep reading line by line into memory Examples: Hadoop, Spark, Storm, Kafka Spatial data analysis Binary Array, raster, grid data Graph Need to be structured in memory Verification & Calibration

Challenges Examples: CPU Scalability: Examples: Memory Scalability: March 2, 2016 UWB Shizuoka Univ. Workshop Challenges Agent-Based Models Spatial Data Analysis CPU Scalability: The more accuracy we pursue, the more agents we need in a simulation Examples: MatSim: 20 minutes to simulate 200K cars driving through Bellevue on I-405 FluTe: 2 hours to simulate epidemic in 10M individuals Memory Scalability: The more accuracy we pursue, the longer time-series or the wider spatial data we need Examples: WRF: generates a 250MB NetCDF file of the regional climate data for every six hours, which sums up 120 files or 30GB data per month, 360GB per year, and 18TB for a 50-year simulation. Parallelization

Hypothesis and Research Goals March 2, 2016 UWB Shizuoka Univ. Workshop Hypothesis and Research Goals Hypothesis Users may choose agents if their programmability is easy regardless of 10-20% performance drawback (because of their management overheads). Agent-Based Simulation Achieving highest-speed ABM simulation using GPU, which assists real-time cyber-physical computation. Spatial Data Analysis allowing spatial data analysis to be coded intuitively with agents, which improves programmability in data analysis.

Multi-Agent Spatial Simulation Library March 2, 2016 UWB Shizuoka Univ. Workshop Multi-Agent Spatial Simulation Library Y-axis A Bag of Agents Agents Agents Agents Application Layer (x,y) Places X-axis Thread 0 Thread 1 Thread 2 Thread 3 Thread 0 Thread 1 Thread 2 Thread 3 Thread 0 Thread 1 Thread 2 Thread 3 MASS, our library for multi-agent spatial simulation facilitates this flow-oriented network parallelization. The library has two classes: Places and Agents. Places is a multi-dimensional array of elements that are allocated over a cluster of multi-core computing nodes. Each element is called a place and is referred to by a network-independent array indices and is capable of exchanging information with others. Agents are a set of objects that reside on a place, migrated to other places, and interact with other agents through their current place. The actual communication and agent migration take place automatically through the underlying shared-memory or socket communication by the library. 1’00”/3’50” Process Rank 0 socket Process Rank 1 socket Process Rank 2 MASS Library Layer CPU Core 0 CPU Core 1 CPU Core 2 CPU Core 3 CPU Core 0 CPU Core 1 CPU Core 2 CPU Core 3 CPU Core 0 CPU Core 1 CPU Core 2 CPU Core 3 System Memory System Memory System Memory mnode0.uwb.edu mnode1.uwb.edu LAN mnode2.uwb.edu

MASS Specification March 2, 2016 UWB Shizuoka Univ. Workshop Public static void main( String[ ] args ) { MASS.init( args ); Places space = new Places( handle, “MySpace”, params, xSize, ySize); Agents agents = new Agents( handle, “MyAgents”, params, space, population ); space.callAll( MySpace.func1, params ); space.exchangeAll( MySpace.func2, neighbors ); agents.exchangeAll( MyAgents.func3 ); agents.manageAll( ); MASS.finish( ); } func1( ) … func2( ) func3( ) The MASS program starts with MASS.init( ) to launch remote processes, each spawning multithreads. A Places multi-dimensional array is created over these processes and partitioned into small stripes, each allocated to a different thread. Similarly, agents are populated over these processes. Parallel computation is performed by callAll to invoke a given function of each array element in parallel. Inter-element communication is performed by exchangeAll that invokes a given function from each array element to its neighbors. Agents.manageAll is the function that allows agent migration, duplication, and termination at once. Finally, MASS.finish destroys all the places and agents, and merges all computation into the main thread. 1’10”/5’00”

MASS Status Library Test programs Applications Java (implemented) March 2, 2016 UWB Shizuoka Univ. Workshop MASS Status Library Java (implemented) C++ (implemented) CUDA (in design) Test programs 2D wave dissemination Sugarscape: artificial society Applications Transport simulation (in Java) Epidemiologic simulation (in C++) Climate analysis (in Java) Network motif search (in Java)

March 2, 2016 UWB Shizuoka Univ. Workshop Multi-Agent-Based Transport Simulation Parallelized by Zach Ma and to be presented at WSC2015 Agents represent vehicles. They move over a traffic network every simulation click. http://matsim.org/uploads/showcase/MATVis-Time.pdf

Porting MatSim to MASS Java March 2, 2016 UWB Shizuoka Univ. Workshop Porting MatSim to MASS Java Road network Converted to an adjacency matrix Vehicle driving Converted to agent migration

Parallel MatSim Programmability and Performance March 2, 2016 UWB Shizuoka Univ. Workshop Parallel MatSim Programmability and Performance LoC Files Original MatSim 5,144 46 Code addition/modification by MASS 585 8 % 11.3% 17.4%

FluTE: Influenza Epidemic Simulation from Univ. New Mexico March 2, 2016 UWB Shizuoka Univ. Workshop FluTE: Influenza Epidemic Simulation from Univ. New Mexico Infected person communities Contagious http://www.cs.unm.edu/~dlchao/flute/

March 2, 2016 UWB Shizuoka Univ. Workshop MASS C++ Parallelized FluTE Programmability and Performance By Osmond Gunarso and Zac Brownell MPI Performance MASS Performance

March 2, 2016 UWB Shizuoka Univ. Workshop UW Climate Analysis Parallelized by Jason Woodring and Submitted to ICPADS 2015

Agent-Based ToE Analysis (in Java) March 2, 2016 UWB Shizuoka Univ. Workshop Agent-Based ToE Analysis (in Java)

UWCA Programmability and Performance March 2, 2016 UWB Shizuoka Univ. Workshop UWCA Programmability and Performance Handling structured scientific data Dispatching agents to only data portions to analyze Supporting runtime analysis The biggest overhead is reading files into a cluster system

March 2, 2016 UWB Shizuoka Univ. Workshop Biological Network Motifs Parallelized by Matt Kipps and Presented at HPCC 2015 Biological network: protein-protein interaction. Network motifs: significantly and uniquely recurring sub-graphs. Motif search: sub-graph enumeration and statistical testing First, let me briefly explain about what biological network motifs are. Biological network presents protein-protein interaction. Of importance is to identify significantly and uniquely recurring sub-graphs within a given biological network, which are called network motif. While such a motif search consists of two steps, (1) as sub-graph enumeration and (2) as statistical testing, today’s presentation focuses on sub-graph enumeration. Given a simple graph with 9 nodes, the Enumerate Subgraph (or ESU) algorithm picks up each node and enumerate all the combination of this node and the others such as 1 and 2, 1 and 3, and so on. Obviously, this enumeration grows exponentially as increasing the number of nodes. For instance, undirected-non-isomorphic graphs with 10 nodes reaches 11,716,571 patterns. Therefore, we need parallelization. 1’20”/2’00”

Parallelization of Network Programs March 2, 2016 UWB Shizuoka Univ. Workshop Parallelization of Network Programs Vertex-oriented approaches Paradigm change from the original sequential algorithms Examples MapReduce Pregel GraphLab Flow-oriented approaches Closer to the original sequential algorithms Olden MASS: our approach The conventional parallelization of network programs takes vertex-oriented approaches where an algorithm or computation is chopped over these nodes and intermediate computation results are passed among these nodes. However, from the programmers’ intuitive viewpoint, they would like to trace links as if they are driving on these routes. This in turn means that computation should migrate over the network as threads in the Olden system or agents in our multi-agent spatial simulation library. 0’50”/2’50”

MASS Agent-Based Parallelization (in Java) March 2, 2016 UWB Shizuoka Univ. Workshop MASS Agent-Based Parallelization (in Java) Crawler agent 1 2 3 4 5 6 7 8 9 The MASS library maps a given network over cluster nodes as a 1-dimensional array where each array element or node maintains a set of links to its neighbors. Injected to a Place element, for instance place 1, an crawler agent duplicates itself to all the neighbors of place 1. This cloning and migration is repeated until they find a new network motif or reach a network leaf. 0’40”/5’40” rank 0 rank 1 rank 2

MASS Agent Execution Performance March 2, 2016 UWB Shizuoka Univ. Workshop MASS Agent Execution Performance Agent explosion 2365-node network with motif size 4 400,000 agents 400,000-node network with motif size 5 5.5 million agents Needs to address Memory allocation overheads Agent management overheads Next, we’ll see the execution performance. While the MASS agent-based approach shows the effect of parallelization beyond two computing nodes, its parallelization has never beaten out the sequential execution. The main reason is agent explosion. With a 400,000 node network with motif size 5, we need to spawn 5.5 million agents. This considerably incurs memory allocation and agent management overheads. 0’40”/8’10”

Performance Improvement Plans for Simulation March 2, 2016 UWB Shizuoka Univ. Workshop Performance Improvement Plans for Simulation A Development of MASS GPU Version Parallel agent spawn and termination No use of heap space and no memory lock Avoidance of agent collision No lock on memory space and no repetitive agent migration UI to agents Micro agent tracking and intermittent space check-pointing To make the agent-based network parallelization more attractive from both programming and execution performance viewpoints, we are considering the following three improvement plants. No.1: At present, agent migration is synchronously initiated by every single invocation of Agnets.manageAll( ) that increases the master-worker inter-process communication. By making agent migration asynchronous, we can improve the MASS performance twice better than the synchronous migration. No.2: agent pooling will reduce memory and agent management overheads. No.3: we are planning to restrict agent explosion: in other words let a few agents traverse network links like depth-first search. This will reduce the heap space needed to instantiate agents. 1’10”/9’40” Check-pointing an entire simulation space Tracking a given agent’s footprint into memory Timeline

Performance Improvement Plans for Spatial Data Analysis March 2, 2016 UWB Shizuoka Univ. Workshop Performance Improvement Plans for Spatial Data Analysis Supporting asynchronous agent migration Will mitigate synchronous master-work communication overheads. Pooling idle agents Will reduce memory and agent management overheads. Restricting explosion of agent population Will reduce the heap space required. To make the agent-based network parallelization more attractive from both programming and execution performance viewpoints, we are considering the following three improvement plants. No.1: At present, agent migration is synchronously initiated by every single invocation of Agnets.manageAll( ) that increases the master-worker inter-process communication. By making agent migration asynchronous, we can improve the MASS performance twice better than the synchronous migration. No.2: agent pooling will reduce memory and agent management overheads. No.3: we are planning to restrict agent explosion: in other words let a few agents traverse network links like depth-first search. This will reduce the heap space needed to instantiate agents. 1’10”/9’40”

Our Current Group Members March 2, 2016 UWB Shizuoka Univ. Workshop Our Current Group Members Parallel File I/O for MASS Java UWCA Performance Improvement MASS Dev. Environment MASS Java Testing & Doc. UrbanSim Parallelization with MASS Java Performance Improvement of MASS Java Testing and Performance Evaluation of MASS C++ on AWS MASS C++ Testing & Doc. Agent Collision Avoidance In MASS C++

March 2, 2016 UWB Shizuoka Univ. Workshop Conclusions Programmability: We demonstrated Intuitive parallelization using multi-agents Execution performance: Simulation: The MASS library should support CUDA for a general-purpose agent-based real-time GPU computing. Spatial data analysis: We need performance tune-ups: (1) asynchronous migration, (2) agent pool, and (3) prevention of agent explosion. For more information, please visit: http://depts.washington.edu/dslab/MASS/ We’ll release the MASS library to the public by March 2016 (hopefully.. ) Now, I’d like to conclude my presentation. Our contribution was to implement and compare our agent-based approach to parallelizing graph algorithms with MPI. While MASS demonstrated intuitive parallelization, it needs further performance improvements as listed here. We will soon release the MASS library to the public and you can get the details through the following website. That’s all my presentation. Thank you very much for your kind attention. 0’40”/10’20”