SICSA student induction day, 2009Slide 1 Multithreading RePast Models Alex Voss 1, Jing-Ya You 2, Eric Yen 2, Simon Lin 2, Ji-Ping Lin 3, Andy Turner 4.

Slides:



Advertisements
Similar presentations
Fabián E. Bustamante, Spring 2007
Advertisements

Concurrency Important and difficult (Ada slides copied from Ed Schonberg)
Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.
Instructor: Sazid Zaman Khan Lecturer, Department of Computer Science and Engineering, IIUC.
High Performance Computing Course Notes Grid Computing.
Overarching Goal: Understand that computer models require the merging of mathematics and science. 1.Understand how computational reasoning can be infused.
Rules for Designing Multithreaded Applications CET306 Harry R. Erwin University of Sunderland.
E-Social Science: scaling up social scientific investigations Alex Voss, Andy Turner (ESRC National Centre for e-Social Science) Gabor Terstyanszky, Gabor.
Future Research Directions in Agent Based Modelling Workshop, Leeds, UK, Large Scale Social Simulation in Java and the NeISS Project Andy Turner.
Multiple Processor Systems
Paul D. Bryan, Jason A. Poovey, Jesse G. Beu, Thomas M. Conte Georgia Institute of Technology.
Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison
Parallelized variational EM for Latent Dirichlet Allocation: An experimental evaluation of speed and scalability Ramesh Nallapati, William Cohen and John.
Threads Section 2.2. Introduction to threads A thread (of execution) is a light-weight process –Threads reside within processes. –They share one address.
MoSeS meets NEC 10 th March 2008 MoSeSMoSeS Andy Turner
1 Characterizing the Sort Operation on Multithreaded Architectures Layali Rashid, Wessam M. Hassanein, and Moustafa A. Hammad* The Advanced Computer Architecture.
International Symposium on Grid Computing 2010 Applications on Humanities & Social Sciences I Taipei, Taiwan ( ) GENESIS Social Simulation Modelling.
Chapter 2 Processes and Threads Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
E-Social Science: scaling up social scientific investigations Alex Voss, Andy Turner, Rob Procter National Centre for e-Social Science Gabor Terstyanszky,
An Introduction to Social Simulation Andy Turner Presentation as part of Social Simulation Tutorial at the.
FLANN Fast Library for Approximate Nearest Neighbors
Constructing Individual Level Population Data for Social Simulation Models Andy Turner Presentation as part.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
SICSA student induction day, 2009Slide 1 Social Simulation Tutorial Session 6: Introduction to grids and cloud computing International Symposium on Grid.
1 Enabling Large Scale Network Simulation with 100 Million Nodes using Grid Infrastructure Hiroyuki Ohsaki Graduate School of Information Sci. & Tech.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Guanhai Wang, Minglu Li and Chuliang Weng Shanghai Jiao Tong University, China. SVM09, Wuhan, China.
University of Michigan Electrical Engineering and Computer Science 1 Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
Threaded Applications Introducing additional threads in a Delphi application is easy.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
Geodemographic modelling collaboration Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research
Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.
FNAL Geant4 Performance Group Issues and Progress Daniel Elvira for M. Fischler, J. Kowalkowski, M. Paterno.
Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.
Towards a Billion Routing Lookups per Second in Software  Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc  Publisher: SIGCOMM Computer Communication Review,
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
March 12, 2001 Kperfmon-MP Multiprocessor Kernel Performance Profiling Alex Mirgorodskii Computer Sciences Department University of Wisconsin.
2-Day Introduction to Agent-Based Modelling Day 2: Session 6 Mutual adaption.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
SICSA student induction day, 2009Slide 1 Social Simulation Tutorial International Symposium on Grid Computing Taipei, Taiwan, 7 th March 2010.
QCAdesigner – CUDA HPPS project
Concurrency Control 1 Fall 2014 CS7020: Game Design and Development.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Operating Systems: Internals and Design Principles
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
Single Node Optimization Computational Astrophysics.
Cs431-cotter1 Processes and Threads Tanenbaum 2.1, 2.2 Crowley Chapters 3, 5 Stallings Chapter 3, 4 Silberschaz & Galvin 3, 4.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Some GPU activities at the CMS experiment Felice Pantaleo EP-CMG-CO EP-CMG-CO 1.
Background Computer System Architectures Computer System Software.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)
Next Generation of Apache Hadoop MapReduce Owen
GMProf: A Low-Overhead, Fine-Grained Profiling Approach for GPU Programs Mai Zheng, Vignesh T. Ravi, Wenjing Ma, Feng Qin, and Gagan Agrawal Dept. of Computer.
1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,
NFV Compute Acceleration APIs and Evaluation
A brief intro to: Parallelism, Threads, and Concurrency
Processes and threads.
Operating Systems (CS 340 D)
Parallelized JUNO simulation software based on SNiPER
for the Offline and Computing groups
Computer Engg, IIT(BHU)
Operating Systems (CS 340 D)
EN Software Carpentry Python – A Crash Course Esoteric Sections Parallelization
Presentation transcript:

SICSA student induction day, 2009Slide 1 Multithreading RePast Models Alex Voss 1, Jing-Ya You 2, Eric Yen 2, Simon Lin 2, Ji-Ping Lin 3, Andy Turner 4 1 School of Computer Science, University of St Andrews 2 Academia Sinica Grid Computing, Academia Sinica, Taiwan 3 Center for Survey Research, Academia Sinica, Taiwan 4 School of Geography, University of Leeds Workshop on Future Directions in Agent Based Modelling Leeds, UK, June 2010

SICSA student induction day, 2009Slide 2 Overview Something about the model we want to build on migration Quite a bit about how we tweaked the model to make use of multiple CPUs/cores A bit about what we will do next and questions we want to explore My interest in this…

SICSA student induction day, 2009Slide 3 Migration in Taiwan Migration has been an important factor in Taiwanese social development and influenced by outside factors since the 1600s Aim is to test existing theories of migration constructively and to investigate recent developments such as increased outward migration to China. Timely as Taiwan is running another census in Based on work conducted by Ji-Ping Lin of the Academia Sinica Center for Survey Research on Migration using the 1990 and 2000 Taiwan Population and Housing Census

SICSA student induction day, 2009Slide 4 SimTaiwan: Migration in Taiwan Based on Taiwan 2000 Population and Housing Census Dataset is individual-level but with restricted variables Held at Academia Sinica Need to identify additional datasets to complement census Issues with data protection Need to scale up to ca. 22 million individuals They are heterogeneous agents with quite a large number of attributes and history.

SICSA student induction day, 2009Slide 5 SimTaiwan Tests Four different model implementations: 1.Naïve single-threaded model 2.Improved single-threaded model 3.Initial multi-threaded model 4.Improved multi-threaded model Test runs with each of these models to measure: 1.Wallclock and CPU time 2.Memory usage 3.Code hotspots 4.Worker thread activity (where applicable)

SICSA student induction day, 2009Slide 6 Test Code and Parameters Simplified model with only fertility and mortality, same for all measured models 250k male and 250k female random initial population, running for 365 ticks (=days) Measurements taken using JProfiler –CPU sampling (5 sec intervals) –Memory allocations recording JVM Parameters: -Xmx8192M –Xss128M Hardware: Dell PowerEdge R610 with 2xXeon 4x2GHz (8 cores total) and 16GB RAM

SICSA student induction day, 2009Slide 7 Naïve Serial Version More time spent in RePast scheduling code than in model code because events are scheduled for each individual agent every step.

SICSA student induction day, 2009Slide 8 Improved Serial Version Event scheduled on DemographicsContext, code iterating through individual agents Wallclock time down from 5:32 to 2:23 Opens up opportunities for parallelising code as well…

SICSA student induction day, 2009Slide 9 Initial Parallel Version Need to partition data to allow multiple worker threads to exploit multiple CPUs & cores PartitionedContext keeping agents in separate HashSets that can return independent Iterators for use by multiple threads. ThreadPoolExecutor with configurable number of worker threads (here 8) Initial version brings only modest / no improvement, wallclock time in some runs > improved serial code Max. CPU utilisation ~ 200% (top)

SICSA student induction day, 2009Slide 10 Initial Parallel Version (II) Worker threads blocking a lot on monitors placed around RePast constructs. Main issue seems to be that use of RandomHelper is not thread- safe Simulation schedule relatively minor issue Some contention around simulation objects

SICSA student induction day, 2009Slide 11 Improved Parallel Version Overloading some of RePast’s code to make it thread safe. Reducing scope of monitor objects used and pulling code parts that are safe out of synchronized sections Introducing thread-local variable containing a per thread random number generator: protected static ThreadLocal uniform = new ThreadLocal () protected Uniform initialValue() { RandomEngine generator = new MersenneTwister((int)System.currentTimeMillis()); return new Uniform(generator); } };

SICSA student induction day, 2009Slide 12 Improved Parallel Version (II) Monitor contention is eased signficantly Wallclock running time down to 1:03 and max. CPU utilisation up to ~ 600% Time spent in serial code for analysis and production of charts is now significant

SICSA student induction day, 2009Slide 13 What have we learned/developed? Advice on structuring RePast code –Parallelise using PartitionedContext –Iteration instead of scheduling events RePast does put some barriers in the way but should be possible to overcome Speed-up initially not as much as hoped for but was overcome by introducing thread-local random number generators Can we factor this work into development of RePast? Or present as tutorial?

SICSA student induction day, 2009Slide 14 Next Step: Debugging/Profiling on the Grid Tests to establish optimum number of partitions and threads vs no. of agents Verification of the code and sensitivity analysis Repeated runs to uncover rare events & need to repeat runs to obtain comparable average figures Availability of high-memory machines will become an issue once we scale up to full 22 million agents; –48GB server available at ASGC –Upgrade / purchase of server at St Andrews planned

SICSA student induction day, 2009Slide 15 Questions What will happen when we make the model more complex? What decisions about the model (will) affect the degree of parallelism and running times? How many CPU cores can we effectively utilise? –Need machine with more cores as well as more memory –This is now becoming affordable thanks to AMD Commodity computing is what we are interested in – less skills involved (?) and availabilty for social scientists

SICSA student induction day, 2009Slide 16 My Interests… Not about building the most sophisticated model or the highest performance one but… about making ABM framework(s) (RePast) usable for social scientists interested in population-level phenomena, addressing the practical issues of developing and using agent-based models in anger cf. challenges outlined by Peter McBurney today