Effective Parallelization Strategies for Scalable, High

Slides:



Advertisements
Similar presentations
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
Advertisements

Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
CS 470/570:Introduction to Parallel and Distributed Computing.
Parallel Game Engine Design or How I Learned to Stop Worrying and Love Multithreading.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
What is Concurrent Programming? Maram Bani Younes.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
A Re-usable Software Architecture for Small Satellite AOCS Systems Authors: Associate professor Jan Dimon Bendtsen
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.
Trace Generation to Simulate Large Scale Distributed Application Olivier Dalle, Emiio P. ManciniMar. 8th, 2012.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
1 Latest Generations of Multi Core Processors
10/02/2012CS4230 CS4230 Parallel Programming Lecture 11: Breaking Dependences and Task Parallel Algorithms Mary Hall October 2,
Multi-Hit Ray Traversal Christiaan Gribble Alexis Naveros Ethan Kerzner ACM SIGGRAPH Symposium on Interactive 3D Graphics & Games 15 March 2014.
1 Object Oriented Logic Programming as an Agent Building Infrastructure Oct 12, 2002 Copyright © 2002, Paul Tarau Paul Tarau University of North Texas.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Path/Ray Tracing Examples. Path/Ray Tracing Rendering algorithms that trace photon rays Trace from eye – Where does this photon come from? Trace from.
Martin Kruliš by Martin Kruliš (v1.1)1.
Parallel OpenFOAM CFD Performance Studies Student: Adi Farshteindiker Advisors: Dr. Guy Tel-Zur,Prof. Shlomi Dolev The Department of Computer Science Faculty.
Chapter 4 – Thread Concepts
These slides are based on the book:
VisIt Project Overview
3D Rendering 2016, Fall.
Segments Introduction: slides 2–6, 8 10 minutes
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Productive Performance Tools for Heterogeneous Parallel Computing
Chapter 4: Threads.
R. Rastogi, A. Srivastava , K. Sirasala , H. Chavhan , K. Khonde
.NET Remoting Priyanka Bharatula.
CS427 Multicore Architecture and Parallel Computing
Chapter 4 – Thread Concepts
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Computer Engg, IIT(BHU)
Grid Computing.
INTRODUCTION TO OPTICAL COMMUNICATION TECHNOLOGY
Steven Ge, Xinmin Tian, and Yen-Kuang Chen
CHAPTER 3 Architectures for Distributed Systems
Grid Computing Colton Lewis.
Leiming Yu, Fanny Nina-Paravecino, David Kaeli, Qianqian Fang
3D Graphics Rendering PPT By Ricardo Veguilla.
Wave BEHAVIORS/interactions
Ray-Cast Rendering in VTK-m
© University of Wisconsin, CS559 Fall 2004
Concept of Power Control in Cellular Communication Channels
Pipeline parallelism and Multi–GPU Programming
CS 179 Lecture 14.
(c) 2002 University of Wisconsin
Parallel Spectral Renderer
Teaching and Learning with Technology Distance Education Chapter 12
Application Development A Tutorial Driven Course
Introduction to locality sensitive approach to distributed systems
What is Concurrent Programming?
Software Engineering with Reusable Components
Chapter 1 Introduction.
Threads and Concurrency
Allen D. Malony Computer & Information Science Department
SAMANVITHA RAMAYANAM 18TH FEBRUARY 2010 CPE 691
Distributed Systems through Web Services
Computer Networking A Top-Down Approach Featuring the Internet
02 | What DirectX Can Do and Creating the Main Game Loop
Graphics Processing Unit
Chapter 4: Threads.
Reporter: Wenkai Cui Institution: Tsinghua University Date:
Presentation transcript:

Effective Parallelization Strategies for Scalable, High Performance Radio Frequency Ray Tracing Christiaan Gribble & Jefferson Amstutz Applied Technology Operation SURVICE Engineering Company IEEE High Performance Extreme Computing 17 September 2015

Agenda Introduction RFRT : radio frequency ray tracing StingRay : architecture & core components 2 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Agenda Introduction Parallel RF simulation RFRT : radio frequency ray tracing StingRay : architecture & core components Parallel RF simulation OpenMP parallelization strategies Results & conclusions 3 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Agenda Introduction Parallel RF simulation Wrap-up RFRT : radio frequency ray tracing StingRay : architecture & core components Parallel RF simulation OpenMP parallelization strategies Results & conclusions Wrap-up 4 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Introduction

Radio frequency simulation Radio frequency (RF) propagation Transmission of radio waves Behavior affected by various phenomena 6 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Radio frequency simulation Radio frequency (RF) propagation Transmission of radio waves Behavior affected by various phenomena Possible methodology Empirical models, FEM, 2N-ray models, … Monte Carlo path tracing 7 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RFRT – radio frequency ray tracing Ray representation EM wave, same direction of travel Direction perpendicular to wavefront 8 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RFRT – radio frequency ray tracing Ray representation EM wave, same direction of travel Direction perpendicular to wavefront Ray behavior Graphics : reflection, refraction RF simulation : +diffraction, +interference 9 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Transmitter Interference Line of Sight Reflection Diffraction

RFRT – diffraction Wedge diffraction without diffraction scene object with diffraction scene object 11 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RFRT – diffraction Wedge diffraction Diffraction edge proxy geometry without diffraction scene object with diffraction scene object 12 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RFRT – interference Constructive v. destructive destructive [image source: Google images] 13 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RFRT – interference Constructive v. destructive Ray/sensor interaction [image source: Google images] 14 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RFRT – validation RFRT 15 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Under estimation Over estimation Transmitter

StingRay

StingRay – combined RF sim/viz Loosely-coupled components Simulation : Embree Visualization : OpenGL, OSPRay, … GUI : Qt5 18 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

StingRay – combined RF sim/viz Loosely-coupled components Simulation : Embree Visualization : OpenGL, OSPRay, … GUI : Qt5 Open source project BSD 3-clause license Cross-platform support http://code.survice.com 19 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

StingRay – simulation engine Ray-based RF wave propagation Dominant RF propagation phenomena Full simulation : Monte Carlo path tracing 20 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

StingRay – simulation engine Ray-based RF wave propagation Dominant RF propagation phenomena Full simulation : Monte Carlo path tracing C++ library with straightforward API Asynchronous multithreaded ray logging Embree : intersection engine 21 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

StingRay – visualization pipeline Loosely-coupled visualization plug-ins Producer/consumer API Frame-oriented I/O 22 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

StingRay – visualization pipeline Loosely-coupled visualization plug-ins Producer/consumer API Frame-oriented I/O Interactive exploration of results Online queries of underlying data OpenGL, OSPRay, … 23 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

StingRay – visualization capabilities Scene geometry Diffraction edges Ray glyphs 24 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

StingRay – visualization capabilities Scene geometry Diffraction edges Ray glyphs Scalar volume data Traditional volume rendering Participating media 25 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

StingRay – examples 26 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

StingRay – examples 27 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Parallelization

Motivation Ray coherence Visibility rays : coherent Secondary rays : incoherent visibility secondary 29 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Motivation Ray coherence Vector processing Visibility rays : coherent Secondary rays : incoherent Vector processing Coherent rays Low-level operations visibility secondary 30 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RFRT is inherently incoherent Motivation Ray coherence Visibility rays : coherent Secondary rays : incoherent Vector processing Coherent rays Low-level operations visibility secondary RFRT is inherently incoherent 31 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RF simulation – core loop while (running) { // Generate path originating from current transmitter Transceiver* tr = transceivers[txrx%ntxrx]; Path path(tr->samplePrimaryRay(), tr->getPower()); // Trace path to completion while (!path.traceNextRay(scene)) ; // Record state, advance to next transceiver … } // End while // Simulation has been stopped, finish in-flight paths // and terminate No path-level parallelism 32 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RF simulation – core loop while (running) { // Generate path originating from current transmitter Transceiver* tr = transceivers[txrx%ntxrx]; Path path(tr->samplePrimaryRay(), tr->getPower()); // Trace path to completion while (!path.traceNextRay(scene)) ; // Record state, advance to next transceiver … } // End while // Simulation has been stopped, finish in-flight paths // and terminate No path-level parallelism Performance baseline 33 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RF simulation – core loop while (running) { // Generate path originating from current transmitter Transceiver* tr = transceivers[txrx%ntxrx]; Path path(tr->samplePrimaryRay(), tr->getPower()); // Trace path to completion while (!path.traceNextRay(scene)) ; // Record state, advance to next transceiver … } // End while // Simulation has been stopped, finish in-flight paths // and terminate No path-level parallelism Performance baseline About 3.7 million rays/second 34 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RF simulation – OpenMP v1 while (running) { #pragma omp parallel for num_threads(nthreads) for (auto i = 0; i < npaths; ++i) … } // End for } // End while // Simulation has been stopped, finish in-flight paths // and terminate Path-level parallelism Maintain pool of live paths Distribute paths across threads 35 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RF simulation – OpenMP v1 while (running) { #pragma omp parallel for num_threads(nthreads) for (auto i = 0; i < npaths; ++i) … } // End for } // End while // Simulation has been stopped, finish in-flight paths // and terminate Path-level parallelism Maintain pool of live paths Distribute paths across threads OpenMP constructs omp parallel for 36 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RF simulation – OpenMP v1 while (running) { #pragma omp parallel for num_threads(nthreads) for (auto i = 0; i < npaths; ++i) … } // End for } // End while // Simulation has been stopped, finish in-flight paths // and terminate Path-level parallelism Maintain pool of live paths Distribute paths across threads OpenMP constructs omp parallel for Simple… But is it effective? 37 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Results – OpenMP v1 38 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RF simulation – OpenMP v1 while (running) { #pragma omp parallel for num_threads(nthreads) for (auto i = 0; i < npaths; ++i) … } // End for } // End while // Simulation has been stopped, finish in-flight paths // and terminate Path-level parallelism Maintain pool of live paths Distribute paths across threads OpenMP constructs omp parallel for Implicit barrier : unnecessary synchronization 39 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RF simulation – OpenMP v2 #pragma omp parallel num_threads(nthreads) { while (running) #pragma omp for nowait for (auto i = 0; i < npaths; ++i) … } // End for } // End while } // End omp parallel // Simulation has been stopped, finish in-flight paths // and terminate Path-level parallelism Distribute core loop Cooperate to complete paths 40 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RF simulation – OpenMP v2 #pragma omp parallel num_threads(nthreads) { while (running) #pragma omp for nowait for (auto i = 0; i < npaths; ++i) … } // End for } // End while } // End omp parallel // Simulation has been stopped, finish in-flight paths // and terminate Path-level parallelism Distribute core loop Cooperate to complete paths OpenMP constructs omp parallel omp for nowait 41 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RF simulation – OpenMP v2 #pragma omp parallel num_threads(nthreads) { while (running) #pragma omp for nowait for (auto i = 0; i < npaths; ++i) … } // End for } // End while } // End omp parallel // Simulation has been stopped, finish in-flight paths // and terminate Path-level parallelism Distribute core loop Cooperate to complete paths OpenMP constructs omp parallel omp for nowait Relatively simple… But does it alleviate synchronization? 42 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Results – OpenMP v2 43 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RF simulation – OpenMP v2 #pragma omp parallel num_threads(nthreads) { while (running) #pragma omp for nowait for (auto i = 0; i < npaths; ++i) … } // End for } // End while } // End omp parallel // Simulation has been stopped, finish in-flight paths // and terminate Path-level parallelism Distribute core loop Cooperate to complete paths OpenMP constructs omp parallel omp for nowait for nowait : threads proceed independently 44 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RF simulation – OpenMP v3 #pragma omp parallel num_threads(nthreads) { #pragma omp single for (auto i = 0; i < npaths; ++i) #pragma omp task Path& path = livePaths[i]; while (running) { … } // End while // Simulation has been stopped, finish in-flight // paths and terminate … } // End omp task } // End for } // End omp single } // End omp parallel Path-level parallelism Maintain pool of live paths Distribute work across threads 45 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RF simulation – OpenMP v3 #pragma omp parallel num_threads(nthreads) { #pragma omp single for (auto i = 0; i < npaths; ++i) #pragma omp task Path& path = livePaths[i]; while (running) { … } // End while // Simulation has been stopped, finish in-flight // paths and terminate … } // End omp task } // End for } // End omp single } // End omp parallel Path-level parallelism Maintain pool of live paths Distribute work across threads OpenMP constructs omp parallel omp single omp task 46 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RF simulation – OpenMP v3 #pragma omp parallel num_threads(nthreads) { #pragma omp single for (auto i = 0; i < npaths; ++i) #pragma omp task Path& path = livePaths[i]; while (running) { … } // End while // Simulation has been stopped, finish in-flight // paths and terminate … } // End omp task } // End for } // End omp single } // End omp parallel Path-level parallelism Maintain pool of live paths Distribute work across threads OpenMP constructs omp parallel omp single omp task Complex… Is tasking actually necessary? 47 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Results – OpenMP v3 48 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

RF simulation – OpenMP v3 #pragma omp parallel num_threads(nthreads) { #pragma omp single for (auto i = 0; i < npaths; ++i) #pragma omp task Path& path = livePaths[i]; while (running) { … } // End while // Simulation has been stopped, finish in-flight // paths and terminate … } // End omp task } // End for } // End omp single } // End omp parallel Path-level parallelism Maintain pool of live paths Distribute work across threads OpenMP constructs omp parallel omp single omp task omp single : higher overhead 49 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Conclusions Parallelization strategies v1 : omp parallel for v2 : omp parallel + omp for nowait v3 : omp task 50 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Conclusions Parallelization strategies v1 : omp parallel for v2 : omp parallel + omp for nowait v3 : omp task Path-level parallelism : effective 51 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Conclusions Parallelization strategies v1 : omp parallel for v2 : omp parallel + omp for nowait v3 : omp task Path-level parallelism : effective Vectorization : open problem 52 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Wrap-up

Take-home messages RF simulation : critical to planning, analyzing, & optimizing communication networks 54 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Take-home messages RF simulation RF ray tracing : high performance alternative to traditional RF simulation methods 55 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Take-home messages RF simulation RF ray tracing StingRay : enables a combined simulation/visualization approach 56 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Take-home messages RF simulation RF ray tracing StingRay Parallelization with OpenMP : careful attention to path-level parallelism enables scalable, high performance RFRT 57 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Future work Vectorization + incoherent rays 58 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Future work Vectorization + incoherent rays Intel Xeon Phi & GPUs 59 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Future work Vectorization + incoherent rays Intel Xeon Phi & GPUs Bi-directional path tracing 60 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Acknowledgements Intel Corporation University of Utah US Army Research Laboratory NYU Polytechnic 61 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Contact information Address Applied Technology Operation SURVICE Engineering Company 4694 Millennium Drive, Suite 500 Belcamp, MD 21017 E-mail christiaan.gribble@survice.com Web http://www.survice.com/ 62 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Simulation engine

StingRay – specular reflection incident RF ray reflected RF ray scene object 66 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

StingRay – wedge diffraction diffraction edge proxy geometry incident RF ray scene object diffracted RF ray 67 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

StingRay – interference receiver sphere receiver sphere (destructive) receiver sphere receiver sphere receiver sphere (constructive) incoming RF ray at t0 incoming RF ray at t1 68 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

StingRay – core components scene geometry edge geometry ray logger simulation controller 69 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

StingRay – core components scene geometry edge geometry ray logger simulation controller Defines objects & materials in physical environment 70 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

StingRay – core components scene geometry edge geometry ray logger simulation controller Defines objects & materials in physical environment Defines objects & materials in physical environment Captures & generates diffraction events 71 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

StingRay – core components scene geometry edge geometry ray logger simulation controller Defines objects & materials in physical environment Captures & generates diffraction events Defines objects & materials in physical environment Captures & generates diffraction events Collects completed paths & RF propagation state 72 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

StingRay – core components scene geometry edge geometry ray logger simulation controller Defines objects & materials in physical environment Captures & generates diffraction events Collects completed paths & RF propagation state Defines objects & materials in physical environment Captures & generates diffraction events Collects completed paths & RF propagation state Governs execution & interfaces with client programs 73 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Visualization pipeline

StingRay – examples 75 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Parallelization

Results – Intel Xeon Phi 79 C. Gribble & J. Amstutz, Effective Parallelization … Radio Frequency Ray Tracing

Live demonstration