Restructuring the multi-resolution approximation for spatial data to reduce the memory footprint and to facilitate scalability Vinay Ramakrishnaiah Mentors:

Slides:

Advertisements

Similar presentations

Use of Python as a MATLAB Replacement for Algorithm Development and Execution in a Multi-Core Environment Glen W. Mabey, Ph.D. Southwest Research Institute.

Advertisements

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.

A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager

June 2003Yun (Helen) He1 Coupling MM5 with ISOLSM: Development, Testing, and Application W.J. Riley, H.S. Cooley, Y. He*, M.S. Torn Lawrence Berkeley National.

Erhan Erdinç Pehlivan Computer Architecture Support for Database Applications.

Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,

Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.

Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?

An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem Juyeon Kim, Deokjin Joo, Taehan Kim DAC’13.

FLANN Fast Library for Approximate Nearest Neighbors

HPEC_GPU_DECODE-1 ADC 8/6/2015 MIT Lincoln Laboratory GPU Accelerated Decoding of High Performance Error Correcting Codes Andrew D. Copeland, Nicholas.

Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.

Building Efficient Time Series Similarity Search Operator Mijung Kim Summer Internship 2013 at HP Labs.

An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.

Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.

Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.

Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,

Binary Image Compression via Monochromatic Pattern Substitution: A Sequential Speed-Up Luigi Cinque and Sergio De Agostino Computer Science Department.

Atlanta, Georgia TiNy Threads on BlueGene/P: Exploring Many-Core Parallelisms Beyond The Traditional OS Handong Ye, Robert Pavel, Aaron Landwehr, Guang.

Cluster-based SNP Calling on Large Scale Genome Sequencing Data Mucahid KutluGagan Agrawal Department of Computer Science and Engineering The Ohio State.

The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov.

/ ZZ88 Performance of Parallel Neuronal Models on Triton Cluster Anita Bandrowski, Prithvi Sundararaman, Subhashini Sivagnanam, Kenneth Yoshimoto,

MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi CoE EECS Department April 21, 2014.

Slide 1 MIT Lincoln Laboratory Toward Mega-Scale Computing with pMatlab Chansup Byun and Jeremy Kepner MIT Lincoln Laboratory Vipin Sachdeva and Kirk E.

1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.

Presented by High Productivity Language and Systems: Next Generation Petascale Programming Wael R. Elwasif, David E. Bernholdt, and Robert J. Harrison.

High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.

Collective Buffering: Improving Parallel I/O Performance By Bill Nitzberg and Virginia Lo.

ESMF Performance Evaluation and Optimization Peggy Li(1), Samson Cheung(2), Gerhard Theurich(2), Cecelia Deluca(3) (1)Jet Propulsion Laboratory, California.

LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:

StreamX10: A Stream Programming Framework on X10 Haitao Wei School of Computer Science at Huazhong University of Sci&Tech.

Parallelization of Classification Algorithms For Medical Imaging on a Cluster Computing System 指導教授 : 梁廷宇老師系所 : 碩光通一甲姓名 : 吳秉謙學號 :

ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.

Distributed WHT Algorithms Kang Chen Jeremy Johnson Computer Science Drexel University Franz Franchetti Electrical and Computer Engineering.

2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.

Parallel & Distributed Systems and Algorithms for Inference of Large Phylogenetic Trees with Maximum Likelihood Alexandros Stamatakis LRR TU München Contact:

LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Mathematical Modeling and Computational Physics.

Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.

Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.

Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.

October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)

TI Information – Selective Disclosure Implementation of Linear Algebra Libraries for Embedded Architectures Using BLIS September 28, 2015 Devangi Parikh.

CS 732: Advance Machine Learning

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

Pixel Parallel Vessel Tree Extraction for a Personal Authentication System 2010/01/14 學生：羅國育.

Next Generation of Apache Hadoop MapReduce Owen

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi

Fast Data Analysis with Integrated Statistical Metadata in Scientific Datasets By Yong Chen (with Jialin Liu) Data-Intensive Scalable Computing Laboratory.

2014 Heterogeneous many cores for medical control: Performance, Scalability, and Accuracy Madhurima Pore, Arizona State University October 10,2014 #GHC14.

Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,

BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.

Paul Cockshott Glasgow

Support for Program Analysis as a First-Class Design Constraint in Legion Michael Bauer 02/22/17.

R. Rastogi, A. Srivastava , K. Sirasala , H. Chavhan , K. Khonde

Ioannis E. Venetis Department of Computer Engineering and Informatics

Scaling Spark on HPC Systems

Distributed Processors

Parallel Programming By J. H. Wang May 2, 2017.

Parallel Density-based Hybrid Clustering

Spatial Analysis With Big Data

Steven Ge, Xinmin Tian, and Yen-Kuang Chen

Introduction to Parallelism.

Real-Time Ray Tracing Stefan Popov.

Course Outline Introduction in algorithms and applications

Numerical Algorithms Quiz questions

Parallel Programming in C with MPI and OpenMP

Presentation transcript:

Restructuring the multi-resolution approximation for spatial data to reduce the memory footprint and to facilitate scalability Vinay Ramakrishnaiah Mentors: Dorit Hammerling Raghuraj Prasanna Kumar Rich Loft

Introduction High resolution global measurements of large areas. Accurate representation and processing of spatial data. Predict trends in global climate. Traditional methods – computationally infeasible. Multi-resolution approximation (MRA)

Multi-resolution approximation (MRA) Spatial statistics – make parameter inference and spatial predictions Computational inference using traditional spatial statistical approach – difficult to parallelize. For the number of observations n: Computational complexity - O(n3) Memory complexity – O(n2) MRA - approximate remainder independently Exploit parallelism Reduce memory requirement Sequential MRA: Computational complexity – O(n log2 n) Memory complexity – O(n log n)

Multi-resolution Approximation (MRA) Spatial domain is recursively partitioned Spatial process – linear combination of basis functions at multiple spatial resolutions Similar to multi-grid algorithm

Outline of the algorithm Creation of prior Posterior inference

Implementations Existing implementation Original implementation - sequential MRA Full-layer parallelism Alternatives: Hyper-segmentation Shallow trees

Full-layer parallel approach Sequential execution

Full-layer parallel approach Sequential execution

Full-layer parallel approach Sequential execution

Full-layer parallel approach Sequential execution

Full-layer parallel approach Regions within a resolution layer are executed in parallel Layers are executed sequentially Sequential execution

Hyper-segmentation

Hyper-segmentation

Hyper-segmentation

Hyper-segmentation

Hyper-segmentation

Hyper-segmentation

Hyper-segmentation

Hyper-segmentation First step towards reducing memory footprint Trades-off parallelism for memory

Shallow tree approach

Shallow tree approach

Shallow tree approach

Shallow tree approach

Shallow tree approach

Shallow tree approach

Shallow tree approach

Shallow tree approach

Shallow tree approach

Shallow tree approach

Shallow tree approach

Shallow tree approach

Shallow tree approach

Shallow tree approach

Shallow tree approach

Shallow tree approach

Shallow tree approach Partitioning into shallow trees start at a certain resolution level Sub-trees (shallow trees) can be executed sequentially or in a distributed fashion Regions within the shallow tree resolution layers can be executed in parallel

Experimental methodology Matlab used for implementation Geyser: Hardware per node: Four 10 core, 2.4 GHz Intel Xeon E7-4870 (Westmere EX) processors per node 1 TB DDR3-1600 memory per node Single node (40 cores) implementation of full-layer parallel, hyper-segmentation, and shallow trees. PMET = Peak memory × Execution time

Performance Number of layers M=9, Children/parent J=4, Knots/region r=32

Performance Number of layers M=9, Children/parent J=4, Knots/region r=64

Performance Number of layers M=10, Children/parent J=4, Knots/region r=32

Performance Number of layers M=11, Children/parent J=4, Knots/region r=25

Performance Number of layers M=11, Children/parent J=4, Knots/region r=40

Execution cost - PMET

Moving to distributed memory No distributed computing server for Matlab on Yellowstone Hack: Matlab MPI by Lincoln laboratory, MIT Uses file I/O protocol to implement MPI There must be a directory visible to every machine Python to run MPI and call Matlab functions

Conclusion Improvement over existing implementations Capable of reducing the memory footprint by a factor of ~3 Able to increase the maximum size of data set that could be processed by MRA The implemented algorithms are theoretically well scalable

Future work Restructure the data types. Rewrite the code in lower level language to exploit more levels of parallelism. Potential for GPU implementation.

Acknowledgements Thanks to my mentors: Dorit Hammerling, Raghuraj Prasanna Kumar, Rich Loft. Thanks to Sophia Chen (high school intern) for the graphics used in this presentation. Thanks to Patrick Nichols, Shiquan Su, Brian Vanderwende, Davide Del Vento, Richard Valent. Thanks to all the NCAR administrative staff.

Thank you Questions?