2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Towards.

Slides:



Advertisements
Similar presentations
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Advertisements

MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
Chapter1 Fundamental of Computer Design Dr. Bernard Chen Ph.D. University of Central Arkansas.
Multi Agent Simulation and its optimization over parallel architecture using CUDA™ Abdur Rahman and Bilal Khan NEDUET(Department Of Computer and Information.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.
Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
GPGPU platforms GP - General Purpose computation using GPU
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
U 2 SOD-DB: A Database System to Manage Large-Scale Ubiquitous Urban Sensing Origin-Destination Data Jianting Zhang 134 Hongmian Gong 234 Camille Kamga.
Skew Handling in Aggregate Streaming Queries on GPUs Georgios Koutsoumpakis 1, Iakovos Koutsoumpakis 1 and Anastasios Gounaris 2 1 Uppsala University,
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
N Tropy: A Framework for Analyzing Massive Astrophysical Datasets Harnessing the Power of Parallel Grid Resources for Astrophysical Data Analysis Jeffrey.
Computer System Architectures Computer System Software
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
Computer Graphics Graphics Hardware
Roger Jones, Lancaster University1 Experiment Requirements from Evolving Architectures RWL Jones, Lancaster University Ambleside 26 August 2010.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Multi-core architectures. Single-core computer Single-core CPU chip.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Multi-Core Architectures
© David Kirk/NVIDIA and Wen-mei W. Hwu, 1 Programming Massively Parallel Processors Lecture Slides for Chapter 1: Introduction.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Emergence of GPU systems and clusters for general purpose high performance computing ITCS 4145/5145 April 3, 2012 © Barry Wilkinson.
2011 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’11) 19 th ACM SIGSPATIAL GIS: Chicago, IL Nov 1—4, 2011 Speeding.
Introducing collaboration members – Korea University (KU) ALICE TPC online tracking algorithm on a GPU Computing Platforms – GPU Computing Platforms Joohyung.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Feeding Parallel Machines – Any Silver Bullets? Novica Nosović ETF Sarajevo 8th Workshop “Software Engineering Education and Reverse Engineering” Durres,
CMPS Operating Systems Prof. Scott Brandt Computer Science Department University of California, Santa Cruz.
Computer Engg, IIT(BHU)
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Chapter1 Fundamental of Computer Design
Constructing a system with multiple computers or processors
Jianting Zhang City College of New York
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Jianting Zhang1,2 Simin You2, Le Gruenwald3
Types of Parallel Computers
Presentation transcript:

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Towards Personal High-Performance Geospatial Computing (HPC-G): Perspectives and a Case Study Jianting Zhang Department of Computer Science, the City College of New York

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Outline  Introduction  Geospatial Data, GIS, Spatial Databases and HPC  Geospatial data: what’s special?  GIS: impacts of hardware architectures  Spatial Databases: parallel DB or MapReduce?  HPC: many options  Personal HPC-G: A New Framework  Why Personal HPC for geospatial data?  GPGPU Computing: a brief introduction  Pipelining CPU and GPU workloads for performance  Parallel GIS prototype development strategies  A Case Study: Geographically Weighted Regression  Summary and Conclusions

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Geography GIS Applications Remote Sensing Ecological Informatics Computer Science Spatial Databases Mobile Computing Data Mining

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Introduction – Personal Stories  Computational intensive problems in geospatial data processing  Distributed hydrological modeling/flood simulation  Satellite image processing: clustering/classification (multi-/hyper-spectral)  Identifying and tracking storms from time-series NEXRAD images  Species distribution modeling (e.g. regression/GA-based)  History of accesses to HPC resources  1994: Simulating a 33 hours flood on a PC (33MHZ/4M) took 50+ hours  2000: A Cray machine was available but special arrangement was required to access it while taking a course ( Parallel and Distributed Processing)  : HPC resources at SDSC were available to the SEEK project but the project ended up only using SRB for data/metadata storage  : An Nvidia Quadro FX 3700 GPU card (that came with a Dell workstation) gave 23X speedup after porting a serial CPU codebase (for SSDBM’10) to CUDA platform (ACM-GIS’10)

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Two books that changed my research focus (as a database person )… 2 nd edition  4 th editionhttp://courses.engr.illinois.edu/ece498/al/ As well as a few visionary database research papers David J. DeWitt, Jim Gray: Parallel Database Systems: The Future of High Performance Database Systems. Commun. ACM 35(6): (1992) Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, David A. Wood: DBMSs on a Modern Processor: Where Does Time Go? VLDB 1999: J. Cieslewicz and K.A. Ross: Database Optimizations for Modern Hardware. Proceedings of the IEEE, 96(5):2008

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Introduction – PGIS in traditional HPC Environment “Despite all these initiatives the impact of parallel GIS research has remained slight:  the anticipated performance plateau became a mountain still being scaled  GIS companies found that, other than for concurrency in databases, their markets did not demand multi-processor performance.  While computing in general demands less of its users, HPC has demanded more– –the barriers to use remain high and the range of options has increased” “…fundamental problem remains the fact that creating parallel GIS operations is non-trivial and there is a lack of parallel GIS algorithms, application libraries and toolkits.” A. Clematis, M. Mineter, and R. Marciano. High performance computing with geographical data. Parallel Computing, 29(10):1275–1279, 2003 If parallel GIS runs in a personal computing environment, to what degree the conclusions will change?

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Introduction – PGIS in Personal Computing Environment  Every personal computer is now a parallel machine  Chip-Multiprocessors (CMP): Dual-core, Quad-core, Six-core CPUs INTEL XEON E5520 $ cores/8 threads; 2.26G, 80W 4*256K L2 cache, 8M L3 cache Max Memory Bandwidth 25.6GB/s  Massively parallel GPGPU computing: Hundreds of GPU cores in a GPU card Nvidia GTX480 $ cores/ (15*1024 threads); 700/1401MHZ, 250W  1.35 TFlops 15*32768 registers; 15*64K shared memory/L1 cache; 768 L2 cache; additional constant/texture memory 1.5G GDDR5 – 1848MHZ clock rate, 384-bit memory interface width, GB/s memory bandwidth If these parallel computing powers are fully utilized, to what degree a personal workstation can match a traditional cluster for geospatial data processing?

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Geospatial data: what’s special?  The slowest processing unit determines the overall performance in parallel computing  Real world data very often are skewed Wavelet compressed raster data Clustered Point data

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Geospatial data: what’s special?  Techniques to handle skewness  data decomposition/partition  spatial indexing  task scheduling Simple equal-size partition may work well for local operations, but may not for focal, zonal and global operations which requires more sophisticated partitions to achieve load balancing Complexities of task scheduling grow fast with the number of tasks and generic scheduling heuristics may not always produce good results

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 GIS: impacts of hardware architectures  GIS have been evolving along with mainstream information technologies  major platform shift from Unix workstations to Windows PCs in the early 1990s  the marriage with Web technologies to create Web-GIS in the late 1990s  Will GIS naturally evolve from serial to parallel as computers evolve from uniprocessor to chip multiprocessor?  What can the community do to speedup the evolution?

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 GIS: impacts of hardware architectures  Three roles of GIS  data management  information visualization  modeling support  GIS-based spatial modeling, such as agent based modeling, is naturally suitable for HPC  Computational intensive  Adopt a raster tessellation and mostly involve local operations and/or focal operations with small constant numbers of neighbors - parallelization-friendly or even “Embarrassingly parallel”  Runs in an offline mode and uses traditional GIS for visualization  How to make full use of hardware and support data management and information visualization more efficiently and effectively?

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 HPC: many options  The combination of architectural and organizational enhancements lead to 16 years of sustained growth in performance at an annual rate of 50% from 1986 to 2002, due to the combined power, memory and instruction-level parallelism problem, the growth rate has dropped to about 20% per year from 2002 to 2006  In 2004, Intel cancelled its high-performance uniprocessor projects and joined IBM and Sun to declare that the road to higher performance would be via multiple processors per chip (or Chip Multiprocessors, CMP) rather than via faster uniprocessors.  As a marketing strategy, Nvidia calls a personal computer equipped with one or more of its high-end GPGPU cards as a personal supercomputer. Nvidia claimed that when compared to the latest quad-core CPU, Tesla 20-series GPU computing processors deliver equivalent performance at 1/20th of power consumption and 1/10th of cost.

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 HPC: many options 1. CPU Multi-cores 2. GPU Many-cores 3. CPU Multi-nodes (traditional HPC) 4. CPU+GPU Multi-nodes (2+3) How about 1+2?  Personal HPC Affordable and dedicated personal computing environment No additional cost: use-it or waste-it Excellent visualization and user interaction supports Can be the “last-mile” of a larger cyberinfrastructure Data structures/algorithms/software are critical to the success

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Personal HPC-G: A New Framework  Additional arguments to advocate for Personal HPC for geospatial data  While some geospatial data processing tasks are computationally intensive, many more are data intensive in nature Distributing large data chunks incur significant network and disk I/O overheads (50-100MB/s) make full use of high interface bandwidths between CPU cores –memory (10-30 GB/s), CPU memory  GPU memory(8GB/s) and GPU cores- memory ( GB/s)  The improved CPU+GPU performance will not only solve old problems faster but also allow many traditionally offline data processing tasks run online in an interactive manner. The uninterrupted exploration processes are likely to facilitate novel scientific discoveries more effectively.

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Why Personal HPC for geospatial data? High-Level Comparisons among Cluster Computing, Cloud Computing and Personal HPC Cluster Computing Cloud Computing Personal HPC Initial costHighLow Operational costHighMediumLow End user controlLowHigh Theoretical scalabilityHigh Medium User code developmentMediumLowHigh Data managementLowMedium Numeric modelingHighMediumHigh Interaction & visualizationLow High

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Spatial Database: parallel DB or MapReduce  Spatial databases: GIS without GUI  Learn lessons from the relational databases on parallelization  The debates between Parallel DB and MapReduce  The emergence of hybrid approaches (e.g. HadoopDB )  While parallel processing of geospatial data to achieve high performance has been a research topic for quite a while, neither of them has been extensively applied to practical large-scale geospatial data management  Call for pilot studies in experimenting the two approaches to provide insights for future synthesis

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 GPGPU Computing: Nvidia CUDA: Compute Unified Device Architecture AMD/ATI: Stream Computing

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Parallel GIS prototype development strategies We envision that Personal HPC-G provides an opportunity to evolve traditional GIS to parallel GIS gradually. Community research and development efforts are needed to speed up the evolution.  We first propose to learn from existing parallel geospatial data processing algorithms and adapt them to CMP CPU and GPU architectures.  Second, we suggest study existing GIS modules (e.g., ArcGIS geoprocessing tools) carefully, identify most frequently used ones and develop parallel code for multicore CPUs and many-core GPUs  Third, while exiting database research on CMP CPU and GPU architectures are still relatively limited, they can be the starting point to investigate how geospatial data management can be realized on the new architectures and their hybridization  Finally, reuse existing CMP and GPU based software codebases developed by the computer vision and computer graphics communities

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 GWR Case Study  A conceptual design of efficiently implement GWR based on CUDA GPGPU computing architecture - preliminary in nature  Being realized by a master student at CCNY  Good C/C++ programming skills  New to GPGPU/CUDA programming  Being supported 5 hours/per week through a tiny grant (experiment on what $2000 can contribute to PGIS development)

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 GWR Case Study  GWR extends the traditional regression framework by allowing local parameters to be estimated  Given a neighborhood definition (or Bandwidth) of a data item, a traditional regression can be applied to data items that fall into the neighborhood or region.  The correlation coefficients for all the geo-referenced data items (raster cells or points) form a scalar field that can be visualized and interactively explored  By interactively changing some GWR parameters (e.g., bandwidth) and visual exploring the changes of the corresponding scalar fields, users can have better understanding of the distributions of GWR statistics and the original dataset.

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Dependent Variable Independent Variable Using an n*n moving window to compute correlation coefficients (n=3). The correlation coefficient at the dotted cell is r= GWR is computationally intensive Point data are usually clustered which makes load-balancing very difficult

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 GWR Case Study: Overall Design

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Let S 1 =nΣx i y i, S 2 =Σx i, S3= Σy i, S 4 =nΣx i 2, S 5 =nΣy i 2, f can be computed from n and S 1 through S 5. Assuming that data items D 1, D 2, …D n are divided into m groups and each group has computed their partial statistics s 1, s 2, s 3, s 4, s 5, then f can be computed from n j, S 1j, S 2j, S 3j, S 4j and S 5j as the following (j=1,m): n= Σn j, S 1 =nΣ (S 1j /n j ), S 2 =Σ S 2j, S 3 =Σ S 3j, S 4 =nΣ (S 4j /n j ), S 5 =nΣ (S 5j /n j ). GWR Case Study: From partial to total statistics

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Summary and Conclusions  We aimed at introducing a new HPC framework for processing geospatial data in a personal computing environment, i.e., Personal HPC-G.  We argued that the fast increasing hardware capacities of modern personal computers equipped with chip multiprocessor CPUs and massively parallel GPU devices have make Personal HPC-G an attractive alternative to traditional Cluster computing and newly emerging Cloud computing for geospatial data processing.  We used a parallel design of GWR on Nvidia CUDA enabled GPU device as an example to discuss how Personal HPC-G can be utilized to realize parallel GIS modules by synergistic software and hardware co-programming.

2010 Workshop on High Performance and Distributed Geographic Information Systems (HPDGIS’10) 18 th ACM SIGSPATIAL GIS: San Jose, CA Nov 2—5, 2010 Q&A 25