High Performance Computing On Laptops With Multicores & GPUs Sushil K. Prasad Computer Science sprasad@gsu.edu
About me Research Area: Parallel and Distributed Algorithms and Systems - over multicores, GPUs, clusters, sensors, handhelds, web services, … Lab: Distributed and Mobile Systems (DiMoS) at Ga. Tech campus, 5 PhD students, 2 M.S. students IEEE TCPP Chair (elected) 2 NSF grants – currently looking for PhD/MS/undergraduate students Distributed Algorithms High Performance Cloud Computing
Multicore & GPU Chips Inside a Laptop - 100s of processors - Big machines and clusters are not the only platforms for high end computing For the first time in history, almost anyone can own a parallel computer: your laptop has dual core CPU + many-core GPU (240 cores in a nVIDIA 280 GTX) Cost is $500. In 2000, we spent $300K to purchase a 24 CPU SGI computer. For $40K, we just bought a cluster with 10 compute nodes with 88 cores + four GPUs with 240 cores each – total > 1000 cores!
GPUs Vs Multicores Combined power exceeds 180 GFLOPs
Intel Core-2 Duo Multicore Difficult to parallelize Memory hierarchy is a barrier: 1 cycle core 3 cycles L1 cache 14 cycles L2 250 cycles RAM - Difficult to parallelize - Memory hierarchy is a barrier: 1 cycle core (1/3 ns), 3 cycles L1, 14 cycles L2, and 250 cycles RAM
GPU: Graphics Processing Unit Nvidia 280 GTX 240 cores Extreme memory hierarchy Registers Local memory Shared memory/8 cores Off chip Global Memory bottleneck bus to CPU Good research needed – hot area
Smith Waterman Seq Alignment, Fasta, and Blast Nvidia 8800 GTX Smith Waterman Seq Alignment, Fasta, and Blast Database: SwissProt Manavski and Valle 2008 Smith-Waterman in CUDA running on single and double GPU vs. BLAST and SSEARCH. Substitution matrix used: BLOSUM50. Gap-open penalty: 10. Gap-extension penalty: 2. Database used: SwissProt (Dec. 2006 – 250,296 proteins and 91,694,534 amino acids). * Smith-Waterman in CUDA running on an NVidia GeForce 8800 GTX ** Smith-Waterman in CUDA running on two NVidia GeForce 8800 GTX
Parallel Data Structures -Priority Queues Large Scale Event Simulation Immune System Simulation VLSI Logic simulation Branch and Bound Task Scheduling Challenge: Fine Grained Systems Students: Dinesh Agarwal, Nick Mancuso 5 3 1 2 6 8 7 9 19 21 12 14 23 34 25 38 16 13 65 10 15
Parallel Priority Queues on Multicore
Legacy-Code to GPUs (Student: Chad Christopher)
Distributed Algorithms for Lifetime of Wireless Sensor Networks (Student: Akshaye Dhawan)
NP-Hard Distributed Problems in Networks NSF Grant Minimum Vertex/Target Cover Minimum Triangle Packing Optimum mobile sensor network target tracking Minimum channel assignment in mobile ad-hoc networks Students: John Daigle, Thamer Sulaiman
Middleware for Mobile Ad–hoc Applications Mobile Support Station Applications Deviceware Process Requests 3. p2p communication Applications Listener Applications Deviceware Process Requests 2. Lookup Bottom-up Listener 1. Register Deviceware Groupware Process Requests Listener Listener Process Requests 18 February 2019 UM-Morris Directory
BondFlow: Distributed Workflow over Web Services (Student: Janaka Balasooriya) Web service interface module Proxy object generator module Workflow configuration module Execution module. Mobile Web Services Web Service Interface Module Lookup for Web services Web Services Registry (UDDI) S O A P WS Locator WSDL WSDL Parser Parsed WSDL Workflow Execution Module Proxy Object Generator Module Web Bond Runtime SOAP/ SyD Workflow Configuration Module JVM
A Posterior Uncertainty P2P Search based on Bayesian Decision and Value of Information (VOI) – (Student: Rasanjalee) The meaning of Uncertainty based Information Peer Selection: Sending/forwarding query at each node along query path = series of decision making steps based on incomplete data A decision step: query the node that will reduce the uncertainty of current belief most. Experimental Results: A Priori Uncertainty : U1 A Posterior Uncertainty : U2 U1 –U2 = Information The reduction in uncertainty at each decision step Current Belief Decision step 1 . . Decision step n
Middleware on Distributed Smart Cameras Middleware on DSC networks provide a high-level programming interface for applications. simplify the development of distributed applications on DSC networks. provide networking functionality as part of the middleware Student: Jayampathi Sampat cmucam3
About me Research Area: Parallel and Distributed Algorithms and Systems - over multicores, GPUs, clusters, sensors, handhelds, web services, … Lab: Distributed and Mobile Systems (DiMoS) at Ga. Tech campus, 5 PhD students, 2 M.S. students IEEE TCPP Chair (elected) 2 NSF grants – currently looking for PhD/MS/undergraduate students Distributed Algorithms High Performance Cloud Computing
High Performance Computing On Laptops With Multicores & GPUs Sushil K. Prasad Computer Science sprasad@gsu.edu