Jawwad A Shamsi Nouman Durrani Nadeem Kafi Systems Research Laboratories, FAST National University of Computer and Emerging Sciences, Karachi Novelties.

Slides:



Advertisements
Similar presentations
CSCI 4125 Programming for Performance Andrew Rau-Chaplin
Advertisements

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Parallel Programming Models and Paradigms Prof. Rajkumar Buyya Cloud Computing and Distributed Systems (CLOUDS) Lab. The University of Melbourne, Australia.
NSF/TCPP Early Adopter Experience at Jackson State University Computer Science Department.
Hong Lin Computer and Mathematical Sciences University of Houston – Downtown Teaching Parallel and Distributed Computing Using a Cluster Computing Portal.
Weekly Report Ph.D. Student: Leo Lee date: Oct. 9, 2009.
Parallel Programming Models and Paradigms
Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.
High Performance Communication using MPJ Express 1 Presented by Jawad Manzoor National University of Sciences and Technology, Pakistan 29 June 2015.
CS 732: Advance Machine Learning Usman Roshan Department of Computer Science NJIT.
Early Adopter: ASU - Intel Collaboration in Parallel and Distributed Computing Yinong Chen, Eric Kostelich, Yann-Hang Lee, Alex Mahalov, Gil Speyer, and.
Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.
Parallelization: Conway’s Game of Life. Cellular automata: Important for science Biology – Mapping brain tumor growth Ecology – Interactions of species.
Early Adopter Introduction to Parallel Computing: Research Intensive University: 4 th Year Elective Bo Hong Electrical and Computer Engineering Georgia.
Parallel and Distributed Computing Overview and Syllabus Professor Johnnie Baker Guest Lecturer: Robert Walker.
High Performance Computation --- A Practical Introduction Chunlin Tian NAOC Beijing 2011.
1 Parallel of Hyderabad CS-726 Parallel Computing By Rajeev Wankar
Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications Published in: Cluster.
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Quan Yuan and Sasithorn Zuge Dept. of Computing and New Media Technologies University of Wisconsin-Stevens Point.
Adding PDC within a Six-Course Subset of the CS Major Apan Qasem Texas State University.
Trip report: GPU UERJ Felice Pantaleo SFT Group Meeting 03/11/2014 Felice Pantaleo SFT Group Meeting 03/11/2014.
Thinking in Parallel Adopting the TCPP Core Curriculum in Computer Systems Principles Tim Richards University of Massachusetts Amherst.
Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th.
Integrating Parallel and Distributed Computing Topics into an Undergraduate CS Curriculum Andrew Danner & Tia Newhall Swarthmore College Third NSF/TCPP.
STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: –Instruction (single, multiple) per clock cycle.
CIS4930/CDA5125 Parallel and Distributed Systems Florida State University CIS4930/CDA5125: Parallel and Distributed Systems Instructor: Xin Yuan, 168 Love,
ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006outline.1 ITCS 4145/5145 Parallel Programming (Cluster Computing) Fall 2006 Barry Wilkinson.
Extreme scale parallel and distributed systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
Fall 2012: Early Adopter Plan for Teaching Concurrent and Distributed Systems Jawwad Shamsi FAST NU, Pakistan.
ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 3, 2011outline.1 ITCS 6010/8010 Topics in Computer Science: GPU Programming for High Performance.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Early Adopter: Integrating Concepts from Parallel and Distributed Computing into the Undergraduate Curriculum Eileen Kraemer Computer Science Department.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Guiding Principles. Goals First we must agree on the goals. Several (non-exclusive) choices – Want every CS major to be educated in performance including.
Practices of Integrating Parallel and Distributed Computing Topics into CS Curriculum at UESTC Guoming Lu University of Electronic Science and Technology.
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
CSC 7600 Lecture 28 : Final Exam Review Spring 2010 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS FINAL EXAM REVIEW Daniel Kogler, Chirag Dekate.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
Early Adopter: Integration of Parallel Topics into the Undergraduate CS Curriculum at Calvin College Joel C. Adams Chair, Department of Computer Science.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
Fall-11: Early Adoption of NSF/TCPP PDC Curriculum at Texas Tech University and Beyond Yong Chen Yu Zhuang Noe Lopez-Benitez May 10 th, 2013.
ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson, Dec 26, 2012outline.1 ITCS 4145/5145 Parallel Programming Spring 2013 Barry Wilkinson Department.
 Course Overview Distributed Systems IT332. Course Description  The course introduces the main principles underlying distributed systems: processes,
Multi-Semester Effort and Experience to Integrate NSF/IEEE-TCPP PDC into Multiple Department- wide Core Courses of Computer Science and Technology Department.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous.
Euro-Par, 2006 ICS 2009 A Translation System for Enabling Data Mining Applications on GPUs Wenjing Ma Gagan Agrawal The Ohio State University ICS 2009.
Master Program (Laurea Magistrale) in Computer Science and Networking High Performance Computing Systems and Enabling Platforms Marco Vanneschi Course.
CSci6702 Parallel Computing Andrew Rau-Chaplin
CS 732: Advance Machine Learning
Distributed Real-time Systems- Lecture 01 Cluster Computing Dr. Amitava Gupta Faculty of Informatics & Electrical Engineering University of Rostock, Germany.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Defining the Competencies for Leadership- Class Computing Education and Training Steven I. Gordon and Judith D. Gardiner August 3, 2010.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Accelerators to Applications
Parallel Programming By J. H. Wang May 2, 2017.
Inculcating “Parallel Programming” in UG curriculum
Accelerating MapReduce on a Coupled CPU-GPU Architecture
Summary Background Introduction in algorithms and applications
Hybrid Programming with OpenMP and MPI
Chapter 01: Introduction
Presentation transcript:

Jawwad A Shamsi Nouman Durrani Nadeem Kafi Systems Research Laboratories, FAST National University of Computer and Emerging Sciences, Karachi Novelties in Teaching High Performance Computing

High Performance Computing CPU Intensive Data Intensive

HPC Curriculum at BS level ` Single Course Multiple Courses CPU intensiveData intensiveCPU intensiveData intensive Can only cover either of CPU or Data intensive

HPC Curriculum at BS level Shared Memory Distributed Memory Significant to teach both aspects of HPC

HPC Curriculum at BS level Shared Memory Distributed Memory Pertinent to incorporate breadth of knowledge GPGPU OpenMP MPI Hadoop

Our Thesis for HPC Curriculum A single Course Breadth of knowledge CPU intensive and Data Intensive Shared Memory and Distributed Memory Systems

Novel Contributions of This Paper Introduced a Specific Course Introduced Students to Multiple HPC Platforms Impart Practical Knowledge CPU and Data Intensive Systems Shared Memory and Distributed Memory Systems

Pedagogical Goals GoalDescription G1Understand basic concepts of High Performance computing G2Provide practical experience on multiple HPC platforms G3Impart knowledge of Parallel programming algorithms G4Motivate students for advanced topics and learning

Assignments AssignmentDescriptionGoals A1MPI program to scatter computational task to worker nodes in the cluster and gather the results back at the root node. Learn techniques for process creation and task distribution. Understand methods for point to point and collective communication between processes using MPI A2For a large computational problem, identify opportunities for parallelism. Use OpenMP to solve the computational problem. Stimulate students’ thinking of applying parallelism. Learn parallel computing in shared memory architecture using OpenMP. A3Solve the movie-ratings problem from Netflix using Hadoop and MapReduce. Learn Data-Intensive Computing. Understand programming in Hadoop using MapReduce. A4Multiplication of large matrixes using GPUs Comprehend knowledge about GPUs. Learn GPGPU programming for CPU-intensive tasks.

WeekTopicAlignment with Goals 1Introduction to important concepts of HPCG1 2Task Division, interaction, and Solving Techniques. HPC Clusters. Introduction to MPI G1, G2, and G4 3MPI Communication, Message Passing. (A1)G2 and G4 4Dynamic Process creation in MPI. File access in MPI. Quiz 1 G2 and G4 5Shared Memory Clusters. Open MP Programming Assignment on Open MP (A2) G2 and G4 6First Hourly. Hybrid Clusters using MPI G2 and G4 7,8 Data Intensive Computing. Introduction to MapReduce G3 and G4 9,10 Hadoop – Open Source Platform for MapReduce G3 and G4 11,12 Architecture of Hadoop. Assignment on Hadoop (A3) G3 and G4 13Hadoop and MapReduce : Applications G3 and G4 14 GPU/GPGPU Programming. Architecture concepts in GPU. G2 and G4 15 Grid, thread, and block concepts. Device to host communication. Programming Assignment on GPU (A4) G2 and G4 16Project Presentations G2 and G4 17Final Examination G1, G2, G3, and G4

Topics Covered S. NoTopic Systems 1 Configuration of Clusters 2Use of Cloud 3Network and Distributed File System Architecture 4Parallel Computing Architecture 5GPU Architecture Algorithms and Applications 6CPU Intensive Computing 7Data Intensive Computing 8Parallel Algorithms Programming on Distributed Memory 9 MPI 10Hadoop Programming on Shared Memory 11OpenMP 12GPU

PDC Topics Covered AreaTopics with Bloom Level ArchitectureTaxonomy (C), Multi-core (C), SMP (A), NUMA (C), ILP (C) AlgorithmsDivide and Conquer (A), Reduction (A), Recursion (A), Scan (C), Speedup (A), Task graph (A), Scatter (A), Gather (A), Multicast (A) ProgrammingGustafson's law (C), Amdhal's law (C), Shared Memory (A), Static and Dynamic mapping (A), load balancing (C), Synchronization (A), Critical regions (A), Compiler directives (A), Producer Consumer (A), Task/thread spawning (A), SPMD/SIMD (A), Hybrid (A), Distributed Memory (A), Client/Server (A), Data Parallelism (A) Data Locality (A), Work Stealing (K) Advanced TopicsCluster Computing/Grid computing (A), Cloud Computing (A), Web Search (C/A) Social Networking (C), Distributed File System (A), GPU Architecture (C/A)

Student Evaluations

Student Marks in Assignments Assignment AverageMedianStd. DevMinMax A1 (MPI) A2 (O PEN MP) A3 (H ADOOP ) A4 (GPU)

Overall Assessments of Students

Student’s feedback about Multiple Platforms QuestionResults Multiple HPC platforms enhanced Learning Strongly Agree (63.63%), Agree (31.81%), Neutral (4.54%), Disagree(0%), Strongly Disagree(0%) Programming Helped In Learning Strongly Agree (77.27%), Agree (13.63 %), Neutral (9.09%), Disagree(0%), Strongly Disagree (0%) Group Discussion and Interactive Style helped in Learning Strongly Agree (63.63%), Agree (18.18%), Neutral (13.63%), Disagree (4.54%), Strongly Disagree(0%)

Overall Marks given by Students AverageMinMedianMax 97910

Conclusion Programming provides an effective method for learning. -Using multiple HPC platforms provided an effective way of learning. -Both data and CPU intensive computing are needed to be covered in parallel computing course. -Cutting edge topics such as GPU (CUDA), Hadoop, and Cloud computing are very popular among students -Interactive learning, peer discussion, and group discussions are effective in teaching. -Students feedback should be incorporated for elective courses.