Introduction to Parallel Computing Most of the slides are borrowed from https://www.cs.purdue.edu/homes/ayg/book/Slides/.

Slides:



Advertisements
Similar presentations
CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.
Advertisements

Clusters Part 1 - Definition of and motivation for clusters Lars Lundberg The slides in this presentation cover Part 1 (Chapters 1-4) in Pfister’s book.
Introduction to Parallel Computing
Dinker Batra CLUSTERING Categories of Clusters. Dinker Batra Introduction A computer cluster is a group of linked computers, working together closely.
Chapter1 Fundamental of Computer Design Dr. Bernard Chen Ph.D. University of Central Arkansas.
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
Introduction CS 524 – High-Performance Computing.
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 4: Computer Performance.
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
Lecture 1: Introduction to High Performance Computing.
Parallel Programming Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
Computer performance.
1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1.
Chapter 1- Introduction
Department of Computer and Information Science, School of Science, IUPUI Dale Roberts, Lecturer Computer Science, IUPUI CSCI.
Chapter1 Fundamental of Computer Design Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
CS526 Lecture 2.  The role of parallelism in accelerating computing speeds has been recognized for several decades.  Its role in providing multiplicity.
Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy
Parallel Processing CS453 Lecture 2.  The role of parallelism in accelerating computing speeds has been recognized for several decades.  Its role in.
Classification of Computers
CSE 531 Parallel Processors and Processing Dr. Mahmut Kandemir.
Fall 2000M.B. Ibáñez Lecture 01 Introduction What is an Operating System? The Evolution of Operating Systems Course Outline.
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
History of Integrated Circuits  In 1961 the first commercially available integrated circuits came from the Fairchild Semiconductor Corporation.  The.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
TASHKENT UNIVERSITY OF INFORMATION TECHNOLOGIES Lesson №18 Telecommunication software design for analyzing and control packets on the networks by using.
CSCI-455/522 Introduction to High Performance Computing Lecture 1.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Computer Organization & Assembly Language © by DR. M. Amer.
© 2009 IBM Corporation Motivation for HPC Innovation in the Coming Decade Dave Turek VP Deep Computing, IBM.
Introduction. Demand for Computational Speed Continual demand for greater computational speed from a computer system than is currently possible Areas.
Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.
SNU OOPSLA Lab. 1 Great Ideas of CS with Java Part 1 WWW & Computer programming in the language Java Ch 1: The World Wide Web Ch 2: Watch out: Here comes.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Outline Why this subject? What is High Performance Computing?
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
Understanding IT Infrastructure Lecture 9. 2 Announcements Business Case due Thursday Business Analysis teams have been formed Business Analysis Proposals.
1 A simple parallel algorithm Adding n numbers in parallel.
Introductory Lecture. What is Discrete Mathematics? Discrete mathematics is the part of mathematics devoted to the study of discrete (as opposed to continuous)
Sub-fields of computer science. Sub-fields of computer science.
CS203 – Advanced Computer Architecture
Parallel Computing and Parallel Computers
Chapter1 Fundamental of Computer Design
Parallel Computing Demand for High Performance
Architecture & Organization 1
Parallel Computers.
Architecture & Organization 1
Parallel Computing Demand for High Performance
Chapter 1 Introduction.
Parallel Computing Demand for High Performance
Parallel Computing Demand for High Performance
Computer Evolution and Performance
Parallel Computing and Parallel Computers
Vrije Universiteit Amsterdam
Introduction to Parallel Computing. Serial Computing Traditionally, software has been written for serial computation – A problem is broken into a discrete.
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Introduction to Parallel Computing Most of the slides are borrowed from

Topic Overview Motivating Parallelism Scope of Parallel Computing Applications Organization and Contents of the Course

Demand for Computational Speed Continual demand for greater computational speed from a computer system than is currently possible Areas requiring great computational speed include numerical modeling and simulation of scientific and engineering problems. Computations must be completed within a “reasonable” time period.

Grand Challenge Problems A grand challenge problem is one that cannot be solved in a reasonable amount of time with today’s computers. Obviously, an execution time of 10 years is always unreasonable. Examples Modeling large DNA structures Global weather forecasting Modeling motion of astronomical bodies.

Global Weather Forcast Example

Parallel Computing Using more than one computer, or a computer with more than one processor, to solve a problem. Motives Usually faster computation - very simple idea - that n computers operating simultaneously can achieve the result n times faster – it will not be n times faster for various reasons.

Motivating Parallelism The role of parallelism in accelerating computing speeds has been recognized for several decades. Its role in providing multiplicity of datapaths and increased access to storage elements has been significant in commercial applications. The scalable performance and lower cost of parallel platforms is reflected in the wide variety of applications.

Motivating Parallelism Developing parallel hardware and software has traditionally been time and effort intensive. If one is to view this in the context of rapidly improving uniprocessor speeds, one is tempted to question the need for parallel computing. There are some unmistakable trends in hardware design, which indicate that uniprocessor (or implicitly parallel) architectures may not be able to sustain the rate of realizable performance increments in the future. This is the result of a number of fundamental physical and computational limitations. The emergence of standardized parallel programming environments, libraries, and hardware have significantly reduced time to (parallel) solution.

The Computational Power Argument Moore's law states [1965]: ``The complexity for minimum component costs has increased at a rate of roughly a factor of two per year. Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. That means by 1975, the number of components per integrated circuit for minimum cost will be 65,000.''

The Computational Power Argument Moore attributed this doubling rate to exponential behavior of die sizes, finer minimum dimensions, and ``circuit and device cleverness''. In 1975, he revised this law as follows: ``There is no room left to squeeze anything out by being clever. Going forward from here we have to depend on the two size factors - bigger dies and finer dimensions.'' He revised his rate of circuit complexity doubling to 18 months and projected from 1975 onwards at this reduced rate.

The Computational Power Argument If one is to buy into Moore's law, the question still remains - how does one translate transistors into useful OPS (operations per second)? The logical recourse is to rely on parallelism, both implicit and explicit. Most serial (or seemingly serial) processors rely extensively on implicit parallelism. We focus in this class, for the most part, on explicit parallelism.

The Memory/Disk Speed Argument While clock rates of high-end processors have increased at roughly 40% per year over the past decade, DRAM access times have only improved at the rate of roughly 10% per year over this interval. This mismatch in speeds causes significant performance bottlenecks. Parallel platforms provide increased bandwidth to the memory system. Parallel platforms also provide higher aggregate caches. Principles of locality of data reference and bulk access, which guide parallel algorithm design also apply to memory optimization. Some of the fastest growing applications of parallel computing utilize not their raw computational speed, rather their ability to pump data to memory and disk faster.

The Data Communication Argument As the network evolves, the vision of the Internet as one large computing platform has emerged. This view is exploited by applications such as and In many other applications (typically databases and data mining) the volume of data is such that they cannot be moved. Any analyses on this data must be performed over the network using parallel techniques.

Scope of Parallel Computing Applications Parallelism finds applications in very diverse application domains for different motivating reasons. These range from improved application performance to cost considerations.

Applications in Engineering and Design Design of airfoils (optimizing lift, drag, stability), internal combustion engines (optimizing charge distribution, burn), high-speed circuits (layouts for delays and capacitive and inductive effects), and structures (optimizing structural integrity, design parameters, cost, etc.). Design and simulation of micro- and nano-scale systems. Process optimization, operations research.

Scientific Applications Functional and structural characterization of genes and proteins. Advances in computational physics and chemistry have explored new materials, understanding of chemical pathways, and more efficient processes. Applications in astrophysics have explored the evolution of galaxies, thermonuclear processes, and the analysis of extremely large datasets from telescopes. Weather modeling, mineral prospecting, flood prediction, etc., are other important applications. Bioinformatics and astrophysics also present some of the most challenging problems with respect to analyzing extremely large datasets.

Commercial Applications Some of the largest parallel computers power the wall street! Data mining and analysis for optimizing business and marketing decisions. Large scale servers (mail and web servers) are often implemented using parallel platforms. Applications such as information retrieval and search are typically powered by large clusters.

Applications in Computer Systems Network intrusion detection, cryptography, multiparty computations are some of the core users of parallel computing techniques. Embedded systems increasingly rely on distributed control algorithms. A modern automobile consists of tens of processors communicating to perform complex tasks for optimizing handling and performance. Conventional structured peer-to-peer networks impose overlay networks and utilize algorithms directly from parallel computing.

Organization and Contents of this Course Fundamentals: This part of the class covers basic parallel platforms, principles of algorithm design, group communication primitives, and analytical modeling techniques. Parallel Programming: This part of the class deals with programming using message passing libraries and threads. Parallel Algorithms: This part of the class covers basic algorithms for matrix computations, graphs, sorting, discrete optimization, and dynamic programming.

CSI’s HPC Systems “SALK” : a Cray XE6m with a total of 1280 processor cores. Salk is reserved for large parallel jobs, particularly those requiring more than 64 cores. Emphasis is on applications in the environmental sciences and astrophysics. Salk is named in honor of Dr. Jonas Salk, the developer of the first polio vaccine, and a City College alumnus. “ANDY” is an SGI cluster with 744 processor cores and 96 NVIDIA Fermi processor accelerators. Andy is for jobs using 64 cores or fewer, for jobs using the NVIDIA Fermi’s, and for Gaussian jobs. Andy is named in honor of Dr. Andrew S. Grove, a City College alumnus and one of the founders of the Intel Corporation.

CSI’s HPC Systems “BOB” is a Dell cluster with 232 processor cores. Bob is for jobs using 64 cores or fewer, and parallel Matlab jobs. Bob is named in honor of Dr. Robert E. Kahn, an alumnus of the City College, who, along with Vinton G. Cerf, invented the TCP/IP protocol. “KARLE” is a Dell shared memory system with 24 processor cores. Karle is used for serial jobs, Matlab, SAS, parallel Mathematica, and certain ARCview jobs. Karle is named in honor of Dr. Jerome Karle, an alumnus of the City College of New York who was awarded the Nobel Prize in Chemistry in 1985.

CSI’s HPC Systems “Zeus” is a Dell cluster with 88 processor cores. It is used for Gaussian jobs. “Neptune” is the system that is used as a gateway to the above HPCC systems. It is not used for computations.

CSI’s HPC Systems nstalled_systems