Tuesday, September 04, 2006 I hear and I forget, I see and I remember, I do and I understand. -Chinese Proverb.

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
Advertisements

Instructor: Sazid Zaman Khan Lecturer, Department of Computer Science and Engineering, IIUC.
Parallel Programming Yang Xianchun Department of Computer Science and Technology Nanjing University Introduction.
Zhao Lixing.  A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation.  Supercomputers.
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
Introduction CS 524 – High-Performance Computing.
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 4: Computer Performance.
Supercomputers Daniel Shin CS 147, Section 1 April 29, 2010.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
Chapter 17 Parallel Processing.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
Lecture 1: Introduction to High Performance Computing.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Parallel Programming Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.
* Definition of -RAM (random access memory) :- -RAM is the place in a computer where the operating system, application programs & data in current use.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Computer performance.
C.S. Choy95 COMPUTER ORGANIZATION Logic Design Skill to design digital components JAVA Language Skill to program a computer Computer Organization Skill.
Gary MarsdenSlide 1University of Cape Town Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) ( ) 2005.
Basics and Architectures
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Parallel Computing Laxmikant Kale
 Design model for a computer  Named after John von Neuman  Instructions that tell the computer what to do are stored in memory  Stored program Memory.
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
ECE 568: Modern Comp. Architectures and Intro to Parallel Processing Fall 2006 Ahmed Louri ECE Department.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Operating Systems Lecture 02: Computer System Overview Anda Iamnitchi
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: –Discretize the.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
ECE 569: High-Performance Computing: Architectures, Algorithms and Technologies Spring 2006 Ahmed Louri ECE Department.
Computer Organization & Assembly Language © by DR. M. Amer.
Morgan Kaufmann Publishers
Pipelining and Parallelism Mark Staveley
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
Outline Why this subject? What is High Performance Computing?
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
COMPSYS 304 Computer Architecture Cache John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe at 13 knots.
1 A simple parallel algorithm Adding n numbers in parallel.
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Computer Hardware What is a CPU.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
CMSC 611: Advanced Computer Architecture
Parallel Processing - introduction
Assembly Language for Intel-Based Computers, 5th Edition
Architecture & Organization 1
Cache Memory Presentation I
What is Parallel and Distributed computing?
Architecture & Organization 1
CSE8380 Parallel and Distributed Processing Presentation
Chapter 1 Introduction.
Computer Evolution and Performance
Chapter 4 Multiprocessors
Virtual Memory: Working Sets
Vrije Universiteit Amsterdam
Presentation transcript:

Tuesday, September 04, 2006 I hear and I forget, I see and I remember, I do and I understand. -Chinese Proverb

Today §Course Overview. §Why Parallel Computing? §Evolution of Parallel Systems.

§Course URL §Folder on indus \\indus\Common\cs524a06 §Website – Check Regularly: Course announcements, office hours, slides, resources, policies … §Course Outline CS 524 : High Performance Computing

§Several programming exercises will be given throughout the course. Assignments will include popular programming models for shared memory and message passing such as OpenMP and MPI. §The development environment will be C/C++ on UNIX.

Pre-requisites  Computer Organization & Assembly Language (CS 223)  Data Structures & Algorithms (CS 213)  Senior level standing.  Operating Systems?

 Five minute rule.

Hunger For More Power!

§Endless quest for more and more computing power. §However much computing power there is, it is never enough.

Why this need for greater computational power? §Science, engineering, businesses, entertainment etc., all are providing the impetus. §Scientists – observe, theorize, test through experimentation. §Engineers – design, test prototypes, build.

HPC offers a new way to do science: Computation used to approximate physical systems - Advantages include: §Playing with simulation parameters to study emergent trends §Possible replay of a particular simulation event §Study systems where no exact theories exist

Why Turn to Simulation? When the problem is too... § Complex § Large § Expensive § Dangerous

Why this need for greater computational power? §Less expensive to carry out computer simulations. §Able to simulate phenomenon that could not be studied by experimentation. e.g. evolution of universe.

Why this need for greater computational power? §Problems such as: l Weather prediction. l Aeronautics (airflow analysis, structural mechanics, engine efficiency etc). l Simulating world economy. l Pharmaceutical (molecular modeling). l Understanding drug receptor interactions in brain. l Automotive crash simulation. are all computationally intensive. §The more knowledge we acquire the more complex our questions become.

Why this need for greater computational power? §In 1995, the first full length computer animated motion picture, Toy Story, was produced on a parallel system composed on hundreds of Sun workstations. l Decreased cost l Decreased Time (Several months on several hundred processors)

Why this need for greater computational power? §Commercial Computing has also come to rely on parallel architectures. §Computer system speed and capacity  Scale of business. l OLTP (Online transaction processing) benchmark represent the relation between performance and scale of business. §Rate performance of system in terms of its throughput in transactions per minute.

Why this need for greater computational power? §Vendors supplying database hardware or software offer multiprocessor systems that provide performance substantially greater than uniprocessor products.

§One solution in the past: Make the clock run faster. §The advance of VLSI technology allowed clock rates to increase and larger number of components to fit on a chip. §However there are limits… Electrical signal cannot propagate faster than the speed of light: 30cm/nsec in vacuum and 20cm/nsec in copper wire or optical fiber.

§Electrical signal cannot propagate faster than the speed of light: 30cm/nsec in vacuum and 20cm/nsec in copper wire or optical fiber. 10-GHz clock - signal path length 2cm in total 100-GHz clock - 2mm 1 THZ (1000 GHz) computer will have to be smaller than 100 microns if the signal has to travel from one end to the other and back with a single clock cycle.

Another fundamental problem: §Heat dissipation §The faster a computer runs: more heat it generates §High end Pentium systems: CPU cooling system bigger than the CPU itself.

Evolution of Parallel Architecture §New dimension added to design space: Number of processors. §Driven by demand for performance at acceptable cost.

Evolution of Parallel Architecture §Advances in hardware capability enable new application functionality, which places a greater demand on the architecture. §This cycle drives the ongoing design, engineering and manufacturing effort.

Evolution of Parallel Architecture §Microprocessor performance has been improving at a rate of about 50% per year. §A parallel machine of hundred processors can be viewed as providing to applications computing power that will be available in 10 years time. §1000 processors  20 year horizon §The advantages of using small, inexpensive, mass produced processors as building blocks for computer systems are clear.

Technology trends §With technological advance, transistors, gates etc have been getting smaller and faster. l More can fit in same area. §Processors are getting faster by making more effective use of ever larger volume of computing resources. §Possibilities: l Place more computer system on chip including memory and I/O. (Building block for parallel architectures. System-on-a-chip) l Or multiple processors on chip. (Parallel architecture on single-chip regime)

Microprocessor Design Trends §Technology determines what is possible. §Architecture translates the potential of technology into performance. §Parallelism is fundamental to conventional computer architecture. l Current architectural trends are leading to multiprocessor designs.

Bit level Parallelism §From 1970 to 1986 advancements in bit- level parallelism § 4bit, 8 bit, 16 bit and so-on l Doubling the data path reduces the number of cycles required to perform an operation.

Instruction level Parallelism Mid 1980s to mid 1990s §Performing portions of several machine instructions concurrently. §Pipelining (kind of parallelism also) §Fetch multiple instructions at a time and issue them in parallel to distinct function units in parallel (superscalar)

Instruction level Parallelism However… §Instruction level parallelism is worthwhile only if processor can be supplied with instructions and data fast enough. §Gap between processor cycle time and memory cycle time has grown wider. §To satisfy increasing bandwidth requirements, larger and larger caches are placed on chip with the processor. l cache miss l control transfer §Limits

§In mid 1970s, the introduction of vector processors marked the beginning of modern supercomputing l Perform operations on sequences of data elements rather than individual scalar data l Offered advantage of at least one order of magnitude over conventional systems of that time.

§In late 1980s a new generation of systems came on market. These were microprocessor based supercomputers that initially provided about 100 processors and increased roughly to 1000 in §These aggregation of processors are known as massively parallel processors (MPPs).

§Factors behind emergence of MPPs l Increase in performance of standard microprocessors l Cost advantage l Usage of “off-the-shelf” microprocessors instead of custom processors l Fostered by government programs for scalable parallel computing using distributed memory.

§MPPs claimed to equal or surpass the performance of vector multiprocessors. §Top500 l Lists the sites that have the 500 most powerful installed computer systems. l LINPACK benchmark Most widely used metric of performance on numerical applications Collection of Fortran subroutines that analyze and solve linear equations and linear least squares problems

§Top500 (Updated twice a year since June 1993) l In the first Top500 list there were already 156 MPP and SIMD systems present (around 1/3 rd )

Some memory related issues §Time to access memory has not kept pace with CPU clock speeds. §SRAM l Each bit is stored in a latch made up of transistors l Faster than DRAM, but is less dense and requires greater power §DRAM l Each bit of memory is stored as a charge on a capacitor l 1GHz CPU will execute 60 instructions before a typical 60ns DRAM can return a single byte

Some memory related issues §Hierarchy l Cache memories §Temporal locality §Cache lines (64, 128, 256 bytes)

Parallel Architectures: Memory Parallelism §One way to increase performance is to replicate computers. §Major choice is between shared memory and distributed memory

Memory Parallelism §In mid 1980s, when 32-bit microprocessor was first introduced, computers containing multiple microprocessors sharing a common memory became prevalent. §In most of these designs all processors plug into a common bus. §However, a small number of processors can be supported by bus

UMA bus based SMP architecture §If the bus is busy, when a CPU wants to read or write memory, the CPU waits for CPU to become idle. §Contention of bus can be manageable for small number of processors only. §The system will be totally limited by bandwidth of the bus and most of the CPUs will be idle most of the time.

UMA bus based SMP architecture §One way to alleviate this problem is to add a cache to each CPU. §Less bus traffic if most reads can be satisfied from the cache and system can support more CPUs. §Single bus limits UMA microprocessor to about CPUs.

SMP §SMP (Symmetric multiprocessor) l Shared memory multiprocessor where the cost of accessing a memory location is same for all processors.