Midterm 3 Revision and Parallel Computers Prof. Sin-Min Lee Department of Computer Science.

Slides:



Advertisements
Similar presentations
IT253: Computer Organization
Advertisements

Microprocessors. Von Neumann architecture Data and instructions in single read/write memory Contents of memory addressable by location, independent of.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Today’s topics Single processors and the Memory Hierarchy
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
What Is a Computer and What Does It Do?
1 The System Unit Lecture 2 CSCI 1405 Introduction to Computer Science Fall 2006.
Chapter 5: Computer Systems Organization Invitation to Computer Science, Java Version, Third Edition.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
CS252/Patterson Lec /23/01 CS252 Graduate Computer Architecture Lecture 11: Multiprocessor 1: Reasons, Classifications, Performance Metrics, Applications.

Parallel Processing Architectures Laxmi Narayan Bhuyan
Computer Organization and Assembly language
Fall 2008Introduction to Parallel Processing1 Introduction to Parallel Processing.
Introduction to Parallel Processing Debbie Hui CS 147 – Prof. Sin-Min Lee 7 / 11 / 2001.
Introduction to Parallel Processing Ch. 12, Pg
MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.
* Definition of -RAM (random access memory) :- -RAM is the place in a computer where the operating system, application programs & data in current use.
Lecture 12 Today’s topics –CPU basics Registers ALU Control Unit –The bus –Clocks –Input/output subsystem 1.
Computer performance.
Chapter 8 Input/Output. Busses l Group of electrical conductors suitable for carrying computer signals from one location to another l Each conductor in.
Parallel Computers Prof. Sin-Min Lee Department of Computer Science.
Interconnection Structures
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
… when you will open a computer We hope you will not look like …
Computer System Architectures Computer System Software
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
CHAPTER 5 I/O PRINCIPLE Understand the principles of System Bus
Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science.
Interrupts and DMA CSCI The Role of the Operating System in Performing I/O Two main jobs of a computer are: –Processing –Performing I/O manage and.
Classification of Computers
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
ECEn 191 – New Student Seminar - Session 9: Microprocessors, Digital Design Microprocessors and Digital Design ECEn 191 New Student Seminar.
Parallel Computers Prof. Sin-Min Lee Department of Computer Science.
Department of Computer Science University of the West Indies.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
CSIE30300 Computer Architecture Unit 15: Multiprocessors Hsin-Chou Chi [Adapted from material by and
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Parallel Computers1 RISC and Parallel Computers Prof. Sin-Min Lee Department of Computer Science.
Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.
Parallel Computers Prof. Sin-Min Lee Department of Computer Science.
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
Computer Basic Vocabulary
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Parallel Computing.
Midterm 3 Revision and Parallel Computers Prof. Sin-Min Lee Department of Computer Science.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
1 Chapter 2 Central Processing Unit. 2 CPU The "brain" of the computer system is called the central processing unit. Everything that a computer does is.
Outline Why this subject? What is High Performance Computing?
Parallel Computers1 Prof. Sin-Min Lee Department of Computer Science.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Paula Michelle Valenti Garcia #30 9B. MULTICORE TO CLUSTER Parallel circuits processing, symmetric multiprocessor, or multiprocessor: in the PC has been.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Information Technology Basics. Introduction to Information Technology 2 Computer Science – Theory of Computational Applications Computer Engineers - Make.
THE COMPUTER MOTHERBOARD AND ITS COMPONENTS Compiled By: Jishnu Pradeep.
Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Parallel Computers Prof. Sin-Min Lee Department of Computer Science.
Overview Parallel Processing Pipelining
Multiprocessor Systems
Parallel Computers Definition: “A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.”
Course Outline Introduction in algorithms and applications
CS 147 – Parallel Processing
Chapter 17 Parallel Processing
Parallel Processing Architectures
Chapter 5: Computer Systems Organization
AN INTRODUCTION ON PARALLEL PROCESSING
Chapter 4 Multiprocessors
Presentation transcript:

Midterm 3 Revision and Parallel Computers Prof. Sin-Min Lee Department of Computer Science

Solution to Quiz 6 problems 1 and 4. Draw By: Alice Cotti Thanks!

Solution to problem 1 X T1T1T1T1 T0T0T0T0 Q1Q1Q1Q1 Q0Q0Q0Q0 Q1+Q1+Q1+Q1+ Q0+Q0+Q0+Q Step 1: Create a table with a column for all the input values, and for the output values Q 1+ and Q 0+ Step 2: Start by entering all the possible values for the inputs: X, Q 0 and Q 1 IMAGE

Solution to problem 1 X T1T1T1T1 T0T0T0T0 Q1Q1Q1Q1 Q0Q0Q0Q0 Q1+Q1+Q1+Q1+ Q0+Q0+Q0+Q Step 3: By following the circuit diagram we can tell that T 1 is equal to Q 1 X + Q 0. Fill in the T1 column according to the boolean expression. IMAGE +

Solution to problem 1 X T1T1T1T1 T0T0T0T0 Q1Q1Q1Q1 Q0Q0Q0Q0 Q1+Q1+Q1+Q1+ Q0+Q0+Q0+Q Step 4: By looking at the image, realize that T 0 is always 1 Set it to 1 everywhere. IMAGE

Solution to problem 1 X T1T1T1T1 T0T0T0T0 Q1Q1Q1Q1 Q0Q0Q0Q0 Q1+Q1+Q1+Q1+ Q0+Q0+Q0+Q Step 5: Q 1+ is next state of T 1 (toggle) flip-flop. Value in Q 1 column is current state of Flip-flop. T 1 column is the input. Using this information, fill in the values for the Q 1+ column. Draw a T Flip-Flop truth table if needed. IMAGE

Solution to problem 1 X T1T1T1T1 T0T0T0T0 Q1Q1Q1Q1 Q0Q0Q0Q0 Q1+Q1+Q1+Q1+ Q0+Q0+Q0+Q Step 6: Q 0+ is next state of T 0 (toggle) flip-flop. Value in Q 0 column is current state of Flip-flop. T 0 column is the input. Using this information, fill in the values for the Q 0+ column. Draw a T Flip-Flop truth table if needed. IMAGE

Solution to problem 4 clk J K Clear Q0Q0

Solution to problem 4 clk J K Clear Q0Q ClearJK Q0Q0Q0Q00XX0 100Q Q Because Clear is zero, Q 0 is also zero [J and K doesn't matter] Since Clear is 1, look for J and K. In this case they both are zero so Q 0 also stays zero Because Clear is zero, Q 0 is also zero [J and K doesn't matter]

Uniprocessor Systems Improve performance: Allowing multiple, simultaneous memory access Allowing multiple, simultaneous memory access - requires multiple address, data, and control buses (one set for each simultaneous memory access) (one set for each simultaneous memory access) - The memory chip has to be able to handle multiple transfers simultaneously transfers simultaneously

Uniprocessor Systems Multiport Memory: Has two sets of address, data, and control pins to allow simultaneous data transfers to occur Has two sets of address, data, and control pins to allow simultaneous data transfers to occur CPU and DMA controller can transfer data concurrently CPU and DMA controller can transfer data concurrently A system with more than one CPU could handle simultaneous requests from two different processors A system with more than one CPU could handle simultaneous requests from two different processors

Uniprocessor Systems Multiport Memory (cont.): Can - Multiport memory can handle two requests to read data from the same location at the same time Cannot - Process two simultaneous requests to write data to the same memory location - Requests to read from and write to the same memory location simultaneously

Multiprocessors I/O Port Device Controller CPU Bus Memory CPU

Multiprocessors Systems designed to have 2 to 8 CPUs Systems designed to have 2 to 8 CPUs The CPUs all share the other parts of the computer The CPUs all share the other parts of the computer Memory Memory Disk Disk System Bus System Bus etc etc CPUs communicate via Memory and the System Bus CPUs communicate via Memory and the System Bus

MultiProcessors Each CPU shares memory, disks, etc Each CPU shares memory, disks, etc Cheaper than clusters Cheaper than clusters Not as good performance as clusters Not as good performance as clusters Often used for Often used for Small Servers Small Servers High-end Workstations High-end Workstations

MultiProcessors OS automatically shares work among available CPUs OS automatically shares work among available CPUs On a workstation… On a workstation… One CPU can be running an engineering design program One CPU can be running an engineering design program Another CPU can be doing complex graphics formatting Another CPU can be doing complex graphics formatting

Applications of Parallel Computers Traditionally: government labs, numerically intensive applications Traditionally: government labs, numerically intensive applications Research Institutions Research Institutions Recent Growth in Industrial Applications Recent Growth in Industrial Applications 236 of the top of the top 500 Financial analysis, drug design and analysis, oil exploration, aerospace and automotive Financial analysis, drug design and analysis, oil exploration, aerospace and automotive

1966 Flynn’s Classification Michael Flynn, Professor of Stanford University

Multiprocessor Systems Flynn’s Classification Single instruction multiple data (SIMD): Main Memory Control Unit Processor Memory Communications Network Executes a single instruction on multiple data values simultaneously using many processors Executes a single instruction on multiple data values simultaneously using many processors Since only one instruction is processed at any given time, it is not necessary for each processor to fetch and decode the instruction Since only one instruction is processed at any given time, it is not necessary for each processor to fetch and decode the instruction This task is handled by a single control unit that sends the control signals to each processor. This task is handled by a single control unit that sends the control signals to each processor. Example: Array processor Example: Array processor

Why Multiprocessors? 1. Microprocessors as the fastest CPUs Collecting several much easier than redesigning 1 Collecting several much easier than redesigning 1 2. Complexity of current microprocessors Do we have enough ideas to sustain 1.5X/yr? Do we have enough ideas to sustain 1.5X/yr? Can we deliver such complexity on schedule? Can we deliver such complexity on schedule? 3. Slow (but steady) improvement in parallel software (scientific apps, databases, OS) 4. Emergence of embedded and server markets driving microprocessors in addition to desktops Embedded functional parallelism, producer/consumer model Embedded functional parallelism, producer/consumer model Server figure of merit is tasks per hour vs. latency Server figure of merit is tasks per hour vs. latency

Parallel Processing Intro Long term goal of the field: scale number processors to size of budget, desired performance Long term goal of the field: scale number processors to size of budget, desired performance Machines today: Sun Enterprise (8/00) Machines today: Sun Enterprise (8/00) MHz UltraSPARC® II CPUs,64 GB SDRAM memory, GB disk,tape MHz UltraSPARC® II CPUs,64 GB SDRAM memory, GB disk,tape $4,720,800 total $4,720,800 total 64 CPUs 15%,64 GB DRAM 11%, disks 55%, cabinet 16% ($10,800 per processor or ~0.2% per processor) 64 CPUs 15%,64 GB DRAM 11%, disks 55%, cabinet 16% ($10,800 per processor or ~0.2% per processor) Minimal E10K - 1 CPU, 1 GB DRAM, 0 disks, tape ~$286,700 Minimal E10K - 1 CPU, 1 GB DRAM, 0 disks, tape ~$286,700 $10,800 (4%) per CPU, plus $39,600 board/4 CPUs (~8%/CPU) $10,800 (4%) per CPU, plus $39,600 board/4 CPUs (~8%/CPU) Machines today: Dell Workstation 220 (2/01) Machines today: Dell Workstation 220 (2/01) 866 MHz Intel Pentium® III (in Minitower) 866 MHz Intel Pentium® III (in Minitower) GB RDRAM memory, 1 10GB disk, 12X CD, 17” monitor, nVIDIA GeForce 2 GTS,32MB DDR Graphics card, 1yr service GB RDRAM memory, 1 10GB disk, 12X CD, 17” monitor, nVIDIA GeForce 2 GTS,32MB DDR Graphics card, 1yr service $1,600; for extra processor, add $350 (~20%) $1,600; for extra processor, add $350 (~20%)

Major MIMD Styles 1. Centralized shared memory ("Uniform Memory Access" time or "Shared Memory Processor") 2. Decentralized memory (memory module with CPU) get more memory bandwidth, lower memory latency get more memory bandwidth, lower memory latency Drawback: Longer communication latency Drawback: Longer communication latency Drawback: Software model more complex Drawback: Software model more complex

Multiprocessor Systems Flynn’s Classification

Four Categories of Flynn ’ s Classification: SISDSingle instruction single data SISDSingle instruction single data SIMDSingle instruction multiple data SIMDSingle instruction multiple data MISDMultiple instruction single data ** MISDMultiple instruction single data ** MIMDMultiple instruction multiple data MIMDMultiple instruction multiple data ** The MISD classification is not practical to implement. In fact, no significant MISD computers have ever been build. It is included only for completeness.

MIMD computers usually have a different program running on every processor. This makes for a very complex programming environment. What processor? Doing which task? At what time? What’s doing what when?

Memory latency The time between issuing a memory fetch and receiving the response. Simply put, if execution proceeds before the memory request responds, unexpected results will occur. What values are being used? Not the ones requested!

A similar problem can occur with instruction executions themselves. Synchronization The need to enforce the ordering of instruction executions according to their data dependencies. Instruction b must occur before instruction a.

Despite potential problems, MIMD can prove larger than life. MIMD Successes IBM Deep Blue – Computer beats professional chess player. Some may not consider this to be a fair example, because Deep Blue was built to beat Kasparov alone. It “knew” his play style so it could counter is projected moves. Still, Deep Blue’s win marked a major victory for computing.

IBM’s latest, a supercomputer that models nuclear explosions. IBM Poughkeepsie built the world’s fastest supercomputer for the U. S. Department of Energy. It’s job was to model nuclear explosions.

MIMD – it’s the most complex, fastest, flexible parallel paradigm. It’s beat a world class chess player at his own game. It models things that few people understand. It is parallel processing at its finest.

Multiprocessor Systems System Topologies: The topology of a multiprocessor system refers to the pattern of connections between its processors The topology of a multiprocessor system refers to the pattern of connections between its processors Quantified by standard metrics: Quantified by standard metrics: DiameterThe maximum distance between two processors in the computer system DiameterThe maximum distance between two processors in the computer system BandwidthThe capacity of a communications link multiplied by the number of such links in the system (best case) BandwidthThe capacity of a communications link multiplied by the number of such links in the system (best case) Bisectional BandwidthThe total bandwidth of the links connecting the two halves of the processor split so that the number of links between the two halves is minimized (worst case) Bisectional BandwidthThe total bandwidth of the links connecting the two halves of the processor split so that the number of links between the two halves is minimized (worst case)

Multiprocessor Systems System Topologies Six Categories of System Topologies: Shared bus Ring Tree Mesh Hypercube Completely Connected

Multiprocessor Systems System Topologies Shared bus: The simplest topology The simplest topology Processors communicate with each other exclusively via this bus Processors communicate with each other exclusively via this bus Can handle only one data transmission at a time Can handle only one data transmission at a time Can be easily expanded by connecting additional processors to the shared bus, along with the necessary bus arbitration circuitry Can be easily expanded by connecting additional processors to the shared bus, along with the necessary bus arbitration circuitry Shared Bus Global Memory M P M P M P

Multiprocessor Systems System Topologies Ring: Uses direct dedicated connections between processors Uses direct dedicated connections between processors Allows all communication links to be active simultaneously Allows all communication links to be active simultaneously A piece of data may have to travel through several processors to reach its final destination A piece of data may have to travel through several processors to reach its final destination All processors must have two communication links All processors must have two communication links P PP PP P

Multiprocessor Systems System Topologies Tree topology: Uses direct connections between processors Uses direct connections between processors Each processor has three connections Each processor has three connections Its primary advantage is its relatively low diameter Its primary advantage is its relatively low diameter Example: DADO Computer Example: DADO Computer P PP P PP P