Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.

Slides:



Advertisements
Similar presentations
Super Computers By Phuong Vo.
Advertisements

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
The Central Processing Unit: What Goes on Inside the Computer.
Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.
Higher Computing Computer Systems 3. Computer Performance.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Introduction CS 524 – High-Performance Computing.
Purdue University - RHIT Department How Technology Affects Us u Knowledge –Knowledge Explosion - Knowledge is doubling every 18 months to 2 years u Careers.
1 Lecture 6 Performance Measurement and Improvement.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Parallel Computing Overview CS 524 – High-Performance Computing.
RISC By Don Nichols. Contents Introduction History Problems with CISC RISC Philosophy Early RISC Modern RISC.
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
PARALLEL PROCESSING The NAS Parallel Benchmarks Daniel Gross Chen Haiout.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
1 CS402 PPP # 1 Computer Architecture Evolution. 2 John Von Neuman original concept.
GCSE Computing - The CPU
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Computer Systems CS208. Major Components of a Computer System Processor (CPU) Runs program instructions Main Memory Storage for running programs and current.
1 Chapter 4 The Central Processing Unit and Memory.
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Managing Information Technology 6 th Edition CHAPTER 2 COMPUTER HARDWARE.
M206 – Data Measurement. Introduction ‘Have you ever wondered how the computer interprets data?’ This is the language that the computer understands. This.
Chapter 2 Computer Clusters Lecture 2.1 Overview.
1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)
1 CHAPTER 2 COMPUTER HARDWARE. 2 The Significance of Hardware  Pace of hardware development is extremely fast. Keeping up requires a basic understanding.
Information and Communication Technology Fundamentals Credits Hours: 2+1 Instructor: Ayesha Bint Saleem.
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Writer:-Rashedul Hasan Editor:- Jasim Uddin
The Computer Systems By : Prabir Nandi Computer Instructor KV Lumding.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the numerical methods, but also How and when to apply them,
Lappeenranta University of Technology / JP CT30A7001 Concurrent and Parallel Computing Introduction to concurrent and parallel computing.
Buses Warning: some of the terminology is used inconsistently within the field.
1 4.2 MARIE This is the MARIE architecture shown graphically.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Computer Architectures... High Performance Computing I Fall 2001 MAE609 /Mth667 Abani Patra.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
Computer Hardware The Processing Unit.
CS591x -Cluster Computing and Parallel Programming
Academic PowerPoint Computer System – Architecture.
Outline Why this subject? What is High Performance Computing?
CPU/BIOS/BUS CES Industries, Inc. Lesson 8.  Brain of the computer  It is a “Logical Child, that is brain dead”  It can only run programs, and follow.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Chapter 2 Turning Data into Something You Can Use
Processor Rashedul Hasan. Processor The microprocessor is sometimes referred to as the 'brain' of the personal computer, and is responsible for the processing.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
 A computer is an electronic device that receives data (input), processes data, stores data, and produces a result (output).  It performs only three.
Lecture # 10 Processors Microcomputer Processors.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
GCSE Computing - The CPU
These slides are based on the book:
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Computing Environment
Super Computing By RIsaj t r S3 ece, roll 50.
Modern Processor Design: Superscalar and Superpipelining
Morgan Kaufmann Publishers
Central Processing Unit
Unit One - Computing Fundamentals
Computers: Tools for an Information Age
Memory System Performance Chapter 3
GCSE Computing - The CPU
Computers: Tools for an Information Age
Cluster Computers.
Presentation transcript:

Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers to use, What type of code to write, What kind of CPU time and memory requirement your jobs will have, What tools (e.g., visualization software) to use to analyze the data.

Definitions – Clock Cycles computer chip operates at discrete intervals called clocks. Often measured in nanoseconds (ns) or megahertz. 500 mHz (Pentium III) -> 2 ns 100 mhz (Cray J90) -> 10 ns May take 4 clocks to do one multiplication May take 30 clocks to start a procedure May take 2 clocks to access memory mHz not the only measure

Definitions – FLOPS Floating Operations / Second Mflops – million FLOPS A good measure of code performance – typically one add is one flop, one multiplication is also on flop Cray J90 Perk = 200 Mflops, most codes achieves only 1/3 of peak Cray T90 Perk = 3.2 Gflops Earth Simulator (NEC XS-5) = 8 Gflops Fastest Workstation Processor (DEC Alpha) ~ 1Gflops

MIPS Million instructions per second – also a measure of computer speed – used most the old days

Bandwidth The speed at which data flow across a network or wire 56K Modem = 56 kilobits / second T1 link = mbits / sec T3 link = 45 mbits / sec FDDI = 100 mbits / sec Fiber Channel = 800 mbits /sec 100 BaseT (fast) Ethernet = 100 mbits/ sec Brain system = 3 Gbits / s

Hardware Evolution Mainframe computers Supercomputers Workstations Microcomputers / Personal Computers Desktop Supercomputers Workstation Super Clusters Handheld, Palmtop, Calculators, et al….

Types of Processors Scalar (Serial) One operation per clock cycle Vector Multiple (tens to hundreds) operations per clock cycle. Typically achieved at the loop level where the instructions are the same or similar for each loop index Superscalar Several instructions per clock cycle

Types of Computer Systems Single Processor Scalar (e.g., ENIAC, IBM704, IBM-PC) Single Processor Vector (CDC7600, Cray-1) Multi-Processor Vector (e.g., Cray XMP, Cray C90, Cray J90, NEC SX-5), Single Processor Super-scalar (IBM RS/6000 such as Bluesky) Multi-processor scalar (e.g., Multi-processor Pentium PC) Multi-processor super-scalar (e.g., DEC Alpha based Cray T3E, RS/6000 based IBM SP-2, SGI Origin 2000) Clusters of the above (e.g., Linux clusters, Earth Simulator – Cluster of multiple vector processor nodes)

Memory Architectures Shared Memory Systems Distributed Memory Systems Memory can be accessed and addressed uniformly by all processors Fast/expensive CPU, Memory, and networks Easy to use Difficult to scale to many (> 32) processors Each processor has its own memory Others can access its memory only via network communications Often off-the-shelf components, therefore low cost Hard to use, explicit user specification of communications often needed. Single CPU slow. Not suitable for inherently serial codes High-scalability - largest current system has nearly 10K processors

Memory Architectures Multi-level memory (cache and main memory) architectures Cache – fast and expensive memory Typical L1 cache size in current day microprocessors ~ 32 K L2 size ~ 256K to 8mb Main memory a few Mb to many Gb. Try to reuse the content of cache as much as possible before the content is replaced by new data or instructions

Issues with Parallel Computing Load-balance / Synchronization Try to give equal amount of workload to each processor Try to give processors that finish first more work to do (load rebalance) The goal is to keep all processors as busy as possible Communication / Locality Inter-processor communications typically the biggest overhead on MPP platforms, because network is slow relative to CPU speed Try to keep data access local E.g., 2 nd -order finite difference requires data at 3 points requires data at 5 points 4 th -order finite difference

A Few Simple Roles for Writing Efficient Code Use multiplies instead of divides whenever possible Make innermost loop the longest Slower loop: Do 100 i=1000 Do 10 j=1,10 a(i,j)=… 10 continue Faster loop Do 100 j=100 Do 10 i=1,1000 a(i,j)=… 10 continue For the short loop like Do I=1,3, write out the associated expressions explicitly since the startup cost may be very high Avoid complicated logics (IF’s) inside Do loops Avoid subroutine and function calls inside DO loops Vectorizable codes typically also run faster on RISC based super-scalar processors Keep it simple.

Transition in Computing Architectures This chart depicts major NCAR SCD computers from the 1960s onward, along with the sustained gigaflops (billions of floating-point calculations per second) attained by the SCD machines from 1986 to the end of fiscal year Arrows at right denote the machines that will be operating at the start of FY00. The division is aiming to bring its collective computing power to 100 Gfps by the end of FY00, 200 Gfps in FY01, and 1 teraflop by FY03. (Source at