Introduction Super-computing Tuesday

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

1 Computational models of the physical world Cortical bone Trabecular bone.
Parallel computer architecture classification
Parallel Programming Yang Xianchun Department of Computer Science and Technology Nanjing University Introduction.
ICS 556 Parallel Algorithms Ebrahim Malalla Office: Bldg 22, Room
MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick.
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
Supercomputers Daniel Shin CS 147, Section 1 April 29, 2010.
Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
Lecture 1: Introduction to High Performance Computing.
Parallel Programming Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.
Fundamental Issues in Parallel and Distributed Computing Assaf Schuster, Computer Science, Technion.
Lappeenranta University of Technology / JP CT30A7001 Concurrent and Parallel Computing Introduction to concurrent and parallel computing.
Lecture 1 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Outline Course Administration Parallel Archtectures –Overview –Details Applications Special Approaches Our Class Computer Four Bad Parallel Algorithms.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
- Rohan Dhamnaskar. Overview  What is a Supercomputer  Some Concepts  Couple of examples.
CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.
Non-Data-Communication Overheads in MPI: Analysis on Blue Gene/P P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk Argonne National Laboratory University.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
University of Washington What is parallel processing? Spring 2014 Wrap-up When can we execute things in parallel? Parallelism: Use extra resources to solve.
CS591x -Cluster Computing and Parallel Programming
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive.
Parallel Computing’s Challenges. Old Homework (emphasized for effect) Download a parallel program from somewhere. –Make it work Download another.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Introduction. News you can use Hardware –Multicore chips (2009: mostly 2 cores and 4 cores, but doubling) (cores=processors) –Servers (often.
Our Graphics Environment Landscape Rendering. Hardware  CPU  Modern CPUs are multicore processors  User programs can run at the same time as other.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.
General Purpose computing on Graphics Processing Units
Conclusions on CS3014 David Gregg Department of Computer Science
Introduction to Parallel Processing
CSCI206 - Computer Organization & Programming
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Welcome: Intel Multicore Research Conference
Parallel Processing - introduction
What Exactly is Parallel Processing?
CLUSTER COMPUTING Presented By, Navaneeth.C.Mouly 1AY05IS037
Parallel Computing Lecture
The University of Adelaide, School of Computer Science
Introduction to Reconfigurable Computing
Multi-Processing in High Performance Computer Architecture:
Introduction.
Introduction.
CSCI206 - Computer Organization & Programming
Summary Background Introduction in algorithms and applications
P A R A L L E L C O M P U T I N G L A B O R A T O R Y
GPU Introduction: Uses, Architecture, and Programming Model
Hybrid Programming with OpenMP and MPI
Dr. Tansel Dökeroğlu University of Turkish Aeronautical Association Computer Engineering Department Ceng 442 Introduction to Parallel.
Parallel Computing’s Challenges
By Brandon, Ben, and Lee Parallel Computing.
Vrije Universiteit Amsterdam
Multicore and GPU Programming
Lecture 20 Parallel Programming CSE /27/2019.
Multicore and GPU Programming
Presentation transcript:

18.337 Introduction Super-computing Tuesday

News you can use Hardware Multicore chips (2008: mostly 2 cores and 4 cores, but doubling) (cores=processors) Servers (often 2 or 4 multicores sharing memory) Clusters (often several, to tens, and many more servers not sharing memory)

Performance Single processor speeds for now no longer growing. Moore’s law still allows for more real estate per core (transistors double/nearly every two years) http://www.intel.com/technology/mooreslaw/index.htm People want performance but hard to get Slowdowns seen before speedups Flops (floating point ops / second) Gigaflops (109), Teraflops (1012), Petaflops(1015) Compare matmul with matadd. What’s the difference?

Some historical machines

Earth Simulator was #1 now #30

Some interesting hardware Nvidia Cell Processor Sicortex – “Teraflops from Milliwatts” http://www.sicortex.com/products/sc648 http://www.gizmag.com/mit-cycling-human-powered-computation/8503/

Programming MPI: The Message Passing Interface Low level “lowest common denominator” language that the world has stuck with for nearly 20 years Can get performance, but can be a hindrance as well Some say that there are those that will pay for a 2x speedup, just make it easy Reality is that many want at least 10x and more for a qualitative difference in results People forget that serial performance can depend on many bottlenecks including time to memory Performance (and large problems) are the reason for parallel computing, but difficult to get the “ease of use” vs “performance” trade-off right.

Places to Look Best current news: Huge Conference: http://www.hpcwire.com/ Huge Conference: http://sc08.supercomputing.org/ MIT Home grown software, now Interactive Supercomputing (Star-P for MATLAB®, Python, and R) http://www.interactivesupercomputing.com

Architecture Diagrams from Sam Williams @ Berkeley Bottom Up Performance Engineering: Understanding Hardware’s implications on performance up to software Top Down: measuring software and tweaking sometimes aware and sometimes unaware of hardware

http://www.cs.berkeley.edu/~samw/research/talks/sc07.pdf

Want to delve into hard numerical algorithms Examples: FFTs and Sparse Linear Algebra At the MIT level: Potential “not quite right” question: How do you parallelize these operations? Rather what issues arise and why is getting performance hard? Why is nxn matmul easy? Almost cliché? Comfort level in this class to delve in?

Another New Term for the Day: SIMD SIMD (Single Instruction, Multiple Data) refers to parallel hardware that can execute the same instruction on multiple data.  One may also refer to a SIMD operation (sometimes but not always historically synonymous with a Data Parallel Operation) when the software appears to run “lock-step” with every processor executing the same instructions.