Introduction to Parallel Programming on parasol.cs.tamu.edu -- PARASOL’s 16 processor V2200 Jack Perdue / System Analyst II January 29, 2001 CPSC 626.

Slides:



Advertisements
Similar presentations
The AMD Athlon ™ Processor: Future Directions Fred Weber Vice President, Engineering Computation Products Group.
Advertisements

HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts Essentials – 2 nd Edition Chapter 4: Threads.
Introduction CS 524 – High-Performance Computing.
Server Platforms Week 11- Lecture 1. Server Market $ 46,100,000,000 ($ 46.1 Billion) Gartner.
Parallel/Concurrent Programming on the SGI Altix Conley Read January 25, 2007 UC Riverside, Department of Computer Science.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Input-Output Problems L1 Prof. Sin-Min Lee Department of Mathematics and Computer Science.
Chapter 4: Multithreaded Programming. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Principles Objectives To introduce a notion of a thread.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
PlayStation 2 Architecture Irin Jose Farid Momin Quy Ngo Olivia Wong.
1 MPI-2 and Threads. 2 What are Threads? l Executing program (process) is defined by »Address space »Program Counter l Threads are multiple program counters.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Maria Athanasaki, Evangelos Koukis, Nectarios Koziris National Technical.
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
Processes Part I Processes & Threads* *Referred to slides by Dr. Sanjeev Setia at George Mason University Chapter 3.
Parallel Processing LAB NO 1.
Computer Systems 1 Fundamentals of Computing The CPU & Von Neumann.
CS Fall 2011 CS 423 – Operating Systems Design Lecture 2 – Concepts Review Klara Nahrstedt Fall 2011.
CS 111 – Aug – 1.3 –Information arranged in memory –Types of memory –Disk properties Commitment for next day: –Read pp , In other.
1 Hardware Support for Collective Memory Transfers in Stencil Computations George Michelogiannakis, John Shalf Computer Architecture Laboratory Lawrence.
1 CS503: Operating Systems Spring 2014 Dongyan Xu Department of Computer Science Purdue University.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
TECH 6 VLIW Architectures {Very Long Instruction Word}
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Comparing High-End Computer Architectures for Business Applications Presentation: 493 Track: HP-UX Dr. Frank Baetke HP.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Sun Fire™ E25K Server Keith Schoby Midwestern State University June 13, 2005.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005.
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
Chapter 4: Threads. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th edition, Jan 23, 2005 Chapter 4: Threads Overview Multithreading.
Sun Starfire: Extending the SMP Envelope Presented by Jen Miller 2/9/2004.
CS 838: Pervasive Parallelism Introduction to pthreads Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
1 Chapter 2 Central Processing Unit. 2 CPU The "brain" of the computer system is called the central processing unit. Everything that a computer does is.
Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.
Hewlett-Packard PA-RISC Bit Processors: History, Features, and Architecture Presented By: Adam Gray Christie Kummers Joshua Madagan.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 4 Shared Memory Programming with Pthreads An Introduction to Parallel Programming Peter Pacheco.
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.
Chapter 3 Getting Started. Copyright © 2005 Pearson Addison-Wesley. All rights reserved. Objectives To give an overview of the structure of a contemporary.
INTRODUCTION TO COMPUTERS. A computer system is an electronic device used to input data, process data, store data for later use and produce output in.
Martin Kruliš by Martin Kruliš (v1.1)1.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Threads.
Operating System Overview
NFV Compute Acceleration APIs and Evaluation
An Overview of the Computer System
Roadmap C: Java: Assembly language: OS: Machine code: Computer system:
Operating Systems (CS 340 D)
Chapter 5: Threads Overview Multithreading Models Threading Issues
Chapter 4: Multithreaded Programming
Chapter 4 Threads.
Structural Simulation Toolkit / Gem5 Integration
NGS computation services: APIs and Parallel Jobs
Chapter 4: Threads.
Looking Inside the machine (Types of hardware, CPU, Memory)
An Overview of the Computer System
Chapter 4: Threads.
Intel Microprocessor.
Introduction to Computing
CS 585 Summer 2002 By Robert Moncrief II
Using Vector Capabilities of GPUs to Accelerate FFT
Today’s agenda Hardware architecture and runtime system
Welcome to Architectures of Digital Systems
System Calls System calls are the user API to the OS
Cluster Computers.
Presentation transcript:

Introduction to Parallel Programming on parasol.cs.tamu.edu -- PARASOL’s 16 processor V2200 Jack Perdue / System Analyst II January 29, 2001 CPSC 626

HW Overview 16 PA-8200 CPU’s (PA = HP’s Precision Architecture) - 64 bit 200MHz RISC [each CPU costs $25,000 (that’s a lot of Athlons)] 2MB Data and 2MB Instruction Cache (off processor but on processor card [L2]) 8x8 Hyperplane (Exemplar Routing Attachment Controller) provides 960MB/sec each way for each channel for a total of 15.3GB/sec bidirectional bandwidth 4GB RAM in parasol.cs.tamu.edu

HW Overview (cont’d) EPAC - Exemplar Processor Agent Controller -- connects two CPUs to EPIC and ERAC EPIC - Exemplar PCI Interface Controller ERAC - Exemplar Routing Attachment Controller (four combined together create the 8x8 Hyperplane crossbar) EMAC - Exemplar Memory Agent Controller -- each can access 4 banks of memory

SW Overview OS is HP-UX 11.0, Hewlett Packard’s Unix OS on parasol.cs (and all V-class systems) runs in 64-bit mode providing 40 bits of addressable physical memory (HP-UX also runs in 32-bit) CPU capable of full 64-bit but only 40 bits used on V-class to reduce cost -- still provides a terabyte of addressing 32 bit programming sufficient for most tasks (e.g. this class) and is the default 64 (40) bit addressing can be used for applications that need more than 2GB of RAM (+DA2.0W)

Compiling/Linking with POSIX Threads (pthreads) “-D_POSIX_C_SOURCE=199506L” needed at compile time “-lpthread” needed during final link ppcall.o (from.c) needed for “PARDO API” provided in example Easiest to copy/modify provided Makefile “make sum.pthread” creates sum.pthread Appendix A in HP’s “Parallel Programming Guide for HP-UX Systems” is basis for the sample program [it could be improved ]

Creating MPI based programs number of ways to actually compile best to use the provided mpicc/mpiCC scripts (in /opt/omni/bin) -- they will invoke HP’s ANSI C/C++ compiler e.g. “mpicc -o sum.mpi sum.mpi.c” creates the MPI based executable sum.mpi present MPI example blows doors on the pthread example [blame ppcall.c]

More Information links to more documentation -- the Developer’s Resource Library is very helpful some notes for those familiar with CS department systems complete software/hardware listing