HPC F ORUM S EPTEMBER 8-10, 2009 Steve Rowan srowan at conveycomputer.com.

Slides:



Advertisements
Similar presentations
Operating Systems Components of OS
Advertisements

INTRODUCTION TO SIMULATION WITH OMNET++ José Daniel García Sánchez ARCOS Group – University Carlos III of Madrid.
Exceptional Control Flow Processes Today. Control Flow Processors do only one thing: From startup to shutdown, a CPU simply reads and executes (interprets)
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Convey Computer Status Steve Wallach swallach”at”conveycomputer.com.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Program Flow Charting How to tackle the beginning stage a program design.
1-1 Embedded Software Development Tools and Processes Hardware & Software Hardware – Host development system Software – Compilers, simulators etc. Target.
Device Driver for Generic ASC Module - Project Presentation - By: Yigal Korman Erez Fuchs Instructor: Evgeny Fiksman Sponsored by: High Speed Digital Systems.
Chapter 10 Application Development. Chapter Goals Describe the application development process and the role of methodologies, models and tools Compare.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Virtualization Concept. Virtualization  Real: it exists, you can see it.  Transparent: it exists, you cannot see it  Virtual: it does not exist, you.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-7 Memory Management (1) Department of Computer Science and Software.
MIPS coding. SPIM Some links can be found such as:
Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Compiler BE Panel IDC HPC User Forum April 2009 Don Kretsch Director, Sun Developer Tools Sun Microsystems.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
INTRODUCTION SOFTWARE HARDWARE DIFFERENCE BETWEEN THE S/W AND H/W.
CIS250 OPERATING SYSTEMS Memory Management Since we share memory, we need to manage it Memory manager only sees the address A program counter value indicates.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
GPU Architecture and Programming
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Software Overview Environment, libraries, debuggers, programming tools and applications Jonathan Carter NUG Training 3 Oct 2005.
The course. Description Computer systems programming using the C language – And possibly a little C++ Translation of C into assembly language Introduction.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
Experts in numerical algorithms and HPC services Compiler Requirements and Directions Rob Meyer September 10, 2009.
University of Maryland Profile-Driven Selective Program Loading Tugrul Ince Jeff Hollingsworth Department of Computer Science University.
M. Mateen Yaqoob The University of Lahore Spring 2014.
Computer Software Types Three layers of software Operation.
Chapter 1 Computers, Compilers, & Unix. Overview u Computer hardware u Unix u Computer Languages u Compilers.
CREATED BY – UPENDRA SHARMA
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Introduction Why are virtual machines interesting?
HPD -- A High Performance Debugger Implementation A Parallel Tools Consortium project
Single Node Optimization Computational Astrophysics.
Lally School of M&T Pindaro Demertzoglou 1 Computer Software.
Co-Processor Architectures Fermi vs. Knights Ferry Roger Goff Dell Senior Global CERN/LHC Technologist |
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
1 Asstt. Prof Navjot Kaur Computer Dept PRESENTED BY.
Virtual Machines Mr. Monil Adhikari. Agenda Introduction Classes of Virtual Machines System Virtual Machines Process Virtual Machines.
Programming 2 Intro to Java Machine code Assembly languages Fortran Basic Pascal Scheme CC++ Java LISP Smalltalk Smalltalk-80.
Introduction To Software Development Environment.
Chapter 2 Instruction Addressing and Execution. Lesson plan Review some concepts in the first week First assembly program with EMU8086 Related concepts.
Compilers: History and Context COMP Outline Compilers and languages Compilers and architectures – parallelism – memory hierarchies Other uses.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
CSC235 Computer Organization & Assembly Language
Instruction Set Architecture
Testing of Heterogeneous Multi-Core Embedded Systems
Chapter 1: A Tour of Computer Systems
Protection of System Resources
Microprocessor and Assembly Language
Structural Simulation Toolkit / Gem5 Integration
Software Development with uMPS
Many-core Software Development Platforms
Assembly Language for Intel-Based Computers
Memory Management Tasks
Information Security - 2
Chapter 1 Introduction.
Back End Compiler Panel
Outline Chapter 2 (cont) OS Design OS structure
Java Programming Introduction
Code Composer Essentials 3.0
System Programming By Prof.Naveed Zishan.
Question 1 How are you going to provide language and/or library (or other?) support in Fortran, C/C++, or another language for massively parallel programming.
Presentation transcript:

HPC F ORUM S EPTEMBER 8-10, 2009 Steve Rowan srowan at conveycomputer.com

Convey Hybrid-Core Computing Intel® Processor Coprocessor Oil & Gas Financial Custom CAE Sciences Application-Specific Personalities Cache-coherent shared virtual memory Applications x86-64 Instructions Coprocessor Instructions Convey Compilers An x86 processor is combined with a coprocessor that implements highly parallel instructions Copyright /10/09 2

Using Personalities Convey Software Development Suite Hybrid-Core Executable x86-64 and Coprocessor Instructions Hybrid-Core Executable x86-64 and Coprocessor Instructions C/C++ Fortran Convey HC-1 Intel x86 Coprocessor P P Personalities description file specifies available instructions personality loaded at runtime by OS Program using ANSI standard C/C++ and Fortran User specifies personality at compile time OS demand loads personalities at runtime Copyright /10/09 3

Language/Library Support for Massive Parallelism Apply massive amounts of logic for a single thread of execution – Do that via specialization – Have hardware adapt to the application rather than the application adapting to the hardware – Parallelizing at the instruction level not the core level C/C++ and Fortran programming – No special languages – Code can run on X86 servers without coprocessors Copyright /10/09 4 multiple units in each pipe for instruction level parallelism instructions can be very complex Multiple function pipes for data parallelism Crossbar Dispatch Crossbar Dispatch Crossbar Dispatch Crossbar Dispatch

Development Tools executable Intel® 64 code Coprocessor code C/C++ Fortran95 Common Optimizer Intel® 64 Optimizer & Code Generator Convey Vectorizer& Code Generator Procedural Personality Interface Linker other objects Program in ANSI standard C/C++ and Fortran Unified compiler generates x86 & coprocessor instructions Seamless debugging environment for Intel & coprocessor code Executable can run on x86_64 nodes or on Convey Hybrid-Core nodes Copyright /10/09

Multi Mode Compilation 09/10/09 Original code: for (j=0; j<N; j++) a[j] = b[j]+scalar*c[j]; Generated code: if(CP available) { coprocessor instructions } else { x86 instructions } Convey backend x86-64 backend x86-64 backend Personality Definition Files Convey Multi Mode Compiler Convey systems are inherently heterogeneous Can select from a set of architectures Required architectures are dynamically loaded at runtime Higher level parallelism supported via MPI or threads Copyright

Custom Convey Runtime Intel® 64 code Coprocessor code Convey Shared Libraries Convey Simulator shared library launched by OS cny_runtime.o executable coprocessor hardware gdb debugging on HW & simulator SPAT performance simulator if dlopen of shared library fails, Intel 64 code executed x86-64 hardware FAP DP SP personalities are demand loaded by OS at runtime Copyright /10/09

Debugging Hybrid-Core Applications (gdb) run Starting program: /home/guest/Desktop/DEMOS/compiler_demo/vec_auto.exe Breakpoint 1, main (argc=1, argv=0x7fffa9111ee8) at vec_main.c:19 19 for (i=0; i<n; i++) { (gdb) disass Dump of assembler code for function main: 0x : push %rbp 0x : mov %rsp,%rbp 0x c : add $0xffffffffffffffb0,%rsp 0x : add $0xfffffffffffffff8,%rsp 0x : fnstcw (%rsp) 0x : andw $0xfcff,(%rsp) 0x d : orw $0x300,(%rsp) (gdb) cont Continuing. Breakpoint 4, 0x a0 in __cny_region_triad0 () (gdb) disass 0x8000a0 0x8000c0 Dump of assembler code from 0x8000a0 to 0x8000c0: 0x a0 : mov %a11,%VL 0x a8 : ld.dw $0x0(%a10),%v0r 0x ac : or %a11,$0,%a13 0x b0 : ld.dw $0x0(%a9),%v1r 0x b4 : add.sq %a12,%a13,%a12 0x b8 : fma.fs %v0r,%s1,%v1r,%v0r 0x c0 : st.dw %v0r,$0x0(%a8) End of assembler dump. (gdb) Copyright /10/09

Copyright /10/09

Third Party Libraries Third Party Libraries run unmodified on the X86 Key kernels have been optimized by Convey Third party libraries can call Convey optimized routines – BLAS – LAPACK – etc. Copyright /10/09 10