Performance Evaluation on SGI Altix 4700 Guangdeng Liu and Danny Guo.

Slides:



Advertisements
Similar presentations
Lesson 6. The Computer Operation Computer Operating Systems GUI vs. Command line The Microsoft Windows Family File Systems – How Computers Manage Data.
Advertisements

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Memory Chapter 3. Slide 2 of 14Chapter 1 Objectives  Explain the types of memory  Explain the types of RAM  Explain the working of the RAM  List the.
General information Course web page: html Office hours:- Prof. Eyal.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
CSCE101 – 4.2, 4.3 October 17, Power Supply Surge Protector –protects from power spikes which ruin hardware. Voltage Regulator – protects from insufficient.
Parallel/Concurrent Programming on the SGI Altix Conley Read January 25, 2007 UC Riverside, Department of Computer Science.
1-1 Embedded Software Development Tools and Processes Hardware & Software Hardware – Host development system Software – Compilers, simulators etc. Target.
GCSE Computing - The CPU
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
C.S. Choy95 COMPUTER ORGANIZATION Logic Design Skill to design digital components JAVA Language Skill to program a computer Computer Organization Skill.
Tim Madden ODG/XSD.  Graphics Processing Unit  Graphics card on your PC.  “Hardware accelerated graphics”  Video game industry is main driver.  More.
1 Hardware Support for Collective Memory Transfers in Stencil Computations George Michelogiannakis, John Shalf Computer Architecture Laboratory Lawrence.
University of Amsterdam Computer Systems – a guided tour Arnoud Visser 1 Computer Systems A guided Tour.
Intro to Computer Systems Summer 2014 COMP 2130 Introduction to Computer Systems Computing Science Thompson Rivers University.
Redefining the Desktop Stu Baker AUL for Library Technology
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
3. April 2006Bernd Panzer-Steindel, CERN/IT1 HEPIX 2006 CPU technology session some ‘random walk’
PAPI – Performance API Shirley Moore 8 th VI-HPS Tuning Workshop 5-9 September 2011.
Performance Analysis using PAPI and Hardware Performance Counters on the IBM Power3 Philip Mucci Shirley Moore
History of Microprocessor MPIntroductionData BusAddress Bus
ECE 103 Engineering Programming Chapter 5 Programming Languages Herbert G. Mayer, PSU CS Status 6/19/2015 Initial content copied verbatim from ECE 103.
Using parallel tools on the SDSC IBM DataStar DataStar Overview HPM Perf IPM VAMPIR TotalView.
Performance of mathematical software Agner Fog Technical University of Denmark
Module 1: Concepts of information Technology.  Central processing unit (CPU)  Hard disk  Common input and output devices  Types of memory Main Parts.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
Computer Organization & Assembly Language © by DR. M. Amer.
1 Text Reference: Warford. 2 Computer Architecture: The design of those aspects of a computer which are visible to the programmer. Architecture Organization.
1 COMS 161 Introduction to Computing Title: Computing Basics Date: September 15, 2004 Lecture Number: 10.
CPE 631 Project Presentation Hussein Alzoubi and Rami Alnamneh Reconfiguration of architectural parameters to maximize performance and using software techniques.
1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseehttp://
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
Software Performance Monitoring Daniele Francesco Kruse July 2010.
CPU/BIOS/BUS CES Industries, Inc. Lesson 8.  Brain of the computer  It is a “Logical Child, that is brain dead”  It can only run programs, and follow.
A Software Performance Monitoring Tool Daniele Francesco Kruse March 2010.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
PAPI on Blue Gene L Using network performance counters to layout tasks for improved performance.
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
PROCESSOR Ambika | shravani | namrata | saurabh | soumen.
Performance profiling of Experiments’ Geant4 Simulations Geant4 Technical Forum Ryszard Jurga.
Hello world !!! ASCII representation of hello.c.
*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Performance Monitoring.
OPERATING SYSTEMS DO YOU REQUIRE AN OPERATING SYSTEM IN YOUR SYSTEM?
Hardware Architecture
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Computer is a general-purpose device that can be programmed to carry out a set of arithmetic or logical operations automatically. Since a sequence of.
CPU Central Processing Unit
Operating System Overview
ECE 103 Engineering Programming Chapter 5 Programming Languages
Chapter 1: A Tour of Computer Systems
For Massively Parallel Computation The Chaotic State of the Art
Chapter 2: System Structures
THE CPU i Bytes 1.1.
Constructing a system with multiple computers or processors
Architecture Background
Performance Tuning Team Chia-heng Tu June 30, 2009
Programmable Logic Controllers (PLCs) An Overview.
Understanding Performance Counter Data - 1
MICROPROCESSOR MEMORY ORGANIZATION
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Chapter 4: MEMORY.
Multithreading Why & How.
System calls….. C-program->POSIX call
Introduction COE 301 Computer Organization Prof. Aiman El-Maleh
What Are Performance Counters?
Presentation transcript:

Performance Evaluation on SGI Altix 4700 Guangdeng Liu and Danny Guo

SGI Platform Intro Model Name: SGI Altix 4700 Bandwidth System (1600MHz 24M L3, DC Itanium2 9050) CPU: Intel DC Itanium2 Processor 9050 (533 MHz FSB) CPU(s) enabled:32 cores, 16 chips, 2 cores/chip (Hyper-Threading Technology disabled) Memory:128 GB (8*1GB PC DIMMS per 1-chip module) Disk Subsystem:1 x 147 GB SCSI (Seagate Cheetah 10k rpm) 2 FPGAs on RASC blade

RASC Blade 2 FPGAs on board together with other 64 processors on SGI Altix 4700 –40MBytes of SRAM –200,448 logic cells –128bit/sec read and write

How to Login Client for remote login: –Putty –Etc Type hostname, your account –sgi-2.cs.ucr.edu –your user name/password

SPLASH2 Download – Compilation errors – It gives a fix to most of the applications, admittedly, there are still some erroneous output that does not match the correct_out provided. – If you still don’t want to patch the fix, simply download an updated version from the link above.

PAPI The Performance API (PAPI) project specifies a standard application programming interface (API) for accessing hardware performance counters available on most modern microprocessors, such as cache miss, retired instruction and cycles etc. PAPI provides three interfaces to the underlying counter hardware: 1.The low level interface manages hardware events in user defined groups called EventSets. 2.The high level interface simply provides the ability to start, stop and read the counters for a specified list of events. 3.Graphical tools to visualize information..

High Level Interface Meant for application programmers wanting coarse-grained measurements Not thread safe Calls the lower level API Allows only PAPI preset events Easier to use and less setup (additional code) than low-level

Example long long values[NUM_EVENTS]; unsigned int Events[NUM_EVENTS]={PAPI_TOT_INS,PAPI_TOT_C YC}; /* Start the counters */ PAPI_start_counters((int*)Events,NUM_EVENTS); /* What we are monitoring? */ do_work(); /* Stop the counters and store the results in values */ retval = PAPI_stop_counters(values,NUM_EVENTS);

Low Level Interface Increased efficiency and functionality over the high level PAPI interface About 40 functions Obtain information about the executable and the hardware Thread-safe Fully programmable Callbacks on counter overflow

Example #include "papi.h “ #define NUM_EVENTS 2 int Events[NUM_EVENTS]={PAPI_FP_INS,PAPI_TOT_CYC}, EventSet; long_long values[NUM_EVENTS]; /* Initialize the Library */ retval = PAPI_library_init(PAPI_VER_CURRENT); /* Allocate space for the new eventset and do setup */ retval = PAPI_create_eventset(&EventSet); /* Add Flops and total cycles to the eventset */ retval = PAPI_add_events(&EventSet,Events,NUM_EVENTS); /* Start the counters */ retval = PAPI_start(EventSet); do_work(); /* What we want to monitor: SPLASH*/ /*Stop counters and store results in values */ retval = PAPI_stop(EventSet,values);

More examples.. From the PAPI source distribution: tests/native.c…