 Copyright, HiCLAS1 George Delic, Ph.D. HiPERiSM Consulting, LLC And Arney Srackangast, AS1MET Services

Slides:



Advertisements
Similar presentations
CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.
Advertisements

 Copyright, HiPERiSM Consulting, LLC, George Delic, Ph.D. HiPERiSM Consulting, LLC (919) P.O. Box 569, Chapel Hill, NC.
Linux vs. Windows. Linux  Linux was originally built by Linus Torvalds at the University of Helsinki in  Linux is a Unix-like, Kernal-based, fully.
Copyright HiPERiSM Consulting, LLC, George Delic, Ph.D. HiPERiSM Consulting, LLC (919) P.O. Box 569, Chapel Hill,
Copyright HiPERiSM Consulting, LLC, George Delic, Ph.D. HiPERiSM Consulting, LLC (919) P.O. Box 569, Chapel Hill,
Lincoln University Canterbury New Zealand Evaluating the Parallel Performance of a Heterogeneous System Elizabeth Post Hendrik Goosen formerly of Department.
CMPT 300: Operating Systems I Dr. Mohamed Hefeeda
1 Lecture 6 Performance Measurement and Improvement.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.
NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim CRAY T90 vs. Tera MTA: The Old Champ Faces a New Challenger.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
Prepared by: Nor Zuraida Bt Mohd Gaminan Department of IT & Communication Politeknik Tuanku Syed Sirajuddin Chapter 1 Introduction to Operating System.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
1 The Virtual Reality Virtualization both inside and outside of the cloud Mike Furgal Director – Managed Database Services BravePoint.
ITEC 325 Lecture 29 Memory(6). Review P2 assigned Exam 2 next Friday Demand paging –Page faults –TLB intro.
Higher Computing Computer Systems S. McCrossan 1 Higher Grade Computing Studies 3. Computer Performance Measures of Processor Speed When comparing one.
Computer Performance Computer Engineering Department.
A Comparison of Linux vs. Windows Bhargav A. Sorathiya B.E. 4 th C.E. Roll no:6456.
Chapter 2 – Software Part A. Definition Computer is made up of two components Hardware Physical components Software Instructions for the computer Two.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
Computer Architecture Memory Management Units Iolanthe II - Reefed down, heading for Great Barrier Island.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Dept. of Computer and Information Sciences : University of Delaware John Cavazos Department of Computer and Information Sciences University of Delaware.
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
Parallelization of the Classic Gram-Schmidt QR-Factorization
 Copyright, HiPERiSM Consulting, LLC, George Delic, Ph.D. HiPERiSM Consulting, LLC (919) P.O. Box 569, Chapel Hill, NC.
Operating System What is an Operating System? A program that acts as an intermediary between a user of a computer and the computer hardware. An operating.
School of Computer Science & Information Technology G6DICP Introduction to Computer Programming Milena Radenkovic.
1. 2 Table 4.1 Key characteristics of six passenger aircraft: all figures are approximate; some relate to a specific model/configuration of the aircraft.
From lecture slides for Computer Organization and Architecture: Designing for Performance, Eighth Edition, Prentice Hall, 2010 CS 211: Computer Architecture.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
MIPS Project -- Simics Yang Diyi Outline Introduction to Simics Simics Installation – Linux – Windows Guide to Labs – General idea Score Policy.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
CPE 631 Project Presentation Hussein Alzoubi and Rami Alnamneh Reconfiguration of architectural parameters to maximize performance and using software techniques.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Virtualization Technology and Microsoft Virtual PC 2007 YOU ARE WELCOME By : Osama Tamimi.
Full and Para Virtualization
Performance Performance
Performance Data Standard and API Shirley Browne, Jack Dongarra, and Philip Mucci University of Tennessee from the Ptools Annual Meeting, May 1998.
Introduction TO Network Administration
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 OS 1.
EGRE 426 Computer Organization and Design Chapter 4.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
December 13, G raphical A symmetric P rocessing Prototype Presentation December 13, 2004.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
Analyzing Memory Access Intensity in Parallel Programs on Multicore Lixia Liu, Zhiyuan Li, Ahmed Sameh Department of Computer Science, Purdue University,
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
Chapter 2 Operating Systems
Virtualization.
Lecture 2: Performance Evaluation
Chapter 1: Introduction
LINUX WINDOWS Vs..
Is 64 bit computing ready for the desktop?
Defining Performance Which airplane has the best performance?
Andy Wang COP 5611 Advanced Operating Systems
CMAQ PARALLEL PERFORMANCE WITH MPI AND OpenMP George Delic, Ph
Section 1: Introduction to Simics
CMSC 611: Advanced Computer Architecture
STUDY AND IMPLEMENTATION
FEniCS = Finite Element - ni - Computational Software
CMSC 611: Advanced Computer Architecture
CSE 502: Computer Architecture
Computer Organization and Design Chapter 4
Presentation transcript:

 Copyright, HiCLAS1 George Delic, Ph.D. HiPERiSM Consulting, LLC And Arney Srackangast, AS1MET Services & (919) HiCLAS1 HiPERiSM Consulting, LLC Linking with AS1MET Services

 Copyright, HiCLAS1 Topics  Introduction  Choice of hardware & OS  Benchmark timings  Hardware performance events  Why is AERMOD-HPC faster?  Conclusions  Next steps and community responses

 Copyright, HiCLAS1 Introduction  HiCLAS1 Mission  Why AERMOD?  AERMOD-HPC development process  QA process  AERMOD-HPCS v1.8 release

 Copyright, HiCLAS1 HiCLAS1 Mission HiCLAS1 is dedicated to bringing High Performance Computing (HPC) capability to Environmental Modeling. The HiCLAS1 mission is to develop (or enhance) software and improve performance on current and future computers for legacy Air Quality Models (AQM).

 Copyright, HiCLAS1 Why AERMOD?  Large/dedicated user community  Long model runs  Low efficiency  Regulatory model  Linux and Windows platforms

 Copyright, HiCLAS1 AERMOD-HPC development process U.S. EPA source as baseline Progressive source modification Branching structure reduction Vector instruction enhancement Extensive testing/benchmarking of four case studies Parallel potential realized Code structure modifications for efficiency only: no changes in the science

 Copyright, HiCLAS1 QA process A & B team source validation Line-by-line code inspection Tests with multiple compilers Tests on multiple platforms Comparison against U.S. EPA version:  Line-by-line source inspection  Numerical differences inspected

 Copyright, HiCLAS1 AERMOD-HPCS v1.8 release  Windows 2K and XP in three steps:  Run installer package  Request a license  Run license extractor application  Linux  Available but not yet shipping  Download pages at

 Copyright, HiCLAS1 Choice of hardware & OS  32-bit Linux  64-bit Linux  32-bit MS Windows  Pentium 4 Xeon (or AMD)

 Copyright, HiCLAS1 Benchmark timings: vs EPA executable

 Copyright, HiCLAS1 Benchmark timings: vs EPA source

 Copyright, HiCLAS1 Hardware performance events Operations and instructions Memory footprint Branching instructions TLB Cache usage L1 cache usage

 Copyright, HiCLAS1 Mflops

 Copyright, HiCLAS1 Vector Mips

 Copyright, HiCLAS1 Memory footprint: Mem instructions per flop

 Copyright, HiCLAS1 Branching instructions

 Copyright, HiCLAS1 TLB cache misses: Data (DM) vs. Instr. (IM)

 Copyright, HiCLAS1 L1 cache misses: Data (DM) vs. Instr. (IM)

 Copyright, HiCLAS1 Why is AERMOD-HPC faster? Higher Mflops rates Lower number of memory instructions per floating point instruction Lower mispredicted branch instruction rates Lower instruction TLB miss rates Lower L1 instruction cache miss rates

 Copyright, HiCLAS1 Conclusions A much faster AERMOD is available as AERMOD-HPCS Current serial performance is 1.9 to 3.4 times faster than EPA distribution. Simple code transformations gave improved efficiency Much more left to do

 Copyright, HiCLAS1 Next steps at HiCLAS1  Next release v1.9 features:  Streamlined memory model  More serial code speed-up  Parallel version in progress  Target is the quad-core CPU  10x speed-up is feasible:  ~ 3x from serial improvements  ~ 3x from parallelization

 Copyright, HiCLAS1 Community responses “Let me be one of the first air dispersion modelers to congratulate you on this achievement. I most sincerely hope that you succeed on this important speed improvement on AERMOD.” –CEO of a major environmental software company. “Modifying air quality models to make use of parallel processing is a much needed improvement to the air quality community, and I commend the staff at High Performance Algorism Consulting that have made this possible” –Group leader of a State Department of Environmental Quality A major hardware & software vendor has offered services and support to HiCLAS1 for the AERMOD-HPC initiative