PRIME/GreenLight project Progress Report

Slides:



Advertisements
Similar presentations
DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08.
Advertisements

Main MemoryCS510 Computer ArchitecturesLecture Lecture 15 Main Memory.
Outline Memory characteristics SRAM Content-addressable memory details DRAM © Derek Chiou & Mattan Erez 1.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
TOSSIM A simulator for TinyOS Presented at SenSys 2003 Presented by : Bhavana Presented by : Bhavana 16 th March, 2005.
IT Systems Memory EN230-1 Justin Champion C208 –
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
* Definition of -RAM (random access memory) :- -RAM is the place in a computer where the operating system, application programs & data in current use.
CSIT 301 (Blum)1 Memory. CSIT 301 (Blum)2 Types of DRAM Asynchronous –The processor timing and the memory timing (refreshing schedule) were independent.
Computer Architecture Part III-A: Memory. A Quote on Memory “With 1 MB RAM, we had a memory capacity which will NEVER be fully utilized” - Bill Gates.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Survey of Existing Memory Devices Renee Gayle M. Chua.
A Mixed Time-Criticality SDRAM Controller MeAOW Sven Goossens, Benny Akesson, Kees Goossens COBRA – CA104 NEST.
CPEN Digital System Design
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Memory Cell Operation.
Modern DRAM Memory Architectures Sam Miller Tam Chantem Jon Lucas CprE 585 Fall 2003.
Computer Architecture Lecture 24 Fasih ur Rehman.
Chapter 4 Memory Design: SOC and Board-Based Systems
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
High-Performance DRAM System Design Constraints and Considerations by: Joseph Gross August 2, 2010.
Feb/18/2014 Mazen Alzyoud Early Term Exam Review.
By Miguel A. Erazo Advisor: Jason Liu March 2009.
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
Enhanced TOSSIM for Sensor Network Simulations Raju Kumar and Aditya YSV CS598C : Virtual Machines and Their Applications Spring 2006.
Disk Drive Architecture Exploration VisualSim Mirabilis Design.
“With 1 MB RAM, we had a memory capacity which will NEVER be fully utilized” - Bill Gates.
Block Cache for Embedded Systems Dominic Hillenbrand and Jörg Henkel Chair for Embedded Systems CES University of Karlsruhe Karlsruhe, Germany.
Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.
Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width.
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
CS 704 Advanced Computer Architecture
COMP541 Memories II: DRAMs
APPENDIX A Hardware and Software Basics
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
CS 1251 Computer Organization N.Sundararajan
Chapter 5 Internal Memory
William Stallings Computer Organization and Architecture 7th Edition
Resource Management IB Computer Science.
Green cloud computing 2 Cs 595 Lecture 15.
Modern Computer Architecture
Reducing Hit Time Small and simple caches Way prediction Trace caches
Computer Memory.
COMP541 Memories II: DRAMs
Introduction of microprocessor
BUSINESS PLUG-IN B3 HARDWARE AND SOFTWARE BASICS
Cache Memory Presentation I
William Stallings Computer Organization and Architecture 7th Edition
Improving java performance using Dynamic Method Migration on FPGAs
William Stallings Computer Organization and Architecture 8th Edition
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
William Stallings Computer Organization and Architecture 7th Edition
William Stallings Computer Organization and Architecture 8th Edition
BIC 10503: COMPUTER ARCHITECTURE
Types of Computers Mainframe/Server
A High Performance SoC: PkunityTM
Lecture 15: Memory Design
/ Computer Architecture and Design
15-740/ Computer Architecture Lecture 19: Main Memory
AKT211 – CAO 07 – Computer Memory
DRAM Hwansoo Han.
William Stallings Computer Organization and Architecture 8th Edition
Memory System Performance Chapter 3
Bob Reese Micro II ECE, MSU
PRIME/GreenLight project Miguel Erazo
Memory Principles.
Notes on: Cache Comparison Problem
Chapter 2 from ``Introduction to Parallel Computing'',
Presentation transcript:

PRIME/GreenLight project Progress Report Roberto Pereira, Miguel Erazo Florida International University December 2009

Outline Motivation and Objectives PRIME overview Installation Methodology Future work PRIME/GreenLight Progress Report

Motivation and Objectives PRIME/GreenLight Progress Report

Motivation “The information technology industry consumes as much energy and has roughly the same carbon “footprint” as the airline industry” “Every dollar spent on power for IT equipment requires that another dollar be spent on cooling” PRIME/GreenLight Progress Report

Objectives Provide the scientific community useful guidelines regarding the energy consumption of distributed simulations/emulations of network models Develop a large-scale Grid application performance evaluation platform based on PRIME PRIME/GreenLight Progress Report

PRIME overview PRIME/GreenLight Progress Report

The PRIME network simulator Simulator /Emulator of computer networks based on the SSF specification Able to simulate from tens of thousand to millions of nodes Emulation is supported via OpenVPN Distributed simulation/emulation supported through MPI PRIME/GreenLight Progress Report

The PRIME network simulator Network model Emulation infrastructure Distributed simulation PRIME/GreenLight Progress Report

A specific deployment The network model: topology, traffic, and applications Define alignments, partition the network and map to physical machines PRIME/GreenLight Progress Report

Installation PRIME/GreenLight Progress Report

Platform PRIME installed in Lincoln, Abe and QueenBee in Teragrid Simple network models run using PBS scheduler A number of useful tools were used and tested, i.e. Perfsuite PRIME/GreenLight Progress Report

Perfsuite Collection of tools, utilities, and libraries for software performance analysis Uses the Performance Application Programming Interface (PAPI) Installed in Abe and QueenBee PRIME/GreenLight Progress Report

Utilities psrun is used to gather hardware performance information psprocess is used to post-process the results of a performance analysis experiment PRIME/GreenLight Progress Report

Methodology PRIME/GreenLight Progress Report

The approach Measure the time that an application, i.e. PRIME, uses each computing resource and then derive the energy consumption by extracting from the specifications the power signature of each these resources PRIME/GreenLight Progress Report

CPU We use Perfsuite for measuring CPU time We consider two states for the CPU: PRIME/GreenLight Progress Report

Memory Basic block diagram of a CPU CPU PRIME/GreenLight Progress Report

Memory When There is a cache miss 2 things happen: 1 )The data requested by the CPU is fetched. 2) There is also a pre-fetch. PRIME/GreenLight Progress Report

Memory If data/instructions are not found in caches, the main memory is accessed. The PAPI event PAPI_PRF_DM (Data prefetch cache misses) is not available in the infrastructure provided by Abe in Teragrid We compute the memory time taking into account the number of accesses due to L2 cache misses only PRIME/GreenLight Progress Report

Memory We will be Using Synchronous DDR2 DRAM at 667MHz with internal array cells of 8 bits. PRIME/GreenLight Progress Report

Memory Second generation of DDR, improvement in bus width. PRIME/GreenLight Progress Report

Memory Array cells of 8 bits. Dual Data Rate, transmits twice per cycle. Second generation, bus width of 4. Data per access = (#bits) * (bus width) * (clock multiplier). 64 bits in our case. PRIME/GreenLight Progress Report

Memory 3 2 5 4 1 1) The correct row is activated. 2) Delay between row activation and column activation (tRCD). 3) The correct column is activated. 4) The data is retrieved from the array (CL). 5) The data is sent to the memory controller (tDPD). PRIME/GreenLight Progress Report

Memory The manufacturer’s bandwidth assumes the best case, so we will need to make a more accurate approximation. We use the Total Access Time: Address Transport Time, the Data Access Time, and the Data Transport Time The memory is Synchronous so the Address Transport time equals a clock cycle. PRIME/GreenLight Progress Report

Memory tRCD Is the Row to Column access Delay. CL is the Column Access time. (Clock cycles) tAC Is the minimum Access time. tDPD Is the Data Propagation Delay. BMM is number of subsequent accesses in burst mode. PRIME/GreenLight Progress Report

Disk For the Hard disk drive we will use the Internal Sustained Transfer Rate (ISTR). ISTR depends on the track the files are located. The transfer is slower is the files are fragmented. PRIME/GreenLight Progress Report

Disk Outer tracks have more sectors per track. We will approximate an average position. ISTR optimal for files in adjacent tracks and sectors. PRIME/GreenLight Progress Report

Disk We will use the command pidstat from SYSSTAT. Includes page faults, cache misses and direct accesses. With the total number of bytes read/written and the Internal Sustained Transfer Rate we can calculate the total time. PRIME/GreenLight Progress Report

Future work PRIME/GreenLight Progress Report

Future activities cont. Find a suitable methodology for approximating the energy consumption of the network Pick a network model to be used for the experiments Run the experiments on Teragrid PRIME/GreenLight Progress Report

Future activities cont. Process results Compose the paper PRIME/GreenLight Progress Report

Timeline PRIME/GreenLight Progress Report

References PRIME/GreenLight Progress Report [1] Kansal, A., and Zhao, F. "Fine-grained energy profiling for power-aware application design" In Workshop on Hot Topics in Measurement and Modeling of Computer Systems (2008)KANSAL, A., AND ZHAO, F. [2] X. Feng, R. Ge, and K. Cameron, "Power and energy profiling of scientific applications on distributed systems" Proc. 19th Int’l Parallel & Distributed Processing Symp. (IPDPS 05), Apr. 2005. [3] R. Joseph and M. Martonosi, "Run-time Power Estimation in High Performance Microprocessors" Proceedings of the 2001 international symposium on Low power electronics and Design (ISLPED’01) 2001 [4] V. Shnayder, M. Hempstead, B. rong Chen, G. Werner-Allen, and M. Welsh, “Simulating the power consumption of large-scale sensor network applications,” in Proceedings of the Second ACM Conference on Embedded Networked Systems (SenSys? ), Nov. 2004. [5] R. Jain, D. Molnar, and Z. Ramzan, "Towards understanding algorithmic factors affecting energy consumption: switching complexity, randomness, and preliminary experiments" In Proc. of the 2005 joint workshop on foundations of mobile computing, pages 70–79. ACM, 2005. [6] F. Bellosa, "The Benefits of Event-Driven Accounting in Power-Sensitive Systems". In Proceedings of the SIGOPS European Workshop, September 2000. [7] Perfsuite http://perfsuite.ncsa.uiuc.edu/ [8] PAPI http://icl.cs.utk.edu/papi/ [9] SYSSTAT http://pagesperso-orange.fr/sebastien.godard/ [10] G. Torres, "Understanding RAM Timings" http://www.hardwaresecrets.com/article/26/ [11] Kingston Memory Module Specification: KVR667D2D8F5? [12] DDR2 http://www.hardwaresecrets.com/article/167 and [10] [13] SDRAM latency http://en.wikipedia.org/wiki/SDRAM_latency [14] CAS Latency (page 200) http://books.google.com/books?id=HLpTtLjEXqcC&lpg=PA200&ots=AMDTH6D5HU&dq=SDRAM%20%20latency%20formula&pg=PA200#v=onepage&q=&f=true [15] Calculating SDRAM cache-line-fill latency http://www.dewassoc.com/performance/memory/hampel_rambus.htm [16] DRAM Normal Access Mode http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=332332&isnumber=7848 [17] DRAM Operation http://www.ece.cmu.edu/~ece548/localcpy/dramop.pdf [18] DRAM Specifications http://www.cs.albany.edu/~sdc/CSI404/dramperf.pdf [19] Hard Disk Performance http://www.storagereview.com/guide2000/ref/hdd/perf/perf/spec/index.html PRIME/GreenLight Progress Report