Conference title1 A New Methodology for Studying Realistic Processors in Computer Science Degrees Crispín Gómez, María E. Gómez y Julio Sahuquillo DISCA.

Slides:



Advertisements
Similar presentations
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
It’s all about latency Henk Neefs Dept. of Electronics and Information Systems (ELIS) University of Gent.
Managing Wire Delay in Large CMP Caches Bradford M. Beckmann David A. Wood Multifacet Project University of Wisconsin-Madison MICRO /8/04.
1 Memory Performance and Scalability of Intel’s and AMD’s Dual-Core Processors: A Case Study Lu Peng 1, Jih-Kwon Peir 2, Tribuvan K. Prakash 1, Yen-Kuang.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
Erhan Erdinç Pehlivan Computer Architecture Support for Database Applications.
A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
CS752 Decoupled Architecture for Data Prefetching Jichuan Chang Kai Xu.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow Wilson W. L. Fung Ivan Sham George Yuan Tor M. Aamodt Electrical and Computer Engineering.
Glenn Reinman, Brad Calder, Department of Computer Science and Engineering, University of California San Diego and Todd Austin Department of Electrical.
A BitTorrent Module for the OMNeT++ Simulator MASCOTS 2009, London, UK G. Xylomenos (with K. Katsaros, V.P. Kemerlis and C. Stais)
©UCB CS 162 Computer Architecture Lecture 1 Instructor: L.N. Bhuyan
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.
By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and
Conference title 1 A Research-Oriented Advanced Multicore Architecture Course Julio Sahuquillo, Salvador Petit, Vicent Selfa, and María E. Gómez May 25,
McRouter: Multicast within a Router for High Performance NoCs
CISC Machine Learning for Solving Systems Problems Arch Explorer Lecture 5 John Cavazos Dept of Computer & Information Sciences University of Delaware.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
A Performance-Correctness Explicitly-Decoupled Architecture Alok Garg and Michael Huang Department of Electrical & Computer Engineering University of Rochester.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
LIBRA: Multi-mode On-Chip Network Arbitration for Locality-Oblivious Task Placement Gwangsun Kim Computer Science Department Korea Advanced Institute of.
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA,SURATHKAL Presentation on ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS Publisher’s:
Guiding Principles. Goals First we must agree on the goals. Several (non-exclusive) choices – Want every CS major to be educated in performance including.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
Lecture 01: Welcome Computer Architecture! Kai Bu
CASH: REVISITING HARDWARE SHARING IN SINGLE-CHIP PARALLEL PROCESSOR
Performance Analysis of the Compaq ES40--An Overview Paper evaluates Compaq’s ES40 system, based on the Alpha Only concern is performance: no power.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
Computer Architecture SIMD Ola Flygt Växjö University
CMP Design Choices Finding Parameters that Impact CMP Performance Sam Koblenski and Peter McClone.
BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.
Opening Up Automatic Structural Design Space Exploration by Fixing Modular Simulation VEERLE DESMET SYLVAIN GIRBAL OLIVIER TEMAM Ghent University Thales.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed.
1 An Execution-Driven Simulation Tool for Teaching Cache Memories in Introductory Computer Organization Courses Salvador Petit, Noel Tomás Computer Engineering.
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.
CS717 1 Hardware Fault Tolerance Through Simultaneous Multithreading (part 2) Jonathan Winter.
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
Corse Overview Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip.
POLITECNICO DI MILANO A SystemC-based methodology for the simulation of dynamically reconfigurable embedded systems Dynamic Reconfigurability in Embedded.
V7 Foundation Series Vignette Education Services.
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
Elec/Comp 526 Spring 2015 High Performance Computer Architecture Instructor Peter Varman DH 2022 (Duncan Hall) rice.edux3990 Office Hours Tue/Thu.
Lecture 01: Welcome Computer Architecture! Kai Bu
Lecture 23: Interconnection Networks
The Problem Finding a needle in haystack An expert (CPU)
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Hyperthreading Technology
Ka-Ming Keung Swamy D Ponpandi
CPE 631: Multithreading: Thread-Level Parallelism Within a Processor
Coe818 Advanced Computer Architecture
/ Computer Architecture and Design
EE 4xx: Computer Architecture and Performance Programming
Chip&Core Architecture
The University of Adelaide, School of Computer Science
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
Ka-Ming Keung Swamy D Ponpandi
Computer Engineering Department Islamic University of Gaza ECOM 6301
Presentation transcript:

Conference title1 A New Methodology for Studying Realistic Processors in Computer Science Degrees Crispín Gómez, María E. Gómez y Julio Sahuquillo DISCA. Technical University of Valencia DSI. University of Castilla-La Mancha

2 Outline Motivation Simulator Proposed Methodology Case Study Conclusions

3 Motivation Astonishingly quick evolution of processor architecture: Teaching should cover from the basics to the most realistic up-to-date concepts In-Order Execution Superscalar Out-Of-Order Execution ManycoreMulticore POWER

4 Motivation Current designs imply a big complexity Out-Of-order complex cores Multi-level memory hierarchy On-chip Interconnection network

5 Outline Motivation Simulator Proposed Methodology Case Study Conclusions

6 Simulator Multi2Sim: multicore and multithreaded X86 binary compatibility Application-only Free simulator: Open source project – Widely used on research –Academia –Industry

7 Simulator – Cores CPU: 6-staged pipelined processors, out-of-order execution – Execution stage maybe customized to be multicycle. Speculative execution Three mutithreading paradigms are supported: –Coarse grain, fine grain, simultaneous multithreading All microarchitectural parameters are customizable –Type of branch predictor –Issue width –Etc. GPUs

8 Simulator – Memory Hierarchy Complete memory hierarchy Coherency: MOESI Flexible hierarchy: # of memory levels and memory structures in each level Each memory structure is fully customizable –#Sets –#Ways –Block size

9 Simulator – Interconnection Network Interconnection network: Any topology can be implemented Forwarding tables routing (any routing algorithm can be used) Each network element is fully customizable –Buffer size at switches –Link bandwidth

10 Outline Motivation Simulator Proposed Methodology Case Study Conclusions

11 Proposed Methodology Tries to motivate the students into processor architecture Realistic examples Increasing difficulty levels Shared use in several courses Develop basic skills for final projects, MS thesis or Ph.D thesis Based on a progressive interaction with Multi2Sim 4 learning phases with increasing difficulty due to the simulator’s complexity

12 Proposed Methodology 1 st phase: Simulation parameters modifications ( at labs) –Configure the system components –Launch simulations –Analyze the effects of the parameters on the system performance

13 Proposed Methodology 2 nd phase: Modify small pieces of code –Very small and bounded fragments of source code –Completely guided by the instructors –Modification of a provided baseline –Examples: Branch predictor, prefetch mechanisms,… –Final work of the course

14 Proposed Methodology 3 rd phase: Implementation complete functionalities –Consolidated simulator skills –Development of functionalities from scratch –Examples: Memory controller, Stream-buffers based prefetcher,… –Final project or MS thesis –Some works have been published in top level conferences 4 th phase: Complete autonomy –The students are in a privileged position to start a Ph.D.

15 Outline Motivation Simulator Proposed Methodology Case Study Conclusions

16 Case study The methodology has been implanted at the UPV in two courses Advanced Processor Architectures –Computer Science Degree and Master Degree Networks on-chip –Master Degree We have defined several learning stages with the simulator Baseline system modeling Execution of standard benchmark suites Prefetching mechanisms implementation

17 Case study Baseline system modeling

18 Case study Baseline system modeling Detailed explanation of the configuration for –Memory –Cores –Interconnection network Sample configuration files are used

19 Case study Benchmark Execution Parallel (Splash 2) Multiprogrammed mixes (Spec) Performance study (IPC, Execution Time, Network latency) varying L2 block size

20 Case study Prefetching mechanisms implementation Base simple prefetching mechanism provided –OBL (One Block Look-ahead) on L2 miss Modification to this mechanism –N-block sequential –N-block with regular stride

21 Case study Results This year 2 final projects have been performed in memory controller and prefetching Results from these projects are expected to be sent to first level international conferences These projects are expected to be evolved into MS thesis Results projection is based on the experiences from previous year, in which results from the projects were accepted in PACT and IPDPS conferences

22 Outline Motivation Simulator Proposed Methodology Case Study Conclusions

23 Conclusions We have reduced the gap between theoretical contents on Computer Architecture topics and real processors By using a well-established CMP-simulator in the international research community Methodology based on an increasing degree of difficulty First steps are very guided by instructors Students are encouraged to go ahead to more complex implementations Methodology + simulator = good platform for future works as the range of design choices is very wide

Conference title24 Thanks you for your attention