1 Provided By: Ali Teymouri Based on article “Jaguar: A Next-Generation Low-Power x86-64 Core ” Coarse: Custom Implementation of DSP Systems University.

Slides:



Advertisements
Similar presentations
An International Technology Roadmap for Semiconductors
Advertisements

1 Cleared for Open Publication July 30, S-2144 P148/MAPLD 2004 Rea MAPLD 148:"Is Scaling the Correct Approach for Radiation Hardened Conversions.
Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
Non Intel based Microprocessor What is a microprocessor? A microprocessor is an integrated circuit built on a tiny piece of silicon. It contains thousands,
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
Parallell Processing Systems1 Chapter 4 Vector Processors.
In God We Trust Class presentation for the course: “Custom Implementation of DSP systems” Presented by: Mohammad Haji Seyed Javadi May 2013 Instructor:
OPTERON (Advanced Micro Devices). History of the Opteron AMD's server & workstation processor line 2003: Original Opteron released o 32 & 64 bit processing.
Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science August 20, 2009 Enabling.
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 4: Computer Performance.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Embedded Computing From Theory to Practice November 2008 USTC Suzhou.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors Nikolaos Bellas, Ibrahim N. Hajj, Fellow, IEEE, Constantine.
1 Lecture 1: Introduction and Memory Systems CS 7810 Course organization:  5 lectures on memory systems  5 lectures on cache coherence and consistency.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Computer performance.
Comp-TIA Standards.  AMD- (Advanced Micro Devices) An American multinational semiconductor company that develops computer processors and related technologies.
DELAY INSERTION METHOD IN CLOCK SKEW SCHEDULING BARIS TASKIN and IVAN S. KOURTEV ISPD 2005 High Performance Integrated Circuit Design Lab. Department of.
Introspective 3D Chips S. Mysore, B. Agrawal, N. Srivastava, S. Lin, K. Banerjee, T. Sherwood (UCSB), ASPLOS 2006 Shimin Chen (LBA Reading Group Presentation)
Low Power Techniques in Processor Design
McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures Runjie Zhang Dec.3 S. Li et al. in MICRO’09.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Determining the Optimal Process Technology for Performance- Constrained Circuits Michael Boyer & Sudeep Ghosh ECE 563: Introduction to VLSI December 5.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
Crossbar switches By Alejandro Ayala. Hardware design Show hardware design of several modern crossbar switches used for multiprocessing system on chip.
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
By Michael Butler, Leslie Barnes, Debjit Das Sarma, Bob Gelinas This paper appears in: Micro, IEEE March/April 2011 (vol. 31 no. 2) pp 마이크로 프로세서.
CLEMSON U N I V E R S I T Y AVR32 Micro Controller Unit Atmel has created the first processor architected specifically for 21st century applications that.
Winter 2004 Class Representation For Advanced VLSI Course Instructor : Dr S.M.Fakhraie Presented by : Naser Sedaghati Major Reference : Design and Implementation.
INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.
Jia Yao and Vishwani D. Agrawal Department of Electrical and Computer Engineering Auburn University Auburn, AL 36830, USA Dual-Threshold Design of Sub-Threshold.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
AMD Athlon 64 FX-55 PROCESSOR ARCHITECTURE
26 th International Conference on VLSI January 2013 Pune,India Optimum Test Schedule for SoC with Specified Clock Frequencies and Supply Voltages Vijay.
ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.
A 1-V 2.4-GHz Low-Power Fractional-N Frequency Synthesizer with Sigma-Delta Modulator Controller 指導教授 : 林志明 教授 學生 : 黃世一 Shuenn-Yuh Lee; Chung-Han Cheng;
1 Latest Generations of Multi Core Processors
Basics of Energy & Power Dissipation
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
45nm Processors & Beyond A Presentation On By Ajaypal Singh Dhillon Kurukshetra university.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
PC Internal Components Lesson 4.  Intel is perhaps the most recognizable microprocessor manufacturer. List some others.
Patricia Gonzalez Divya Akella VLSI Class Project.
Lx: A Technology Platform for Customizable VLIW Embedded Processing.
A 1.2V 26mW Configurable Multiuser Mobile MIMO-OFDM/-OFDMA Baseband Processor Motivations –Most are single user, SISO, downlink OFDM solutions –Training.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
CS203 – Advanced Computer Architecture
1 Aphirak Jansang Thiranun Dumrongson
Matthew Locke November 2007 A Linux Power Management Architecture.
Intel and AMD processors
Gopakumar.G Hardware Design Group
GENERATIONS OF MICROPROCESSORS
Ioannis E. Venetis Department of Computer Engineering and Informatics
Visit for more Learning Resources
System On Chip.
Architecture & Organization 1
Unit 2 Computer Systems HND in Computing and Systems Development
Architecture & Organization 1
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Chapter 1 Introduction.
Computer Evolution and Performance
The University of Adelaide, School of Computer Science
Utsunomiya University
Microprocessor I 7/18/2019.
Presentation transcript:

1 Provided By: Ali Teymouri Based on article “Jaguar: A Next-Generation Low-Power x86-64 Core ” Coarse: Custom Implementation of DSP Systems University of Tehran School of Electrical and Computer Engineering

Outline 2 Introduction Motivation Comparing two core Architecture Improvements Conclusion

3 List of AMD microprocessors [5] 1 AMD-originated architectures 1.1 Am2900 series (1975) (29K) (1987–95) 2 non x86 architecture processors 2.1 2nd source (1974) 2.2 2nd source (1982) 3 x86 architecture processors 3.1 2nd source (1979–91) 3.2 Am X86 series (1991–95) 3.3 K5 architecture (1995) 3.4 K6 architecture (1997–2001) 3.5 K7 architecture (1999–2005) 3.6 K8 core architecture 3.7 K10 core architecture 3.8 Bulldozer module architecture 3.9 Bobcat core architecture

Bobcat 4 Core power gating and a micro architecture optimized for low power designed for mobile, tablet to address the specific customer demands 4.5 – 18 watt power range Bobcat low-power core [2]

Jaguar core 5 Bobcat low-power core [4] Jaguar core [4] Jaguar

Jaguar CU 6 First AMD 28nm quad- core x86-64 Build unit to deploy into a wide variety of SoCs for different applications Span wide array of applications from sub 5W to 25W Jaguar CU[4]

Motivation Jaguar 7 [1] Build SoC to fit range of markets – Tablet, hybrids – Value notebook – Ultrathin notebook – Value desktop

Comparing two core 8 [1]

Architecture 9 Improved IPC, frequency and power more than BT Estimated typical IPC improvement over “Bobcat”: >15%* The load-store unit is redesigned 4x32B Instruction Cache loop buffer for power Improved Instruction Cache prefetcher for IPC Added L2 prefetcher Added hardware integer divider Improved C6 and CC6 entry/exit latencies Clock gate >92% flops in typical applications

Architecture 10 The JG core is optimized at two main frequency targets, low and high voltage giving the core a dynamic range for application in several markets 3 Vt solution: HVT/RVT/LVT Longer lengths for each Vt BT had 10 metal stack JG uses 11 metal stack [1]

High Speed Flop 11 custom built flip-flops [4] to maximize performance over traditional master-slave flops larger flops consume more dynamic power To minimize the power and area impact they are inserted only in critical paths [1] custom flops account for < 8%

CU Level Clock Distribution 12 Matched clock delay to all endpoints to minimize latency extensive clock gating Each unit’s clock independently gated to reduce dynamic power [1]

Power Gating 13 Integrated Power Gating Headers have 4 independent enables to Longer lengths for each Vt Diagram showing highlighted headers within the JG core Area overhead is ~3% [1]

Conclusion 14 “Jaguar” is first AMD 28nm bulk CPU Quad core with shared L2 support a wide range of applications Is low-power and Focus on high density and smaller chip area Improved IPC, frequency and power more than BT Worthy successor to “Bobcat” x86-64 core

References 15 [1]. T. Singh, J. Bell, S. Southard., “Jaguar: A Next-Generation Low- Power x86-64 Core,” in 2013 IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 17–21, 2013, section 3 [2]. D. Foley, P. Bansal, D. Cherepacha, R. Wasmuth, A. Gunasekar, S. Gutta, A. Naini, ‘‘A Low-Power Integrated x86–64 and Graphics Processor for Mobile Computing Devices, ’’ IEEE Journal of Solid-State Circuits, VOL. 47, NO. 1, January [3]. [4]. www. semiaccurate.com [5].