Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors.

Slides:



Advertisements
Similar presentations
JUST-IN-TIME COMPILATION
Advertisements

Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
Introduction to Machine/Assembler Language Noah Mendelsohn Tufts University Web:
ENGS 116 Lecture 101 ILP: Software Approaches Vincent H. Berk October 12 th Reading for today: , 4.1 Reading for Friday: 4.2 – 4.6 Homework #2:
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Instructor: Sazid Zaman Khan Lecturer, Department of Computer Science and Engineering, IIUC.
Microprocessors VLIW Very Long Instruction Word Computing April 18th, 2002.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
Microprocessors AMD Hammer AMD’s High Stakes RISC Entry May 2 nd, 2002.
CS 300 – Lecture 23 Intro to Computer Architecture / Assembly Language Virtual Memory Pipelining.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
The AMD and Intel Architectures COMP Jamie Curtis.
The Pentium: A CISC Architecture Shalvin Maharaj CS Umesh Maharaj:
Prince Sultan College For Woman
Cisc Complex Instruction Set Computing By Christopher Wong 1.
Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
Basics and Architectures
TECH 6 VLIW Architectures {Very Long Instruction Word}
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
CLEMSON U N I V E R S I T Y AVR32 Micro Controller Unit Atmel has created the first processor architected specifically for 21st century applications that.
Computer Systems Organization CS 1428 Foundations of Computer Science.
INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.
IT253: Computer Organization Lecture 10: Making a Processor: Control Signals Tonga Institute of Higher Education.
Intel Pentium II Processor Brent Perry Pat Reagan Brian Davis Umesh Vemuri.
Anshul Kumar, CSE IITD CS718 : VLIW - Software Driven ILP Example Architectures 6th Apr, 2006.
Mahesh Sukumar Subramanian Srinivasan. Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more.
Fall 2012 Chapter 2: x86 Processor Architecture. Irvine, Kip R. Assembly Language for x86 Processors 6/e, Chapter Overview General Concepts IA-32.
Transmeta and Dynamic Code Optimization Ashwin Bharambe Mahim Mishra Matthew Rosencrantz.
10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems.
Hardware Support for Compiler Speculation
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
 Introduction to SUN SPARC  What is CISC?  History: CISC  Advantages of CISC  Disadvantages of CISC  RISC vs CISC  Features of SUN SPARC  Architecture.
Pipelining and Parallelism Mark Staveley
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
Transmeta’s New Processor Another way to design CPU By Wu Cheng
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
© GCSE Computing Computing Hardware Starter. Creating a spreadsheet to demonstrate the size of memory. 1 byte = 1 character or about 1 pixel of information.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
EECS 322 March 18, 2000 RISC - Reduced Instruction Set Computer Reduced Instruction Set Computer  By reducing the number of instructions that a processor.
Real-World Pipelines Idea –Divide process into independent stages –Move objects through stages in sequence –At any given times, multiple objects being.
1 Aphirak Jansang Thiranun Dumrongson
PipeliningPipelining Computer Architecture (Fall 2006)
Chapter Overview General Concepts IA-32 Processor Architecture
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
Real-World Pipelines Idea Divide process into independent stages
Overview Motivation (Kevin) Thermal issues (Kevin)
Crusoe Processor Seminar Guide: By: - Prof. H. S. Kulkarni Ashish.
Assembly language.
Visit for more Learning Resources
William Stallings Computer Organization and Architecture 8th Edition
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
What happens inside a CPU?
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Instruction Scheduling for Instruction-Level Parallelism
Superscalar Processors & VLIW Processors
Central Processing Unit
Microprocessor & Assembly Language
Control unit extension for data hazards
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
* From AMD 1996 Publication #18522 Revision E
Introduction to Computer Systems
Control unit extension for data hazards
Control unit extension for data hazards
Presentation transcript:

Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Generations of Crusoe’s Processors Original architecture TM3120, TM5400 Later version TM5600-TM5800  The architecture is moreover the same, but is improved Faster clock rate (up to 800 MHz now) Smaller core/size (0.13 micron die) Has special instructions for the OS its emulating Lower power consumption Wider range of applications (from internet appliances to high density servers) We will look at the TM5400 here

Instruction Set Uses a VLIW (Very Long Instruction Word) Instruction format/engine  Instruction word is a 128 bit long packet Each word (also called molecule) has four individual execution units called atoms  These atoms are packed into either a 128 or 64-bit chunks  These atoms (operations) execute in parallel (4 operations per clock)  These Operations must be independent from one and another

Four Execution Units FPU (Floating Point Unit)  Has a 10-stage floating point pipeline  Uses conventional x86 80-bit register format 32 FP registers 2 Integer ALU (Arithmetic-Logic Units)  Has a 7-stage integer pipeline  bit registers dedicated to it LSU (Load/Store Unit) Branch Unit

Sample Instruction 128 bit Instruction FADD ADD LD BRCC FPU Integer LSU BU ALU#0 (Load/Sore) (Branch) Figure copied from reference#1

Introduction to Code Morphing Code Morphing Software is a clever translation software layer that dynamically recompiles a x86 program into its native VLIW instruction format  Located in the Bios Rom and runs in main memory  An entire group of instructions are translated at once and then is put into the translation cache  Basically, an emulation mechanism It can be used for architectures other than x86 such as the Linux (TM3120), Alpha’s FX!32, but TM5400’s is known for its x86 compatibility  Great Potential!

Crusoe Translation layers CPU Core X86 Applications Operating System X86 Bios Code Morphing Layer

Traditional x86 Architecture Ia32 instructions are translated by the cpu into more compact and uniformed RISC-like instructions (translates instruction individually) fancy/complicated translation It has dedicated hardware for  x86 Instruction translation  Branch prediction  Register Renaming  Instruction reOrder

Transmeta’s Simplified Core Al lot of the processor functionality is implemented in software  Its hardware if made up of execution units, the instruction decode unit and of course, the cache  However, the rest of dedicated hardware (in previous slide) is done in software  Advantages the cpu takes less die space less power demanding Less expensive for production and upgrades

Hardware vs. Software Implemented the hardware in software comes with a cost  Software is slower than hardware But how much slower?  It is not so easy Its reordering registers, renaming registers, predicating branches on the fly, etc. using the same hardware used for addition, instruction execution, etc. adds complications Does the benefits outweigh the costs?  According to Transmeta, IT DOES!

Execution, Decoding and Scheduling In x86,  Instructions are translated individually  An instruction’s binary is fetched and decoded into n operations These operations are reordered and are fed to the execution units (i.e. FPU, ALU, etc.) in parallel the sequence is reconstructed for execution  an out-of order execution has to be reconstructed in sequence and retranslated (complicated and costly)

Execution, Decoding and Scheduling (Continued) In Crusoe,  A group of instructions are translated at once  Instructions are translated once and are placed into the translation cache If the same code is run again, the processor can grab it from the translation cache Instructions can by reordered by the scheduler by looking at the generated code  Thus, the number of instructions executed can be minimized

Caching and Optimization Translation cache used more efficiently  A translation is optimized every time it is executed  However, it will probably require more than pass for it to be truly optimized Optimization is done in steps Sections of code usually don't get optimized if they occur only once Code is recompiled quickly to keep the processor and programming running Uses common optimizations done by a ordinary compiler  Optimizer is basically a simple compiler

Optimization Strategies The Code Morphing software has many ways to gather feedback about a running program  “Instrument Translation” Special code is used to collect information about the block that is going to be executed This info is later used for optimizations and translation Branch predictions, path speculations and the reordering loads and stores are done by the Code Morphing layer with some (Alias) hardware support and some condition code Filtering  Determines how much effort must be spent on translation and optimizing a piece code  Executions modes Interpretation, translation with or without optimization

Translation Example addl %eax, (%esp) addl %ebx, (%esp) movl %esi, (%ebp) subl %ecx, 5 FRONTEND ld %r30, [%esp] add.c %eax, %eax, %r30 ld %r31, [%esp] add.c %ebx, %ebx, %r31 ld %esi, [%ebp] sub.c %ecx, %ecx, 5 OPTIMIZER ld %r30, [%esp] add %eax, %eax, %r30 add %ebx, %ebx, %r30 ld %esi, [%ebp] sub.c %ecx, %ecx, 5 SCHEDULER ld %r30, [%esp]; sub.c %ecx, %ecx, 5 ld %esi, [%ebp]; add %eax, %eax, %r30; add %ebx, %ebx, %r30 KEY ld – load movl - load Addl – load and add add.c - add with condition codes set Subl – load and sub sub.c - sub with condition codes set Example from reference#2

Power Management Typical power saving approaches  Switching off the processor Having duty cycles Causes glitches  Changing the clock rate by suspending to and restarting from the RAM Crusoe power saving Approaches  Longrun power management (next slide)  Integrated the north bridge of the chipset and RAM controllers onto the cpu core Can also integrate video and sound cards Saves power in the overall system

Longrun Power Management Feature of Code Morphing Software layer by detecting cpu load Can adjust clock frequency on the fly Can dynamically change the cpu voltage It can reduce power consumption by 30% by lowering the cpu clock rate by 10%  30% = 100% x (1-(.9 x.9 9 ))  Less heat problems No need for extra fans take up more power and space

Conclusion Advantages  low power consumption technology Low cost Longer battery life Great for the mobile user, embedded systems and even high density servers  Smaller and lighter computers  Code Morphing technology Can emulate any target architecture  Compatibility Uses special optimization techniques for target Operating Systems Easier Software debugging (look at reference #1) Cheaper and Simplified upgrades

Conclusion (Continued) Disadvantages  An emulation can not be faster than the real thing Code translation requires extra cycles Code Morphing technology runs in main memory and takes up memory bandwidth Heavy coding  Inherits the some of the same problems with other VLIW processors Need clever Compilers for parallelism Too much fixup code (for speculation, predictions, rollbacks, etc.)  Technology seems to be really geared toward mobile users For desktops (power users) and servers, performance outweighs power consumption Performance is a measure of power consumption

Final Thoughts Transmeta only reported a net revenue of $4.1 millions for the first quarter of 2002  No significant share in the mobile industry Even though Transmeta has a clever technology, the clock speeds of AMD and Intel have overshadowed its impact just like multiflow (clock speed are about 1.0 GHZ faster than the Crusoe) AMD and Intel have also develop their own power efficient mobile processors (mobile Athlon XP with AMD PowerNow!™ technology and mobile pentium 4 with Intel® SpeedStep® technology)

Stay Tuned for the next Exciting Episode VS. AMD, I am your father! Not any more!!!

References als/article/1237.4/ als/article/1237.4/ er_aklaiber_19jan00.pdf er_aklaiber_19jan00.pdf soe-1.html soe-1.html /transmeta/transmeta.pdf /transmeta/transmeta.pdf