INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.

Slides:



Advertisements
Similar presentations
JUST-IN-TIME COMPILATION
Advertisements

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors.
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 4: Computer Performance.
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
Instruction Level Parallelism (ILP) Colin Stevens.
CSCE101 – 4.2, 4.3 October 17, Power Supply Surge Protector –protects from power spikes which ruin hardware. Voltage Regulator – protects from insufficient.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
Embedded Computing From Theory to Practice November 2008 USTC Suzhou.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Computer Organization and Assembly language
Lect 13-1 Lect 13: and Pentium. Lect Microprocessor Family  Microprocessor  Introduced in 1989  High Integration  On-chip 8K.
How a Computer Processes Data Hardware. Major Components Involved: Central Processing Unit Types of Memory Motherboards Auxiliary Storage Devices.
Computer performance.
INTRODUCTION TO MICROPROCESSORS
Semiconductor Memory 1970 Fairchild Size of a single core –i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core.
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
TECH 6 VLIW Architectures {Very Long Instruction Word}
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Anshul Kumar, CSE IITD CS718 : VLIW - Software Driven ILP Example Architectures 6th Apr, 2006.
Transmeta and Dynamic Code Optimization Ashwin Bharambe Mahim Mishra Matthew Rosencrantz.
10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems.
Hardware Support for Compiler Speculation
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.
CS5222 Advanced Computer Architecture Part 3: VLIW Architecture
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Transmeta’s New Processor Another way to design CPU By Wu Cheng
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
Spring 2003CSE P5481 Precise Interrupts Precise interrupts preserve the model that instructions execute in program-generated order, one at a time If an.
The Intel 86 Family of Processors
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
Lx: A Technology Platform for Customizable VLIW Embedded Processing.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
1 Aphirak Jansang Thiranun Dumrongson
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
ALPHA 21164PC. Alpha 21164PC High-performance alternative to a Windows NT Personal Computer.
Use of Pipelining to Achieve CPI < 1
CS 352H: Computer Systems Architecture
Crusoe Processor Seminar Guide: By: - Prof. H. S. Kulkarni Ashish.
Protection in Virtual Mode
Visit for more Learning Resources
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
INTRODUCTION TO MICROPROCESSORS
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
/ Computer Architecture and Design
Henk Corporaal TUEindhoven 2009
Pipelining: Advanced ILP
Superscalar Processors & VLIW Processors
Superscalar Pipelines Part 2
EE 445S Real-Time Digital Signal Processing Lab Spring 2014
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
Henk Corporaal TUEindhoven 2011
Control unit extension for data hazards
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Presentation transcript:

INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. VLIW based processor and x86 Code Morphing software provide x86-compatible mobile platform solution. VLIW based processor and x86 Code Morphing software provide x86-compatible mobile platform solution. Processor core operates at MHz. Processor core operates at MHz.

Crusoe Processor Family TM 5400: mhz. 256k L2 cache TM 5400: mhz. 256k L2 cache TM 5500: mhz 256k L2 cache TM 5500: mhz 256k L2 cache TM 5600: mhz 512k L2 cache TM 5600: mhz 512k L2 cache TM 5800: mhz 512K L2 cache TM 5800: mhz 512K L2 cache

Multiple Issue Microprocessors Several Functional Units (Integer ALUs, Floating Point Unit, Load/Store…) Several Functional Units (Integer ALUs, Floating Point Unit, Load/Store…) Multiple instructions issued per cycle Multiple instructions issued per cycle Requires higher memory bandwidth and more registers Requires higher memory bandwidth and more registers Two main flavors: Superscalar and VLIW. Two main flavors: Superscalar and VLIW.

Intel’s Superscalar Approach Superscalar: Issue a variable number of instructions per cycle. Superscalar: Issue a variable number of instructions per cycle. Pentium Pro, Pentium II, Pentium III are all superscalar, with a single pipeline. Pentium Pro, Pentium II, Pentium III are all superscalar, with a single pipeline. Processor core is RISC-based with x86 front end. Processor core is RISC-based with x86 front end.

VLIW Approach Very Long Instruction Word processor Very Long Instruction Word processor Multiple FU’s, each explicitly programmed on each instruction Multiple FU’s, each explicitly programmed on each instruction A Very Long Instruction Word is called a molecule A Very Long Instruction Word is called a molecule Each molecule contains 4 atoms: one instruction for each FU. Each molecule contains 4 atoms: one instruction for each FU. A molecule is either 128 bits or 64 bits wide. A molecule is either 128 bits or 64 bits wide.

Transmeta’s Crusoe Core Floating Point Unit Integer ALU #0 Load/Store Unit Branch Unit FADDADDLDBRCC 128 bit Molecule

Code Morphing: Crusoe’s key x86 instructions are converted to the Crusoe instruction set through a software layer x86 instructions are converted to the Crusoe instruction set through a software layer During instruction translation, optimizations and scheduling tricks can be performed During instruction translation, optimizations and scheduling tricks can be performed Crusoe Processor Architecture is decoupled from application software Crusoe Processor Architecture is decoupled from application software

Code Morphing basics Code Morphing software resides in ROM Code Morphing software resides in ROM Translations are performed dynamically and are cached Translations are performed dynamically and are cached Successively aggressive optimizations are performed each time a block is executed Successively aggressive optimizations are performed each time a block is executed VLIW Processor Core Code Morphing Software x86 OS/BIOS x86 Applications

Code Translation Superscalar approach translates one instruction at a time Superscalar approach translates one instruction at a time Code Morphing examines blocks at a time, creating a translation from a block. Code Morphing examines blocks at a time, creating a translation from a block. Translations are saved in a translation cache. Translations are saved in a translation cache. Successive executions of the translation invokes only the optimizer, not the translator Successive executions of the translation invokes only the optimizer, not the translator Cost of translation is amortized over successive executions Cost of translation is amortized over successive executions

Hardware Support for Code Morphing Explicit setting of condition code Explicit setting of condition code All registers holding x86 state are shadowed All registers holding x86 state are shadowed Commit operation copies active state to the shadow registers. Commit operation copies active state to the shadow registers. “Translated bit” in page table to detect self-modifying code “Translated bit” in page table to detect self-modifying code Alias hardware allows the ordering of load instructions ahead of store instructions Alias hardware allows the ordering of load instructions ahead of store instructions

Exception Handling x86 exceptions are precise (Problematic for out-of- order execution of instructions) x86 exceptions are precise (Problematic for out-of- order execution of instructions) On an exception, processor state is rolled back to the most recent commit. On an exception, processor state is rolled back to the most recent commit. Execution proceeds in in-order mode until the fault location is found Execution proceeds in in-order mode until the fault location is found

LongRun: Dynamic Power Management Typical Approach 1: Switch off processor quickly to save power (Can give glitches) Typical Approach 1: Switch off processor quickly to save power (Can give glitches) Typical Approach 2: Change clock rate by suspending processor and restarting Typical Approach 2: Change clock rate by suspending processor and restarting Crusoe 1: Adjust clock rate dynamically, without suspension Crusoe 1: Adjust clock rate dynamically, without suspension Crusoe 2: Adjust voltage level Crusoe 2: Adjust voltage level Result: Cubic power reduction, up to 30%. Result: Cubic power reduction, up to 30%.

Performance of Crusoe Processor The heatsink on the TM5400 Crusoe processor is quite small. The heatsink on the TM5400 Crusoe processor is quite small. Execution Time Execution Time – Comparable to direct hardware implementation by Intel or AMD – Comparable to direct hardware implementation by Intel or AMD – TM5400 at 667 MHz is about the same as a Pentium III running at 500MHz. – TM5400 at 667 MHz is about the same as a Pentium III running at 500MHz. Low Cost. Low Cost. – Much simpler hardware. – Much simpler hardware. Crusoe TM5400 is a about 7 million transistors (P4 is at 41 Million) Crusoe TM5400 is a about 7 million transistors (P4 is at 41 Million) – Easier to design, more scalable, easier to reach high clock rate, – Easier to design, more scalable, easier to reach high clock rate, more room for caches, better yield, etc more room for caches, better yield, etc Low Power Low Power

Crusoe vs. PIII, heat generation PIII: 105.5C.Crusoe: 48.2 Both processors playing a DVD

Drawbacks Code optimization doesn’t start until a block of code has been executed more than a few times. Code optimization doesn’t start until a block of code has been executed more than a few times. Code translation requires clock cycles which could otherwise be used in performing application computation. Code translation requires clock cycles which could otherwise be used in performing application computation.

Where Transmeta could go next The current emphasis is on mobile computing. The current emphasis is on mobile computing. Different applications of Code Morphing could be made to allow a different emphasis or target. Different applications of Code Morphing could be made to allow a different emphasis or target. Optimization techniques could be tailored to different target architectures. Optimization techniques could be tailored to different target architectures. Workstation/Server chips were hinted at in the documentation. Workstation/Server chips were hinted at in the documentation.

Conclusions Transmeta has built an x86 Crusoe processor based on VLIW technology Transmeta has built an x86 Crusoe processor based on VLIW technology Code Morphing offers a new approach to the implementation of an instruction set architecture Code Morphing offers a new approach to the implementation of an instruction set architecture Crusoe offers the power of a high-performance Intel processor, consuming a fraction of the power Crusoe offers the power of a high-performance Intel processor, consuming a fraction of the power