The TM3270 Media-Processor. Introduction Design objective – exploit the high level of parallelism available. GPPs with Multi-media extensions (Ex: Intel’s.

Slides:



Advertisements
Similar presentations
Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
Advertisements

Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
DSPs Vs General Purpose Microprocessors
Intro to the “c6x” VLIW processor
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
Superscalar and VLIW Architectures Miodrag Bolic CEG3151.
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Computer Architecture A.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
The University of Adelaide, School of Computer Science
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
TU/e Processor Design 5Z0321 Processor Design 5Z032 Computer Systems Overview Chapter 1 Henk Corporaal Eindhoven University of Technology 2011.
Fall EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Computer Organization Lecture 20 Pipelining: “bucket brigade” MIPS.
1 Recap: Memory Hierarchy. 2 Unified vs.Separate Level 1 Cache Unified Level 1 Cache (Princeton Memory Architecture). A single level 1 cache is used for.
Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.
Embedded Systems Programming
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
Chapter 17 Parallel Processing.
Processor Types And Instruction Sets Barak Perelman CS147 Prof. Lee.
RISC and CISC. Dec. 2008/Dec. and RISC versus CISC The world of microprocessors and CPUs can be divided into two parts:
Computer performance.
Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,
NATIONAL POLYTECHNIC INSTITUTE COMPUTING RESEARCH CENTER IPN-CICMICROSE Lab Design and implementation of a Multimedia Extension for a RISC Processor Eduardo.
Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Comparing Memory Systems for Chip Multiprocessors Leverich et al. Computer Systems Laboratory at Stanford Presentation by Sarah Bird.
INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.
The Arrival of the 64bit CPUs - Itanium1 นายชนินท์วงษ์ใหญ่รหัส นายสุนัยสุขเอนกรหัส
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Anshul Kumar, CSE IITD CS718 : VLIW - Software Driven ILP Example Architectures 6th Apr, 2006.
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Computer Organization & Programming Chapter 6 Single Datapath CPU Architecture.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
The Alpha Thomas Daniels Other Dude Matt Ziegler.
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
Lx: A Technology Platform for Customizable VLIW Embedded Processing.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Xinsong1 Multimedia Extension Technology survey Xinsong Yang Electrical and Computer Engineering 734 Final Project 5/10/2002.
COMPUTER SYSTEMS ARCHITECTURE A NETWORKING APPROACH CHAPTER 12 INTRODUCTION THE MEMORY HIERARCHY CS 147 Nathaniel Gilbert 1.
BITS Pilani Pilani Campus Pawan Sharma ES C263 Microprocessor Programming and Interfacing.
1 ECE 734 Final Project Presentation Fall 2000 By Manoj Geo Varghese MMX Technology: An Optimization Outlook.
Cache memory. Cache memory Overview CPU Cache Main memory Transfer of words Transfer of blocks of words.
Advanced Architectures
Visit for more Learning Resources
Architecture & Organization 1
INTRODUCTION TO MICROPROCESSORS
Morgan Kaufmann Publishers
Vector Processing => Multimedia
Drinking from the Firehose Decode in the Mill™ CPU Architecture
Array Processor.
Architecture & Organization 1
Computer Architecture Lecture 4 17th May, 2006
CS170 Computer Organization and Architecture I
Microprocessor & Assembly Language
STUDY AND IMPLEMENTATION
Alex Saify Chad Reynolds James Aldorisio Brian Bischoff
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Superscalar and VLIW Architectures
CSE 502: Computer Architecture
Presentation transcript:

The TM3270 Media-Processor

Introduction Design objective – exploit the high level of parallelism available. GPPs with Multi-media extensions (Ex: Intel’s MMX and AltiVec in PowerPC) Highly programmable Most effective when operating on data stored consecutively Higher power consumption, may not be suitable for energy sensitive applications Smaller register size and distinct register files for SIMD operations Dedicated hardware Limited format support

Design Features – TM3270 media processor Multi-purpose programmable solution Backward source code compatible Unified 128*32 bit register file 32 bit address range and datapath VLIW architecture with 5 issue slots 64 Kbyte Instruction cache – 8 way set associative 128 Kbyte Data cache – 4 way set associative Variable length instruction encoding Operations are guarded Non-aligned memory access

ISA Enhancements 2 slot operations Collapsed load CABAC

Two-slot operations Executed in Functional units in neighbouring issue slots SUPER_DUALIMIX Pairwise 2-taps filter on 16 bits, and the results are stored in 2 destination registers. SUPER_LD32R Retrieves 2 consecutive 32-bit values from memory and stores them in 2 destination register

Collapsed load operations Used for motion estimation LD_FRAC8

Context Based Binary Arithmetic coding(CABAC) H.264 compression feature Lossless compression of syntax elements in the video stream, based on the probabilities of syntax elements of the given context. High compression ration Computationally intensive

Prefetching Prefetching to hide memory latency Prefetching based on memory regions Memory regions defined by start address, end address and stride Memory regions are under software control 4 memory regions supported

Pipeline Sequential Icache design Two slot execution unit Unified register file Load –Store unit connects to 2 issue slots 5 delay slots for jump

Load store unit Two extra cycles for fractional load Two copies of tags Loads issued only from slot 5

Realization Fully synthesizable, low power process design in 90nm High threshold voltage Frequency 450 MHz – 1.2V 350 MHz – 1.08V Area : 8mm sq. Almost 50% for SRAMS Power 0.7 – 1mW / MHz (1.2V) Clock gating – 70 clock domains

Relative Performance

Sources The TM3270 Media processor (Thesis carried out at Philips Semiconductor)