THE PHILIPS NEXPERIA DIGITAL VIDEO PLATFORM
The Digital Video Revolution Transition from Analog to Digital Video Navigate, store, retrieve and share digital programs as well as access to new interactive services and connectivity possibility Home entertainment systems will be implemented with a number of Digital Video appliances Digital Televisions (DTVs) DVD Players Digital Video Recorders Set-top Boxes
The Philips Nexperia Platform Approach Philips Semiconductors decided to serve the application domains of digital video and mobile with a platform approach : Nexperia Nexperia : ‘next experience’ Nexperia platform properties Flexibility Innovation Future-proof Using architecture framework and IP blocks
Nexperia-DVP Platform Conceps Nexperia-DVP is a Reference Architecture A set of documents the describes how the products of the Digital Video Product Family will be partitioned into subsystems and how functionality will be split over theses subsystems. Nexperia-DVP Soc Reference Architecture Nexperia-DVP Software Reference Architecture Nexperia-DVP System Reference Architecture
Nexperia-DVP SoC Reference Architecture. Scalable VLIW Media Processor: 100 to 300+ MHz 32-bit or 64-bit Nexperia ™ System Buses bit General-purpose Scalable RISC Processor 50 to 300+ MHz 32-bit or 64-bit Library of Device IP Blocks Image coprocessors DSPs UART 1394 USB …and more TM-xxxxD$ I$ TriMedia CPU DEVICE IP BLOCK DVP SYSTEM SILICON PI BUS SDRAM MMI DVP MEMORY BUS DEVICE IP BLOCK PRxxxx D$ I$ MIPS CPU DEVICE IP BLOCK PI BUS TriMedia ™ MIPS ™
Three Levels of Abstraction Level 1 : DVP Software-Hardware Rules Deals with the software view of the hardware Unified Memory Architecture Rules for Data Movement Endianess Ordering & Coherency Interrupts Data formats, including Pixel Formats Trimedia-MIPS Communication Protection Boot
Three Levels of Abstraction Level 2 : Device Transaction Level (DTL) Deals with point-to-point transfers between Device IPs and the Connection Network, specifying a Device IP partition and architecture that is compatible with Level 1 Two main types of ports : Typical of Device IP Device Control & Status Ports DMA ports
Three Levels of Abstraction Level 3 : Connection Network Deals with the more traditional Bus Hierarchy & Bus Level In second generation Nexperia-DVP The Device Control & Status (DCS) Network - Primarily a low latency communication path for the CPUs and other initiators to access the control & status register in the Device IPs The Memory Connection Network - It will automate the generation of structures - current generation is implemented with two key elements : Memory Transaction Level (MTL) ports and protocol & Pipelined Memory Access Network
Chiplet-based Design Chiplet : partitioning divides the logic hierarchy into manageable sized blocks A chiplet is a group of modules which are placed together because either they are synchronous to each other or they are not timing critical.
Chiplet-based Design (Cont.) The partitioning of the top-level netlist among chiplets followed these guidelines There should be as few synchronous signal crossings between chiplets as possible The clock module is placed into a separate chiplet because of its complexity Cross-chiplet scan chains are not allowed
Nexperia-DVP Software Architecture Scalable from low-end to high-end Consistent API (on MIPS or TriMedia) Single Streaming Architecture for MIPS and TriMedia Aligned to Nexperia-DVP hardware architecture and IP blocks Operating system independent software layers Re-use of software components on any instance of the platform
Nexperia-DVP System Reference Architecture Performance Characteristics The system must work under normal conditions The system must have a given (short) maximum input/output delay The system must behave gracefully under exceptional condition The system must be as cost effective as possible Critical Resources Memory Bandwidth CPU Cycles (RAM) Memory Size
Effect of bandwidth on transaction latency The memory transaction latency experienced by a CPU is influenced by four different factors Minimum transaction cost Transactions of other blocks that take precedence Pending transactions of blocks that do not take precedence Other transactions of the CPU itself
Effect of transaction latency on CPU performance Typical average behavior of two types of code on a Nexperia-DVP system DSP codeControl Code Characteristics Long expressions Highly repetitive loops Lots of switch/if statements Many function calls Working Set Size Typically smaller than CPUs I-cache size Typically much larger than CPUs I-cache size Typical instruction repetition rate before cache line invalidate 40 or moreBetween 1 and 3 Typical CPU stall cycles due to cache misses on moderately loaded system 20% and below on TM80% and above on TM 60% and above on MIPS Predominant Cache Misses D-CacheI-Cache Typical bandwidth consumed for each effective CPU Mcycle 400KB/s on TM6.4MB/s on TM 1.5MB/s on MIPS
Memory Arbitration To regulate the distribution of available bandwidth over the requesting blocks : Utilize an advanced DDR arbitration scheme Fair distribution of bandwidth over requesting blocks Shortest possible transaction latency for CPUs Nexperia-DVP memory arbiter deploys a sophisticated algorithm for distribution of bandwidth over requesting blocks Guaranteed bandwidth Priority based distribution of remainder
Scheduling Techniques Scheduling of Hardware Accelerators time A1 A2 “Perform A2 after A1 is finished” “Run slice of A2 when A1 finished a slice” “Start A2 after a slice of A1, and guarantee that A2 will not overtake A1”
Scheduling Techniques (Cont.) Properties of Scheduling Techniques SequencingSlicingStaggering Software effortLowest Higher, depends on slice size Low Buffer memoryLargestSmallestMiddle Input/output delayLongestShorter Requires of processors Nothing Must support slicing Must be sequential and localized