Download presentation
Presentation is loading. Please wait.
1
Design Support for Embedded Processors and Applications Prof. Kurt Keutzer EECS University of California Berkeley, CA keutzer@eecs.berkeley.edu
2
2 Embedded system needs meet technology constraints u Embedded system design needs: s Fast-time to market s Predictability s Reliability s Robustness s Efficiency s Economy u Application-specific integrated circuits failing to deliver this s Design risk s Design/tool cost s Unmanageable complexity
3
3 Demise of ASIC: Total IC Designs ASIC ASSP Handel Jones, IBS 9/23/2002
4
4 Customer needs meet technology constraints u Customer needs: s Fast-time to market s Predictability s Reliability s Robustness s Efficiency s Economy u Looking for ``platforms’’ – devices that will amortize system design costs over multiple generations u ``"Based on our analysis, having a software approach is the only way to scale to the next generation," Corgan (, Intel PMM said). "If you have to approach each fourfold increase in speed — from OC-48 [2.5 Gbits/second] to OC-192 [10 Gbits/s], say — with a new architecture, it's not cost-effective."
5
5 Solution: ASIC => ASSP => ASIP ASIP: Programmable Platforms u Develop platforms that allow for amortization of design costs over multiple generations u Make platforms programmable so that they have maximum flexibility with minimum overhead SDRAM Controller engine PCI Interface SRAM Controller Strong Arm Core I$ engine Mini D$ IX Bus Interface Hash Engine Scratch Pad SRAM
6
6 TM-xxxx D$ I$ TriMedia CPU DEVICE I/P BLOCK...... DVP System Silicon VLIW Media Processor: 100 to 300+ MHz 32-bit or 64-bit Nexperia System Busses PI bus Memory bus 32-128 bit PI BUS SDRAM MMI DVP MEMORY BUS DEVICE I/P BLOCK PRxxxx D$ I$ MIPS CPU DEVICE I/P BLOCK...... PI BUS General Purpose RISC Processor 50 to 300+ MHz 32-bit or 64-bit Library of Device Blocks Image coprocessors DSPs UART 1394 USB … and more TriMedia TM MIPS TM Example: Philips Nexperia TM Flexible architecture for digital video applications
7
7 Configurable/Reconfigurable Processors FPGA Processor Specialized Micro-Architectures Specialized Instruction-Set Architectures Domain-Specialization Chameleon Systems Morphics Frontier Design Tensilica ARC Improv Systems PMC Sierra XilinxAlteraAtmelTriscend Actel Adaptive Silicon Proceler eASIC
8
8 0 2 4 6 8 10 12 14 16 18 20 0123456789 Issue width per PE Number of PEs 32 48 64 Cognigine Cisco EZchip Xelerated IBM Lexra Motorola Intel BRECIS Broadcom Applied Micro Clearwater ClearSpeed Vitesse Agere PMC-Sierra Alchemy Conexant 64 instrs/cycle 16 instrs/cycle 8 instrs/cycle 10 Galaxy of Network Processors
9
9 ASIP/Programmable Platform Characteristics 5 Axes of the Architectural Design Space u Approaches to Parallel Processing s Processing Element (PE) level s Instruction-level s Bit-level u Elements of Special Purpose Hardware u Structure of Memory Architectures u Types of On-Chip Communication Mechanisms u Use of Peripherals Era of a single RISC PE (+ 1 DSP) over! This is not a ``run POSIX on an ARM’’ problem! Explosion of application specific solutions =>ASIPs RISC + SFU RISC Co- proc CAM Ether net MAC SRAM
10
10 Three Key Problem Areas Emerge u Development of programmable platforms/ASIP: s Characterizing target applications s Design space exploration u Deployment of programmable platforms/ASIP: s Development of programming model s Provision of software environment u Mapping applications onto programmable platforms/ASIP: s Application modeling s Application mapping
11
11 Addressing the problem areas Modern Embedded Systems Compilers Architectures and Languages MESCAL research mission: s To bring a disciplined methodology, and a supporting tool set, to the development, deployment, and programming of application-specific programmable platforms aka application specific instruction processors. Invited paper: ``From ASIC to ASIP: The Next Design Discontinuity’’, K. Keutzer, S. Malik, R. Newton, Proceedings of ICCD, pp. 84-91, 2002. www.gigascale.org/mescal Press coverage Sept 2002: Programmable Platforms will Rule: http://www.eetimes.com/story/OEG20020911S0063 High on MESCAL http://www.eetimes.com/story/OEG20020911S0065 SDRAM Controller engine PCI Interface SRAM Controller Strong Arm Core I$ engine Mini D$ IX Bus Interface Hash Engine Scratch Pad SRAM
12
12 Three Key Problem Areas u Development of programmable platforms: s Characterizing target applications s Design space exploration u Deployment of programmable platforms: s Development of programming model s Provision of software environment u Mapping applications onto programmable platforms s Application modeling s Application mapping
13
13 Complementary Issues Heterogeneous applications Heterogeneous applications u Programming Environment u Domain-specific models t Domain specific libraries t Environmental models u ``Software architecture’’/MOC u Primitive computation and communication mechanisms u Mapping to implementation Heterogeneous programmable platforms/ASIP u Programming model u Domain-specific presentation t Device Specific Libraries t Environmental support u System architecture/micro-architecture/MOC u Primitive computation and communication mechanisms Port 0 IP Forwarding Engine Port 0
14
14 Our Approach u Bottom-up view - create abstractions of existing devices s opacity - hide micro-architectural details from programmer s visibility - sufficient detail of the architecture to allow the programmer to improve the efficiency of the program u Top down – experiment with existing modeling/programming environments s Learn from their abstractions of the devices s Try to maximize performance within these environments
15
15 Our Constraint/Angle/Prejudice u In real-time embedded systems correct logical functionality can never be divorced from system performance u In commercial (especially consumer-oriented) embedded systems system price is an utmost concern u Quantitative s (Quantitatively) examine trade-offs among: t Quality-of-results (e.g. speed, but also power, device cost) t Programmer productivity (how long does all this take?)
16
16 Application: IPv4 Forwarding Benchmark Port 1 Port 2 Port 15...... Port 1 Port 2 Port 15...... Ingress Ports FIFOsFunctionalityFIFOs Egress Ports Port 0 IP Forwarding Engine Port 0 IP Forwarding Engine IP Forwarding Engine
17
17 Example Programming target – IXP1200 u Intel IXP1200 s Multiple processors s specialized execution units s hardware context swap u ``Intel Corp., … chose a "fully programmable" architecture with plenty of space for users to add their own software — but one that turned out to be difficult to program. ‘’ u http://www.eetimes.com/story/OEG200 20830S0061 SDRAM Controller engine PCI Interface SRAM Controller Strong Arm Core I$ engine Mini D$ IX Bus Interface Hash Engine Scratch Pad SRAM u How do we program these architectures? What’s the right programming model?
18
18 Base-line reference implementations from Intel Assembler u Reference application uEngine C Features u Reference application modified u Basic C language constructs like loops, condition statements and basic data types (char, int, float) u IXP library defines additional data types, macros and functions (useful for common networking applications) u Memory management is user defined. Hence explicit declaration of memory allocation (and no support for pointers).
19
19 Commercial NPU programming environment Teja Technologies u Teja is founded by Akash Deshpande – Student of Prof. Pravin Varaiya u Based on his thesis “Control of Hybrid Systems” (1994) Teja Language Features u User interacts mostly with the graphical interface (which exports pre- defined application primitives) u Extending the Teja primitives is done via a FSM-based model (however, this still requires coding in assembly via the graphical interface) u Memory management for pre-defined primitives is done by Teja. User can alter this process (but is tedious and error prone)
20
20 Teja Features
21
21 Our own NPU programming environment: NPClick u Based on Click s Popular environment for describing/implementing network applications s Developed by Eddie Kohler, MIT=> ICSI u NPClick s Implemented subset of element library in IXP uC s Element communication via function calls t maintained semantics (packet push/pull) s packet storage fixed: t header in SRAM t payload in DRAM u Designer needs to specify: s thread boundaries s thread/uEngine assignment s memory allocation of queues (SRAM, DRAM, Scratch) u Opportunities for optimization (future work) s redundant memory loads/stores based on element/thread mapping s schemes for multiplexing hardware resources among multiple element instantiations (e.g. muxing TFIFO among 8 to Device’s)
22
22 Programming Models for IXP1200
23
23 Productivity Estimates u ``First time’’ learning curve issues makes it difficult to compare the productivity of these approaches u Based on our experience, we estimate the following design times for implementing an IPv4 router Time to functional correctness Additional time for performance tuning ASM 8 weeks uC 4 weeks 6 weeks Teja 2 weeks 3-4 weeks NPClick 2 days 2 weeks u The advantages with Teja and NPClick come from the ability to perform design-space exploration at a higher level
24
24 Conclusions: Programming Embedded Systems u Neither ASICs or general-purpose processors will fill the needs of most embedded system applications u System design teams will increasingly choose ASIPs/programmable platforms u Programming these devices is a new challenge: s Parallelism t Process t Operator t Bit/gate level s Special-purpose execution units u Need to develop matches between application development environments and programming models of ASIPs/programmable platforms u Match must consider: s Efficiency s Productivity s Robustness s Reliabilty
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.