Frank Vahid and Walid Najjar Embedded Systems: Enabling technologies and roadmap, promising applications, research opportunities Frank Vahid and Walid Najjar Dept. of Computer Science and Engineering University of California, Riverside NOES meeting, January 2004 Frank Vahid, UC Riverside
Key Technology Trend Moore’s Law: 2x every 18 months ~10,000 transistors in 1980 Nearly 1 billion transistors today 10,000 1,000 Logic transistors per chip (in millions) 100 10 IC capacity Source: ITRS’99 1 0.1 0.01 0.001 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 Frank Vahid, UC Riverside
Frank Vahid, UC Riverside Key Technology Trend Graphical illustration of Moore’s Law 1981 1984 1987 1990 1993 1996 1999 2002 Leading edge chip in 1981 10,000 transistors chip in 2002 150,000,000 Two trends Smaller and cheaper More functions Frank Vahid, UC Riverside
Smaller and cheaper chips 32-bit microprocessor, plus networking, in tiny packages E.g., “Smart dust” Mass-produced microcontrollers E.g., PICs for <$1 today, likely pennies in the future “Smart dust” Getting awfully small... Photo courtesy of Joe Kahn Frank Vahid, UC Riverside
Computers Everywhere -- Today Anti-lock brakes Auto-focus cameras Automatic teller machines Automatic toll systems Automatic transmission Avionic systems Battery chargers Camcorders Cell phones Cell-phone base stations Cordless phones Cruise control Curbside check-in systems Digital cameras Disk drives Electronic card readers Electronic instruments Electronic toys/games Factory control Fax machines Fingerprint identifiers Home security systems Life-support systems Medical testing systems Modems MPEG decoders Network cards Network switches/routers On-board navigation Pagers A “short-list” of embedded systems A computing system embedded within an electronic product whose primary function is not a computer 98% of processors are embedded [Tu02] 40-50 in every home >50 in some cars Annual sales Embedded processors alone >$3 billion (Dataquest’00) Other ICs >$20 billion (Gartner/Dataquest 01) Photocopiers Point-of-sale systems Portable video games Printers Satellite phones Scanners Smart ovens/dishwashers Speech recognizers Stereo systems Teleconferencing systems Televisions Temperature controllers Theft tracking systems TV set-top boxes VCR’s, DVD players Video game consoles Video phones Washers and dryers Frank Vahid, UC Riverside
Computers everywhere – today Frank Vahid, UC Riverside
Computers everywhere -- tomorrow In every hallway? In every electric appliance? In every food package? In every shirt? In every tooth? Frank Vahid, UC Riverside
Smaller and cheaper – UCR’s “eBlocks” Inside house LED wireless RX At garage door Outside Light Sensor Magnetic Contact Switch 2-Input Logic wireless TX Garage Door Open at Night (wireless solution) Sensors, logic, communication, and output blocks No programming or electronics experience 2-3 year battery life Solution: Every block has computer, communication via packets Future: autoconfiguration? Countless possible uses NSF-funded Vahid, Najjar, Hsieh Frank Vahid, UC Riverside
Defining Basic eBlocks – Partial Catalog Diagram Description Interface Magnetic Contact Switch Determines when contact between two sensors is made. yes = contact between sensors no = no contact between sensors Light Sensor Sensor detects presence of light. yes = light detected no = no light detected Button Indicates whether button is pressed or not. yes = button pressed no = button not pressed LED Device blinks a light when input is a yes. Device emits no light when input is no. yes = blink LED no = turn LED off Splitter Device receives a signal and replicates that signal on each output. yes = output yes signal no = output no signal Toggle An input of yes toggles (inverts) the current value outputted by the device. yes = toggle previous output value no = do nothing 2-Input Logic Block Configurable logic block programmed by the user via DIP switch. For each of the possible outcomes of a and b, there is a corresponding switch which can be set so the resulting output is a yes or no for that particular combination. Magnetic Contact Switch yes/no Light Sensor yes/no Button yes/no LED yes/no Splitter yes/no Toggle yes/no 2-Input Logic yes/no Frank Vahid, UC Riverside
Defining Basic eBlocks – How to Implement Logic? Logic to detect motion at night Motion sensor output A = yes, light sensor output B = no Motion at night = A AND (NOT B) = A * B’ Equations too hard for regular person, plus how to enter it? In general, regular people very weak with logic Can’t create unique eBlock for every unique 2-input logic function Create one 2-input logic block User must configure that block Solution: print truth table on block, user sets switches for each possible input Not ideal, but sufficient for now Logic A B from motion sensor from light sensor no yes A B Output Frank Vahid, UC Riverside
Frank Vahid, UC Riverside Programmability 100s of sensors in one network Write 100s of C programs? Or, write one description of functionality and GENERATE 100s of programs. Which language can express the system design? Simulation? Impractical at this scale Verification is the only option Frank Vahid, UC Riverside
Frank Vahid, UC Riverside Key Technology Trend Graphical illustration of Moore’s Law 1981 1984 1987 1990 1993 1996 1999 2002 Leading edge chip in 1981 10,000 transistors chip in 2002 150,000,000 Two trends Smaller and cheaper More functions Frank Vahid, UC Riverside
More functions -- present Entire systems on one chip Philips Viper: high-quality audio and video 1,920 x 1,080 interlaced pixels Processes multiple input streams Renders, composes and outputs video and audio output streams Tremendous data-processing demands SDRAM ctrl MIPS CPU TriMedia CPU Int ctrl MPEG2 dcd 2D rend JTAG image proc. PCI Clocks video proc. I^2 C IC debug other proc. 1394 CPU debug Digital I/O CRC DMA USB Audio I/O UART Other I/O Frank Vahid, UC Riverside
More functions – future? Hundreds of processors on a single chip? Frank Vahid, UC Riverside
More functions – PROBLEM! Design productivity gap 10,000 1,000 100 10 1 0.1 0.01 0.001 Logic transistors per chip (in millions) 100,000 1000 Productivity (K) Trans./Staff-Mo. 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 IC capacity productivity Gap Moore’s Law Source: ITRS’99 Designer productivity growing at slower rate 1981: 100 designer months ~$1M 2002: 30,000 designer months ~$300M Frank Vahid, UC Riverside
Trend Towards Pre-Fabricated Platforms Example -- ASSP: application specific standard product Domain-specific pre-fabricated IC e.g., digital camera IC ASIC: application specific IC ASSP revenue > ASIC ASSP design starts > ASIC Unique IC design Ignores quantity of same IC ASIC design starts decreasing Due to strong benefits of using pre-fabricated devices Source: Gartner/Dataquest September’01 Frank Vahid, UC Riverside
Single-Chip Microprocessor/FPGA Platforms Altera’s Excalibur EPXA 10 (2002) ARM (922T) hard core 200 Dhrystone MIPS at 200 MHz ~200k to ~2 million logic gates Source: www.altera.com Frank Vahid, UC Riverside
Single-Chip Microprocessor/FPGA Platforms Xilinx Virtex II Pro (2002) PowerPC based 420 Dhrystone MIPS at 300 MHz 1 to 4 PowerPCs 4 to 16 gigabit transceivers 12 to 216 multipliers Millions of logic gates 200k to 4M bits RAM 204 to 852 I/O $100-$500 (>25,000 units) Up to 16 serial transceivers 622 Mbps to 3.125 Gbps PowerPCs Config. logic Courtesy of Xilinx Frank Vahid, UC Riverside
New generation of languages and compilers needed Languages to specify functionality of hugely complex systems Built largely from existing functions Compilers to map functionality to existing hardware resources Multiple processors, plus FPGA How convert to functionality to custom circuit on FPGA? While considering performance, power, and size criteria Frank Vahid, UC Riverside
Advantage of compiling to hw: Parallelism * + 2 MACs + 1 ALU 2 taps/cycle Advanced DSP * + 1 MAC 1 tap/cycle Simple CPU or DSP * + RC fabric – custom circuit K taps/cycle FPGA Fabric Parallelism limited by chip area or memory bandwidth With enough bandwidth: 100s of iterations! Frank Vahid, UC Riverside
UCR: Some Performance Results PC: 800 MHz Pentium III, FPGA: Xilinx Virtex E 2000, all FPGA programs compiled from SA-C with extensive compiler optimizations Code FPGA Clock (MHz) Execution Time (msec) Speed-up Comments FPGA PC Wavelet 35.1 2.0 77.0 35 C++ code on PC Canny 32.2 6.0 850.0 142 C code on PC 135.0 22.5 MMX code on PC Prewitt 42.1 1.9 158.0 83 Erode/Dilate 46.5 3.1 67.0 21.6 AddS 51.7 0.67 5.95 8.88 MMX IPL op. SAR ATR 41.1 80 65,000 800 3 FPGAs and 400 concurrent iterations Ref: W. Najjar et al., IEEE Computer August 2003 Frank Vahid, UC Riverside
Compilers for future platforms UCR research Synthesis of C to hw circuits Automatic partitioning of program among sw and hw From C/C++/Java source, or even binaries 10x-100x speedups by mapping sw kernels to hw Warp processors – same, but dynamic and transparent Vahid, Najjar, Hsieh, Tan Funding: NSF, SRC (Semiconductor Research Corp), Los Alamos, Northup Gruman, others… How can we support platforms with hundreds of microprocessors and huge FPGAs? Also, how utilize processor “soft cores” that can be mapped to the FPGA too? Frank Vahid, UC Riverside
Frank Vahid, UC Riverside Directions Smaller and cheaper trend Languages to specify desired system behavior Synthesis tools to generate hw and sw Power optimization More functions trend Languages/compilers for multi-billion-transistor platforms having hundreds of processors and huge FPGAs Frank Vahid, UC Riverside