© imec 2005 L. Van der Perre - Nov A user’s dream: Seamless wireless access Fixed wireless access Public hot-spot Office WLAN Cellular Mobile ‘High Rate High Mobility’ Wireless multimedia Anywhere, Anytime DVB-H DAB Tomorrow’s new standard ?
© imec 2005 L. Van der Perre - Nov Today’s state of the art solution: multiple radios
© imec 2005 L. Van der Perre - Nov Design for wireless standards: Moving target! Tomorrow’s new standard ?
© imec 2005 L. Van der Perre - Nov The old dream to FLAI: Flexible Air Interface ‘Software Defined Radio’ (SDR) can enable to FLAI at high rates thanks to and because of scientific progress in Communication schemes Technology Design paradigms
© imec 2005 L. Van der Perre - Nov SDR: a wireless dream come true? SDR
© imec 2005 SDRs for wireless terminals: Flexibility for low power, or not November 3 rd 2005 Presented by Liesbet Van der Perre With main contributions by B. Bougard, and many others IMEC/wireless
© imec 2005 L. Van der Perre - Nov SDRs for wireless terminals Why we wish for SDRs A must for cost: The technological scene A must for functionality: The wireless access schemes scene Why we should (not) design SDRs for: A holistic approach for low air interface HW architecture design & SW design Managing flexibility for low power: the QoE approach
© imec 2005 L. Van der Perre - Nov The quest for SDRs driven by volume: competitive need due to increasing NRE cost volume $/chip Node x Node x+1 Not you!
© imec 2005 L. Van der Perre - Nov Why not making one more standard cell ASIC? [Meyr]
© imec 2005 L. Van der Perre - Nov Kees van Berkel, Philips research laboratories Eindhoven
© imec 2005 L. Van der Perre - Nov All cost factors direct us towards SDRs Cost pies are difficult to draw … an artist impression Area Form factor Volume Time-to-Market Reduced time to market Increased volume by multi-functional solutions Reduced area by hardware reuse, reconfigurable components (o.a. RF-MEMs) Small form factor by deep submicron integration Various NRE costs
© imec 2005 L. Van der Perre - Nov SDRs for wireless terminals Why we wish for SDRs A must for cost: The technological scene A must for functionality: The wireless access schemes scene Why we should (not) design SDRs for: A holistic approach for low air interface HW architecture design & SW design Managing flexibility for low power: the QoE approach
© imec 2005 L. Van der Perre - Nov Darwin’s philosophy for wireless communication Not the strongest will survive, But the fittest
© imec 2005 L. Van der Perre - Nov Wireless standards: a large variety to fit cases, to grow still Degree of mobility Stationary Walking Driving User data rate 10Mbps IEEE a,d 1100 HSxPA IEEE e GSM GPRS DECT EDGE FlashOFDM (802.20) 3G Evolution & Beyond 3G > BlueTooth UMTS CDMA EV-DO EV-DV WLAN (802.11a/g/n) WLAN (IEEE b) © Siemens X?
© imec 2005 L. Van der Perre - Nov Do we need ‘beyond 3G‘? Systems Beyond 3G Mobility Individuality Data traffic Multimedia services High-Speed information access Cost reduction Bandwidth Machine-to-machine communication Mobile multimedia applications as commodity High bandwidth and capacity Global coverage Global roaming User in control Economical Trends Technological Trends Social Trends © Siemens
© imec 2005 L. Van der Perre - Nov SDRs for wireless terminals Why we wish for SDRs A must for cost: The technological scene A must for functionality: The wireless access schemes scene Why we should (not) design SDRs for: A holistic approach for low air interface HW architecture design & SW design Managing flexibility for low power: the QoE approach
© imec 2005 L. Van der Perre - Nov Need for low power Extrapolate targets from available energy Power of state-of-the-art components 2003 Relative complexity of different applications (Tx/Rx, Video, 3D) Estimate evolution towards 2006 (battery improvement, technology scaling) Available energy (2006) 13.4 Wh (+5%/year) Peripherals (display, sensors, …)- 4.0 Wh Losses- 1.4 Wh Applications- 5.5 Wh +30%/year Multimedia (video, 3D) 2.5 Wh Other 3.0 Wh Modem (incl. FE) 2.5 Wh (9000J) [source: M4 program] There is an energy gap !
© imec 2005 L. Van der Perre - Nov Position your target: dynamism can be turned into opportunism MIMO communication WLAN SIMO comm. scan SIMO/MISO WLAN Cellular comm cellular Dynamic QoS/Energy Management is key
© imec 2005 L. Van der Perre - Nov Position your target: Where does the power go? Typical Wireless LAN:Typical Cellular scenario: Transmit5%Transmit.5% Receive5%Receive.5% Idle/Listen90%Idle/Listen99% Approach: Flexibility where needed Divide and conquer
© imec 2005 L. Van der Perre - Nov Where does the power go in Tx/Rx? Flexibility can save power! Breakdown of transmitter power consumption Breakdown of receiver power consumption PA DSP+MAC D/As BB FE LO RF FE 530 mW: PA 261 mW: DSP + MAC 40 mW: D/A Converters 36 mW: BB FE 47 mW: LO generation 288 mW: RF FE 176 mW: FEC 176 mW: DSP + MAC 200mW: A/D Converters 72 mW: BB FE 47 mW: LO generation 60 mW: RF FE Numbers given for state-of-the-art WLAN a for 54Mbit/s at 13dBm average transmit power FEC DSP + MAC A/Ds BB FE LO RF FE 25 % 30 % 45 % 25 % Power saving: 120mW = 10% 330mW = 27 % ??
© imec 2005 L. Van der Perre - Nov SDRs for wireless terminals Why we wish for SDRs A must for cost: The technological scene A must for functionality: The wireless access schemes scene Why we should (not) design SDRs for: A holistic approach for low air interface HW architecture design & SW design Managing flexibility for low power: the QoE approach
© imec 2005 L. Van der Perre - Nov From desired functionality to SDR reality Broadband wireless access schemes QoE-enhanced SDR solutions & design flow BB implementation optimized code flexible/low power platform PHY and MAC enabling QoE relax/control front-end
© imec 2005 L. Van der Perre - Nov Broadband access schemes enabling low power: Divide and Conquer!
© imec 2005 L. Van der Perre - Nov Different high performance standards fit diverse communication scenarios strong OFDM/MIMO technology base Urban micro cell Urban macro cell Suburban macro cell Static (0km/h)XXX Pedestrian (10km/h)XXX Vehicular low mobility (60km/h)XXX Vehicular medium mobility (120km/h)XX Vehicular high mobility (250km/h)X e solutions HRHM Asymmetric CDMA-based solutions MC-CDMA in downlink, SC-CDMA in uplink lower terminal cost In line with proposals for 4G cellular, e.g. 3GPP, NTT DoCoMo High Rate High Mobility (HRHM) solutions n solutions
© imec 2005 L. Van der Perre - Nov Divide et impera: Hybrid modulation for MT low complexity and PAPR IFFT at transmit MT moved to receive BS SC PAPR << MC PAPR
© imec 2005 L. Van der Perre - Nov Increasing interest for block-based transmissions, also for cellular Keywords: OFDM vs. Single-Carrier, OFDMA, MC- CDMA… IST-projects & initiatives Matrice Winner WWRF WG4 Qualcomm (Flarion), Docomo IEEE802.16e (OFDMA), IEEE (high mobility solutions), 3GPP (OFDM extension)
© imec 2005 L. Van der Perre - Nov We need to communicate over a frequency selective channel f (GHz) P (dB)
© imec 2005 L. Van der Perre - Nov Divide et impera over frequency selective channels: ‘orthogonal frequency division multiplexing’ f (GHz) P (dB) Parallel carriers see flat fading: Simple one tap equalizer/carrier can suffice
© imec 2005 L. Van der Perre - Nov High potential in combining MC/SC-FD and CDMA MC/SC (modulation technique) low complexity frequency selective channel equalization loss in power in the cyclic prefix Handle limiting interferences in the frequency domain and benefit from the interesting properties of CDMA DS-CDMA (accessing scheme) networking abilities (soft hand-over,…) high potential capacity highly sensitive to interferences (symbols and users) generated by multi-path propagation
© imec 2005 L. Van der Perre - Nov Broadband mobile access: the potential power gain is huge! UMTS (2Mbps stationary, 100km/h) scanning25mW50h tracking14mW90h talk575mW8h IMEC’s High Rate High Mobility (20Mbps DL, 4Mbps 250km/h) tracking2.5mW500h TX/RX61mW73h
© imec 2005 L. Van der Perre - Nov BB implementation: optimizing HW and SW for low power Broadband wireless access schemes QoE-enhanced SDR solutions & design flow BB implementation optimized code flexible/low power platform PHY and MAC enabling QoE relax/control front-end
© imec 2005 L. Van der Perre - Nov SDR platform-based system design Broadband wireless access schemes Platform requirements and targets User terminal requirements Platform exploration & SDR mapping Final code mapping ≤90nm MPSoC design
© imec 2005 L. Van der Perre - Nov The platform should unite SDR functionality and terminal constraints Broadband wireless access schemes: PHY & MAC Platform requirements and targets: DSP complexity: more than MOPS timing constraints power budget standby /TxRx: don’t go back! area target: every gate counts! User terminal: battery time & cost/form factor
© imec 2005 L. Van der Perre - Nov SDR platform-based system design Broadband wireless access schemes Platform requirements and targets User terminal requirements Platform exploration & SDR mapping Final code mapping≤90nm MPSoC design
© imec 2005 L. Van der Perre - Nov lower bit-error rate wider coverage … asks for SW flexibility Kees van Berkel, Philips research laboratories Eindhoven
© imec 2005 L. Van der Perre - Nov Opportunistic platform partitioning MAC & QoE implementation manager Typical WLAN/cellular scenario: > 80% idle/listening time ‘always on’ for listening/scanning (Power detection, PN correlation, …) very low power (5 mW) inner modem C-programmable CGA parallelization algorithms L2
© imec 2005 L. Van der Perre - Nov Coarse Grain array: a well-fitted technology for baseband processing IMEC ADRES IP : a tightly integrated combination of a VLIW DSP and a Coarse Grain Array a compiler to map applications described in C directly on these architectures Baseband processing is dataflow dominated Baseband processing is computing intensive (4 op/memory access) Wireless requires low power CGA provides dense interconnection network matching DF CGA achieves very high IPC CGA provides low power through low instruction fetch freq. SW developers want programming ease
© imec 2005 L. Van der Perre - Nov SDR-FLAI CGA exploration: instantiating a domain specific matrix Goal: explore the design space provided by the ADRES framework Topology: Size of the Array: # of FUs, # and size of RFs, … Interconnection Functionality: level of heterogeneity Intrinsics: SIMD operations, software controlled scratchpad, …
© imec 2005 L. Van der Perre - Nov FFT/IFFT is a key kernel Accounts from 50 to nearly 100% of computational load in the several WLAN TX and RX data processing modes.
© imec 2005 L. Van der Perre - Nov To meet high-performance requirements: Exploit different flavors of parallellism 8x8: 2 ways SIMD (32 bits word width) 4x4: 4 ways SIMD (64 bits word width) Exploration of ADRES architecture
© imec 2005 L. Van der Perre - Nov Position your target: Get the relevant design Cost Open issues: - cycle count is not a fair benchmark to compare performance among processors - relevant conclusions can only be drawn from an energy-performance-area trade-off Approach: - generate and verify RTL Implementation models for all tested configurations - carry out Gate-Level Synthesis to achieve numbers for area and max. clock-speed - back-annotate the designs with realistic switching activity to get power figures
© imec 2005 SDRs for wireless terminals: Flexibility for low energy, or not November 3 rd 2005 Presented by Liesbet Van der Perre With main contributions by B. Bougard, and many others IMEC/wireless
© imec 2005 L. Van der Perre - Nov Explore performance-area-energy tradeoff: we started on a VLIW view
© imec 2005 L. Van der Perre - Nov Explore performance-area-energy tradeoff of the VLIW architecture for the DSP engine Cycle-true model of baseline pipelined VLIW architecture Synthesized to evaluate area, power Architecture optimized -FU exploration (#FU, wordwidth, # L/S-enabled FU) -Instruction set exploration (specialized instructions) -Data level parallelism exploitation (SIMD support) Optimized FU IS, data level parallelism exploration -> results transposable to the CGA view
© imec 2005 L. Van der Perre - Nov VLIW architecture The ADRES VLIW is considered A scalable cycle-true model is developed based on the LISA language
© imec 2005 L. Van der Perre - Nov LISATEK models VLIW and generates HDL: Fast feedback on architectural choices Pipelined behavior profiling
© imec 2005 L. Van der Perre - Nov A Synopsys based flow for Synthesis and Power Estimation has been followed analyze/elaborate rtl2saif compile Physical optimiz. and placement read_saif Power calculation Forward SAIF Backward SAIF RTL-Simulation Application (FFT) RTL-Design (verified) clock speed, area, average power 90nm tech Physical Compiler™ PrimePower™ Mentor Modelsim™
© imec 2005 L. Van der Perre - Nov The reachable clock speed strongly depends on the number of FUs The Instruction-Set Extensions cause an average frequency drop of 70 MHz. TSMC 90nm HP/HVT WCCOM: T = 125 °C V DD = 0,9 V
© imec 2005 L. Van der Perre - Nov Doubling the number of FUs doubles the chip sizes Area numbers are chip-size after preliminary placement TSMC 90nm HP/HVT WCCOM: T = 125 °C V DD = 0,9 V
© imec 2005 L. Van der Perre - Nov The average power is related to Instruction-Set complexity and utilization FFT Benchmark gross IPC * intrinsic optimized version TSMC 90nm HP/HVT NCCOM: T = 25 °C V DD = 1,0 V 0,98 0,97 1,91 1,85 3,61 3,39 5,22 4,47
© imec 2005 L. Van der Perre - Nov Effective Instruction Throughput in MIPS Instruction-Set Extension: lower Throughput more powerful Instruction
© imec 2005 L. Van der Perre - Nov Main lesson learned: Additional Exploitation of DLP pays off 4 FU 1FU 8 FU 2 FU 4 FU* 1FU* 2 FU* 8 FU* FFT Benchmark (64 points) Pareto optimal (802.11a/g real time) Note: performance improvement in sight
© imec 2005 L. Van der Perre - Nov SDRs for wireless terminals Why we wish for SDRs A must for cost: The technological scene A must for functionality: The wireless access schemes scene Why we should (not) design SDRs for: A holistic approach for low air interface HW architecture design & SW design Managing flexibility for low energy: the QoE approach
© imec 2005 L. Van der Perre - Nov Top-down mapping flow: from application to low power implementation IO mechanisms, Hardware multiplexing & #instances Subsystem development (matlab) Gradual refinement in C (simulink) Data types/bitwidth Primitive operations, Instruction set support (partial) Platform independent optimization (mainly DTSE on C descriptions) Platform independent optimization (mainly DTSE on C descriptions) Mapping plane Architecture influence Memory hierarchy/ Dimensions (address width)/ Sequential to concurrent transformation Platform dependent optimization and mapping Memory ports, validation of architecture parameters Semi-automatic
© imec 2005 L. Van der Perre - Nov C Inter-thread communication TLP Exploration (Sprint) CC … TLM and beyond To bring another great idea from happy matlab- producers optimized C code Matlab Algebraic transformation Matlab Data flow analysis Single Assignment C Platform dependent Thread level optimization Optimized Thread C code Thread scheduling template Processor Core optimization Platform template and interconnect optimization Explicit parallelism Platform independent optimization Single Assignment C Extract/maximize DLP Processor Design Environment (LISATek, Target and Beyond)
© imec 2005 L. Van der Perre - Nov Mapping of FLAI functionality on ADRES: intermediate evaluation looks promising CGA offers cheap and scalable instruction and loop level parallelism To cover the most demanding wireless modes combining CGA strengths with SIMD is promising. Memory access becomes dominant contribution after optimization, so ‘Data Transfer and Storage Exploration’ (DTSE) is vital to secure further performance/power gains
© imec 2005 L. Van der Perre - Nov Non-kernel code has data transfer energy significantly larger than arithmetics Hardware:Software: Example: DAB receiver core Non kernel code accounts for 50% in our application!
© imec 2005 L. Van der Perre - Nov Fast implementation with tools DTSE methodology and tools are used to tackled the memory access bottleneck Initial System Specification Accurate cost figures to guide decision System-level Feedback ? ? ? ? design alternatives
© imec 2005 L. Van der Perre - Nov Data Transfer Storage Exploration at work: DAB receiver Memory 66.6% Address & control Data path Code ROM (energy consumption measured on ) Systematic DTSE flow significantly reduces the (memory related) energy with high-level optimizations Extensive automation via the ATOMIUM tool suite
© imec 2005 L. Van der Perre - Nov DTSE enables breakthroughs: Example the Turbo Codec design High performance: 8.25dB coding gain High Throughput:75.6 Mb/s Low Latency:5.356 s Low Power:1.45 nJ/bit/iter POWERED BY ATOMIUM Power(mW) Data rate (Mbit/s) Bickerstaff (measured) Bekooij (estimated) Thul (estimated) SIPS 2001
© imec 2005 L. Van der Perre - Nov SDRs for wireless terminals Why we wish for SDRs A must for cost: The technological scene A must for functionality: The wireless access schemes scene Why we should (not) design SDRs for: A holistic approach for low air interface HW architecture design & SW design Managing flexibility for low energy: the QoE approach
© imec 2005 L. Van der Perre - Nov Flexibility for low power: enable and exploit Flexible air interface modem Multi- Format Multimedia Multi- Format Multimedia QoE manager QoE manager SDR front-end SDR front-end Run-time controls Enable performance/power scaling through Algorithmic ‘knobs’ and Architectural ‘knobs’ Exploit scalability to provide desired minimum power
© imec 2005 L. Van der Perre - Nov PHY (Analog + Digital) MAC + DLC Queuing / Scheduling Network / routing End-to-end transport Application King user requirements Constraints Configuration OSI stack Configure the complete stack to provide the just required QoE with minimum total energy per transaction
© imec 2005 L. Van der Perre - Nov Can we improve our design approach?Traditional New approaches Cross-layer (XL) design 1 worst-case operating point Scenario- based design too expensive not relevant 2 operating points Design-time/ run-time trade-off abstracted, discretized & pruned at design-time
© imec 2005 L. Van der Perre - Nov Key idea: Adaptive resource management Cross-layer design Optimize the system across classical layers, exchange information across layers and reduce overhead and inefficiencies Determine the relevant ‘knobs’ of the system Scenario-based design Design for different goals: Low average energy and/or low peak power consumption Fairness across multiple users Best quality for a given power consumption Maximum capacity for minimum required performance … Trade-off design/calibration/run-time Optimize at design-time for all static/predictable effects to minimize implementation cost
© imec 2005 L. Van der Perre - Nov System knobs in the radio have crucial impact on power consumption ARQ Analog FE Digital Inner Tx FEC encoding Analog Rx ARQ Digital Inner Rx FEC decoding Transmitter (Tx) Receiver (Rx) PA Tx Power Back-off Processing gain Code rate Packet size DL PHY Constellation order [SIPS2003] Link Transmit Receive
© imec 2005 L. Van der Perre - Nov So what can we achieve? /2 /3 18Mbps 30min ON 4GB transferred 9Mbps 90min ON 6GB transferred For the same total Energy budget
© imec 2005 L. Van der Perre - Nov Optimal cross-layer policy adapts to the circumstances For bad channel: scaling saves more energy For good channel: sleeping saves more energy Optimal policy adapts and saves maximally exploiting multi-user diversity /2.5/5
© imec 2005 L. Van der Perre - Nov Splitting the work between design- and run-time Design-time Run-time
© imec 2005 L. Van der Perre - Nov We can partition the global optimization problem into steps! PHY layer rate vs. energy trade-off curves (Design-time) Radio link control strategy PHY average rate vs. energy /bit tradeoff (run-time) Proportional controller DL average delay vs average energy/bit trade-off TCP throughput vs. energy/bit trade-off DL Delay constraint Average PHY Rate= goodput constraint
© imec 2005 L. Van der Perre - Nov Functionality: advanced schemes co-designed with front-end for QoE Broadband wireless access schemes QoE-enhanced SDR solutions & design BB implementation optimized code flexible/low power platform PHY and MAC enabling QoE relax/control front-end
© imec 2005 L. Van der Perre - Nov Wireless system designers sent to the hell of physics Telecom Algorithmists Ambient-Intelligent scenarists Heavenly Dreams To heaven? 200 Watt 600Gops 64bit… Minutes on Workstation Hard Reality 22 nm silicon Leaky devices Variability 800J/g battery Ultra low cost Platform RTOS Compilers <1 Watt 600Mops 12bit… 1 week on single battery charge
© imec 2005 L. Van der Perre - Nov Design and programming of SDRs: Another-great-idea turned crappy?
© imec 2005 L. Van der Perre - Nov Designing SDR systems: Don’t break the flow, enable it E E Design for low power systems: Cross-layer/QoE approach Optimize application mapping Optimize power based on activity of gates Design leakage aware (voltage islands, high Vt libraries) Low cost: go for the deep submicron challenge Low programming effort: C makes a nice entry Provide full validation flow (virtual platform models, emulator in the loop, early layout experiments, …) Take profit of advanced CAD: a lot has been automated! System Archi Circuit tecture
© imec 2005 The SDR dream to FLAI can come true: build on new technologies & communication schemes exploit flexibility for low power rely on an integrated advanced design flow
© imec 2005 L. Van der Perre - Nov SDR: a wireless dream come true?
© imec 2005 L. Van der Perre - Nov To thank scientist for making it come true Amongst many others: The IMEC crew: FLAI team: Andre B., Bruno B., Hans C., Eduardo D., Stefaan D., Veerle D., Miguel G., Francois H., Lieven H., Sven J., Eduardo L., Anthony N., David N., Frederik N., Jimmy N., Thomas S., Roeland V., Wim V., Jan-Willem W. Sofie P., Bruno B., Gregory L., & QoE team David N. & ADRES team Francky C., Boris C., Hugo D., Peter V. H. Meyer, K. Van Berkel, and their teams
© imec 2005 L. Van der Perre - Nov