Design and VLSI implementation of a digital audio-specific DSP core for MP3/AAC Kyoung Ho Bang, Nam Hun Jeong, Joon Seok Kim, Young Cheol Park and Dae Hee Youn IEEE Transactions on Consumer Electronics, page(s): 報告者 : 陳世偉 授課教師 : 黃英哲教授
92/03/24 Seminar – Shi-Wei Chen 2 Outline Introduction System architecture Instruction set System efficiency Conclusion
92/03/24 Seminar – Shi-Wei Chen 3 Introduction Standardized audio compression method – MP3 (MPEG1 layer3), AAC (Advanced Audio Coding) The consumer market – High compression ratio – The transparent quality Hardware performance are also rising. – DSP, ASIC, Microprocessor/Microcontroller
92/03/24 Seminar – Shi-Wei Chen 4 Comparison with CPUs CPUDSPMicroprocessor / Microcontroller ASIC Advantages1. Signal processing tasking 2. High performance 3. Advanced control techniques 4. Additional functions 1. On-chip Peripherals 2. Supervisory functions 3. Familiar architecture 4. Low power consumption 1. Particular application 2. High performance 3. ASICs are built by connecting existing circuit Disadvantages1. Limited peripherals 2. High power consumption 3. High hardware cost 1. Low performance 2. Computation delay 3. Numerical problems 4. Limited hardware resource 1. Hard to modify the actual target algorithm, no feasibility 2. Timing consumption and error-prone 3. Unsuitable for reuse DSP + ASIC + accelerator = digital audio-specific DSP core
92/03/24 Seminar – Shi-Wei Chen 5 Focus aspect aspect Method High-quality audio coding 1. MPEG-1/2 Layer II / III 2. MPEG-2 AAC 3. AC-3 Low power consumption 1. Minimum hardware resource - single ALU - 3 stage pipelining Harvard architecture 2. Disable unused hardware resource - using latch Easy programming1. Operation schedule 2. Hardware resource allocation
92/03/24 Seminar – Shi-Wei Chen 6 Audio-specific DSP feature Data processing unit – 20-bit data, 48-bit ALU, accelerator – Multiplier : 20-bit × 20-bit » signed × signed, signed × unsigned, unsigned × unsigned – Convergent rounder, limiter More than 18-bit PCM output One cycle MAC for F/T transform processing 2048 module addressing for management of 2048-size buffer of AAC 512-point FFT for AC-3 and AAC
92/03/24 Seminar – Shi-Wei Chen 7 System architecture 3-stage pipeline architecture – Instruction fetch stage – Instruction decode stage – Execution stage – One instruction/one clock cycle » Except branch instruction (2 clock cycles) Harvard architecture – Program memory – Data memory Load-Store architecture MAC = MU + ALU
92/03/24 Seminar – Shi-Wei Chen 8 DSP architecture Instruction Decoder Instruction Fetch Unit Data Processing Unit Data Addressing Generator X Data Memory Y Data Memory Program Memory Instruction Control signal PMDPMA DSP Core Execution Unit Condition Code D1 Bus D2 Bus XABYDBYABXDB PMD : Program Memory Data Bus PMA : Program Memory Address Bus XAB : X Data Memory Address Bus XDB : X Data Memory Data Bus YAB : Y Data Memory Address Bus YDB : Y Data Memory Data Bus
92/03/24 Seminar – Shi-Wei Chen 9 Instruction fetch Instruction address generator – PC – Branch instruction (immediate address) – Loop instruction Execution control All core units and additional off-core functional units operation control return Instruction decoder
92/03/24 Seminar – Shi-Wei Chen 10 Data address generator One index register file, two identical address calculation units. It generate two independent data address on each cycle Example: ro[des_line/18][des_line%18] = xr[src_line/18][src_line%18] return
92/03/24 Seminar – Shi-Wei Chen 11 Data processor unit B A S reg DC P reg X0, X1, Y0, Y1 ALUShifterMU XDB YDB AR FilesMRX, MRY, AMRX, AMRY, OMR, SR Imm. Data From S bus D1 bus D2 bus A1 busA2 bus D1 bus D2 bus (DC: Data Converter)
92/03/24 Seminar – Shi-Wei Chen 12 Instruction set Special instructions : UNPACK, HUFFMAN
92/03/24 Seminar – Shi-Wei Chen 13 System efficiency Design tool – VHDL Compile and simulate tool – SYNOPSYS tool 0.35μm, 3.3V COMOS technology – 40MHz ModuleGate Count Predictor AAC Huffman Decoder MP3 Huffman Decoder DSP Core
92/03/24 Seminar – Shi-Wei Chen 14 Quality test of MP NL : Noise level MER : Maximum error ratio Np : The number of processing bit No : The number of output PCM bit 1.09 ISO/IEC : NL < -101dB, MER : < 1
92/03/24 Seminar – Shi-Wei Chen 15 Clock cycle for MP3 decoder 348, Total sum (40MHz / 48kHz sample rate) × 1152 = 960,000 cycles/frame
92/03/24 Seminar – Shi-Wei Chen 16 Memory requirement H/W ResourceSize (word) MP3 Program memory2.2k Data ROM1.4k Data RAM5.7k AAC Program memory4.1k Data ROM5.5k Data RAM7.1k
92/03/24 Seminar – Shi-Wei Chen 17 Evaluation board
92/03/24 Seminar – Shi-Wei Chen 18 Conclusion The system consists of a 20-bit fixed- point DSP core for the software implementation and a hardware accelerator. The decoding system can decode MP3 using only MIPS with high efficiency. The digital audio-specific DSP core is suitable for embedded system. ?