Download presentation
Presentation is loading. Please wait.
Published bySylvia Bridges Modified over 6 years ago
1
Reza Yazdani Albert Segura José-María Arnau Antonio González
An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition Reza Yazdani Albert Segura José-María Arnau Antonio González
2
Automatic Speech Recognition (ASR)
Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
3
ASR Requirements Voice-based user-interfaces for mobile devices
Large Vocabulary Speaker-independent High Accuracy Real-time Performance Energy Efficiency Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
4
ASR Solutions General-purpose platforms Reza Yazdani
An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
5
Outline Motivation Automatic Speech Recognition Accelerated ASR System
Memory Subsystem Optimizations Prefetcher Bandwidth Reduction Experimental Results Conclusions Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
6
Automatic Speech Recognition
State-of-the-art ASR system Hybrid model: DNN + HMM Feature Extraction Likelihood Computation \ Graph Search Sound Signal Speech (words) GPU Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
7
Graph Search Dictionary Training Graph Generator Viterbi Search
Weighted-Finite-State-Transducer Training Graph Generator Viterbi Search Acoustic model Language model Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
8
Viterbi Search A simple example of WFST for detecting 2 words: three and two Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
9
Viterbi Search THREE 0.3 0.21 Frame 0 Frame 1 Frame 2 Frame 3 0.0015
0.54 0.3 0.0012 0.0009 0.46 0.0018 1.0 Pruning! THREE Pruning! Pruning! Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
10
Outline Motivation Automatic Speech Recognition Accelerated ASR System
Memory Subsystem Optimizations Prefetcher Bandwidth Reduction Experimental Results Conclusions Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
11
Accelerated ASR System
Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
12
Accelerator’s Architecture
Average active states On each frame evaluation: Less than 1%! Viterbi Accelerator WFST Dynamic Search Graph Acoustic Scores Main Memory w1 … 1 2 … 4 6 7 w2 Frame i Frame i+1 Solution: Hash Table w3 w4 State ID Token Info 6 … State Index Token frame t th uw r iy 1 0.9 0.025 2 0.7 0.012 0.25 0.12 3 Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
13
Outline Motivation Automatic Speech Recognition Accelerated ASR System
Memory Subsystem Optimizations Prefetcher Bandwidth Reduction Experimental Results Conclusions Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
14
Potential Improvement
Perfect caches and hash tables Speedups with respect to the baseline architecture 94.6% Improvement Large Memory Footprint (34million Arcs) Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
15
Hardware Prefetching Dynamic access of a small sparsely distributed subset of arcs On average: 25K out of 34M arcs Conventional prefetchers are inefficient Graph search exhibits unpredictable access pattern Pruning unlikely paths causes more unpredictability Our proposed scheme based on the decoupled access-execute All memory addresses are deterministic after the pruning Issue memory requests much in advance High accuracy: computed rather than predicted addresses Timeliness: reorder-buffer to avoid early evictions 94% speedup with a negligible area overhead of 0.05% Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
16
Outline Motivation Automatic Speech Recognition Accelerated ASR System
Memory Subsystem Optimizations Prefetcher Bandwidth Reduction Experimental Results Conclusions Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
17
Bandwidth Reduction 97% of dynamically expanded states have less than 16 arcs A novel technique for directly computing arc addresses Changing the memory layout of the WFST dataset Avoid memory access for fetching state’s data 20% Memory Bandwidth Saving at a negligible cost of 0.02% area increase Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
18
Outline Motivation Automatic Speech Recognition Accelerated ASR System
Memory Subsystem Optimizations Prefetcher Bandwidth Reduction Experimental Results Conclusions Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
19
Evaluation Methodology
Viterbi accelerator's timing estimation A cycle-accurate simulator Execution and activity factors RTL Verilog model for logic components Design frequency Modeling memory parts with CACTI Cache&Memory latency Power model Memory & Caches: Cacti Logic: Synopsys Design Compiler Technology node: 28nm Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
20
Experimental Results 111.47x Speedup 16.7x Speedup 1185x Reduction
Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
21
Outline Motivation Automatic Speech Recognition Accelerated ASR System
Memory Subsystem Optimizations Prefetcher Bandwidth Reduction Experimental Results Conclusions Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
22
Conclusion Viterbi search is the main bottleneck in ASR systems
General-purpose solutions Not real-time for large speech models High energy consumption Design of an accelerator tailored for the Viterbi Search More energy-efficient (by orders of magnitude) Memory subsystem is the main challenge of ASR Arc prefetcher Memory bandwidth reduction 1.7x faster than NVIDIA GTX 980 and 287x less energy Reza Yazdani An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
23
Reza Yazdani Albert Segura José-María Arnau Antonio González
An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition Reza Yazdani Albert Segura José-María Arnau Antonio González
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.