Exploring Hyperdimensional Associative Memory

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Information Representation
PARTITIONAL CLUSTERING
Presented by Xinyu Chang
Slide 1 Bayesian Model Fusion: Large-Scale Performance Modeling of Analog and Mixed- Signal Circuits by Reusing Early-Stage Data Fa Wang*, Wangyang Zhang*,
Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18.
Analog-to-Digital Converter (ADC) And
Efficiently searching for similar images (Kristen Grauman)
Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.
Multivariate Methods Pattern Recognition and Hypothesis Testing.
Chapter 2: Pattern Recognition
Distance Measures Tan et al. From Chapter 2.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
FLANN Fast Library for Approximate Nearest Neighbors
L. Padmasree Vamshi Ambati J. Anand Chandulal J. Anand Chandulal M. Sreenivasa Rao M. Sreenivasa Rao Signature Based Duplicate Detection in Digital Libraries.
Floating point variables of different lengths. Trade-off: accuracy vs. memory space Recall that the computer can combine adjacent bytes in the RAM memory.
Interconnect Networks
Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
1 Iris Recognition Ying Sun AICIP Group Meeting November 3, 2006.
An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)
381 Self Organization Map Learning without Examples.
Timo O. Korhonen, HUT Communication Laboratory 1 Convolutional encoding u Convolutional codes are applied in applications that require good performance.
Computer Systems. Bits Computers represent information as patterns of bits A bit (binary digit) is either 0 or 1 –binary  “two states” true and false,
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
GPGPU Performance and Power Estimation Using Machine Learning Gene Wu – UT Austin Joseph Greathouse – AMD Research Alexander Lyashevsky – AMD Research.
Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:
23/07/2016CSE1303 Part B lecture notes 1 Introduction to computer systems Lecture B01 Lecture notes section B01.
MECH 373 Instrumentation and Measurements
Chapter 3 Data Representation
Experience Report: System Log Analysis for Anomaly Detection
Introduction to Discrete-Time Control Systems fall
‡University of California Berekely
†UC Berkeley, ‡University of Bologna, and *ETH Zurich
Multiplicative updates for L1-regularized regression
4.2 Digital Transmission Pulse Modulation (Part 2.1)
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE
Hyperdimensional Computing with 3D VRRAM In-Memory Kernels: Device-Architecture Co-Design for Energy-Efficient, Error-Resilient Language Recognition H.
Introduction Introduction to VHDL Entities Signals Data & Scalar Types
Recognizing Deformable Shapes
Deep Neural Network with Stochastic Computing
Energy Efficient Computing in Nanoscale CMOS
Basic machine learning background with Python scikit-learn
S Digital Communication Systems
Mohsen Imani, Saransh Gupta, Tajana S. Rosing
Digital Integrated Circuits A Design Perspective
CSE 370 – Winter 2002 – Comb. Logic building blocks - 1
mEEC: A Novel Error Estimation Code with Multi-Dimensional Feature
Triangular Sorter Using Memristive Architecture.
COSC 4335: Other Classification Techniques
†UCSD, ‡UCSB, EHTZ*, UNIBO*
Digital Fundamentals Floyd Chapter 1 Tenth Edition
Clustering Wei Wang.
Discrete Controller Synthesis
Sahand Salamat, Mohsen Imani, Behnam Khaleghi, Tajana Šimunić Rosing
Exploring Hyperdimensional Associative Memory
Abbas Rahimi, Pentti Kanerva, Jan M. Rabaey UC Berkeley
EECS Department, UC Berkeley
COMPUTER ORGANIZATION AND ARCHITECTURE
COMPUTER ORGANIZATION
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Modeling IDS using hybrid intelligent systems
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Overview: Chapter 2 Localization and Tracking
Fault Mitigation of Switching Lattices under the Stuck-At Model
Mohsen Imani, Saransh Gupta, Yeseong Kim, Tajana Rosing
Presentation transcript:

Exploring Hyperdimensional Associative Memory Mohsen Imani, Abbas Rahimi, Deqian Kong, Tajana Rosing, and Jan M. Rabaey CSE Department, UC San Diego EECS Department, UC Berkeley

Outline Exploring HAM Designs and Optimizations Experimental Results Background in HD Computing Application Example: Language Recognition Exploring HAM Designs and Optimizations Digital HAM (D-HAM) Resistive HAM (R-HAM) Analog HAM (A-HAM) Experimental Results Summary

Brain-inspired Hyperdimensional Computing Hyperdimensional (HD) computing [P. Kanerva, Cognitive Computation’09]: Emulation of cognition by computing with high-dimensional vectors as opposed to computing with numbers Information distributed in high-dimensional space Supports full algebra Superb properties: General and scalable model of computing Well-defined set of arithmetic operations Fast and one-shot learning (no need of back-prop) Memory-centric with embarrassingly parallel operations Extremely robust against most failure mechanisms and noise Overcome low SNR, and large variability in both data and platform to perform robust decision making and classification improving both performance and energy efficiency!

What Are Hypervectors? Distributed pattern–based data representations and arithmetic in contrast to computing with numbers! Hypervectors are: high-dimensional (e.g., 10,000 dimensions) (pseudo)random with i.i.d. components holographically distributed (i.e., not microcoded) Hypervectors can: use various coding: dense or sparse, bipolar or binary be combined using arithmetic operations: multiplication, addition, and permutation (MAP) be compared for similarity using distance metrics, e.g., Hamming distance

Mapping to Hypervectors Each symbol is represented by a 10,000−D hypervector chosen at random: A = [−1 +1 −1 −1 −1 +1 −1 −1 ...] B = [+1 −1 +1 +1 +1 −1 +1 −1 ...] C = [−1 −1 −1 +1 +1 −1 +1 −1 ...] D = [−1 −1 −1 +1 +1 −1 +1 −1 ...] ... Z = [−1 −1 +1 −1 +1 +1 +1 −1 ...] Every letter hypervector is dissimilar to others, e.g., ⟨A, B⟩ = 0 This assignment is fixed throughout computation Item Memory (iM) “a” A 8 10,000

Example Problem: Language Recognition Identify the language from a stream of letters (n-grams) Train with a megabyte of text from each of 21 EU languages Test with 1,000 sentences from each (from independent source) Item Memory Identified language Encoding: MAP operations Associative Memory Letter hypervector 10,000-D Text hypervector Test sentence: “daher stimme ich gegen anderungsantrag welcher” Item Memory dutch 21×10,000-D learned language hypervectors Encoding: MAP operations Associative Memory Letter hypervector 10,000-D Language hypervector Train text: “ik wil hier en daar een greep uit de geschiedenis doen en ga over...”

Its Algebra is General: Architecture Can Be Reused Associative memory Letter 8-bit 10K-bit Languages: 21 classes Item memory Encoder: MAP operations Associative memory S1 5-bit 10K-bit Item memory Hand gestures: 5 classes S2 S3 S4 Encoder: MAP operations Common denominator for searches Applications n-grams HD Baseline Language identification [ISLPED’16] n=3 96.7% 97.9% Text categorization [DATE’16] n=5 94.2% 86.4% EMG gesture recognition [ICRC’16] n∈ [3,5] 97.8% 89.7% EEG brain-machine interface [BICT’17] n∈ [16,29] 74.5% 69.5%

Contributions Facilitate energy-efficient, fast, and scalable comparison of large patterns stored in memory : Digital CMOS-based HAM (D-HAM): Modular scaling Resistive HAM (R-HAM): Timing discharge for approximate distance computing Analog HAM (A-HAM): Faster and denser alternative For a moderate accuracy of 94%: Approximation techniques: sampling, voltage overscaling, bit width optimization, current-based search and comparators R-HAM improves the energy-delay product by 9.6× A-HAM improves the energy-delay product by 1347× compared to D-HAM

HAM for Language Recognition Structure of hyperdimensional associative memory (HAM): Comparing D=10,000 bits

D-HAM: Digital HAM XOR array (D×C): bit-level comparison of query hypervector with all “learned” hypervectors Counters (C): counting the number of mismatches for every row Comparators: finding a row with minimum Hamming distance

DHAM Optimization: Sampling Language classification accuracy Maximum accuracy: 96.4% Moderate accuracy: up to 4% lower accuracy than maximum Sampling: smaller dimensionality (d < 10,000) Utilize HD robustness d=9,000 bits: 7% energy saving at maximum accuracy d=7,000 bits: 22% energy saving at moderate accuracy

R-HAM: Resistive HAM 2T-2R CAM Array: No XOR! CAM Array: M blocks each with D/M bits (max 4 bits) CAM blocks generate a non-binary code Parallel counters to compute distance among partial CAM blocks

R-HAM Optimizations: Switching Activity Explore block size: 1 to 4 bits Non-binary code to represent the accumulative distances: Q[1]= 1 for “Distance 1 bit” Q[2]= 1 for “Distance 2 bits” Q[3]= 1 for “Distance 3 bits” Q[4]= 1 for “Distance = 4 bits” Lower switching activity in CAM, counters, and comparators

R-HAM Optimizations: Voltage Overscaling Voltage overscaling for blocks rather than blind sampling Target Accuracy Energy Saving: Sampling Energy Saving: Voltage Overscaling Maximum 9%: 250 blocks off 18%: 1000 blocks at 780mV Moderate 22%: 750 blocks off 50%: 2500 blocks at 780mV

A-HAM: Analog HAM Current-based analog search in CAM Lowest discharging current  closest distance row Loser Taken All (LTA): finds the lower current among two inputs Faster and denser alternative I1 LTA I1 if I2> I1 I2 if I1> I2 I2

Multistage A-HAM Discharging current saturates for large D Cannot detect small distances Split the search operation to multiple shorter stages Improves the minimum detectable Hamming distance by 3.1X

Experimental Results TSMC’s 45 nm, LP process, high VTH cells Normalized energy-delay-product to D-HAM Maximum accuracy: R-HAM 7.3X and A-HAM 746X Moderate accuracy: R-HAM 9.6X and A-HAM 1347X Area compared to D-HAM R-HAM is 1.4X and A-HAM is 3X lower area Increasing dimensionality by 20X: D-HAM: ~8.3X energy and 2.2X delay increase R-HAM: ~8.2X energy and 2.0X delay increase A-HAM: ~1.9X energy and 1.7X delay increase Increasing classes by 15X: D-HAM: ~12.6X energy and 3.5X delay increase R-HAM: ~11.4X energy and 3.4X delay increase A-HAM: ~15.9X energy and 4.4X delay increase A-HAM is sensitive to >5% voltage variability

Summary HD computing: manipulating and comparing large patterns stored in memory Explored digital, resistive, and analog HAM designs by sampling, voltage overscaling, bit width optimization, current-based search and comparators For a moderate accuracy of 94%: A-HAM improves EDP by 1347× A-HAM has 3X lower area Multi-stage A-HAM scales with higher dimensionality and linearly with number of classes But sensitive to >5% voltage variability

Acknowledgment This work was supported in part by Systems on Nanoscale Information fabriCs (SONIC), one of the six SRC STARnet Centers, sponsored by MARCO and DARPA.