Exploring Hyperdimensional Associative Memory

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Information Representation

PARTITIONAL CLUSTERING

Presented by Xinyu Chang

Slide 1 Bayesian Model Fusion: Large-Scale Performance Modeling of Analog and Mixed- Signal Circuits by Reusing Early-Stage Data Fa Wang*, Wangyang Zhang*,

Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18.

Analog-to-Digital Converter (ADC) And

Efficiently searching for similar images (Kristen Grauman)

Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.

Multivariate Methods Pattern Recognition and Hypothesis Testing.

Chapter 2: Pattern Recognition

Distance Measures Tan et al. From Chapter 2.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

FLANN Fast Library for Approximate Nearest Neighbors

L. Padmasree Vamshi Ambati J. Anand Chandulal J. Anand Chandulal M. Sreenivasa Rao M. Sreenivasa Rao Signature Based Duplicate Detection in Digital Libraries.

Floating point variables of different lengths. Trade-off: accuracy vs. memory space Recall that the computer can combine adjacent bytes in the RAM memory.

Interconnect Networks

Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.

Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

1 Iris Recognition Ying Sun AICIP Group Meeting November 3, 2006.

An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)

381 Self Organization Map Learning without Examples.

Timo O. Korhonen, HUT Communication Laboratory 1 Convolutional encoding u Convolutional codes are applied in applications that require good performance.

Computer Systems. Bits Computers represent information as patterns of bits A bit (binary digit) is either 0 or 1 –binary  “two states” true and false,

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

GPGPU Performance and Power Estimation Using Machine Learning Gene Wu – UT Austin Joseph Greathouse – AMD Research Alexander Lyashevsky – AMD Research.

Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:

23/07/2016CSE1303 Part B lecture notes 1 Introduction to computer systems Lecture B01 Lecture notes section B01.

MECH 373 Instrumentation and Measurements

Chapter 3 Data Representation

Experience Report: System Log Analysis for Anomaly Detection

Introduction to Discrete-Time Control Systems fall

‡University of California Berekely

†UC Berkeley, ‡University of Bologna, and *ETH Zurich

Multiplicative updates for L1-regularized regression

4.2 Digital Transmission Pulse Modulation (Part 2.1)

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE

Hyperdimensional Computing with 3D VRRAM In-Memory Kernels: Device-Architecture Co-Design for Energy-Efficient, Error-Resilient Language Recognition H.

Introduction Introduction to VHDL Entities Signals Data & Scalar Types

Recognizing Deformable Shapes

Deep Neural Network with Stochastic Computing

Energy Efficient Computing in Nanoscale CMOS

Basic machine learning background with Python scikit-learn

S Digital Communication Systems

Mohsen Imani, Saransh Gupta, Tajana S. Rosing

Digital Integrated Circuits A Design Perspective

CSE 370 – Winter 2002 – Comb. Logic building blocks - 1

mEEC: A Novel Error Estimation Code with Multi-Dimensional Feature

Triangular Sorter Using Memristive Architecture.

COSC 4335: Other Classification Techniques

†UCSD, ‡UCSB, EHTZ*, UNIBO*

Digital Fundamentals Floyd Chapter 1 Tenth Edition

Clustering Wei Wang.

Discrete Controller Synthesis

Sahand Salamat, Mohsen Imani, Behnam Khaleghi, Tajana Šimunić Rosing

Exploring Hyperdimensional Associative Memory

Abbas Rahimi, Pentti Kanerva, Jan M. Rabaey UC Berkeley

EECS Department, UC Berkeley

COMPUTER ORGANIZATION AND ARCHITECTURE

COMPUTER ORGANIZATION

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &

Modeling IDS using hybrid intelligent systems

Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.

Overview: Chapter 2 Localization and Tracking

Fault Mitigation of Switching Lattices under the Stuck-At Model

Mohsen Imani, Saransh Gupta, Yeseong Kim, Tajana Rosing

Presentation transcript:

Exploring Hyperdimensional Associative Memory Mohsen Imani, Abbas Rahimi, Deqian Kong, Tajana Rosing, and Jan M. Rabaey CSE Department, UC San Diego EECS Department, UC Berkeley

Outline Exploring HAM Designs and Optimizations Experimental Results Background in HD Computing Application Example: Language Recognition Exploring HAM Designs and Optimizations Digital HAM (D-HAM) Resistive HAM (R-HAM) Analog HAM (A-HAM) Experimental Results Summary

Brain-inspired Hyperdimensional Computing Hyperdimensional (HD) computing [P. Kanerva, Cognitive Computation’09]: Emulation of cognition by computing with high-dimensional vectors as opposed to computing with numbers Information distributed in high-dimensional space Supports full algebra Superb properties: General and scalable model of computing Well-defined set of arithmetic operations Fast and one-shot learning (no need of back-prop) Memory-centric with embarrassingly parallel operations Extremely robust against most failure mechanisms and noise Overcome low SNR, and large variability in both data and platform to perform robust decision making and classification improving both performance and energy efficiency!

What Are Hypervectors? Distributed pattern–based data representations and arithmetic in contrast to computing with numbers! Hypervectors are: high-dimensional (e.g., 10,000 dimensions) (pseudo)random with i.i.d. components holographically distributed (i.e., not microcoded) Hypervectors can: use various coding: dense or sparse, bipolar or binary be combined using arithmetic operations: multiplication, addition, and permutation (MAP) be compared for similarity using distance metrics, e.g., Hamming distance

Mapping to Hypervectors Each symbol is represented by a 10,000−D hypervector chosen at random: A = [−1 +1 −1 −1 −1 +1 −1 −1 ...] B = [+1 −1 +1 +1 +1 −1 +1 −1 ...] C = [−1 −1 −1 +1 +1 −1 +1 −1 ...] D = [−1 −1 −1 +1 +1 −1 +1 −1 ...] ... Z = [−1 −1 +1 −1 +1 +1 +1 −1 ...] Every letter hypervector is dissimilar to others, e.g., ⟨A, B⟩ = 0 This assignment is fixed throughout computation Item Memory (iM) “a” A 8 10,000

Example Problem: Language Recognition Identify the language from a stream of letters (n-grams) Train with a megabyte of text from each of 21 EU languages Test with 1,000 sentences from each (from independent source) Item Memory Identified language Encoding: MAP operations Associative Memory Letter hypervector 10,000-D Text hypervector Test sentence: “daher stimme ich gegen anderungsantrag welcher” Item Memory dutch 21×10,000-D learned language hypervectors Encoding: MAP operations Associative Memory Letter hypervector 10,000-D Language hypervector Train text: “ik wil hier en daar een greep uit de geschiedenis doen en ga over...”

Its Algebra is General: Architecture Can Be Reused Associative memory Letter 8-bit 10K-bit Languages: 21 classes Item memory Encoder: MAP operations Associative memory S1 5-bit 10K-bit Item memory Hand gestures: 5 classes S2 S3 S4 Encoder: MAP operations Common denominator for searches Applications n-grams HD Baseline Language identification [ISLPED’16] n=3 96.7% 97.9% Text categorization [DATE’16] n=5 94.2% 86.4% EMG gesture recognition [ICRC’16] n∈ [3,5] 97.8% 89.7% EEG brain-machine interface [BICT’17] n∈ [16,29] 74.5% 69.5%

Contributions Facilitate energy-efficient, fast, and scalable comparison of large patterns stored in memory : Digital CMOS-based HAM (D-HAM): Modular scaling Resistive HAM (R-HAM): Timing discharge for approximate distance computing Analog HAM (A-HAM): Faster and denser alternative For a moderate accuracy of 94%: Approximation techniques: sampling, voltage overscaling, bit width optimization, current-based search and comparators R-HAM improves the energy-delay product by 9.6× A-HAM improves the energy-delay product by 1347× compared to D-HAM

HAM for Language Recognition Structure of hyperdimensional associative memory (HAM): Comparing D=10,000 bits

D-HAM: Digital HAM XOR array (D×C): bit-level comparison of query hypervector with all “learned” hypervectors Counters (C): counting the number of mismatches for every row Comparators: finding a row with minimum Hamming distance

DHAM Optimization: Sampling Language classification accuracy Maximum accuracy: 96.4% Moderate accuracy: up to 4% lower accuracy than maximum Sampling: smaller dimensionality (d < 10,000) Utilize HD robustness d=9,000 bits: 7% energy saving at maximum accuracy d=7,000 bits: 22% energy saving at moderate accuracy

R-HAM: Resistive HAM 2T-2R CAM Array: No XOR! CAM Array: M blocks each with D/M bits (max 4 bits) CAM blocks generate a non-binary code Parallel counters to compute distance among partial CAM blocks

R-HAM Optimizations: Switching Activity Explore block size: 1 to 4 bits Non-binary code to represent the accumulative distances: Q[1]= 1 for “Distance 1 bit” Q[2]= 1 for “Distance 2 bits” Q[3]= 1 for “Distance 3 bits” Q[4]= 1 for “Distance = 4 bits” Lower switching activity in CAM, counters, and comparators

R-HAM Optimizations: Voltage Overscaling Voltage overscaling for blocks rather than blind sampling Target Accuracy Energy Saving: Sampling Energy Saving: Voltage Overscaling Maximum 9%: 250 blocks off 18%: 1000 blocks at 780mV Moderate 22%: 750 blocks off 50%: 2500 blocks at 780mV

A-HAM: Analog HAM Current-based analog search in CAM Lowest discharging current  closest distance row Loser Taken All (LTA): finds the lower current among two inputs Faster and denser alternative I1 LTA I1 if I2> I1 I2 if I1> I2 I2

Multistage A-HAM Discharging current saturates for large D Cannot detect small distances Split the search operation to multiple shorter stages Improves the minimum detectable Hamming distance by 3.1X

Experimental Results TSMC’s 45 nm, LP process, high VTH cells Normalized energy-delay-product to D-HAM Maximum accuracy: R-HAM 7.3X and A-HAM 746X Moderate accuracy: R-HAM 9.6X and A-HAM 1347X Area compared to D-HAM R-HAM is 1.4X and A-HAM is 3X lower area Increasing dimensionality by 20X: D-HAM: ~8.3X energy and 2.2X delay increase R-HAM: ~8.2X energy and 2.0X delay increase A-HAM: ~1.9X energy and 1.7X delay increase Increasing classes by 15X: D-HAM: ~12.6X energy and 3.5X delay increase R-HAM: ~11.4X energy and 3.4X delay increase A-HAM: ~15.9X energy and 4.4X delay increase A-HAM is sensitive to >5% voltage variability

Summary HD computing: manipulating and comparing large patterns stored in memory Explored digital, resistive, and analog HAM designs by sampling, voltage overscaling, bit width optimization, current-based search and comparators For a moderate accuracy of 94%: A-HAM improves EDP by 1347× A-HAM has 3X lower area Multi-stage A-HAM scales with higher dimensionality and linearly with number of classes But sensitive to >5% voltage variability

Acknowledgment This work was supported in part by Systems on Nanoscale Information fabriCs (SONIC), one of the six SRC STARnet Centers, sponsored by MARCO and DARPA.