Download presentation
Presentation is loading. Please wait.
Published byMarian Park Modified over 6 years ago
1
A 100 µW, 16-Channel, Spike-Sorting ASIC with On-the-Fly Clustering
PROGRESS UPDATE SUMMER 2010 A 100 µW, 16-Channel, Spike-Sorting ASIC with On-the-Fly Clustering Vaibhav Karkare
2
Spike Sorting Spike sorting: The process of classifying action potentials according to the source neurons Detection (D) & Alignment (A) Feature Extraction (FE) Clustering (C) 2
3
Spike-Sorting DSP Chip
Technology 1P8M 90-nm CMOS Core VDD 0.55 V Gate count 650 k Clock domains 0.4 MHz, 1.6 MHz Power 2 µW/channel Data reduction 91.25 % No. of channels 16, 32, 48, 64 SNR −2.2 dB Median PD 86 % 87 % PFA 1 % 5 % Class. accuracy 92 % 77 % 64-Channel Spike-Sorting DSP
4
Previous Work None of the previous DSPs support online clustering
Reference JNE ’07 JSSC ’05 ISSCC ’08 ISCAS ’09 ASSCC ‘09 No. of Channels 96 32 1 128 64 Power (μW/channel) 104 75 100 14.6 2.03 Area (mm2/channel) - 0.11 1.58 0.01 0.06 Power density (μW/mm2) 680 60 1460 30 Process (nm) FPGA 500 350 90 Core voltage (V) 3 3.3 1.08 0.55 Detection ü Alignment × Feature extraction None of the previous DSPs support online clustering
5
Importance of Online Clustering
Several applications require on-the-fly spike sorting Spike sorting is not complete until clustering is implemented Latencies of offline clustering cannot be accepted for real-time, multi-channel recordings Example: Brain-Computer Interface Clustering provides 240-times reduction in data-rate when compared to raw data transmission Will reduce transmit power by ~240x Transmit power is dominant for a multi-channel system which transmits “wide band” neural data 48 samples/spike 8 bits/sample = 384 bits /spike With clusters only cluster id of 4 bits (for supporting 16 neurons) needs to be transmitted = 4 bits/spike 384/4 = 96x reduction wrt spike transmission Detection vs raw data: Raw data bits/sec = 24,000*8 = bps. With spike id transmission 100*4 = 400 bps => 480x reduction
6
Challenges in Online Clustering
Conventional clustering algorithms are developed for offline clustering Examples: k-means, fuzzy-c-means, super-para-magnetic clustering, valley seeking Data storage of a few TB is required Infeasible for on-chip implementation Online sorting algorithm developed at CalTech Available as a part of Osort software package Collaborators use this software Only algorithm amenable to hardware implementation
7
Online Clustering Algorithm
If d < Threshold If d > Threshold #1 #2 centroid assign create 1st data point 2nd data point d cluster #1 #3 If dmin < Threshold If dmin > Threshold #1 #2 assign create 3rd data point dmin Nth data point dmin If dmin < Threshold→ merge
8
Direct-Mapped Implementation
Large memory requirement for low-power, multi-channel DSP implementation We need 14 kb/channel for storage of cluster means A 224 kb SRAM for 16 channels consumes 1.12 mA of leakage current Each distance calculation entails 95 addition operations and 48 squaring operations Up to 1936 distance computations may be needed for an incoming spike Need to revisit the algorithm to identify simplifications for an implantable ASIC solutions
9
Template Matching for Clustering
Template-matching based classification Osort implemented sequentially Template matching for multi-channel, real time Advantages 14 kb (training) kb*N of memory 44*N for direct-mapped design Max. 6 dist. computations / spike for temp. matching Max dist. computations / spike for temp. matching Scalable Design
10
Computational Simplifications
Use L1-norm instead of L2-norm Approximate cluster mean calculation Approximate merged-mean calculation
11
Error Tolerance in Clustering
Condition on error in cluster mean computation Valid for any source of error Evaluation of simplifications based on 600+ data sets of simulated neural data Accuracy/ Simplifications Median Mean None 0.72 0.71 L1 Norm 0.87 0.77 Cluster Mean 0.88 Cluster Merge 0.85 0.76 Template matching
12
Osort Chip Architecture
Fully Synchronous Design “Training Required” indicator Parallel training and template matching External / Internal threshold for clustering
13
Architecture Analysis
Assumptions for regular E-D analysis are not valid Fixed operating frequency Register dominance Separate logic and flip-flop memory modules Use HVT for flip-flops, SVT for logic Reduced supply voltage for memory Level conversion between memory and logic modules
14
Flip-flop based memories
DFF-based memory as opposed to SRAM Operation at reduced voltages Up to 5-times lower leakage Delay-line based clock Data is not shifted each cycle Clock is valid only for one register in entire memory
15
Serial Processing of Parallel Data
Implement serial processing at a faster clock Reduces logic leakage Would not be possible for direct-mapped, multi-channel implementation
16
16-Channel Spike-Sorting DSP with On-the-Fly Clustering
Chip Summary Technology 65-nm Core VDD 0.5 V / 0.3 V Clock rate 384 kHz CA 82 % Power 100 µW Data reduction 240 x # Channels 16 Area 2.45 mm2 Power Density 40.8 µW/mm2 16-Channel Spike-Sorting DSP with On-the-Fly Clustering
17
Sarah Gibson, Chia-Hsiang Yang, and Victoria Wang
Conclusions Demonstrated first spike-sorting DSP with multi-channel, on-the-fly clustering DSP consumes 100 µW of power and occupies 2.45 mm2 in a 65-nm 1P8M CMOS process A 240-times reduction is obtained in output data-rate when compared to raw data transmission Template-matching based clustering is implemented with simplified online sorting for template identification Fully synchronous, serialized architecture is used to reduce the dominant static power consumption Acknowledgments Sarah Gibson, Chia-Hsiang Yang, and Victoria Wang
18
Questions / Comments?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.