Zhen Dong1,2, Haitong Li1, and H.-S. Philip Wong1

Slides:



Advertisements
Similar presentations
Neural networks Introduction Fitting neural networks
Advertisements

2007 MURI Review The Effect of Voltage Fluctuations on the Single Event Transient Response of Deep Submicron Digital Circuits Matthew J. Gadlage 1,2, Ronald.
RS RTN CIRCUIT LEVEL ULTRA FAST CIRCUIT UPC – UAB
Digital to Analog and Analog to Digital Conversion
Chapter 9 Memory Basics Henry Hexmoor1. 2 Memory Definitions  Memory ─ A collection of storage cells together with the necessary circuits to transfer.
Overview Memory definitions Random Access Memory (RAM)
Traffic Sign Recognition Using Artificial Neural Network Radi Bekker
Curriculum Learning Yoshua Bengio, U. Montreal Jérôme Louradour, A2iA
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Artificial Neural Networks An Introduction. What is a Neural Network? A human Brain A porpoise brain The brain in a living creature A computer program.
Soft Computing Lecture 8 Using of perceptron for image recognition and forecasting.
Neural Networks in Computer Science n CS/PY 231 Lab Presentation # 1 n January 14, 2005 n Mount Union College.
M. Wang, T. Xiao, J. Li, J. Zhang, C. Hong, & Z. Zhang (2014)
Teachers Name - Suman Sarker Subject Name Subject Name – Industrial Electronics (6832) Department Department – Computer (3rd) IDEAL INSTITUTE OF SCIENCE.
Comparing TensorFlow Deep Learning Performance Using CPUs, GPUs, Local PCs and Cloud Pace University, Research Day, May 5, 2017 John Lawrence, Jonas Malmsten,
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Network Compression and Speedup
Hybrid Deep Learning for Reflectance Confocal Microscopy Skin Images
Deep Learning for Dual-Energy X-Ray
Digital-to-Analog Analog-to-Digital
Analysis of Sparse Convolutional Neural Networks
KCS 2016 Multilevel Resistive Switching Memory based on Two-Dimensional (2D) Nanomaterials Gwang Hyuk Shin, Byung Chul Jang, Myung Hun Woo, and Sung-Yool.
The Relationship between Deep Learning and Brain Function
Deep Learning Amin Sobhani.
Compact Bilinear Pooling
Building Adaptive Basis Function with Continuous Self-Organizing Map
PRESENTED BY SAI KRISHNA.R (2-1) NRIIT TEJASWI.K(2-1)
Chilimbi, et al. (2014) Microsoft Research
Neural Network Implementations on Parallel Architectures
Hyperdimensional Computing with 3D VRRAM In-Memory Kernels: Device-Architecture Co-Design for Energy-Efficient, Error-Resilient Language Recognition H.
Learning Mid-Level Features For Recognition
Welcome.
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Digital-to-Analog Analog-to-Digital
Parallel Processing and GPUs
Random walk initialization for training very deep feedforward networks
EE345: Introduction to Microcontrollers Memory
Optimization of PHEV/EV Battery Charging
Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang
Digital Computer Electronics TTL
Layer-wise Performance Bottleneck Analysis of Deep Neural Networks
CNNs and compressive sensing Theoretical analysis
Introduction to Neural Networks
SBNet: Sparse Blocks Network for Fast Inference
A Comparative Study of Convolutional Neural Network Models with Rosenblatt’s Brain Model Abu Kamruzzaman, Atik Khatri , Milind Ikke, Damiano Mastrandrea,
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
Logistic Regression & Parallel SGD
8-3 RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks Z. Dong, Z. Zhou, Z.F. Li, C. Liu, Y.N. Jiang,
Emre O. Neftci  iScience  Volume 5, Pages (July 2018) DOI: /j.isci
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
Declarative Transfer Learning from Deep CNNs at Scale
Final Project presentation
Deep Neural Networks for Onboard Intelligence
Outline Background Motivation Proposed Model Experimental Results
1CECA, Peking University, China
Machine Learning based Data Analysis
Artificial Neural Networks
Neuro-Computing Lecture 2 Single-Layer Perceptrons
ImageNet Classification with Deep Convolutional Neural Networks
Heterogeneous convolutional neural networks for visual recognition
Ovonic Cognitive Computer, LLC formed 9/26/2002
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Automatic Handwriting Generation
Introduction to Neural Networks
CS295: Modern Systems: Application Case Study Neural Network Accelerator – 2 Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech.
Scalable light field coding using weighted binary images
Artificial Intelligence: Driving the Next Generation of Chips and Systems Manish Pandey May 13, 2019.
End-to-End Facial Alignment and Recognition
Fig. 1 Illustrative scheme of biological and hardware neural networks.
Mohsen Imani, Saransh Gupta, Yeseong Kim, Tajana Rosing
Presentation transcript:

Non-volatile Memory Based Convolutional Neural Networks for Transfer Learning Zhen Dong1,2, Haitong Li1, and H.-S. Philip Wong1 1Stanford SystemX Alliance, Stanford University; 2Institute of Microelectronics, Peking University Introduction Transfer Learning Results Transfer Learning Results Deep learning have succeeded in many fields. AlphaGo won the contest against KeJie in May, 2017. The progress in object detection promotes automatic driving. AI for medicine and healthcare are flourishing. Training large-scale networks is energy- and time-consuming: Deep learning is driven by a huge amount of data, like 160GB ImageNet. Training network like ResNet-50 needs 8 powerful Tesla P100 GPUs. The training period can last for days. Utilizing non-volatile memory array can speed DNN up: Integrate memory and computing together can break through the Von Neumann bottleneck and have parallelism naturally. RRAM shows high speed, small area, simple structure and CMOS compatibility. (RRAM: Resistive-RAM, a sort of nonvolatile memory) RRAM has multilevel resistances and potential for 3D integration. Utilizing RRAM Array in Small CNN architecture Transfer learning implementation: Scheme II Using multilevel RRAM in the last layer: Additional memory is needed to store the full-precision weight value. The RRAM crossbar array will be refreshed by quantifying the weight array before every iteration. The operation of multilevel RRAM is simpler than that of the analog RRAM. Even binary RRAM can be used in this scheme, which has high speed, low variation and high endurance. The crossbar array achieves the multiplication between vector and matrix: Two rows of RRAM represent one kernel since the weight value can be positive or negative. Switch the sequence of pooling and activation can make it easier for circuit implementation. The influence of RRAM conductance variation on recognition accuracy: When variation increases, the average accuracy begins to fall and the fluctuation of recognition accuracy gets larger. Hence, keeping the variation lower than 50% is necessary. Modeling & Calculating Energy and Delay Compact Model of RRAM The physical model of RRAM: This model is based on the conductive filament theory and used for all the network simulations in our work. Schematic of a 3D RRAM array: All vertical layers share the same selective layer which consists of transistors. RRAM Characteristics Utilizing RRAM in Large-scale Neural Networks Typical I-V Curve and Multilevel Resistances (a) The architecture of VGG-16: We used VGG-16 as the represent of traditional CNN architecture. And we also tested Google Inception V3 to try on state-of-the-art structure. (b) The influence of variation: As there are more parameters and layers in large-scale CNN architectures, the influence of conductance variation on accuracy is severer than that in the small networks. (c) The influence of quantization: In the field of model compression, using weight sharing or the method described in BWN can achieve high accuracy. However, in transfer learning tasks, we can simulate any pre-trained model utilizing multilevel RRAM, which will not affect the final transfer performance. (a) 3D RRAM Tool for Analyzing Energy and Delay (a) (b) (b) (c) The left figure shows the typical I-V curve of RRAM devices: The black curve corresponds to set process while the red curve corresponds to reset process. High resistive state (HRS) can be abruptly set to low resistive state (LRS), and LRS can be reset to HRS gradually. Different resistive states can be obtained by applying AC pulses: The proportion among those resistive states is around 1:3:7. The variation of high conductance is about 10% while that of low conductance is 40%. (a) shows results on training data while (b) is the performance of prediction: We use parameters in 3D RRAM array as high-dimensional inputs to train this tool. We combine random forest and SVM algorithms in this tool, which can give decent estimations without going through all the physical details of every RRAM in the huge array. The energy used for weight update and inference in one specific network is about 105mJ and the time delay of the transfer learning system is mainly the time needed for writing operations. Typically, a writing process of RRAM takes 10-100ns. Thus the total time delay is around 70ms. Transfer learning implementation: Scheme I Measured Analog Characteristics of RRAM Acknowledgement Gradual reset process can be achieved by applying continuous small pulses: Up to hundreds of resistance states can be reached. The operation here contains hundreds of pulses. Thus it is complicated and time-consuming, for which we can’t use analog RRAM to represent all weights in networks. The variation of each state is rather small. This work is accomplished under the guidance of Haitong Li and Professor H.-S. Philip Wong. And it is supported in part by Stanford SystemX Alliance and UGVR Program Reference Using analog RRAM array as the last layer with all [1] Wong, H-S. Philip, et al. "Metal–oxide RRAM." Proceedings of the IEEE 100.6 (2012): 1951-1970. [2] Li, Haitong, et al. "A SPICE model of resistive random access memory for large-scale memory array simulation." IEEE Electron Device Letters 35.2 (2014): 211-213. [3] Han, Song, Huizi Mao, and William J. Dally. "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding." arXiv preprint arXiv:1510.00149 (2015). [4] Ambrogio S, Balatti S, Milo V, et al. Neuromorphic learning and recognition with one-transistor-one-resistor synapses and bistable metal oxide RRAM[J]. IEEE Transactions on Electron Devices, 2016, 63(4): 1508-1515. [5] Courbariaux, Matthieu, Yoshua Bengio, and Jean-Pierre David. "Binaryconnect: Training deep neural networks with binary weights during propagations." Advances in Neural Information Processing Systems. 2015. [6] Li, Haitong, et al. "Four-layer 3D vertical RRAM integrated with FinFET as a versatile computing unit for brain-inspired cognitive information processing." VLSI Technology, 2016 IEEE Symposium on. IEEE, 2016. other layers simulated by multilevel RRAM and fixed: Given the analog characteristics of RRAM, there is a trade-off between the range and precision of weights. If the precision is not enough, many values in ΔW can’t be updated to the RRAM array, which will slow down the training process. If the range of weights is too small, the learning capability of networks tends to be insufficient and the accuracy will stop increasing after hundreds of iterations, since most of the weights have come to their extremes, which is illustrated in the contrast between those two figures on the right.