Deep Learning with Intel DAAL on Knights Landing Processor

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

Neural networks Introduction Fitting neural networks
Machine Learning Neural Networks
An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,
Associative Learning in Hierarchical Self Organizing Learning Arrays Janusz A. Starzyk, Zhen Zhu, and Yue Li School of Electrical Engineering and Computer.
Artificial Neural Networks
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Artificial Neural Network Theory and Application Ashish Venugopal Sriram Gollapalli Ulas Bardak.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.
A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)
Neural Network and Deep Learning 王强昌 MLA lab.
ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.
Sima Dezső Introduction to multicores October Version 1.0.
Philipp Gysel ECE Department University of California, Davis
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
David Ojika PhD Candidate, University of Florida; Intel Fellow
Machine Learning Supervised Learning Classification and Regression
Vision-inspired classification
Big data classification using neural network
Early Results of Deep Learning on the Stampede2 Supercomputer
Inter-experimental LHC Machine Learning Working Group Activities
Deep Learning Amin Sobhani.
an introduction to: Deep Learning
Scott Michael Indiana University July 6, 2017
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
OCR on Knights Landing (Xeon-Phi)
Deep Learning in HEP Large number of applications:
Matt Gormley Lecture 16 October 24, 2016
Implementing Boosting and Convolutional Neural Networks For Particle Identification (PID) Khalid Teli .
Intelligent Information System Lab
Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan.
Deep learning and applications to Natural language processing
Artificial Neural Networks
Layer-wise Performance Bottleneck Analysis of Deep Neural Networks
CSC 578 Neural Networks and Deep Learning
Machine Learning Today: Reading: Maria Florina Balcan
Introduction to Neural Networks
Early Results of Deep Learning on the Stampede2 Supercomputer
Compiler Back End Panel
Compiler Back End Panel
OVERVIEW OF BIOLOGICAL NEURONS
network of simple neuron-like computing elements
Artificial Neural Networks
Peng Jiang, Linchuan Chen, and Gagan Agrawal
Final Project presentation
Neural Networks Geoff Hulten.
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Sahand Salamat, Mohsen Imani, Behnam Khaleghi, Tajana Šimunić Rosing
Artificial Neural Networks
Seminar on Machine Learning Rada Mihalcea
August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab
An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,
Introduction to Neural Networks
Rohan Yadav and Charles Yuan (rohany) (chenhuiy)
Learning and Memorization
Search-Based Approaches to Accelerate Deep Learning
2019/9/14 The Deep Learning Vision for Heterogeneous Network Traffic Control Proposal, Challenges, and Future Perspective Author: Nei Kato, Zubair Md.
CSC 578 Neural Networks and Deep Learning
LHC beam mode classification
Presentation transcript:

Deep Learning with Intel DAAL on Knights Landing Processor David Ojika dave.n.ojika@cern.ch March 22, 2017

Outline Introduction and Motivation Intel Knights Landing Processor Intel Data Analytics and Acceleration Library (DAAL) Experiment and current progress GitHub: https://github.com/davenso/DKN

Object Identification Object identification is the task of training a computer to recognize patterns in data Object: could be car, person, a piece of flower, higgs, muons, etc. Learning algorithm (e.g. decision tree) is used to for training Generate a “prediction model” to make “guesses” http://www.cs.tau.ac.il/~wolf/OR/

Machine Learning Involves two 2 steps:

Deep Learning Deep learning means using neural networks (a class of machine learning) with multiple hidden layers Neural networks are modelled based on the dynamics of the neurons in our brain Hidden layers represent neural computations in series of processing stages Learning performance can generally improve with depth of network (at more cost to processing) Neuron: biologically-inspired computation unit Neural Network Deep Neural Network (DNN)

DNN Applications ResNet 1 2 3 4 5 6 7 8 9 Handwriting Images ImageNet Handwriting 7 Layers Error rate 15.3% MNIST, 2012 34 layers to > 1, 000 layers Tiger Images 22 layers GoogleNet ILSVR 2014 Winner Error rate: 6.67% Error rate: 3.57% Tiger ILSVR 2015 Winner

DNN Training 10 Neurons 2 Hidden Layers EACH of the (non-output) layers is trained to be an auto-encoder an auto-encoder is trained, with an absolutely standard weight-adjustment algorithm to reproduce the input Back propagation and forward propagation method used training

DNN Inferencing A trained network with: - 10 Neurons - 2 Hidden Layers 0.009 -0.029 0.023 0.062 0.04 -0.065 -0.022 0.011 -0.037 0.014 -0.011 0.001 -0.036 0.034 -0.003 0.006 0.016 -0.033 -0.005 0.015 -0.032 -0.02 0.032 -0.006 0.019 -0.008 0.005 -0.016 0.003 -0.004 0.002 0.017 0.026 0.068 0.044 -0.072 -0.024 0.012 -0.041 -0.013 -0.04 0.039 -0.018 0.054 -0.043 -0.116 -0.078 0.123 0.041 0.069 -0.026 0.022 0.008 -0.019 -0.031 0.004 -0.014 0.03 -0.017 0.007 -0.001 -0.002 0.613 0.721 0.993 0.683 0.787 0.635 0.915 0.669 0.769 Learned Weights (5 x 2) Learned Weights (10 x 5) 0.114 0.223 0.222 0.497 0.184 0.618 0.72 Learned Biases (1 x 2) Learned Biases (1 x 5)

Challenge with DNN Training Large size of dataset Gigabytes, Terabytes of data Large number of hyper-parameters # layers, # neurons, batch size, iterations, learning rate, loss function, weight decay etc. Hyperparatmer optimization techniques: random search, grid search, Bayesian, gradient-based Emerging hardware for deep learning GPU KNL FPGA Software exists, but require manual fine-tuning

(Physics) Object identification: A Cross-layer Perspective Compose hardware, algorithm and software and components Derive efficient FPGA implementation to perform inference Object Recognition Algorithm Software G O A L ! Hardware DNN model FPGA

Intel Knights Landing (KNL) Next generation Xeon Phi (after Knight Corner (KNC) co-processor) Self-boot: unlike KNC Binary-compatible with Intel Architecture (IA) and boots standard OS Some performance-enhancing features Vector processors: AVX-512 High-bandwidth: MCDRAM Cluster modes Networking (in some models)

KNL Overview Chip: 36 Tiles interconnected by 2D Mesh Tile: 2 Cores + 2 VPU/core + 1 MB L2 Memory: MCDRAM: 16 GB on- package; High BW DDR4: 6 channels @ 2400 up to 384GB Based on 2-wide OoO Silvermont™ Microarchitecture, but with many changes for HPC 4 thread/core. Deeper OoO. Better RAS. Higher bandwidth. Larger TLBs 2 VPU: 2x AVX512 units. 32SP/16DP per unit. X87, SSE, AVX1, AVX2 and EMU L2: 1MB 16-way. 1 Line Read and ½ Line Write per cycle. Coherent across all Tiles

KNL AVX-512 512-bit FP/Integer Vectors 32 registers, & 8 mask registers Gather/Scatter AVX-512 http://geekswithblogs.net/akraus1/archive/2014/04/04/155858.aspx 16x float 8x double

KNL MCDRAM Flat mode Cache mode Hybrid mode MCDRAM as regular memory Requires reboot “Fast Malloc” functions in High BW library (https://github.com/memkind)  Built on top to existing libnuma API “FASTMEM” Compiler Annotation for Intel Fortran MCDRAM as regular memory Managed by SW MCDRAM as cache No code change required* MCDRAM part cache, part memory; 25% or 75% as cache float *fv; fv = (float *)hbw_malloc(sizeof(float) * 100); float *fv; fv = (float *)malloc(sizeof(float)*100); numactl -m 1 ./myProgram

Tiles can be organized in 3 different cluster modes KNL Cluster Modes All-to-all: uniformly distributed address Quadrant: four vertical quadrants Sub-NUMA Clustering (SNC): each quadrant as separate NUMA domain Tiles can be organized in 3 different cluster modes

Higgs Classification Data Model Development (DAAL) 11 million events (Monte Carlo simulations) 21 low-level features from particle detector 7 high-level features (hand-crafted) “1”: signal; “0”: background A binary classification problem Training set: 10.5 million Validation set: 500 thousand Our “simulation” environment 8GB Handcrafted: these are high-level features derived by physicists to help discriminate between the two classes. DNN KNL Started with a topology with 3 layers (28-1024-2) Hyper-parameter: began with Random Search with minimal optimization effort MCDRAM mode = Flat Cluster mode = Quadrant

DNN Topology

Preliminary Results Use and improve! code: git clone https://github.com/davenso/DKN Use and improve!

Discussions and Conclusions Performance can greatly enhance with: Deeper network topology Better hyper-parameters Deep neural networks are capable of learning underlying features, and should therefore generalize well, e.g. Higgs, Muon, etc.

Current Developments Exploration of more complex models and hyper-parameter optimization techniques (beyond Random Search) Integration of “real” muon data and performance benchmarking Tuning of KNL hardware to improve runtime performance of DNN training Implementation of distributed DNN algorithm, utilizing multiple KNL nodes for training Exploration of alternative algorithm (likely as a ‘hybrid model’), e.g. Decision Forests

Thank You, UF Team Sergei Gleyzer Brendan Regnery Darin Acosta Ann Gordon-Ross Pranav Goswami Andrew Carnes Erik Deumens Jon Akers UF RC

Image credits http://www.nltk.org/book/ch06.html CERN http://www.cs.tau.ac.il/~wolf/OR/