Deep Learning in HEP Large number of applications:

Slides:



Advertisements
Similar presentations
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Advertisements

© Copyright 2004 ECE, UM-Rolla. All rights reserved A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
M. Wang, T. Xiao, J. Li, J. Zhang, C. Hong, & Z. Zhang (2014)
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
CS 732: Advance Machine Learning
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Comparing TensorFlow Deep Learning Performance Using CPUs, GPUs, Local PCs and Cloud Pace University, Research Day, May 5, 2017 John Lawrence, Jonas Malmsten,
Big data classification using neural network
TensorFlow CS 5665 F16 practicum Karun Joseph, A Reference:
Deep Learning Software: TensorFlow
Intelligent trigger for Hyper-K
Inter-experimental LHC Machine Learning Working Group Activities
Deep Feedforward Networks
CS 6501: 3D Reconstruction and Understanding Convolutional Neural Networks Connelly Barnes.
Machine Learning Developments in ROOT Sergei Gleyzer, Lorenzo Moneta
Neural Networks for Quantum Simulation
Chilimbi, et al. (2014) Microsoft Research
Why it is Called Tensor Flow Parallelism in ANNs Project Ideas and Discussion Glenn Fung Presents Batch Renormalizating Paper.
Classification: Logistic Regression
Applications of Deep Learning and how to get started with implementation of deep learning Presentation By : Manaswi Advisor : Dr.Chinmay.
DeepXplore: Automated Whitebox Testing of Deep Learning Systems
Deep Learning Hung-yi Lee 李宏毅.
Deep Learning Libraries
Genomic Data Clustering on FPGAs for Compression
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Inception and Residual Architecture in Deep Convolutional Networks
Multi-dimensional likelihood
Parallel Processing and GPUs
Comparison Between Deep Learning Packages
Deep Learning Workshop
Grid Canada Testbed using HEP applications
Random walk initialization for training very deep feedforward networks
MLP Based Feedback System for Gas Valve Control in a Madison Symmetric Torus Andrew Seltzman Dec 14, 2010.
Neural Networks and Backpropagation
Torch 02/27/2018 Hyeri Kim Good afternoon, everyone. I’m Hyeri. Today, I’m gonna talk about Torch.
Introduction to Deep Learning for neuronal data analyses
INF 5860 Machine learning for image classification
CSC 578 Neural Networks and Deep Learning
Deep Learning Packages
An open-source software library for Machine Intelligence
Jason furmanek Kris murphy IBM
CS110: Discussion about Spark
Cloud Evolution Dennis Gannon
Logistic Regression & Parallel SGD
Introduction to Deep Learning with Keras
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Oral presentation for ACM International Conference on Multimedia, 2014
network of simple neuron-like computing elements
CSC 578 Neural Networks and Deep Learning
Word Embedding Word2Vec.
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
Optimization for Fully Connected Neural Network for FPGA application
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
Neural Networks Geoff Hulten.
Deep Neural Networks for Onboard Intelligence
Boltzmann Machine (BM) (§6.4)
Overview of deep learning
Deep Learning Some slides are from Prof. Andrew Ng of Stanford.
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
Sketch Object Prediction
CS295: Modern Systems: Application Case Study Neural Network Accelerator – 2 Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech.
Deep Learning Libraries
Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs
Search-Based Approaches to Accelerate Deep Learning
CSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning
Machine Learning for Cyber
An introduction to neural network and machine learning
Presentation transcript:

Deep Learning in HEP Large number of applications: Classification: based on features or raw data Regression: e.g. calorimetric energy or Matrix Element Anomaly detection: detector monitoring or general searches for new physics. Generative Models: faster simulation. Promise of better and faster algorithms that are easier to develop. Perhaps how HL-LHC copes with stalling of Moore’s law See Amir Farbin’s HSF talk: “Deep Learning Implications for HEP Software & Computing”

Why Titan? Training is extremely computationally intensive. GPUs offer O(50)x speed up wrt CPUs. Typically perform lots of training “experiments” to find optimal Hyper- parameters (i.e. network configuration) HEP datasets are large… can easily imagine O(100 TB) training samples. Access to any GPU is an obstacle to HEP scientists interested in DL. Industry (e.g. Google, Microsoft, …) build large, tightly networked, distributed systems w/ many GPUs/node for DL training. Titan can be this resource for HEP and other academic fields.

Generated Flat Mass Distribution A First Attempt Task: reproduce 4-vector addition with a DNN. Two 4-vector input. Regression for mass. A very simple mock-up of a real study: Storing CPU-intensive Matrix Element computations in DNNs. Self-contained example: generates data, then fits it, and assess performance. DNN built in python with Keras with Theano back-end. (TensorFlow would be preferred for distributed training) Performed Hyper-parameter scan. 280 different configurations varying network width (16 to 1024), depth (0 to 10), and loss function. Found that 1 hidden 128 neuron-wide layer performed best. Biggest problem: 2 hour limit. Training for realistic tasks can take days. I’m already setup for training in 2 hour blocks for next time. Generated Flat Mass Distribution DNN Output Residual

And so… After some smaller Hyper-parameter scan on my 4-GPU system… And turning problem into classification of mass bins… got good results! Next step is 20 input fit to 100GB sample of LO and NLO top events. Would ideally use data parallelism (see next slide), since training sample is large. Would require distributed TensorFlow on Titan… I have several ATLAS colleagues who use my machine that would greatly benefit from using Titan. And if OK with PANDA, I also have neutrino physics colleagues in DUNE, LArIAT, MicroBooNE, NEXT, and Nova who would be interested. Hopefully have a tutorial by end of summer.

Parallelism Model HP2 Model HP1 Model HP4 Model HP3 Hyper-parameter scan: simultaneously train multiple models. e.g. 1 model per GPU or node. G P U N o d e Data Tensor operation parallelism: GPUs, FPGA, and ASICs (Google’s Tensor Processing Unit). Model HP4 D1 Model Part D Model Part B Model Part C Model Part A D2 D3 D4 Model HP3 Model HP2 Model HP1 D1 D2 D3 D4 Model HP4 Model HP3 Model HP2 Model HP1 Data Parallelism: Each GPU or Node computes gradient on sub- set of data. Sync’ing gradients bottlenecked by bus or network. Model Parallelism: Large model spread over many GPUs or nodes. Less network traffic but only efficient for large models.