Deep Learning in HEP Large number of applications:

Slides:

Advertisements

Similar presentations

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

Advertisements

© Copyright 2004 ECE, UM-Rolla. All rights reserved A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C.

Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,

M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.

M. Wang, T. Xiao, J. Li, J. Zhang, C. Hong, & Z. Zhang (2014)

Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.

CS 732: Advance Machine Learning

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

Comparing TensorFlow Deep Learning Performance Using CPUs, GPUs, Local PCs and Cloud Pace University, Research Day, May 5, 2017 John Lawrence, Jonas Malmsten,

Big data classification using neural network

TensorFlow CS 5665 F16 practicum Karun Joseph, A Reference:

Deep Learning Software: TensorFlow

Intelligent trigger for Hyper-K

Inter-experimental LHC Machine Learning Working Group Activities

Deep Feedforward Networks

CS 6501: 3D Reconstruction and Understanding Convolutional Neural Networks Connelly Barnes.

Machine Learning Developments in ROOT Sergei Gleyzer, Lorenzo Moneta

Neural Networks for Quantum Simulation

Chilimbi, et al. (2014) Microsoft Research

Why it is Called Tensor Flow Parallelism in ANNs Project Ideas and Discussion Glenn Fung Presents Batch Renormalizating Paper.

Classification: Logistic Regression

Applications of Deep Learning and how to get started with implementation of deep learning Presentation By : Manaswi Advisor : Dr.Chinmay.

DeepXplore: Automated Whitebox Testing of Deep Learning Systems

Deep Learning Hung-yi Lee 李宏毅.

Deep Learning Libraries

Genomic Data Clustering on FPGAs for Compression

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Inception and Residual Architecture in Deep Convolutional Networks

Multi-dimensional likelihood

Parallel Processing and GPUs

Comparison Between Deep Learning Packages

Deep Learning Workshop

Grid Canada Testbed using HEP applications

Random walk initialization for training very deep feedforward networks

MLP Based Feedback System for Gas Valve Control in a Madison Symmetric Torus Andrew Seltzman Dec 14, 2010.

Neural Networks and Backpropagation

Torch 02/27/2018 Hyeri Kim Good afternoon, everyone. I’m Hyeri. Today, I’m gonna talk about Torch.

Introduction to Deep Learning for neuronal data analyses

INF 5860 Machine learning for image classification

CSC 578 Neural Networks and Deep Learning

Deep Learning Packages

An open-source software library for Machine Intelligence

Jason furmanek Kris murphy IBM

CS110: Discussion about Spark

Cloud Evolution Dennis Gannon

Logistic Regression & Parallel SGD

Introduction to Deep Learning with Keras

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

Oral presentation for ACM International Conference on Multimedia, 2014

network of simple neuron-like computing elements

CSC 578 Neural Networks and Deep Learning

Word Embedding Word2Vec.

1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.

Optimization for Fully Connected Neural Network for FPGA application

The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’

Neural Networks Geoff Hulten.

Deep Neural Networks for Onboard Intelligence

Boltzmann Machine (BM) (§6.4)

Overview of deep learning

Deep Learning Some slides are from Prof. Andrew Ng of Stanford.

CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.

Sketch Object Prediction

CS295: Modern Systems: Application Case Study Neural Network Accelerator – 2 Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech.

Deep Learning Libraries

Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs

Search-Based Approaches to Accelerate Deep Learning

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning

Machine Learning for Cyber

An introduction to neural network and machine learning

Presentation transcript:

Deep Learning in HEP Large number of applications: Classification: based on features or raw data Regression: e.g. calorimetric energy or Matrix Element Anomaly detection: detector monitoring or general searches for new physics. Generative Models: faster simulation. Promise of better and faster algorithms that are easier to develop. Perhaps how HL-LHC copes with stalling of Moore’s law See Amir Farbin’s HSF talk: “Deep Learning Implications for HEP Software & Computing”

Why Titan? Training is extremely computationally intensive. GPUs offer O(50)x speed up wrt CPUs. Typically perform lots of training “experiments” to find optimal Hyper- parameters (i.e. network configuration) HEP datasets are large… can easily imagine O(100 TB) training samples. Access to any GPU is an obstacle to HEP scientists interested in DL. Industry (e.g. Google, Microsoft, …) build large, tightly networked, distributed systems w/ many GPUs/node for DL training. Titan can be this resource for HEP and other academic fields.

Generated Flat Mass Distribution A First Attempt Task: reproduce 4-vector addition with a DNN. Two 4-vector input. Regression for mass. A very simple mock-up of a real study: Storing CPU-intensive Matrix Element computations in DNNs. Self-contained example: generates data, then fits it, and assess performance. DNN built in python with Keras with Theano back-end. (TensorFlow would be preferred for distributed training) Performed Hyper-parameter scan. 280 different configurations varying network width (16 to 1024), depth (0 to 10), and loss function. Found that 1 hidden 128 neuron-wide layer performed best. Biggest problem: 2 hour limit. Training for realistic tasks can take days. I’m already setup for training in 2 hour blocks for next time. Generated Flat Mass Distribution DNN Output Residual

And so… After some smaller Hyper-parameter scan on my 4-GPU system… And turning problem into classification of mass bins… got good results! Next step is 20 input fit to 100GB sample of LO and NLO top events. Would ideally use data parallelism (see next slide), since training sample is large. Would require distributed TensorFlow on Titan… I have several ATLAS colleagues who use my machine that would greatly benefit from using Titan. And if OK with PANDA, I also have neutrino physics colleagues in DUNE, LArIAT, MicroBooNE, NEXT, and Nova who would be interested. Hopefully have a tutorial by end of summer.

Parallelism Model HP2 Model HP1 Model HP4 Model HP3 Hyper-parameter scan: simultaneously train multiple models. e.g. 1 model per GPU or node. G P U N o d e Data Tensor operation parallelism: GPUs, FPGA, and ASICs (Google’s Tensor Processing Unit). Model HP4 D1 Model Part D Model Part B Model Part C Model Part A D2 D3 D4 Model HP3 Model HP2 Model HP1 D1 D2 D3 D4 Model HP4 Model HP3 Model HP2 Model HP1 Data Parallelism: Each GPU or Node computes gradient on sub- set of data. Sync’ing gradients bottlenecked by bus or network. Model Parallelism: Large model spread over many GPUs or nodes. Less network traffic but only efficient for large models.