Deep Learning in HEP Large number of applications:

Deep Learning in HEP Large number of applications:
Classification: based on features or raw data Regression: e.g. calorimetric energy or Matrix Element Anomaly detection: detector monitoring or general searches for new physics. Generative Models: faster simulation. Promise of better and faster algorithms that are easier to develop. Perhaps how HL-LHC copes with stalling of Moore’s law See Amir Farbin’s HSF talk: “Deep Learning Implications for HEP Software & Computing”

Why Titan? Training is extremely computationally intensive.
GPUs offer O(50)x speed up wrt CPUs. Typically perform lots of training “experiments” to find optimal Hyper- parameters (i.e. network configuration) HEP datasets are large… can easily imagine O(100 TB) training samples. Access to any GPU is an obstacle to HEP scientists interested in DL. Industry (e.g. Google, Microsoft, …) build large, tightly networked, distributed systems w/ many GPUs/node for DL training. Titan can be this resource for HEP and other academic fields.

Generated Flat Mass Distribution
A First Attempt Task: reproduce 4-vector addition with a DNN. Two 4-vector input. Regression for mass. A very simple mock-up of a real study: Storing CPU-intensive Matrix Element computations in DNNs. Self-contained example: generates data, then fits it, and assess performance. DNN built in python with Keras with Theano back-end. (TensorFlow would be preferred for distributed training) Performed Hyper-parameter scan. 280 different configurations varying network width (16 to 1024), depth (0 to 10), and loss function. Found that 1 hidden 128 neuron-wide layer performed best. Biggest problem: 2 hour limit. Training for realistic tasks can take days. I’m already setup for training in 2 hour blocks for next time. Generated Flat Mass Distribution DNN Output Residual

And so… After some smaller Hyper-parameter scan on my 4-GPU system…
And turning problem into classification of mass bins… got good results! Next step is 20 input fit to 100GB sample of LO and NLO top events. Would ideally use data parallelism (see next slide), since training sample is large. Would require distributed TensorFlow on Titan… I have several ATLAS colleagues who use my machine that would greatly benefit from using Titan. And if OK with PANDA, I also have neutrino physics colleagues in DUNE, LArIAT, MicroBooNE, NEXT, and Nova who would be interested. Hopefully have a tutorial by end of summer.

Parallelism Model HP2 Model HP1 Model HP4 Model HP3 Hyper-parameter scan: simultaneously train multiple models. e.g. 1 model per GPU or node. G P U N o d e Data Tensor operation parallelism: GPUs, FPGA, and ASICs (Google’s Tensor Processing Unit). Model HP4 D1 Model Part D Model Part B Model Part C Model Part A D2 D3 D4 Model HP3 Model HP2 Model HP1 D1 D2 D3 D4 Model HP4 Model HP3 Model HP2 Model HP1 Data Parallelism: Each GPU or Node computes gradient on sub- set of data. Sync’ing gradients bottlenecked by bus or network. Model Parallelism: Large model spread over many GPUs or nodes. Less network traffic but only efficient for large models.

Deep Learning in HEP Large number of applications:

Similar presentations

Presentation on theme: "Deep Learning in HEP Large number of applications:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Deep Learning in HEP Large number of applications:

Similar presentations

Presentation on theme: "Deep Learning in HEP Large number of applications:"— Presentation transcript:

Similar presentations

About project

Feedback