LQCD Workflow: Gauge Generation, Prop Calcs, Analysis Robert Edwards Jefferson Lab HackFest 2014.

Slides:



Advertisements
Similar presentations
The DMRG and Matrix Product States
Advertisements

LabVIEW is a graphical programming development environment for data acquisition and control, data analysis, and data presentation. With LabVIEW you can.
SSA and CPS CS153: Compilers Greg Morrisett. Monadic Form vs CFGs Consider CFG available exp. analysis: statement gen's kill's x:=v 1 p v 2 x:=v 1 p v.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Excited State Spectroscopy from Lattice QCD
Code Generation Steve Johnson. May 23, 2005Copyright (c) Stephen C. Johnson The Problem Given an expression tree and a machine architecture, generate.
Dynamical Anisotropic-Clover Lattice Production for Hadronic Physics C. Morningstar, CMU K. Orginos, College W&M J. Dudek, R. Edwards, B. Joo, D. Richards,
Flocks, Herds and Schools Modeling and Analytic Approaches.
Chapter 6: Transform and Conquer
MATH 685/ CSI 700/ OR 682 Lecture Notes
Exotic and excited-state meson spectroscopy and radiative transitions from lattice QCD Christopher Thomas, Jefferson Lab In collaboration with: Jo Dudek,
Herwig++ Particle Data1 Particle Data for Herwig++ Peter Richardson Durham University.
Tirgul 9 Amortized analysis Graph representation.
CSE351/ IT351 Modeling and Simulation
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Data Flow Analysis Compiler Design Nov. 8, 2005.
Robert’s Rules of Matlab. Disclaimer: My usual processing flow is to use c or fortran codes linked together via tcsh scripts. I find Matlab to be really.
Overview and Mathematics Bjoern Griesbach
CSE 373 Data Structures Lecture 15
Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,
Dynamical Anisotropic-Clover Lattice Production for Hadronic Physics J. Foley, C. Morningstar, CMU K. Orginos, College W&M J. Dudek, R. Edwards, B. Joo,
A Study of The Applications of Matrices and R^(n) Projections By Corey Messonnier.
Simulating Quarks and Gluons with Quantum Chromodynamics February 10, CS635 Parallel Computer Architecture. Mahantesh Halappanavar.
QCD Project Overview Ying Zhang September 26, 2005.
Computers Data Representation Chapter 3, SA. Data Representation and Processing Data and information processors must be able to: Recognize external data.
Lattice QCD in Nuclear Physics Robert Edwards Jefferson Lab CCP 2011 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Excited State Spectroscopy using GPUs Robert Edwards Jefferson Lab TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
Kevin Ross, UCSC, September Service Network Engineering Resource Allocation and Optimization Kevin Ross Information Systems & Technology Management.
Baryon Resonances from Lattice QCD Robert Edwards Jefferson Lab N high Q 2, 2011 TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Hadron Spectroscopy from Lattice QCD
Succinct Data Structures Ian Munro University of Waterloo Joint work with David Benoit, Andrej Brodnik, D, Clark, F. Fich, M. He, J. Horton, A. López-Ortiz,
The Fundamentals: Algorithms, the Integers & Matrices.
Excited baryon spectrum using Lattice QCD Robert Edwards Jefferson Lab JLab Users Group Meeting 2011 TexPoint fonts used in EMF. Read the TexPoint manual.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Christopher Moh 2005 Competition Programming Analyzing and Solving problems.
Baryon Resonance Determination using LQCD Robert Edwards Jefferson Lab Baryons 2013 TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Fundamentals of Algorithms MCS - 2 Lecture # 8. Growth of Functions.
Algorithmic Finance and Tools for Grid Execution (the Swift Grid Scripting/Workflow tool) Tiberiu (Tibi) Stef-Praun.
Eightfold Way (old model)
Lattice QCD and GPU-s Robert Edwards, Theory Group Chip Watson, HPC & CIO Jie Chen & Balint Joo, HPC Jefferson Lab TexPoint fonts used in EMF. Read the.
Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
Excited State Spectroscopy from Lattice QCD Robert Edwards Jefferson Lab CERN 2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
1 Lattice Quantum Chromodynamics 1- Literature : Lattice QCD, C. Davis Hep-ph/ Burcham and Jobes By Leila Joulaeizadeh 19 Oct
Solving Minimal Problems Numerics KALLE ÅSTRÖM, CENTRE FOR MATHEMATICAL SCIENCES, LUND UNIVERSITY, SWEDEN.
Probing TeV scale physics in precision UCN decays Rajan Gupta Theoretical Division Los Alamos National Lab Lattice 2013 Mainz, 30 July 2013 Superconducting.
SSQSA present and future Gordana Rakić, Zoran Budimac Department of Mathematics and Informatics Faculty of Sciences University of Novi Sad
Baryons (and Mesons) on the Lattice Robert Edwards Jefferson Lab EBAC May 2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
STAR Persistent Pointers in the STAR Micro-DST V. Perevoztchikov Brookhaven National Laboratory,USA.
MA/CSSE 473 Day 16 Combinatorial Object Generation Permutations.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Baryon Resonances from Lattice QCD Robert Edwards Jefferson Lab GHP 2011 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
The quark model FK7003.
LabVIEW Real Time for High Performance Control Applications
Baryons on the Lattice Robert Edwards Jefferson Lab Hadron 09
MA/CSSE 473 Day 13 Finish Topological Sort Permutation Generation
Nucleon Resonances from Lattice QCD
Dynamical Anisotropic-Clover Lattice Production for Hadronic Physics
Baryon Spectroscopy and Resonances
Excited State Spectroscopy from Lattice QCD
A Phenomenology of the Baryon Spectrum from Lattice QCD
GENERAL VIEW OF KRATOS MULTIPHYSICS
Excited State Spectroscopy from Lattice QCD
Spontaneous P-parity breaking in QCD at large chemical potentials
Excited State Spectroscopy from Lattice QCD
Excited state meson and baryon spectroscopy from Lattice QCD
Baryon Resonances from Lattice QCD
Scalable light field coding using weighted binary images
Presentation transcript:

LQCD Workflow: Gauge Generation, Prop Calcs, Analysis Robert Edwards Jefferson Lab HackFest 2014

The “Good Ole’ Days”

Beginnings of Industrial Science

The Future of Industrial Science?? Graduate students Software manager Old Fortran code Graduate students

I said Joke slide, not Goat slide

LQCD Workflow Few big jobs Few big files Many small jobs Many big files I/O movement 6 Generate the configurations Leadership level 60K cores, 10’s TF-yr t=0 t=T Contract 8 cores, CPUs Correlators 100K – 1M copies Analyze 100K copies 4 Kepler GPUs + t=0 t=T Propagators

LQCD Workflow ~25% > 5% 7 Generate the configurations Leadership level 60K cores, 10’s TF-yr t=0 t=T Contract 8 cores, CPUs Correlators 100K – 1M copies Analyze 100K copies 4 Kepler GPUs Now AMG! + t=0 t=T Propagators ~75% Production cost New analysis cost Leadership level Throughput mode

LOTS of propagators Isovector meson Isoscalar meson t=0 t t t - Quark propagation between 0 & t & Quark propagation 0 to 0 & t to t Expensive!

Reuse those propagators Variational method: Propagators Operators And lots of permutations of contractions… “single particle” “single to two- particle”

Distillation - mesons Smearing in correlator Correlator Factorizes: operators and perambulators

Rewrite correlator keeping track of smearing labels Notation – keep track of labels as indices Consider complexity Mesons t0

Two-mesons (hadrons) require projection into irreducible representation Can have many Wick contractions and mom. projections –Worst case: rest -> p=100 -> 6x, p=110 -> 12x, p=111 -> 8x More complicated mesons

Two-mesons (hadrons) require projection into irreducible representation Can have many Wick contractions and mom. projections –Worst case: rest -> p=100 -> 6x, p=110 -> 12x, p=111 -> 8x More complicated mesons “Graph”

Consider I=1, A 1 ++, 20 3 x128 lattice Cost of a correlator driven by number of irrep graphs –1752 unique irrep graphs (using SU(2) isospin symmetry) –Time ~ sec. (N=128) 8-core Xeon, ATLAS for mat-muls (“zgemms”) Corresponding case, but all at rest –14 unique irrep graphs –Time ~ 85 sec. Over all irreps & time-sources, 79MB in graph DBs, keys Have seen up to ~2000 graphs Correlation functions get to be expensive… rest-frame

Optimize order of operations Traverse graphs along a t-slice –10,000’s of graphs Also 3-particles and more… Common sub-expression elimination For fixed t-slice - 100’s vertices t=0t=T = 48 Graph 1 Graph 2 Graph 3 t=1t=2t=3 Device 1Device 2Device 3Device 4 I=1/2 K*π arXiv:

Workflow is choreographed Topology –Vertices can have different ordinalities Order of evaluation is important –Baryons ~ O(N 4 ), Mesons ~ O(N 3 ) t=0t=T = 48 Graph 1 Graph 2 Graph 3 t=1t=2t=3 Device 1Device 2Device 3Device 4 More graduate students B B M M

Workflow is choreographed Topology –Vertices can have different ordinalities Order of evaluation is important –Baryons ~ O(N 4 ), Mesons ~ O(N 3 ) –Avoid creating larger ranked objects t=0t=T = 48 Graph 1 Graph 2 Graph 3 t=1t=2t=3 Device 1Device 2Device 3Device 4 More graduate students Yuk M M B B M M Bad

Workflow is choreographed Topology –Vertices can have different ordinalities Order of evaluation is important –Baryons ~ O(N 4 ), Mesons ~ O(N 3 ) –Avoid creating larger ranked objects Lots of mat-muls Obvious application for accelerators t=0t=T = 48 Graph 1 Graph 2 Graph 3 t=1t=2t=3 Device 1Device 2Device 3Device 4 More graduate students B B M M B’ B M B’’ Good

Smells like Industrial Science… Three main components –Gauge generation – leadership level (strong scaling) –Propagator calculations (solver) –Contraction calculations (distillation) Choice of rank N ~ 100’s sources –Larger N helps to resolve higher excited states Our present largest calculation: –4 spins, 256 timeslices, 384 vectors, 1 quark –395,264 individual solves per configuration –Huge # for LQCD –Fast solves enable new class of algorithms AMG for Phi-s or GPU-s –New ways to reduce contraction costs Wick contractions (determinant) via L-U factorization –All components obvious application for accelerators

Some left over bits

Internal unit of work is a “hadron node” In perambulator language Hadron nodes – matrices in distillation space Hadron nodes t0

Code flow Redstar gen_graph (redstar) Input: correlator xml Hadron_node (colorvec) Input: hadron node xml, perambulators, elementals Harom (harom – 3d code) Input: hadron node xml, 3d solution vectors (stochastic) Redstar npt (redstar) Input: correlator xml, hadron node sdbs Perams: Elementals: Solutions : Output: sdb - hadron node xml is key, value is matrix or tensor in distillation space (hadron node sdbs) Output: : unique graphs & hadron nodes (xml “keys”) Eigs: Output: : sdb – correlator xml is key, value is array of complex-s

Could add noise: V space rank N, space rank d Only want minimal number of noise insertions Arbitrary – choose noise on antiquarks All BLAS mat-muls Stochastic estimation

Could add noise: V space rank N, space rank d Only want minimal number of noise insertions Arbitrary – choose noise on antiquarks Lots of ops of lattice objects – prefactors in scaling can be large Stochastic estimation