Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014.

Similar presentations


Presentation on theme: "Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014."— Presentation transcript:

1 Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014

2 1973: Thesis in Nuclear Physics (SC33/CERN, Diogene/Saturne/Saclay) 1973-1975: ISR/R232, p-p elastic scattering with C.Rubbia (Reconstruction) 1975-1980:SPS/NA4, deep inelastic muon scattering with C.Rubbia (Simul + Recons) 1978-1979 : simulation of UA1 with C.Rubbia 1980-1989: simulation of OPAL with R.Heuer 1988-1993:simulation of GEM & SDC for the defunct SSC 1991-1994: simulation of ATLAS and CMS (letters of Intent) F.Gianotti, D.Froidevaux, V.Karimaki 1995-2010: busy with ROOT 2009-2010: interested by theoretical predictions for TOTEM (p-p elastic) and results 2009-2011: foundations for the Nuons model 2011……..: computing particles masses better than 1/1000 2012……. Testing p-p elastic with TOTEM/UA4/D0/ISR 2012… Testing p-p interactions at the LHC (900 GeV, 2.76 TeV, 7 TeV) 2013… Testing nuons model with Jets at the LHC 2014… Predictions for 13 TeV + paper draft Nuons && Threads -> Suggestions2 From Algol to Nuons 15/12/2014

3 Nuons 15/12/2014Nuons && Threads -> Suggestions3 proton neutron

4 I am implementing my « physics model » to: – Model elementary particles using « nuons » – Compute particle masses with high accuracy – Test the model at many energies for p-p elastic scattering – Test the model at LHC energies: particles production and Jets Nuons && Threads -> Suggestions4 findall.C totem.C collide.C Nuons and C++ 15/12/2014

5 Example of event motivating my project 15/12/2014Nuons && Threads -> Suggestions5 Standard proton model Predicted cross section wrong by more than 1000 for t > 2 GeV^2

6 collisions 15/12/2014Nuons && Threads -> Suggestions6 PP elastic PP inelastic

7 Some programming details The 3 C++ programs findall, totem and collide (about 12000 LOC in total) are all running in batch and multi-threaded mode on several OpenLab machines with 2x6 cores Westmere or 2x12 cores Ivy Bridge or 2x14 cores E5-2697v3 now upgraded to 2x18 cores. My programs run from a few minutes to one day. – nohup root.exe –b –q « collide.C+(7000) » >x1.log& – eg processid 12756 While the program(s) are running, I can inspect the results (histograms or/and Trees), (say once per minute) from my laptop, stop and lauch again with a new set of parameters. – root >.x colshow.C(-12756) – This CINT script takes the file collide_12756.root from OpenLab/AFS and stores it on my laptop where histograms are visualized. Nuons && Threads -> Suggestions715/12/2014

8 More programming details Findall is a bit « lattice QCD like ». 99.99% of the time is spent in TMinuit to compute the stable positions of a set of N nuons generated at random in a cube of size 1 fermi. Totem and Collide are quite similar to Pythia or Herwig. They simulate proton-proton collisions generating output particles and Jets. The scripts run on my laptop and show plenty of graphs comparing with the LHC experiments results. Nuons && Threads -> Suggestions8 Batch On OpenLab machines Batch On OpenLab machines Interactive script On my laptop Interactive script On my laptop Histograms, Tree afs Histograms, Tree afs scp 15/12/2014

9 More programming details(2) Findall saves results in a Tree (one particle per entry). It takes about 0.1s to compute a pion, 10 minutes for a proton and 20 minutes for a Omega. Totem generates histograms only (about 20 1&2D) collide generates about 100 histograms (1 & 2D) and a Tree with a size ranging from a few Mbytes/minute to several Gbytes/minute depending on the desired granularity of the collision information. About one billion collisions are generated in one day. Most histograms are filled millions of times per second. Nuons && Threads -> Suggestions915/12/2014

10 Experience  Suggestions All these applications are multi-threaded, a HUGE gain in REAL time for what I am doing. There are many many applications in HEP that look very similar : – All detector simulations – All event generators – Most physics analyses To make the most efficient use of the hardware, I had to make simple changes in ROOT or implement solutions that should be implemented in a more general way in ROOT. Nuons && Threads -> Suggestions1015/12/2014

11 Main Topics Random numbers and distributions : trivial Histograms Trees I/O in general Thread scalability considerations Nuons && Threads -> Suggestions11 Current ROOT is a blocker for performant multi-threaded applications Current ROOT is a blocker for performant multi-threaded applications 15/12/2014

12 Random Numbers No changes required in the TRandomXX classes. I am using only the nice and efficient TRandom3 (Mersenne Twister). I create a TRandom3 object per thread initialized with : TRandom3(pid + 1000*thnumb). I had to modify or circumvent all places referencing gRandom in full backward compatibility and in totally trivial ways: – TF1::GetRandom() -> TF1::GetRandom(double r=-1) – Similar changes should be applied to TH1::GetRandom and FillRandom – TGenPhaseSpace: add SetRandom function and member fRandom – Similar changes should also be applied to: TF2,TF3::GetRandom, Tunuran, TKDTree TMVA: Dataset, RuleEnsemble TGeoBBox, TGeoCompositeShape, TGeoChecker TRobustEstimator, TAttParticle, TVirtualMC, RooStudyPackage TApplicationRemote, TProof Nuons && Threads -> Suggestions1215/12/2014

13 Histograms & Threads Currently one has to set TH1::AddDirectory(0) to bypass gDirectory. However, this forces the user to do the histogram book-keeping himself. This makes the histogram merging phase a bit complex (see next slides with a solution). Histograms may be created in the main thread and filled (with thread-locking) at each fill. This is fine if the number of fills is negligible. The only realistic solution is to make a copy of all histograms per thread. However, in several applications, this can represent a substantial increase in memory. – In my case, I have at most 100 histograms (total 400 Kbytes per thread) – Alice monitoring has 14000 histograms, total size 1.5 Gbytes in memory! – Most analysis applications have a few hundred, up to a few thousand histograms Some tiny work is required to take advantage of the architecture already in place to: – Do lazy instantiations of the bins structures – Exploit better the TH1::SetBuffer mechanism, in particular in TH1::Merge and make vectorization possible. I could not survive without my I/O check-pointing (around one per minute) for histograms and Trees. This allows me to inspect at any time the current status of my jobs and interrupt them and change my parameters when I see that the results are not the ones expected. It also makes the running of multi-threading applications much safer. Nuons && Threads -> Suggestions1315/12/2014

14 Histograms : poor man Nuons && Threads -> Suggestions14 Main Thread TH1 *hrun, *hwatch Main Thread TH1 *hrun, *hwatch Thread 1 Create 97 histograms Loop on events Every N events, save thread histograms to file Thread 1 Create 97 histograms Loop on events Every N events, save thread histograms to file Thread 6 Create 97 histograms Loop on events Every N events, save thread histograms to file Thread 6 Create 97 histograms Loop on events Every N events, save thread histograms to file Thread 12 Create 97 histograms Loop on events Every N events, save thread histograms to file Thread 12 Create 97 histograms Loop on events Every N events, save thread histograms to file ……. Then Merge all thread files every NN events or at end of job What I have been doing for a long time and efficiency < 8/12 15/12/2014

15 Histograms (2) much better Nuons && Threads -> Suggestions15 Main Thread TH1 *hrun, *hwatch Main Thread TH1 *hrun, *hwatch Thread 1 Create 97 histograms Loop on events Every N events, merge histograms from all threads and save to file Thread 1 Create 97 histograms Loop on events Every N events, merge histograms from all threads and save to file Thread 6 Create 97 histograms Loop on events Every N events, merge histograms from all threads and save to file Thread 6 Create 97 histograms Loop on events Every N events, merge histograms from all threads and save to file Thread 12 Create 97 histograms Loop on events Every N events, merge histograms from all threads and save to file Thread 12 Create 97 histograms Loop on events Every N events, merge histograms from all threads and save to file ……. My current version 15/12/2014

16 Histograms Management (1) (my current solution) Nuons && Threads -> Suggestions 16 TH1::AddDirectory(0); TList htr[nthreads]; TH1D *hrun = new TH1D(…); TThread::Lock(); TList &hlist = htr[thnumb]; TH1D *hncol = new TH1D("hncol","number of collisions",66,0,66); hlist.Add(hncol); TH1D *hpoiss = new TH1D("hpoiss","Jets particle multiplicity",50,0,50); hlist.Add(hpoiss); … hncol->Fill(…); … TFile *fhist = TFile::Open(TString::Format("collide_%d.root",processID),"recreate"); hrun->SetBinContent(26,mainwatch->GetRealTime()); hrun->Write(); TList hlistall; int nh = htr[0].GetSize(); for (int ih=0;ih<nh;ih++) { TH1 *hcur = (TH1*)htr[0].At(ih)->Clone(); hlistall.Clear(); for (int t=1;t<ncpus;t++) { hlistall.Add(htr[t].At(ih)); } hcur->Merge(&hlistall); hcur->Write(); delete hcur; } fhist->SaveSelf(); delete fhist; Main thread in thread thnumb In any thread or end of main thread 15/12/2014

17 Histograms Management (2) (what I would like to see in ROOT) Nuons && Threads -> Suggestions 17 TH1::InitializeThreads(nthreads); TH1D *hrun = new TH1D(…); TH1::SetThreadDirectory(thnumb]; TH1D *hncol = new TH1D("hncol","number of collisions",66,0,66); TH1D *hpoiss = new TH1D("hpoiss","Jets particle multiplicity",50,0,50); … hncol->Fill(…); … TFile *fhist = TFile::Open(TString::Format("collide_%d.root",processID),"recreate"); hrun->SetBinContent(26,mainwatch->GetRealTime()); hrun->Write(); TH1::MergeThreads()->Write(); fhist->SaveSelf(); delete fhist; Main thread in thread thnumb In any thread or end of main thread 15/12/2014

18 Histograms (3) muuuch better Nuons && Threads -> Suggestions18 Main Thread TH1 *hrun, *hwatch Main Thread TH1 *hrun, *hwatch Thread 1 Create 97 histograms Loop on events Every N events, merge histograms from all threads and save to file Thread 1 Create 97 histograms Loop on events Every N events, merge histograms from all threads and save to file Thread 6 Create 97 histograms Loop on events Every N events, merge histograms from all threads and save to file Thread 6 Create 97 histograms Loop on events Every N events, merge histograms from all threads and save to file Thread 12 Create 97 histograms Loop on events Every N events, merge histograms from all threads and save to file Thread 12 Create 97 histograms Loop on events Every N events, merge histograms from all threads and save to file ……. What I would like to see Non blocking asynchronous I/O thread 15/12/2014

19 Trees & Threads Solution1 : one TTree per thread  one file per thread, then possibly merge files at end of job. – Currently this requires locking or/and fixing the non-thread-safe parts of TTree I/O – Not very user friendly as it requires more book-keeping Solution2: Use the TTree Buffer merge facility – This is much more efficient, but requires more memory – This solution is not yet fully operational for threads Solution 3: Create only one TTree in main thread (or any thread) – For each fill: Lock, Swap branch addresses, Fill, UnLock – This solution is nice for memory, but adds more sequentiality – This is my current solution, waiting for a better solution, eg Solution4 Solution4: same as Solution3, but with – An optimized branch addresses booking and swapping – Delegation of the pure I/O part to a separate asynchronous thread doing the zipping and disk writes. Solution 5: same as Solution 4, with in addition – Possibility to call branch::Fill per thread (This will be essential for GeantV) Nuons && Threads -> Suggestions1915/12/2014

20 Trees & Threads (my current solution) Nuons && Threads -> Suggestions20 TTree *T = 0; if (!T && fillTree) { TFile::Open(TString::Format("/data/brun/collide_%d _events.root",processID),"recreate"); T = new TTree("T","selected collide events"); T->Branch("i1",&i1,"i1/I"); T->Branch("i2",&i2,"i2/I"); T->Branch("nch",&nch,"nch/I"); T->Branch("nchCMS",&nchCMS,"nchCMS/I"); T->Branch("njets",&njets,"njets/I"); T->Branch("njetsCMS",&njetsCMS,"njetsCMS/I"); T->Branch("phi1",&phi1,"phi1/D"); ……. T->Branch("ptype",ptype,"ptype[nchCMS]/I"); T->Branch("pjet",pjet,"pjet[nchCMS]/I"); T->Branch("ppx",ppx,"ppx[nchCMS]/D"); T->Branch("ppy",ppy,"ppy[nchCMS]/D"); T->Branch("ppz",ppz,"ppz[nchCMS]/D"); T->Branch("ppt",ppt,"ppt[nchCMS]/D"); T->Branch("peta",peta,"peta[nchCMS]/D"); T->AutoSave("SaveSelf"); } if (fillTree && bigjet) { TThread::Lock(); T->SetBranchAddress("i1",&i1); T->SetBranchAddress("i2",&i2); T->SetBranchAddress("nch",&nch); T->SetBranchAddress("nchCMS",&nchCMS); T->SetBranchAddress("njets",&njets); T->SetBranchAddress("njetsCMS",&njetsCMS); T->SetBranchAddress("phi1",&phi1); ……. T->SetBranchAddress("ptype",ptype); T->SetBranchAddress("pjet",pjet); T->SetBranchAddress("ppx",ppx); T->SetBranchAddress("ppy",ppy); T->SetBranchAddress("ppz",ppz); T->SetBranchAddress("ppt",ppt); T->SetBranchAddress("peta",peta); T->Fill(); //every N events autosave if (event%1000==0) T->AutoSave(“SaveSelf”); TThread::UnLock(); } Main thread in initialisation thread thnumb Filling Tree in thread thnumb 15/12/2014

21 Trees & Threads (what would be faster and simpler) Nuons && Threads -> Suggestions21 TTree *T = 0; if (!T && fillTree) { TFile::Open(TString::Format("/data/brun/collide_%d _events.root",processID),"recreate"); T = new TTree("T","selected collide events"); T->Branch("i1",&i1,"i1/I"); T->Branch("i2",&i2,"i2/I"); T->Branch("nch",&nch,"nch/I"); T->Branch("nchCMS",&nchCMS,"nchCMS/I"); T->Branch("njets",&njets,"njets/I"); T->Branch("njetsCMS",&njetsCMS,"njetsCMS/I"); T->Branch("phi1",&phi1,"phi1/D"); ……. T->Branch("ptype",ptype,"ptype[nchCMS]/I"); T->Branch("pjet",pjet,"pjet[nchCMS]/I"); T->Branch("ppx",ppx,"ppx[nchCMS]/D"); T->Branch("ppy",ppy,"ppy[nchCMS]/D"); T->Branch("ppz",ppz,"ppz[nchCMS]/D"); T->Branch("ppt",ppt,"ppt[nchCMS]/D"); T->Branch("peta",peta,"peta[nchCMS]/D"); T->AutoSave("SaveSelf"); T->SaveThreadBranches(thnumb); } if (fillTree && bigjet) { TThread::Lock(); T->SetThreadBranches(thnumb); T->Fill(); //every N events autosave if (event%1000==0) T->AutoSave(“SaveSelf”); TThread::UnLock(); } Main thread in initialisation thread thnumb Filling Tree in thread thnumb 15/12/2014

22 Trees & Threads (3) (what would be much faster and even simpler) Nuons && Threads -> Suggestions22 TTree *T = 0; if (!T && fillTree) { TFile::Open(TString::Format("/data/brun/collide_%d _events.root",processID),"recreate"); T = new TTree("T","selected collide events"); T->Branch("i1",&i1,"i1/I"); T->Branch("i2",&i2,"i2/I"); T->Branch("nch",&nch,"nch/I"); T->Branch("nchCMS",&nchCMS,"nchCMS/I"); T->Branch("njets",&njets,"njets/I"); T->Branch("njetsCMS",&njetsCMS,"njetsCMS/I"); T->Branch("phi1",&phi1,"phi1/D"); ……. T->Branch("ptype",ptype,"ptype[nchCMS]/I"); T->Branch("pjet",pjet,"pjet[nchCMS]/I"); T->Branch("ppx",ppx,"ppx[nchCMS]/D"); T->Branch("ppy",ppy,"ppy[nchCMS]/D"); T->Branch("ppz",ppz,"ppz[nchCMS]/D"); T->Branch("ppt",ppt,"ppt[nchCMS]/D"); T->Branch("peta",peta,"peta[nchCMS]/D"); T->AutoSave("SaveSelf"); T->SaveThreadBranches(thnumb); } if (fillTree && bigjet) { TThread::Lock(); T->SetThreadBranchesFill(thnumb, kAutoSave %( n%1000==0)); TThread::UnLock(); } Main thread in initialisation thread thnumb Filling Tree in thread thnumb Where SetThreadBranchesFill quickly copy the branch data to a circular buffer, return immediately the control to the calling thread and pass the data to another thread asynchronously to fill the TreeCache and disk I-O Where SetThreadBranchesFill quickly copy the branch data to a circular buffer, return immediately the control to the calling thread and pass the data to another thread asynchronously to fill the TreeCache and disk I-O 15/12/2014


Download ppt "Nuons && Threads Suggestions SFT meeting December 15 2014 René Brun Nuons && Threads -> Suggestions115/12/2014."

Similar presentations


Ads by Google