Presentation is loading. Please wait.

Presentation is loading. Please wait.

Instructions to run scripts for 1D histograms and ML

Similar presentations


Presentation on theme: "Instructions to run scripts for 1D histograms and ML"— Presentation transcript:

1 Instructions to run scripts for 1D histograms and ML
Presented by Kaifu Lam May 3, 2017

2 Background Previously presented on Mar 1 (single b-tagging)…
To re-plot the variables and understand the variables in Jet Flavor Classifcation in High-Energy Physics with Deep Neural Networks – Sep 2016 The dataset is created by simulation modeling light and heavy jets of pp collisions in ATLAS detector. The dataset contains expert (high) level variables and mid – low level variables: Expert: variables (16) Mid / low: variables X 15 tracks (422) Variable explanations here 2

3 Expert variables and ROC

4 Python on Tev clusters Todd’s Python installation has everything you need Python 2.7 Python packages required: numpy, h5py, theano, keras, matplotlib Todd’s python installation: ~olsont/TEV/python/bin/python Tips: Go to home ($ cd ~) Open .cshrc file ($ nano .cshrc) In this document, type “alias pythonnew ~olsont/TEV/python/bin/python” Type ‘control ‘+ ‘X’, then ‘Y’ in MacBook to exit and save Restart tev01 Now you can run Todd’s python by typing pythonnew Feel free to change the alias 4

5 Text Editor Suggestion
If you use Mac… Sam Meehan recommends using TextWrangler in your MacBook Latest version rebranded as BBEdit Download to your personal MacBook Use the “Open from FTP/SFTP Server” and “Save to FTP/SFTP Server” to open and save files on TevCluster, while editing locally in your MacBook TextWrangler has saved me so much time!! 5

6 Script repositories and raw data locations
Scripts: Single b-tagging (1D variables and ML) H->bb tagging (1D calorimeter variables) Raw Data: Single b-tagging: Tev01: /phys/groups/tev/scratch4/users/kaifulam/dguest/gjj-pheno/v1/dataset.json.gz H->bb tagging: Tev01: /phys/groups/tev/scratch4/users/kaifulam/dguest/hbb/v1/ Files: signal.txt, background.txt 6

7 Single b-tagging: Data transformation
Step 1: Transform raw data with Julian’s script: Script here Transformed data (DON’T OPEN THIS FOLDER): /phys/groups/tev/scratch4/users/kaifulam/dguest/gjj-pheno/v1/julian/raw_data/saved_batches_test/ 30 million .npy files in this folder Example files of 10 jets are located here: /phys/groups/tev/scratch4/users/kaifulam/dguest/gjj-pheno/v1/julian/raw_data/saved_batches/ Data structures: 3 files for each jet (using first jet as example): High variables [1x16]: clean_diget_high_0.npy Mid variables [1x422]: clean_dijet_mid_0.npy Flavor [1]: clean_dijet_y_0.npy 7

8 Single b-tagging: Data transformation
Step 2: Transfer .npy files to one .h5 file Script here .h5 file location: /phys/groups/tev/scratch4/users/kaifulam/dguest/gjj-pheno/v1/julian/ gjj_Variables.hdf5 Transferred only high variables and flavor variable to .h5 High_input [10m x 1 x 16] Y_input [10m x 2] (logical array: col1 signal, col2 bg) Shortens data reading time from 26 hours to 30 seconds Future To-do: transfer mid level variables to .h5 [2] variables High_input dimensions [1] tracks [0] Jets 8

9 Single b-tagging: Data transformation
Completed (don’t have to re-do these): Step 1 for all variables Step 2 for high variables and flavor variable To be completed: Step 2 for mid variables Not enough RAM in tev01 Cannot add all mid variables in one take Have to add mid variables into .h5 by batches. histograms 1D histograms ROC Raw Data Julian Export H5 format ML training ROC 9

10 Single b-tagging: Scripts
*collector.csv dimensions [1] counts per bin or bin edges B-tagging variables Expert level 1D histogram Input: gjj_variables.hdf5 Output: histo_sig_collector.csv, histo_bg_collector.csv, bin_collector.csv Plot 1D histogram (solo) Input: histo_sig_collector.csv, histo_bg_collector.csv, bin_collector.csv Output: One .png for each variable Plot 1D ROC curves for each variable (One Figure) Output: One .png for all ROC curves Machine Learning Expert level ML (GRU Neural network; multivariate classifier) Output: tpr.csv, fpr.csv Plot ROC curve for ML Input: tpr.csv, fpr.csv Output: One .png for ROC [0] Variables tpr / fpr.csv dimensions [0] Size = bin count 10 For plotting, download (scp) input files to and run plot scripts in your local machine

11 Single b-tagging: ML Configurations
Output Layer Current Future GRU layer [2] variables Input layer High_input dimensions [1] tracks [0] Jets 11

12 H->bb Tagging: Data parsing
Data structure: My Readme Dan Guest’s Readme Completed (don’t have to re-do these): Parsing Raw ‘cluster’ (calorimeter) variables cluster_pt, cluster_eta, cluster_dphi_jet, cluster_energy Plotted variables of max energy cluster and sum of 3 max energy clusters To be completed: Parsing all other variables Storing all variables in .h5 ML for all variables 12

13 In Work H->bb tagging: Scripts B-tagging variables
*collector.csv dimensions [1] counts per bin or bin edges B-tagging variables 1D histogram Input: signal.txt, background.txt Output: histo_sig_collector.csv, histo_bg_collector.csv, bin_collector.csv Plot 1D histogram (solo) Input: histo_sig_collector.csv, histo_bg_collector.csv, bin_collector.csv Output: One .png for each variable [0] Variables tpr / fpr.csv dimensions [0] Size = bin count 13

14 Next Steps Single b-tagging: H->bb tagging:
Store mid variables in .h5 format Get GRU network config from Julian Run ML using mid variables Run ML using expert + mid variables H->bb tagging: Parse variables (except cluster variables) Run ML using cluster variables Run ML using track variables Run ML using a combinations of all variables 14

15 Special Thanks! Dr. Sam Meehan Prof. Shih-Chieh Hsu 15


Download ppt "Instructions to run scripts for 1D histograms and ML"

Similar presentations


Ads by Google