Instructions to run scripts for 1D histograms and ML

Slides:



Advertisements
Similar presentations
Lab III – Linux at UMBC.
Advertisements

Professional Toolkit V2.0 C:\Presentations - SmartCafe_Prof_V2.0 - bsc page 1 Professional Toolkit 2.0.
Variables 9/10/2013. Readings Chapter 3 Proposing Explanations, Framing Hypotheses, and Making Comparisons (Pollock) (pp.48-58) Chapter 1 Introduction.
Sarah Reonomy OSCON 2014 ANALYZING DATA WITH PYTHON.
Types of Data SimTracks: generated particles SimHits: energy depositions in a detector volume Digis: Single-channel pieces of the detector’s raw binary.
Lesson 1 “Hello, World!” MDST3703 – Studio Track Alvarado 28 January 2010.
ATM 315 Environmental Statistics Course Goto Follow the link and then choose the desktop application.
MCB Lecture #3 Sept 2/14 Intro to UNIX terminal.
Getting Started with GIT. Basic Navigation cd means change directory cd.. moves you up a level cd dir_name moves you to the folder named dir_name A dot.
CSE 390a Editing and Moving Files
CFT Offline Monitoring Michael Friedman. Contents Procedure  About the executable  Notes on how to run Results  What output there is and how to access.
Introduction to NS2 -Network Simulator- -Prepared by Changyong Jung.
Introduction to Programming Workshop 1 PHYS1101 Discovery Skills in Physics Dr. Nigel Dipper Room 125d
Unix Tutorial for FreeSurfer Users. Helpful To Know FreeSurfer Tutorial Wiki:
Creating and Publishing Your own web site PC Version SEAS 001 Professor Ahmadi.
Introduction to Python By Neil Cook Twitter: njcuk Slides/Notes:
Client – Server Application Can you create a client server application: The server will be running as a service: does not have a GUI The server will run.
® IBM Software Group © 2008 IBM Corporation Setting up Build Forge demo projects for ALM Windows only May – work in progress Stuart Poulin
Unix Tutorial for FreeSurfer Users. Helpful To Know FreeSurfer Tutorial Wiki:
Prof. Alfred J Bird, Ph.D., NBCT Door Code: * Office – McCormick 3rd floor 607 Office.
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
NA-MIC National Alliance for Medical Image Computing BRAINSCut General Tutorial Eun Young(Regina) Kim University of Iowa
1 GIT NOUN \’GIT\ A DISTRIBUTED REVISION CONTROL AND SOURCE CODE MANAGEMENT (SCM) SYSTEM WITH AN EMPHASIS ON SPEED. INITIALLY DESIGNED AND DEVELOPED BY.
JoePack Ultra Light Packaging for Large Teams. The Problem.
Infrastructure for QA and automatic trending F. Bellini, M. Germain ALICE Offline Week, 19 th November 2014.
Surya Bahadur Kathayat Outline  Ramses  Installing Ramses  Ramses Perspective (Views and Editors)  Importing/Exporting Example.
Servo Motors Precise angular motion. Servo Motors Raspberry Pi Webcam Interfaces Keeping track of things.
2007 TAX YEARERO TRAINING - MODULE 61 ERO (Transmitter) Training Module 6 Federal and State Installation and Updates.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Practical Kinetics Exercise 0: Getting Started Objectives: 1.Install Python and IPython Notebook 2.print “Hello World!”
NA-MIC National Alliance for Medical Image Computing BRAINSCut General Tutorial Eun Young(Regina) Kim University of Iowa
Axel Naumann, DØ University of Nijmegen, The Netherlands 04/20/2002 APS April Meeting 2002 Prospects of the Multivariate B Quark Tagger for the Level 2.
Downloading the MAXENT Software
+ Auto-Testing Code for Teachers & Beginning Programmers Dr. Ronald K. Smith Graceland University.
Neural Network Analysis of Dimuon Data within CMS Shannon Massey University of Notre Dame Shannon Massey1.
Starting Analysis with Athena (Esteban Fullana Torregrosa) Rik Yoshida High Energy Physics Division Argonne National Laboratory.
CACI Proprietary Information | Date 1 Upgrading to webMethods Product Suite Name: Semarria Rosemond Title: Systems Analyst, Lead Date: December 8,
An AOD analysis example Esteban Fullana Torregrosa High Energy Physics Division Argonne National Laboratory.
Course Project for CS Objective Simulate a datacenter network using Mininet.
Introduction to Unix for FreeSurfer Users
AOD example analysis Argonne Jamboree January 2010
Directions and Tips for Editing/Printing
Reports and Translations
Development Environment
How to get started with RefWorks
Jet variables Presented by Kaifu Lam Mar 1, 2017.
Install external command line softwares
ISPAQ: IRIS System for Portable Assessment of Quality
Tun Sheng Tan AUGUST UW Analysis Meeting Summer 2014
How to get started with RefWorks
Introduction to ZBOSS Embedded Systems Software Training Center
The Linux Operating System
Tips to Manually Uninstall Norton Antivirus 2012.
Computer Basics Section 2.1 YOU WILL LEARN TO… Identify hardware
Prepared by Kimberly Sayre and Jinbo Bi
OpenWorld 2018 Audio Recognition Using Oracle Data Science Platform
Python I/O.
This is where R scripts will load
Introduction to Image Processing in Python
Map Reduce Workshop Monday November 12th, 2012
Digital Image Processing
Deep Neural Networks: A Hands on Challenge Deep Neural Networks: A Hands on Challenge Deep Neural Networks: A Hands on Challenge Deep Neural Networks:
Introduction to Athena
This is where R scripts will load
Input and Output Python3 Beginner #3.
SSIS Data Integration Data Warehouse Acceleration
LO: “Picking up the points”
CS 295: Modern Systems Lab2: Convolution Accelerators
Installations for Course
Machine Learning for Cyber
Presentation transcript:

Instructions to run scripts for 1D histograms and ML Presented by Kaifu Lam May 3, 2017

Background Previously presented on Mar 1 (single b-tagging)… To re-plot the variables and understand the variables in Jet Flavor Classifcation in High-Energy Physics with Deep Neural Networks – Sep 2016 The dataset is created by simulation modeling light and heavy jets of pp collisions in ATLAS detector. The dataset contains expert (high) level variables and mid – low level variables: Expert: 2 + 14 variables (16) Mid / low: 2 + 28 variables X 15 tracks (422) Variable explanations here 2

Expert variables and ROC

Python on Tev clusters Todd’s Python installation has everything you need Python 2.7 Python packages required: numpy, h5py, theano, keras, matplotlib Todd’s python installation: ~olsont/TEV/python/bin/python Tips: Go to home ($ cd ~) Open .cshrc file ($ nano .cshrc) In this document, type “alias pythonnew ~olsont/TEV/python/bin/python” Type ‘control ‘+ ‘X’, then ‘Y’ in MacBook to exit and save Restart tev01 Now you can run Todd’s python by typing pythonnew Feel free to change the alias 4

Text Editor Suggestion If you use Mac… Sam Meehan recommends using TextWrangler in your MacBook Latest version rebranded as BBEdit Download to your personal MacBook Use the “Open from FTP/SFTP Server” and “Save to FTP/SFTP Server” to open and save files on TevCluster, while editing locally in your MacBook TextWrangler has saved me so much time!! 5

Script repositories and raw data locations Scripts: Single b-tagging (1D variables and ML) H->bb tagging (1D calorimeter variables) Raw Data: Single b-tagging: Tev01: /phys/groups/tev/scratch4/users/kaifulam/dguest/gjj-pheno/v1/dataset.json.gz H->bb tagging: Tev01: /phys/groups/tev/scratch4/users/kaifulam/dguest/hbb/v1/ Files: signal.txt, background.txt 6

Single b-tagging: Data transformation Step 1: Transform raw data with Julian’s script: Script here Transformed data (DON’T OPEN THIS FOLDER): /phys/groups/tev/scratch4/users/kaifulam/dguest/gjj-pheno/v1/julian/raw_data/saved_batches_test/ 30 million .npy files in this folder Example files of 10 jets are located here: /phys/groups/tev/scratch4/users/kaifulam/dguest/gjj-pheno/v1/julian/raw_data/saved_batches/ Data structures: 3 files for each jet (using first jet as example): High variables [1x16]: clean_diget_high_0.npy Mid variables [1x422]: clean_dijet_mid_0.npy Flavor [1]: clean_dijet_y_0.npy 7

Single b-tagging: Data transformation Step 2: Transfer .npy files to one .h5 file Script here .h5 file location: /phys/groups/tev/scratch4/users/kaifulam/dguest/gjj-pheno/v1/julian/ gjj_Variables.hdf5 Transferred only high variables and flavor variable to .h5 High_input [10m x 1 x 16] Y_input [10m x 2] (logical array: col1 signal, col2 bg) Shortens data reading time from 26 hours to 30 seconds Future To-do: transfer mid level variables to .h5 [2] variables High_input dimensions [1] tracks [0] Jets 8

Single b-tagging: Data transformation Completed (don’t have to re-do these): Step 1 for all variables Step 2 for high variables and flavor variable To be completed: Step 2 for mid variables Not enough RAM in tev01 Cannot add all mid variables in one take Have to add mid variables into .h5 by batches. histograms 1D histograms ROC Raw Data Julian Export H5 format ML training ROC 9

Single b-tagging: Scripts *collector.csv dimensions [1] counts per bin or bin edges B-tagging variables Expert level 1D histogram Input: gjj_variables.hdf5 Output: histo_sig_collector.csv, histo_bg_collector.csv, bin_collector.csv Plot 1D histogram (solo) Input: histo_sig_collector.csv, histo_bg_collector.csv, bin_collector.csv Output: One .png for each variable Plot 1D ROC curves for each variable (One Figure) Output: One .png for all ROC curves Machine Learning Expert level ML (GRU Neural network; multivariate classifier) Output: tpr.csv, fpr.csv Plot ROC curve for ML Input: tpr.csv, fpr.csv Output: One .png for ROC [0] Variables tpr / fpr.csv dimensions [0] Size = bin count 10 For plotting, download (scp) input files to and run plot scripts in your local machine

Single b-tagging: ML Configurations Output Layer Current Future GRU layer … [2] variables Input layer … High_input dimensions [1] tracks [0] Jets 11

H->bb Tagging: Data parsing Data structure: My Readme Dan Guest’s Readme Completed (don’t have to re-do these): Parsing Raw ‘cluster’ (calorimeter) variables cluster_pt, cluster_eta, cluster_dphi_jet, cluster_energy Plotted variables of max energy cluster and sum of 3 max energy clusters To be completed: Parsing all other variables Storing all variables in .h5 ML for all variables 12

In Work H->bb tagging: Scripts B-tagging variables *collector.csv dimensions [1] counts per bin or bin edges B-tagging variables 1D histogram Input: signal.txt, background.txt Output: histo_sig_collector.csv, histo_bg_collector.csv, bin_collector.csv Plot 1D histogram (solo) Input: histo_sig_collector.csv, histo_bg_collector.csv, bin_collector.csv Output: One .png for each variable [0] Variables tpr / fpr.csv dimensions [0] Size = bin count 13

Next Steps Single b-tagging: H->bb tagging: Store mid variables in .h5 format Get GRU network config from Julian Run ML using mid variables Run ML using expert + mid variables H->bb tagging: Parse variables (except cluster variables) Run ML using cluster variables Run ML using track variables Run ML using a combinations of all variables 14

Special Thanks! Dr. Sam Meehan Prof. Shih-Chieh Hsu 15