Download presentation
Presentation is loading. Please wait.
1
Machine Learning for Data Certification at CMS
by Humza Khan Mentor: Dr. Nural Akchurin, Federico De Guio
2
Overview Compact Muon Solenoid (CMS) Data Certification
Machine Learning Introduction Preprocessing Supervised Learning Support Vector Machines, Boosted Decision Trees, Stochastic Gradient Descent Unsupervised Learning One-class SVM, Isolation Forest, Autoencoders Further Steps
3
Compact Muon Solenoid (CMS)
Major experiment, along with ATLAS LHC smashes particles together CMS is like a giant camera for collisions Multiple layers for tracking different particles
4
Compact Muon Solenoid (CMS) continued
5
Data Certification LHC produces approximately 25 petabytes of data per year Not all of it is interesting for physics “good” vs. “bad” data Preliminary filters get sort a lot of the data Still leaves a large amount of unclassified data Needs to be manually checked by detector experts
6
Machine Learning Introduction
Make computers learn a problem without being explicitly programmed Clever uses of statistics and computer science to analyze patterns in data Classification vs. Regression Both assign numerical values to data samples Regression assigns from a continuous set Classification assigns from a discrete set Given 𝑛 data samples Each has 𝑞 features (values that describe the object) Each sample (usually) has a label
7
Machine Learning Introduction
Split data into training and testing sets Train model with training set Feed testing set to model, then see how accurate predictions are Supervised learning has labels attached to samples Unsupervised learning does not have labels attached to samples
8
Preprocessing Feature scaling Feature selection
Dimensionality reduction Given 43 features Represent Pt, Eta, Phi, MetPt MetPhi, Vertices, Cross Section Mean, RMS, Q1, Q2, Q3, Q4, Q5
9
Preprocessing
10
Support Vector Machines (SVM)
Data with 𝑛 features is in 𝑛-dimensional space Find 𝑛−1 dimensional hyperplane to divide data Maximize distance between hyperplane and points Not all data is linearly separable
11
Support Vector Machines (SVM)
12
Stochastic Gradient Descent (SGD)
Finds greatest derivative at point and moves that way Good at finding minima quickly Can get stuck at local minimum instead of global SGD only updates based on one sample instead of all
13
Stochastic Gradient Descent (SGD)
14
One-class SVM Novelty detection
Train only on good data, Test on both Useful when classes are extremely different in size Only have 5% background
15
One-class SVM
16
Autoencoders Neural network Dimensionality reducer
Computational representation of human neurons Dimensionality reducer Finds non-linear correlations within features
17
Further Steps Inspect labels Further explore deep learning
Image recognition
18
Sources A4-at jpg?itok=NaAYCj1Z svm-plot-oneclass-py
20
Thanks! Jean Kirsch, Steven Goldfarb, and Thomas Schwarz
Lounsberry Foundation Federico de Guio Nural Akchurin Giovanni Franconi Filip Sîroky
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.