Machine Learning for Data Certification at CMS

Slides:



Advertisements
Similar presentations
Rerun of machine learning Clustering and pattern recognition.
Advertisements

1 Image Classification MSc Image Processing Assignment March 2003.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support vector machine
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
An Overview of Machine Learning
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Artificial Neural Networks
Introduction to machine learning
Collaborative Filtering Matrix Factorization Approach
This week: overview on pattern recognition (related to machine learning)
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Today Ensemble Methods. Recap of the course. Classifier Fusion
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Support-Vector Networks C Cortes and V Vapnik (Tue) Computational Models of Intelligence Joon Shik Kim.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Neural Network Analysis of Dimuon Data within CMS Shannon Massey University of Notre Dame Shannon Massey1.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Introduction to Machine Learning, its potential usage in network area,
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
CSC321: Neural Networks Lecture 9: Speeding up the Learning
Neural networks and support vector machines
CS 9633 Machine Learning Support Vector Machines
PREDICT 422: Practical Machine Learning
Fall 2004 Backpropagation CS478 - Machine Learning.
Machine Learning Models
Semi-Supervised Clustering
Deep Learning Amin Sobhani.
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Classification: Logistic Regression
Boosting and Additive Trees (2)
Matt Gormley Lecture 16 October 24, 2016
Neural Networks for Machine Learning Lecture 1e Three types of learning Geoffrey Hinton with Nitish Srivastava Kevin Swersky.
Machine Learning I & II.
Mixture of SVMs for Face Class Modeling
Estimating Link Signatures with Machine Learning Algorithms
Basic machine learning background with Python scikit-learn
Machine Learning Basics
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Pawan Lingras and Cory Butz
Support Vector Machines
Collaborative Filtering Matrix Factorization Approach
Goodfellow: Chapter 14 Autoencoders
Perceptron as one Type of Linear Discriminants
Object Classes Most recent work is at the object level We perceive the world in terms of objects, belonging to different classes. What are the differences.
Image Classification Painting and handwriting identification
Deep Learning for Non-Linear Control
Other Classification Models: Support Vector Machine (SVM)
Neural networks (3) Regularization Autoencoder
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Linear Discrimination
Introduction to Neural Networks
CAMCOS Report Day December 9th, 2015 San Jose State University
Support Vector Machines 2
CSC 578 Neural Networks and Deep Learning
What is Artificial Intelligence?
Outline Announcement Neural networks Perceptrons - continued
Patterson: Chap 1 A Review of Machine Learning
Presentation transcript:

Machine Learning for Data Certification at CMS by Humza Khan Mentor: Dr. Nural Akchurin, Federico De Guio

Overview Compact Muon Solenoid (CMS) Data Certification Machine Learning Introduction Preprocessing Supervised Learning Support Vector Machines, Boosted Decision Trees, Stochastic Gradient Descent Unsupervised Learning One-class SVM, Isolation Forest, Autoencoders Further Steps

Compact Muon Solenoid (CMS) Major experiment, along with ATLAS LHC smashes particles together CMS is like a giant camera for collisions Multiple layers for tracking different particles

Compact Muon Solenoid (CMS) continued

Data Certification LHC produces approximately 25 petabytes of data per year Not all of it is interesting for physics “good” vs. “bad” data Preliminary filters get sort a lot of the data Still leaves a large amount of unclassified data Needs to be manually checked by detector experts

Machine Learning Introduction Make computers learn a problem without being explicitly programmed Clever uses of statistics and computer science to analyze patterns in data Classification vs. Regression Both assign numerical values to data samples Regression assigns from a continuous set Classification assigns from a discrete set Given 𝑛 data samples Each has 𝑞 features (values that describe the object) Each sample (usually) has a label

Machine Learning Introduction Split data into training and testing sets Train model with training set Feed testing set to model, then see how accurate predictions are Supervised learning has labels attached to samples Unsupervised learning does not have labels attached to samples

Preprocessing Feature scaling Feature selection Dimensionality reduction Given 43 features Represent Pt, Eta, Phi, MetPt MetPhi, Vertices, Cross Section Mean, RMS, Q1, Q2, Q3, Q4, Q5

Preprocessing

Support Vector Machines (SVM) Data with 𝑛 features is in 𝑛-dimensional space Find 𝑛−1 dimensional hyperplane to divide data Maximize distance between hyperplane and points Not all data is linearly separable

Support Vector Machines (SVM)

Stochastic Gradient Descent (SGD) Finds greatest derivative at point and moves that way Good at finding minima quickly Can get stuck at local minimum instead of global SGD only updates based on one sample instead of all

Stochastic Gradient Descent (SGD)

One-class SVM Novelty detection Train only on good data, Test on both Useful when classes are extremely different in size Only have 5% background

One-class SVM

Autoencoders Neural network Dimensionality reducer Computational representation of human neurons Dimensionality reducer Finds non-linear correlations within features

Further Steps Inspect labels Further explore deep learning Image recognition

Sources http://cms.web.cern.ch/sites/cms.web.cern.ch/files/styles/large/public/field/image/0611042_01- A4-at-140001.jpg?itok=NaAYCj1Z http://cms.web.cern.ch/news/what-cms http://scikit-learn.org/stable/auto_examples/svm/plot_oneclass.html#sphx-glr-auto-examples- svm-plot-oneclass-py https://twiki.cern.ch/twiki/bin/viewauth/CMS/ML4DC https://upload.wikimedia.org/wikipedia/commons/f/f3/CART_tree_titanic_survivors.png http://nghiaho.com/wp-content/uploads/2012/12/autoencoder_network1.png

Thanks! Jean Kirsch, Steven Goldfarb, and Thomas Schwarz Lounsberry Foundation Federico de Guio Nural Akchurin Giovanni Franconi Filip Sîroky