Machine Learning for Data Certification at CMS

Slides:

Advertisements

Similar presentations

Rerun of machine learning Clustering and pattern recognition.

Advertisements

1 Image Classification MSc Image Processing Assignment March 2003.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Support vector machine

Machine learning continued Image source:

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

An Overview of Machine Learning

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Artificial Neural Networks

Introduction to machine learning

Collaborative Filtering Matrix Factorization Approach

This week: overview on pattern recognition (related to machine learning)

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.

Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,

M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

Today Ensemble Methods. Recap of the course. Classifier Fusion

1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.

Support-Vector Networks C Cortes and V Vapnik (Tue) Computational Models of Intelligence Joon Shik Kim.

WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.

WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.

SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.

CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

Neural Network Analysis of Dimuon Data within CMS Shannon Massey University of Notre Dame Shannon Massey1.

Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:

Introduction to Machine Learning, its potential usage in network area,

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

CSC321: Neural Networks Lecture 9: Speeding up the Learning

Neural networks and support vector machines

CS 9633 Machine Learning Support Vector Machines

PREDICT 422: Practical Machine Learning

Fall 2004 Backpropagation CS478 - Machine Learning.

Machine Learning Models

Semi-Supervised Clustering

Deep Learning Amin Sobhani.

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

Classification: Logistic Regression

Boosting and Additive Trees (2)

Matt Gormley Lecture 16 October 24, 2016

Neural Networks for Machine Learning Lecture 1e Three types of learning Geoffrey Hinton with Nitish Srivastava Kevin Swersky.

Machine Learning I & II.

Mixture of SVMs for Face Class Modeling

Estimating Link Signatures with Machine Learning Algorithms

Basic machine learning background with Python scikit-learn

Machine Learning Basics

LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS

Pawan Lingras and Cory Butz

Support Vector Machines

Collaborative Filtering Matrix Factorization Approach

Goodfellow: Chapter 14 Autoencoders

Perceptron as one Type of Linear Discriminants

Object Classes Most recent work is at the object level We perceive the world in terms of objects, belonging to different classes. What are the differences.

Image Classification Painting and handwriting identification

Deep Learning for Non-Linear Control

Other Classification Models: Support Vector Machine (SVM)

Neural networks (3) Regularization Autoencoder

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Linear Discrimination

Introduction to Neural Networks

CAMCOS Report Day December 9th, 2015 San Jose State University

Support Vector Machines 2

CSC 578 Neural Networks and Deep Learning

What is Artificial Intelligence?

Outline Announcement Neural networks Perceptrons - continued

Patterson: Chap 1 A Review of Machine Learning

Presentation transcript:

Machine Learning for Data Certification at CMS by Humza Khan Mentor: Dr. Nural Akchurin, Federico De Guio

Overview Compact Muon Solenoid (CMS) Data Certification Machine Learning Introduction Preprocessing Supervised Learning Support Vector Machines, Boosted Decision Trees, Stochastic Gradient Descent Unsupervised Learning One-class SVM, Isolation Forest, Autoencoders Further Steps

Compact Muon Solenoid (CMS) Major experiment, along with ATLAS LHC smashes particles together CMS is like a giant camera for collisions Multiple layers for tracking different particles

Compact Muon Solenoid (CMS) continued

Data Certification LHC produces approximately 25 petabytes of data per year Not all of it is interesting for physics “good” vs. “bad” data Preliminary filters get sort a lot of the data Still leaves a large amount of unclassified data Needs to be manually checked by detector experts

Machine Learning Introduction Make computers learn a problem without being explicitly programmed Clever uses of statistics and computer science to analyze patterns in data Classification vs. Regression Both assign numerical values to data samples Regression assigns from a continuous set Classification assigns from a discrete set Given 𝑛 data samples Each has 𝑞 features (values that describe the object) Each sample (usually) has a label

Machine Learning Introduction Split data into training and testing sets Train model with training set Feed testing set to model, then see how accurate predictions are Supervised learning has labels attached to samples Unsupervised learning does not have labels attached to samples

Preprocessing Feature scaling Feature selection Dimensionality reduction Given 43 features Represent Pt, Eta, Phi, MetPt MetPhi, Vertices, Cross Section Mean, RMS, Q1, Q2, Q3, Q4, Q5

Preprocessing

Support Vector Machines (SVM) Data with 𝑛 features is in 𝑛-dimensional space Find 𝑛−1 dimensional hyperplane to divide data Maximize distance between hyperplane and points Not all data is linearly separable

Support Vector Machines (SVM)

Stochastic Gradient Descent (SGD) Finds greatest derivative at point and moves that way Good at finding minima quickly Can get stuck at local minimum instead of global SGD only updates based on one sample instead of all

Stochastic Gradient Descent (SGD)

One-class SVM Novelty detection Train only on good data, Test on both Useful when classes are extremely different in size Only have 5% background

One-class SVM

Autoencoders Neural network Dimensionality reducer Computational representation of human neurons Dimensionality reducer Finds non-linear correlations within features

Further Steps Inspect labels Further explore deep learning Image recognition

Sources http://cms.web.cern.ch/sites/cms.web.cern.ch/files/styles/large/public/field/image/0611042_01- A4-at-140001.jpg?itok=NaAYCj1Z http://cms.web.cern.ch/news/what-cms http://scikit-learn.org/stable/auto_examples/svm/plot_oneclass.html#sphx-glr-auto-examples- svm-plot-oneclass-py https://twiki.cern.ch/twiki/bin/viewauth/CMS/ML4DC https://upload.wikimedia.org/wikipedia/commons/f/f3/CART_tree_titanic_survivors.png http://nghiaho.com/wp-content/uploads/2012/12/autoencoder_network1.png

Thanks! Jean Kirsch, Steven Goldfarb, and Thomas Schwarz Lounsberry Foundation Federico de Guio Nural Akchurin Giovanni Franconi Filip Sîroky