Supervised Normalized Cut for Detecting, Classifying and Identifying Special Nuclear Materials Yan T. Yang Barak Fishbain Dorit S. Hochbaum Eric B. Norman.

Slides:



Advertisements
Similar presentations
Nathalie Japkowicz, Colin Bellinger, Shiven Sharma, Rodney Berg, Kurt Ungar University of Ottawa, Northern Illinois University Radiation Protection Bureau,
Advertisements

Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
COMPUTER AIDED DIAGNOSIS: CLASSIFICATION Prof. Yasser Mostafa Kadah –
ECG Signal processing (2)
Random Forest Predrag Radenković 3237/10
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
An Introduction of Support Vector Machine
Efficient And Accurate Ranking of Multidimensional Drug Profiling Data by Graph-Based Algorithm Dorit S. Hochbaum Chun-nan Hsu Yan T. Yang.
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
IMAN SAUDY UMUT OGUR NORBERT KISS GEORGE TEPES-NICA BARLEY SEEDS CLASSIFICATION.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Addressing the Medical Image Annotation Task using visual words representation Uri Avni, Tel Aviv University, Israel Hayit GreenspanTel Aviv University,
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Summary 1 l The Analytical Problem l Data Handling.
A Search for Point Sources of High Energy Neutrinos with AMANDA-B10 Scott Young, for the AMANDA collaboration UC-Irvine PhD Thesis:
Lecture 14: Classification Thursday 18 February 2010 Reading: Ch – 7.19 Last lecture: Spectral Mixture Analysis.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
“Study on Parallel SVM Based on MapReduce” Kuei-Ti Lu 03/12/2015.
Chapter 8 Discriminant Analysis. 8.1 Introduction  Classification is an important issue in multivariate analysis and data mining.  Classification: classifies.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Image Classification 영상분류
Today Ensemble Methods. Recap of the course. Classifier Fusion
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Swift/BAT Census of Black Holes Preliminary results in Markwardt et al ' energy coded color.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Machine Learning Lecture for Methodological Foundations of Biomedical Informatics Fall 2015 (BMSC-GA 4449) Sisi Ma NYU Langone Medical Center CHIBI.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Data Mining and Decision Support
Mammogram Analysis – Tumor classification - Geethapriya Raghavan.
ECE 471/571 - Lecture 19 Review 11/12/15. A Roadmap 2 Pattern Classification Statistical ApproachNon-Statistical Approach SupervisedUnsupervised Basic.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
ADAPTIVE HIERARCHICAL CLASSIFICATION WITH LIMITED TRAINING DATA Dissertation Defense of Joseph Troy Morgan Committee: Dr Melba Crawford Dr J. Wesley Barnes.
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Semi-Supervised Clustering
ECE 471/571 - Lecture 19 Review 02/24/17.
k-Nearest neighbors and decision tree
Machine Learning Clustering: K-means Supervised Learning
Object Orie’d Data Analysis, Last Time
School of Computer Science & Engineering
The Elements of Statistical Learning
Basic machine learning background with Python scikit-learn
Incorporating Ancillary Data for Classification
Machine Learning Week 1.
CSE 4705 Artificial Intelligence
PROBLEM 1 Training Examples: Class 1 Training Examples: Class 2
Neural Networks and Their Application in the Fields of Coporate Finance By Eric Séverin Hanna Viinikainen.
Supervised vs. unsupervised Learning
CSCI N317 Computation for Scientific Applications Unit Weka
Generally Discriminant Analysis
Nearest Neighbors CSC 576: Data Mining.
Concave Minimization for Support Vector Machine Classifiers
Chapter 7: Transformations
Junheng, Shengming, Yunsheng 11/09/2018
Recognizing Deformable Shapes
Patterson: Chap 1 A Review of Machine Learning
Presentation transcript:

Supervised Normalized Cut for Detecting, Classifying and Identifying Special Nuclear Materials Yan T. Yang Barak Fishbain Dorit S. Hochbaum Eric B. Norman Erik Swanberg

Agenda Background Data Generation Data Processing Methodology Results Conclusion

Background: Application Nuclear Terrorism Detection Task:  Physical limitation of detectors High purity germanium gamma detector Sodium iodine scintillator  Background noise Optimization can greatly increase detection capabilities

Background: Application Nuclear Terrorism Detection Task:  Physical limitation of detectors High purity germanium gamma detector Sodium iodine scintillator  Background noise Optimization can greatly increase detection capabilities

Our Problem Detector Determine: Special Nuclear Material (SNM ) 1.Radioactive ? 2.What kind ? Coarse Detection data

Our Problem Training data:  To develop a black box Machine learning classification problem Black Box Training Data Unknown Data Or

Background: Technique Machine Learning Techniques:  Supervised learning (Classification)  Unsupervised learning (Clustering)

Background: Technique Machine Learning Techniques:  Supervised learning (Classification)  Unsupervised learning (Clustering) Evil Good Human

Background: Technique Machine Learning Techniques:  Supervised learning (Classification)  Unsupervised learning (Clustering) Evil ! Human

Background: Technique Machine Learning Techniques:  Supervised learning (Classification)  Unsupervised learning (Clustering) Black Box Training Data Unknown Data Or Machine

Background: Technique Machine Learning Techniques:  Supervised learning (Classification)  Unsupervised learning (Clustering) F(x)

Agenda Background Data Generation Data Processing Methodology Results Conclusion

Sodium Iodine (NaI) detector  Low resolution – noisy data  Rugged – more amicable to practical use Types of detection: 1. Passive interrogation: direct acquisition 2. Active interrogation: irradiation Data Generation

Sodium Iodine (NaI) detector  Low resolution – noisy data  Rugged – more amicable to practical use Types of detection: 1. Passive interrogation: direct acquisition 2. Active interrogation: irradiation Data Generation

Simulation of materials’ present:  Non-radioactive: Latite (igneous rock) Background/blank  Radioactive: Uranium: 235U Plutonium: 239Pu

Data Generation E. Swanberg, E.B. Normal et al Irradiate for 30 seconds material ( 235 U, 239 Pu, latite or background) Cave 1 Cave 2 Low resolution gamma detector 2.5 second interval for 10 times

Agenda Background Data Generation Data Processing Methodology Results Conclusion

Data Processing Data points:  239 P: 93 samples  235 U: 140 samples  10 spectra per sample Live time Dead Time Gamma Ray from experiment

Data Processing Data points:  10 spectra per sample: each for every 2.5 seconds Rescaled Count Energy Channels Time

Data Processing Data points:  10 spectra per sample: each for every 2.5 seconds Column Stacking (CS)

Data Processing Data points:  10 spectra per sample: each for every 2.5 seconds Normalized (N)

Data Processing Data points:  10 spectra per sample: each for every 2.5 seconds Spectral Derivative/Difference (SD)

Data Processing Data points:  10 spectra per sample: each for every 2.5 seconds CS and SD

Agenda Background Data Generation Data Processing Methodology Results Conclusion

Methodology Supervised Normalized Cut (SNC) Support Vector Machine (SVM) PCA (excluded) SVM or SNC Training Data Unknown Data Or Supervised Learning

Methodology 235 U spectra 239 Pu spectra Training Testing Classify

Validation Methodology Procedure: 50% Sub-sampling SVM or SNC Calculate Accuracy 50% Testing50% Training 100 runs

Validation Methodology Procedure: 60% Sub-sampling SVM or SNC Calculate Accuracy 60% Testing60% Training 100 runs

Binary Classification A B A A B B A B A A B B A A A A B B B B

Muti-classification A B C C D C C A A AB B D D

Multi-classification n classes/materials - n >= 3 : more realistic Decomposition  Several simpler binary classifications.  n different binary problems: i th binary problem: material i & ``other materials" or O Each unknown point: classified n times  The voting scheme: Throw out all O; The leftover label is identified as the class of k.

Muti-classification A O O O O O O A A AO O O O A O O O O O O A A AO O O O A B C C D C C A A AB B D D A B C C D C C A A AB B D D

A O O O O O O A A AO O O O

A O O O O O O A A AO O O O

A A O O O O O O O A A AO O O O O

O O B D C B C O O OB B D D O O O O O O O O

C C D C C A A AB B D D D D D O O O O O O O O O O O O O

C C D C C A A AB B D D D D D O O O O O O O O O O O O O A B C C D C C A A AB B D D O O C C C C O O O O O O O O B B B O O O O

C C D C C A A AB B D D D D D O O O O O O O O O O O O O A B C C D C C A A AB B D D O O C C C C O O O O O O O O B B B O O O O A B C C D C C A A AB B D D A A A A A A A B B B B B C C C C D D D Multi-classification can be broken down into many smaller binary problems.

Methodologies Consider binary problems for  Supervised Normalized Cut (SNC) Graph notation A variant of Normalized Cut (NC’) Supervised Normalized Cut  Support vector machine (SVM)

Graph Notation Graph representation G = (V,E) W ij : Similarity between data i and j Similarity measures: 1) Euclidean metric, 2) City Block metric, 3) Mahalanobis, 4) Minkowski, 5) Correlation Cut w 12 w 13 w 14

Normalized Cut’ Variant Normalized Cut Bi-objective: [Hochbaum, 2010]  Intra-cluster: similarity small  Inter-cluster: similarity large F(x) Clustering methods (as oppose to classification) 5

Normalized Cut’ Solved efficiently  Construction of a s-t graph  Run minimum-cut algorithm  Seeds: forced a-priori

Normalized Cut’ Solved efficiently  Construction of a s-t graph  Run minimum-cut algorithm  Seeds: forced a-priori

Normalized Cut’ Solved efficiently  Construction of a s-t graph  Run minimum-cut algorithm  Seeds: forced a-priori

Supervised Normalized Cut The black box: normalized cut’ Selection of seeds as training data points Normalized Cut’ Training Data as Seed Testing data Or Uranium Plutonium

Support Vector Machine Dotted: too sensitive Solid: Furthest from both data sets w : the slope of the plane; unknown. b : the intercept of the plane; unknown. x : a variable; : denotes inner products Discriminant plane: + b = 0 Discriminant function: f(x) = + b Training data (by calibration) New data Discriminant Plane

Support Vector Machine Construct discriminant function  Separate training data points (calibration data) into two distinct groups Use discriminant function  Classify testing data points (unknown data) into correct group

Support Vector Machine Advantage:  Robust: avoid over-fitting  Sparseness

Validation Methodology Procedure: 50% Sub-sampling SVM or SNC Calculate Accuracy 50% Testing50% Training 100 runs

Agenda Background Data Generation Data Processing Methodology Results Conclusion

Results – SVM Different Training- Testing ratio Different processing method Statistically significant  100 Runs Best data processing:  Column stacking (CS)

Results – SVM

Results – Accuracy SNC vs. SVM More training data, more accuracy SNC is better both in accuracy and robustness.

Results – Misclassification of SNC Confusion Matrices:  Uranium: always predicted correctly.  the only source of error: misclassification of Plutonium as an Uranium sample.

Results – SNC vs. SVM Running Time SVM: directly proportional to the number of training data points, SNC: more efficient

Results – Multi-classification Non-radioactive:  Latite (igneous rock)  Background/blank Radioactive:  Uranium: 235U  Plutonium: 239Pu

Results – Multi-classification CS SD CSnSD N CategoriesmeanSDmeanSDmeanSDmeanSD Pu, U, Latite94.85%3.40% 98.34%0.98% 98.91%0.78% 84.61%0.27% Pu, U, Blank100.00%0.00% 99.88%1.21% %0.00% 87.37%0.54% Pu,U,Latite, Blank98.65%1.25% 98.32%1.75% 98.65%1.12% 86.22%0.62%  Highly accurate and consistent prediction  CSnSD is the best feature vector  Blank has less effect on prediction than Latite  All materials are considered, the prediction improves  SNC can perform very well

Conclusion Supervised normalized cut (SNC). Support vector machine (SVM) SNC is superior in accuracy and efficiency Nuclear material detection Future research: 1. Generalization to other problems 2. Other voting schemes for multi-classification