Supervised Normalized Cut for Detecting, Classifying and Identifying Special Nuclear Materials Yan T. Yang Barak Fishbain Dorit S. Hochbaum Eric B. Norman Erik Swanberg
Agenda Background Data Generation Data Processing Methodology Results Conclusion
Background: Application Nuclear Terrorism Detection Task: Physical limitation of detectors High purity germanium gamma detector Sodium iodine scintillator Background noise Optimization can greatly increase detection capabilities
Background: Application Nuclear Terrorism Detection Task: Physical limitation of detectors High purity germanium gamma detector Sodium iodine scintillator Background noise Optimization can greatly increase detection capabilities
Our Problem Detector Determine: Special Nuclear Material (SNM ) 1.Radioactive ? 2.What kind ? Coarse Detection data
Our Problem Training data: To develop a black box Machine learning classification problem Black Box Training Data Unknown Data Or
Background: Technique Machine Learning Techniques: Supervised learning (Classification) Unsupervised learning (Clustering)
Background: Technique Machine Learning Techniques: Supervised learning (Classification) Unsupervised learning (Clustering) Evil Good Human
Background: Technique Machine Learning Techniques: Supervised learning (Classification) Unsupervised learning (Clustering) Evil ! Human
Background: Technique Machine Learning Techniques: Supervised learning (Classification) Unsupervised learning (Clustering) Black Box Training Data Unknown Data Or Machine
Background: Technique Machine Learning Techniques: Supervised learning (Classification) Unsupervised learning (Clustering) F(x)
Agenda Background Data Generation Data Processing Methodology Results Conclusion
Sodium Iodine (NaI) detector Low resolution – noisy data Rugged – more amicable to practical use Types of detection: 1. Passive interrogation: direct acquisition 2. Active interrogation: irradiation Data Generation
Sodium Iodine (NaI) detector Low resolution – noisy data Rugged – more amicable to practical use Types of detection: 1. Passive interrogation: direct acquisition 2. Active interrogation: irradiation Data Generation
Simulation of materials’ present: Non-radioactive: Latite (igneous rock) Background/blank Radioactive: Uranium: 235U Plutonium: 239Pu
Data Generation E. Swanberg, E.B. Normal et al Irradiate for 30 seconds material ( 235 U, 239 Pu, latite or background) Cave 1 Cave 2 Low resolution gamma detector 2.5 second interval for 10 times
Agenda Background Data Generation Data Processing Methodology Results Conclusion
Data Processing Data points: 239 P: 93 samples 235 U: 140 samples 10 spectra per sample Live time Dead Time Gamma Ray from experiment
Data Processing Data points: 10 spectra per sample: each for every 2.5 seconds Rescaled Count Energy Channels Time
Data Processing Data points: 10 spectra per sample: each for every 2.5 seconds Column Stacking (CS)
Data Processing Data points: 10 spectra per sample: each for every 2.5 seconds Normalized (N)
Data Processing Data points: 10 spectra per sample: each for every 2.5 seconds Spectral Derivative/Difference (SD)
Data Processing Data points: 10 spectra per sample: each for every 2.5 seconds CS and SD
Agenda Background Data Generation Data Processing Methodology Results Conclusion
Methodology Supervised Normalized Cut (SNC) Support Vector Machine (SVM) PCA (excluded) SVM or SNC Training Data Unknown Data Or Supervised Learning
Methodology 235 U spectra 239 Pu spectra Training Testing Classify
Validation Methodology Procedure: 50% Sub-sampling SVM or SNC Calculate Accuracy 50% Testing50% Training 100 runs
Validation Methodology Procedure: 60% Sub-sampling SVM or SNC Calculate Accuracy 60% Testing60% Training 100 runs
Binary Classification A B A A B B A B A A B B A A A A B B B B
Muti-classification A B C C D C C A A AB B D D
Multi-classification n classes/materials - n >= 3 : more realistic Decomposition Several simpler binary classifications. n different binary problems: i th binary problem: material i & ``other materials" or O Each unknown point: classified n times The voting scheme: Throw out all O; The leftover label is identified as the class of k.
Muti-classification A O O O O O O A A AO O O O A O O O O O O A A AO O O O A B C C D C C A A AB B D D A B C C D C C A A AB B D D
A O O O O O O A A AO O O O
A O O O O O O A A AO O O O
A A O O O O O O O A A AO O O O O
O O B D C B C O O OB B D D O O O O O O O O
C C D C C A A AB B D D D D D O O O O O O O O O O O O O
C C D C C A A AB B D D D D D O O O O O O O O O O O O O A B C C D C C A A AB B D D O O C C C C O O O O O O O O B B B O O O O
C C D C C A A AB B D D D D D O O O O O O O O O O O O O A B C C D C C A A AB B D D O O C C C C O O O O O O O O B B B O O O O A B C C D C C A A AB B D D A A A A A A A B B B B B C C C C D D D Multi-classification can be broken down into many smaller binary problems.
Methodologies Consider binary problems for Supervised Normalized Cut (SNC) Graph notation A variant of Normalized Cut (NC’) Supervised Normalized Cut Support vector machine (SVM)
Graph Notation Graph representation G = (V,E) W ij : Similarity between data i and j Similarity measures: 1) Euclidean metric, 2) City Block metric, 3) Mahalanobis, 4) Minkowski, 5) Correlation Cut w 12 w 13 w 14
Normalized Cut’ Variant Normalized Cut Bi-objective: [Hochbaum, 2010] Intra-cluster: similarity small Inter-cluster: similarity large F(x) Clustering methods (as oppose to classification) 5
Normalized Cut’ Solved efficiently Construction of a s-t graph Run minimum-cut algorithm Seeds: forced a-priori
Normalized Cut’ Solved efficiently Construction of a s-t graph Run minimum-cut algorithm Seeds: forced a-priori
Normalized Cut’ Solved efficiently Construction of a s-t graph Run minimum-cut algorithm Seeds: forced a-priori
Supervised Normalized Cut The black box: normalized cut’ Selection of seeds as training data points Normalized Cut’ Training Data as Seed Testing data Or Uranium Plutonium
Support Vector Machine Dotted: too sensitive Solid: Furthest from both data sets w : the slope of the plane; unknown. b : the intercept of the plane; unknown. x : a variable; : denotes inner products Discriminant plane: + b = 0 Discriminant function: f(x) = + b Training data (by calibration) New data Discriminant Plane
Support Vector Machine Construct discriminant function Separate training data points (calibration data) into two distinct groups Use discriminant function Classify testing data points (unknown data) into correct group
Support Vector Machine Advantage: Robust: avoid over-fitting Sparseness
Validation Methodology Procedure: 50% Sub-sampling SVM or SNC Calculate Accuracy 50% Testing50% Training 100 runs
Agenda Background Data Generation Data Processing Methodology Results Conclusion
Results – SVM Different Training- Testing ratio Different processing method Statistically significant 100 Runs Best data processing: Column stacking (CS)
Results – SVM
Results – Accuracy SNC vs. SVM More training data, more accuracy SNC is better both in accuracy and robustness.
Results – Misclassification of SNC Confusion Matrices: Uranium: always predicted correctly. the only source of error: misclassification of Plutonium as an Uranium sample.
Results – SNC vs. SVM Running Time SVM: directly proportional to the number of training data points, SNC: more efficient
Results – Multi-classification Non-radioactive: Latite (igneous rock) Background/blank Radioactive: Uranium: 235U Plutonium: 239Pu
Results – Multi-classification CS SD CSnSD N CategoriesmeanSDmeanSDmeanSDmeanSD Pu, U, Latite94.85%3.40% 98.34%0.98% 98.91%0.78% 84.61%0.27% Pu, U, Blank100.00%0.00% 99.88%1.21% %0.00% 87.37%0.54% Pu,U,Latite, Blank98.65%1.25% 98.32%1.75% 98.65%1.12% 86.22%0.62% Highly accurate and consistent prediction CSnSD is the best feature vector Blank has less effect on prediction than Latite All materials are considered, the prediction improves SNC can perform very well
Conclusion Supervised normalized cut (SNC). Support vector machine (SVM) SNC is superior in accuracy and efficiency Nuclear material detection Future research: 1. Generalization to other problems 2. Other voting schemes for multi-classification