Saket Anand David Madigan Richard Mammone Saumitr Pathak Fred Roberts

Slides:



Advertisements
Similar presentations
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Advertisements

Hypothesis testing Another judgment method of sampling data.
Fast Algorithms For Hierarchical Range Histogram Constructions
Evaluating Diagnostic Accuracy of Prostate Cancer Using Bayesian Analysis Part of an Undergraduate Research course Chantal D. Larose.
Classification and risk prediction
1 Algorithms for Port of Entry Inspection: Finding Optimal Binary Decision Trees Fred S. Roberts Rutgers University.
1 Decision Support Algorithms for Port of Entry Inspection Fred S. Roberts DIMACS Center, Rutgers University In collaboration with Los Alamos National.
1 Algorithms for Port of Entry Inspection for WMDs Fred S. Roberts DyDAn Center Rutgers University.
Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,
1 Algorithms for Port of Entry Inspection for WMDs Fred S. Roberts DIMACS Center, Rutgers University.
1 Algorithms for Port of Entry Inspection: Finding Optimal Binary Decision Trees Fred S. Roberts Rutgers University.
1 Algorithms for Port of Entry Inspection for WMDs Fred S. Roberts DyDAn Center Rutgers University.
1 Decision Support for Port of Entry Inspection Fred S. Roberts DyDAn Center Rutgers University.
Decision Theory Naïve Bayes ROC Curves
1 Algorithms for Port of Entry Inspection for WMDs Fred S. Roberts DIMACS Center, Rutgers University Philip D. Stroud Los Alamos National Laboratory.
ROC Curves.
1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Lecture Slides Elementary Statistics Twelfth Edition
Control Charts for Attributes
Data Selection In Ad-Hoc Wireless Sensor Networks Olawoye Oyeyele 11/24/2003.
Computer Vision Lecture 8 Performance Evaluation.
Inferring Decision Trees Using the Minimum Description Length Principle J. R. Quinlan and R. L. Rivest Information and Computation 80, , 1989.
1 Lecture 19: Hypothesis Tests Devore, Ch Topics I.Statistical Hypotheses (pl!) –Null and Alternative Hypotheses –Testing statistics and rejection.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Evaluating Results of Learning Blaž Zupan
Chapter 5 Probability Distributions 5-1 Overview 5-2 Random Variables 5-3 Binomial Probability Distributions 5-4 Mean, Variance and Standard Deviation.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Chapter 6 Normal Approximation to Binomial Lecture 4 Section: 6.6.
Biostatistics Class 2 Probability 2/1/2000.
Lecture 1.31 Criteria for optimal reception of radio signals.
Physics 114: Lecture 13 Probability Tests & Linear Fitting
CHAPTER 9 Testing a Claim
Chapter 7. Classification and Prediction
Decisions Under Risk and Uncertainty
BAE 5333 Applied Water Resources Statistics
CHAPTER 13 Design and Analysis of Single-Factor Experiments:
Evaluating Results of Learning
CHAPTER 9 Testing a Claim
Hypothesis Testing Is It Significant?.
Chapter 5 Sampling Distributions
Chapter 8: Inference for Proportions
Evaluating Classifiers
Chapter 5 Sampling Distributions
Data Mining Classification: Alternative Techniques
Numerical Descriptive Measures
Chapter 9 Hypothesis Testing.
LECTURE 05: THRESHOLD DECODING
CHAPTER 29: Multiple Regression*
Hidden Markov Models Part 2: Algorithms
Chapter 5 Sampling Distributions
LECTURE 05: THRESHOLD DECODING
Optimization Problems in Economic Epidemiology
Pattern Recognition and Image Analysis
BASIC REGRESSION CONCEPTS
Classification of class-imbalanced data
Presented by: Chang Jia As for: Pattern Recognition
5.2 Least-Squares Fit to a Straight Line
Numerical Descriptive Measures
Product moment correlation
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
LECTURE 05: THRESHOLD DECODING
Evaluation and Its Methods
Roc curves By Vittoria Cozza, matr
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Logistic Regression.
Evaluation and Its Methods
Presentation transcript:

Saket Anand David Madigan Richard Mammone Saumitr Pathak Fred Roberts Experimental Analysis of Sequential Decision Making Algorithms for Port of Entry Inspection Procedures Saket Anand David Madigan Richard Mammone Saumitr Pathak Fred Roberts RUTGERS UNIVERSITY

Port of Entry Inspection Procedures Goal: Find ways to intercept illicit nuclear materials and weapons destined for the U.S. via the maritime transportation system subject to minimization of delays, manpower and equipment utilization Find inspection schemes that minimize total “cost” including “cost” of false positives and false negatives

Port of Entry Inspection Procedures Formulation of the problem as a complex sequential decision making problem Containers have attributes Sample attributes: Does ship’s manifest set off an “alarm”? What is the neutron or Gamma emission count? Is it above threshold? The Decision Maker’s Problem: Which attributes to inspect? Which inspections next based on previous results? Approach: Builds on ideas of Stroud and Saeger at Los Alamos National Laboratory Mobile VACIS: truck-mounted gamma ray imaging system

Sequential Decision Making Problem Simplest Case: Attributes are in state 0 or 1 Sensors measure presence/absence of attributes Then: Container is a binary string like 0110 So: Classification is a Boolean decision function F that assigns each binary string to a category (0 or 1). 0110 F(0110) = 0/1 Category 0 = “ok” and 1 = “suspicious” Example: 00001111 ; F(000) = F(001) = F(010) = F(011) = 0, F(100) = F(101) = F(110) =F(111) = 1,

Binary Decision Tree Approach Nodes are sensors (A,B,C, etc.) or categories (0 or 1) Stroud and Saeger enumerate all “complete” and monotone boolean functions and calculate the least expensive corresponding binary decision trees. No. of attributes Number of distinct BDTs for all boolean functions Complete and monotone boolean functions Number of distinct BDTs for complete and monotone boolean functions 2 74 4 3 16,430 9 60 1,079,779,602 114 11,808 5 5x1018 6894 263,515,920

Sensitivity Analysis of Stroud Saeger Experiments Aim – Experimental Analysis of the robustness of the optimal binary decision tree (BDT) implementing the inspection scheme found by the Stroud-Saeger1 approach How sensitive are the optimal Binary Decision Trees to variations in the cost and sensor parameters? 1 Stroud, P. D. and Saeger K. J., “Enumeration of Increasing Boolean Expressions and Alternative Digraph Implementations for Diagnostic Applications”, Proceedings Volume IV, Computer, Communication and Control Technologies, (2003), 328-333

Cost Function used for Evaluating the Decision Trees. CTot = CFalsePositive *PFalsePositive + CFalseNegative *PFalseNegative + Cutil CFalsePositive is the cost of false positive (Type I error) CFalseNegative is the cost of false negative (Type II error) PFalsePositive is the probability of a false positive occurring PFalseNegative is the probability of a false negative occurring Cutil is the expected cost of utilization of the tree. Note: In this cost model, other costs such as fixed costs and costs due to delay are not considered

Probability of Error for Individual Sensors For ith sensor, the type 1 (P(Yi=1|X=0)) and type 2 (P(Yi=0|X=1)) errors are modeled using Gaussian distributions. State of nature X=0 represents absence of a bomb. State of nature X=1 represents presence of a bomb. Yi represents the outcome (count) of sensor i. Σi is variance of the distributions PD = prob. of detection, PF = prob. of false positive Ki P(Yi|X=1) P(Yi|X=0) Ti Characteristics of a typical sensor i

Probability of Error for The Entire Tree State of nature is zero (X = 0), absence of a bomb State of nature is one (X = 1), presence of a bomb A B C 1 Probability of false positive (P(Y=1|X=0)) for this tree is given by Probability of false negative (P(Y=0|X=1)) for this tree is given by P(Y=1|X=0) = P(YA=1|X=0) * P(YB=1|X=0) + P(YA=1|X=0) *P(YB=0|X=0)* P(YC=1|X=0) Pfalsepositive P(Y=0|X=1) = P(YA=0|X=1) + P(YA=1|X=1) *P(YB=0|X=1)*P(YC=0|X=1) Pfalsenegative

Stroud Saeger Experiments Stroud-Saeger ranked all trees formed from 3 or 4 sensors A, B, C and D according to increasing tree costs. Used cost function defined above. Values used in their experiments: CA = .25; P(YA=1|X=1) = .9856; P(YA=1|X=0) = .0144; KA = 4.37; ΣA = 1; CB = 1; P(YC=1|X=1) = .7779; P(YB=1|X=0) = .2221; KB = 1.53; ΣB = 1; CC = 10; P(YD=1|X=1) = .9265; P(YC=1|X=0) = .0735; KC = 2.9; ΣC = 1; CD = 30; P(YD=1|X=1) = .9893; P(YD=1|X=0) = .0107; KD = 4.6; ΣD = 1; Here, Ci = unit cost of utilization of sensor i, Ki is the sensor discrimination power and Σi is the relative spread factor for sensor i. Also fixed were: CFalseNegative, CFalsePositive, P(X=1)

Stroud Saeger Experiments: Our Sensitivity Analysis CFalseNegative was varied between 25 million and 500 billion dollars Low and high estimates of direct and indirect costs incurred due to a false negative. CFalsePositive was varied between $180 and $720 Cost incurred due to false positive (4 men * (3 -6 hrs) * (15 – 30 $/hr) P(X=1) was varied between 3x10-9 and 1x10-5

Stroud Saeger Experiments: Our Sensitivity Analysis First set of Computer experiments: n = 3; (use sensors, A, C and D) Experiment 1: Fix values of two of CFalseNegative, CFalsePositive, P(X=1) and vary the third through their interval of possible values. Experiment 2: Fix a value of one of CFalseNegative, CFalsePositive, P(X=1) and vary the other two. Do 10,000 experiments each time. Look for the variation in the highest ranked tree.

Variation of CTot vs. CFalseNegative 493 P(X=1) and CFalsePositive were kept constant at the specified value and CTot was computed for 10,000 randomly selected values of CFalseNegative in the specified range. Randomly selected fixed parameter values

Variation of CTot vs. CFalsePositive 426,807,420,776 P(X=1) and CFalseNegative were kept constant at the specified value and CTot was computed for 10,000 randomly selected values of CFalsePositive in the specified range. Randomly selected fixed parameter values

Variation of CTot vs. P(X=1) 447,470,143,842 352 CFalsePositive and CFalseNegative were kept constant at the specified value and CTot was computed for 10,000 randomly selected values of P(X=1) in the specified range. Randomly selected fixed parameter values

Variation of CTot wrt CFalseNegative and CFalsePositive Randomly selected fixed parameter values CTot = CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1) + Cutil

Variation of CTot wrt CFalseNegative and P(X=1) 669 Randomly selected fixed parameter values CTot = CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1) + Cutil

Variation of CTot wrt CFalsePositive and P(X=1) 82,737,009,757 Randomly selected fixed parameter values CTot = CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1) + Cutil

Structure of trees which came first Rank with 3 sensors (A, C and D) 1 a d c 1 a d d c a 1 Tree number 37 Boolean Fn: 00011111 Tree number 49 Boolean Fn: 01010111 Tree number 55 Boolean Fn: 01111111 In the 10,000 experiments, only 3 out of the 60 Binary Decision Trees ever attained first rank

Stroud Saeger Experiments: Our Sensitivity Analysis Second set of computer experiments: n = 4; (use sensors, A, B, C, D). Experiment 1: Fix values of two of CFalseNegative, CFalsePositive, P(X=1) and vary the third through their interval of possible values. Experiment 2: Fix a value of one of CFalseNegative, CFalsePositive, P(X=1) and vary the other two. Do 10,000 experiments each time. Look for the variation in the highest ranked tree.

Variation of CTot vs. CFalseNegative 189 P(X=1) and CFalsePositive were kept constant at the specified value and CTot was computed for 10,000 randomly selected values of CFalseNegative in the specified range. Randomly selected fixed parameter values

Variation of CTot vs. CFalsePositive 240,407,400,315 P(X=1) and CFalseNegative were kept constant at the specified value and CTot was computed for 10,000 randomly selected values of CFalsePositive in the specified range. Randomly selected fixed parameter values

Variation of CTot vs. P(X=1) 406,238,290,733 298 CFalsePositive and CFalseNegative were kept constant at the specified value and CTot was computed for 10,000 randomly selected values of P(X=1) in the specified range. Randomly selected fixed parameter values

Variation of CTot wrt CFalseNegative and CFalsePositive Randomly selected fixed parameter values CTot = CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1) + Cutil

Variation of CTot wrt CFalseNegative and P(X=1) 454 Randomly selected fixed parameter values CTot = CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1) + Cutil

Variation of CTot wrt CFalsePositive and P(X=1) 47,484,728,943 Randomly selected fixed parameter values CTot = CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1) + Cutil

Structure of trees and corresponding Boolean Expressions for n = 4 1 a c d a b 1 c d Tree number 11785 Boolean Fn: 0111111111111111 Tree number 11605 Boolean Fn: 0101011111111111 In the above experiments, ≤ 10 out of the 11,808 Binary Decision Trees ever attained first rank

Structure of trees and corresponding Boolean Expressions for n = 4 1 a b 1 d c Tree number 9133 Boolean Fn: 0001010111111111 Tree number 8965 Boolean Fn: 0001010101111111 In the above experiments, ≤ 10 out of the 11,808 Binary Decision Trees ever attained first rank

Receiver Operating Characteristic (ROC) Curve The ROC curve is the plot of the probability of correct detection (PD) vs. the probability of false positive (PF) Sensor threshold is varied to select an operating point; trade off between the PD and PF Each sensor has an ROC curve and the combination of sensors into a decision tree has a composite ROC curve Equal Error Rate (EER) is the operating point on the ROC curve where, PF = 1 - PD P(Yi|X=1) P(Yi|X=0) Ti Ki PD PF Operating Point 1 EER

Sensitivity to Sensor Performance Following experiments have been done using sensors A, B, C and D as described below by varying the individual sensor thresholds Ti from -4.0 to +4.0 in steps of 0.4. These values were chosen since they gave us a ROC curve for the individual sensors over a complete range P(Yi=1|X=0) and P(Yi=1|X=1) PF for the ith sensor is computed as: P(Yi=1|X=0) = 0.5 erfc[Ti/√2] PD for the ith sensor is computed as: P(Yi=1|X=1) = 0.5 erfc[(Ti-Ki)/(Σi√2)] CA = .25; KA = 4.37; ΣA = 1; CB= 1; KB = 1.53; ΣB = 1 CC = 10; KC = 2.9; ΣC = 1; CD = 30; KD = 4.6; ΣD = 1 where Ci is the individual cost of utilization of sensor i, Ki is the discrimination power of the sensor and Σi is the spread factor for the sensor

Performance (ROC) of Binary Decision Tree number 37 (3 sensors) The lines represent performance characteristics (ROC curve) of sensors A, C and D. The green dots represent the performance characteristics (P(Y=1|X=0), P(Y=1|X=1)) of the tree over all combinations of sensor thresholds (Ti). 15 out of 60 trees attained first rank

Performance (ROC) of Binary Decision Tree number 37 (3 sensors) Assuming performance probabilities (P(Y=1|X=1) and P(Y=1|X=0)) to be monotonically related (in the sense that P(Y=1|X=1) can be called a monotonic function of P(Y=1|X=0)), we can find an ROC curve for the tree consisting of the set containing maximum P(Y=1|X=1) value corresponding to given P(Y=1|X=0) value. The blue dots represent such an ROC curve, the “best” ROC curve for tree 37.

Performance (ROC) of Binary Decision Tree number 445 (4 sensors) The lines represent performance characteristics (ROC curve) of sensors A, B, C and D. The green dots represent the performance characteristics (P(Y=1|X=0), P(Y=1|X=1)) of the tree over all combinations of sensor thresholds (Ti). Only 244 of 11,808 trees attained first rank

Performance (ROC) of Binary Decision Tree number 445 (4 sensors) Assuming performance probabilities (P(Y=1|X=1) and P(Y=1|X=0)) to be monotonically related (in the sense that P(Y=1|X=1) can be called a monotonic function of P(Y=1|X=0)), we can find an ROC curve for the tree consisting of the set containing maximum P(Y=1|X=1) value corresponding to given P(Y=1|X=0) value. The blue dots represent such an ROC curve, the “best” ROC curve for tree 445.

Conclusions from Sensitivity Analysis Considerable lack of sensitivity to modification in parameters for trees using 3 or 4 sensors. Very few optimal trees. Very few boolean functions arise among optimal trees. Binary Decision Trees perform better than individual sensors

Some Research Challenges Explain why conclusions are so insensitive to variation in parameter values. Explore the structure of the optimal trees and compare the different optimal trees. Develop less brute force methods for finding optimal trees that might work if there are more than 4 attributes. Develop methods for approximating the optimal tree. Pallet VACIS

Acknowledgement Supported by Naval Research and National Science Foundation The authors thank Phil Stroud Kevin Saeger Rick Picard for providing data, code and ideas

Thank you http://dimacs.rutgers.edu/Workshops/PortofEntry/ Saket Anand – anands@caip.rutgers.edu Fred S. Roberts – froberts@dimacs.rutgers.edu

Monotone and Complete Boolean Functions Monotone Boolean Functions: Given two strings x1x2…xn, y1y2…yn Suppose that xi ≤ yi for all i implies that F(x1x2…xn) ≤ F(y1,y2…yn). Then we say that F is monotone. Then 11…1 has highest probability of being in category 1. Complete Boolean Functions: Boolean function F is complete if and only if F can be calculated by finding all n attributes and knowing the value of the input string on those attributes Example: F(111) = F(110) = F(101) = F(100) = 1, F(000) = F(001) = F(010) = F(011) = 0. F(abc) is determined without knowing b (or c). F is incomplete.