Download presentation
Presentation is loading. Please wait.
Published byKayla York Modified over 11 years ago
1
V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital
2
What is negative selection? Biological background: T cells, thymus Major steps: 1. Generate candidates randomly 2. Eliminate those that recognize self samples
3
Main steps Generation detection
4
What is matching rule? When a sample and a detector are considered matching. Matching rule plays an important role in negative selection algorithm. It largely depends on the data representation.
5
In real-valued representation, detector can be visualized as hyper-sphere. Candidate 1: thrown-away; candidate 2: made a detector. Match or not match?
6
Main idea of V-detector By allowing the detectors to have some variable properties, V-detector enhances negative selection algorithm from several aspects: It takes fewer large detectors to cover non-self region – saving time and space Small detector covers holes better. Coverage is estimated when the detector set is generated. The shapes of detectors or even the types of matching rules can be extended to be variable too.
7
Main concept of Negative Selection and V-detector Constant-sized detectorsVariable-sized detectors
8
Outline of the algorithm (generation of variable-sized detector set)
9
Detector Set Generation Algorithm Constant-sized detectors Variable-sized detectors
10
Screenshots of the software Message view Visualization of data points and detectors
11
Experiments and Results Synthetic Data 2D. Training data are randomly chosen from the normal region. Fishers Iris Data One of the three types is considered as normal. Biomedical Data Abnormal data are the medical measures of disease carrier patients. Air Pollution Data Abnormal data are made by artificially altering the normal air measurements Ball bearings: Measurement: time series data with preprocessing - 30D and 5D
12
Synthetic data - Cross-shaped self space Shape of self region and example detector coverage (a) Actual self space (b) self radius = 0.05 (c) self radius = 0.1
13
Synthetic data - Cross-shaped self space Results Detection rate and false alarm rateNumber of detectors
14
Error rates
15
Synthetic data - Ring-shaped self space Shape of self region and example detector coverage (a) Actual self space (b) self radius = 0.05 (c) self radius = 0.1
16
Synthetic data - Ring-shaped self space Results Detection rate and false alarm rateNumber of detectors
17
Iris data Comparison with other methods: performance Detection rateFalse alarm rate Setosa 100%MILA95.16 0 NSA (single level)100 0 V-detector99.980 Setosa 50%MILA94.02 8.42 NSA (single level)100 11.18 V-detector99.97 1.32 Versicolor 100%MILA84.370 NSA (single level)95.670 V-detector85.950 Versicolor 50%MILA84.4619.6 NSA (single level)9622.2 V-detector88.38.42 Virginica 100%MILA75.750 NSA (single level)92.510 V-detector81.870 Virginica 50%MILA88.9624.98 NSA (single level)97.1833.26 V-detector93.5813.18
18
Iris data Comparison with other methods: number of detectors meanmaxMinSD Setosa 100%204257.87 Setosa 50%16.443355.63 Veriscolor 100%153.242557238.8 Versicolor 50%110.081846022.61 Virginica 100%218.364437866.11 Virginica 50%108.122034630.74
19
Iris data Virginica as normal, 50% points used to train Detection rate and false alarm rateNumber of detectors
20
Biomedical data Blood measure for a group of 209 patients Each patient has four different types of measurement 75 patients are carriers of a rare genetic disorder. Others are normal.
21
Biomedical data: results comparison Training DataAlgorithmDetection RateFalse Alarm rateNumber of Detectors MeanSDMeanSDMeanSD 100% trainingMILA59.073.85001000 * 0 NSA69.362.670010000 r=0.130.613.040021.527.29 r=0.0540.513.920014.845.14 50% trainingMILA61.613.822.430.431000 * 0 NSA72.292.632.940.2110000 r = 0.132.922.350.610.3115.514.85 r=0.0542.893.831.070.4912.284 25% trainingMILA80.472.8014.932.081000 * 0 NSA86.962.7219.502.0510000 r=0.143.684.251.240.512.243.97 r=0.0557.975.862.630.778.942.57
22
Biomedical data Detection rate and false alarm rateNumber of detectors
23
Air pollution data Totally 60 original records. Each is 16 different measurements concerning air pollution. All the real data are considered as normal. More data are made artificially: 1. Decide the normal range of each of 16 measurements 2. Randomly choose a real record 3. Change three randomly chosen measurements within a larger than normal range 4. If some the changed measurements are out of range, the record is considered abnormal; otherwise they are considered normal Totally 1000 records including the original 60 are used as test data. The original 60 are used as training data.
24
Air pollution data Detection rate and false alarm rateNumber of detectors
25
Ball bearing data raw data: time series of acceleration measurements Preprocessing (from time domain to representation space for detection) 1. FFT (Fast Fourier Transform) with Hanning windowing: window size 30 2. Statistical moments: up to 5 th order
26
Example of data (raw data of new bearings) --- first 1000 points
27
Example of data (FFT of new bearings) --- first 3 coefficients of the first 100 points
28
Example of data (statistical moments of new bearings) --- moments up to 3rd order of the first 100 points
29
Ball bearings structure and damage Damaged cage
30
Ball bearing data: results Ball bearing conditionsTotal number of data pointsNumber of detected anomalies Percentage detected New bearing (normal)273900% Outer race completely broken2241218297.37% Broken cage with one loose element298857719.31% Damage cage, four loose elements298833711.28% No evident damage; badly worn29882096.99% Ball bearing conditionsTotal number of data pointsNumber of detected anomalies Percentage detected New bearing (normal)265100% Outer race completely broken2169167477.18% Broken cage with one loose element2892140.48% Damage cage, four loose elements289200% No evident damage; badly worn289200% Preprocessed with FFT Preprocessed with statistical moments
31
Ball bearing data: performance summary
32
New development of this work A new algorithm to generate variable-sized detectors. Purpose: reduce the possible false negative at the boundary of self region Why the issue exits: some self samples may be very close to the boundary. Main idea: differentiate between internal self samples and boundary self samples Solution: combine the advantage of the algorithms to generate variable-sized and constant-sized detectors described previously.
33
How much one sample tells
34
Samples may be on boundary
35
In term of detectors
36
Comparing three methods Constant-sized detectors V-detector New algorithm Self radius = 0.05
37
Comparing three methods Constant-sized detectors V-detectorsNew algorithm Self radius = 0.1
38
Work ongoing Estimate of coverage using formal statistics point estimate is the simplest method. Two types of statistical inference: 1. Confidence interval 2. Hypothesis testing
39
Point estimate of proportion
40
Summary 1. V-detector uses fewer detectors to obtain similar coverage. 2. Smaller detectors are more acceptable if the total number of detectors are largely controlled. 3. Coverage estimate is superior to fixed number of detectors. 4. V-detector can deal with high-dimensional data, including time series, better. 5. Self radius and estimated coverage are the two control parameters in V-detector. 6. Variable size, variable shape, variable matching rules, or other variable properties of detectors provide encouraging opportunity to enhance negative selection mechanism.
41
9-17-2004
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.