10/14/20051 Dissertation Proposal Negative selection algorithms: from the thymus to V-detector Zhou Ji, advised by Prof. Dasgupta.

Slides:



Advertisements
Similar presentations
Números.
Advertisements

1 A B C
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Worksheets.
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Applicability Issues of the (Real-valued) Negative Selection Algorithms Zhou Ji, Dipankar Dasgupta The University of Memphis GECCO 2006: July 11, 2006.
Augmented Negative Selection Algorithm with Variable-Coverage Detectors Zhou Ji, Zhou Ji, St. Jude Childrens Research Hospital Dipankar Dasgupta, Dipankar.
V-detector: a real-valued negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital.
Analysis of Dental Images using Artificial Immune Systems Zhou Ji 1, Dipankar Dasgupta 1, Zhiling Yang 2 & Hongmei Teng 1 1: The University of Memphis.
Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis.
Negative Selection Algorithms at GECCO /22/2005.
V-Detector: A Negative Selection Algorithm Zhou Ji, advised by Prof. Dasgupta Computer Science Research Day The University of Memphis March 25, 2005.
Real-valued negative selection algorithms Zhou Ji
Applications of one-class classification
Addition and Subtraction Equations
David Burdett May 11, 2004 Package Binding for WS CDL.
Continuous Numerical Data
Whiteboardmaths.com © 2004 All rights reserved
Create an Application Title 1Y - Youth Chapter 5.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Chapter 7 Sampling and Sampling Distributions
The 5S numbers game..
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Break Time Remaining 10:00.
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
EE, NCKU Tien-Hao Chang (Darby Chang)
You will need Your text Your calculator
PP Test Review Sections 6-1 to 6-6
Oil & Gas Final Sample Analysis April 27, Background Information TXU ED provided a list of ESI IDs with SIC codes indicating Oil & Gas (8,583)
Regression with Panel Data
Computer vision: models, learning and inference
Chapter 10 Estimating Means and Proportions
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Biology 2 Plant Kingdom Identification Test Review.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 Termination and shape-shifting heaps Byron Cook Microsoft Research, Cambridge Joint work with Josh Berdine, Dino Distefano, and.
Artificial Intelligence
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
: 3 00.
5 minutes.
1 Titre de la diapositive SDMO Industries – Training Département MICS KERYS 09- MICS KERYS – WEBSITE.
Static Equilibrium; Elasticity and Fracture
12 System of Linear Equations Case Study
Converting a Fraction to %
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Ch 14 實習(2).
Resistência dos Materiais, 5ª ed.
Clock will move after 1 minute
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Physics for Scientists & Engineers, 3rd Edition
Select a time to count down from the clock above
Copyright Tim Morris/St Stephen's School
1.step PMIT start + initial project data input Concept Concept.
9. Two Functions of Two Random Variables
4/4/2015Slide 1 SOLVING THE PROBLEM A one-sample t-test of a population mean requires that the variable be quantitative. A one-sample test of a population.
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Some Technical Considerations In negative selection algorithms 4/15/2005.
1 Negative selection algorithms: from the thymus to V-detector Dissertation defense Zhou Ji Major professor: Prof. Dasgupta Advisory committee: Dr. Lin,
Presentation transcript:

10/14/20051 Dissertation Proposal Negative selection algorithms: from the thymus to V-detector Zhou Ji, advised by Prof. Dasgupta

2 Outline Background of the area Major contributions of current work Description of the algorithm Demonstration of the software Experimental results Work to do next

background3 Background AIS (Artificial Immune Systems) – only about 10 years history Negative selection (development of T cells) Immune network theory (how B cells and antibodies interact with each other) Clonal selection (how a pool of B cells, especially, memory cells are developed) New inspirations from immunology: danger theory, germinal center, etc. Negative selection algorithms The earliest and most widely used AIS.

background4 Biological metaphor of negative selection How T cells mature in the thymus: The cell are diversified. Those that recognize self are eliminated. The rest are used to recognize nonself.

background5 The idea of negative selection algorithms (NSA) The problem to deal with: anomaly detection (or one-class classification) Detector set random generation: maintain diversity censoring: eliminating those that match self samples The concept of feature space and detectors

background6 Outline of a typical NSA Generation of detector set Anomaly detection: (classification of incoming data items)

background7 Family of NSA Types of works about NSA Applications: solving real world problems by using a typical version or adapting for specific applications Improving NSA of new detector scheme and generation method and analyzing existing methods. Works are data representation specific, mostly binary representation. Establishment of framework for binary representation to include various matching rules; discussion on uniqueness and usefulness of NSA; introduction of new concepts. What defines a negative selection algorithm? Representation in negative space One-class learning Usage of detector set

background8 Major issues in NSA Number of detectors Affecting the efficiency of generation and detection Detector coverage Affecting the accuracy detection Generation mechanisms Affecting the efficiency of generation and the quality of resulted detectors Matching rules – generalization How to interpret the training data depending on the feature space and representation scheme Issues that are not NSA specific Difficulty of one-class classification Curse of dimensionality

contribution9 V-detector: work done for the proposed dissertation to deal with the issues in NSA V-detector is a new negative selection algorithm. It embraces a series of related works to develop a more efficient and more reliable algorithm. It has its unique process to generate detectors and determine coverage.

contribution10 V-detectors major features Variable-sized detectors Statistical confidence in detector coverage Boundary-aware algorithm Extensibility

contribution11 Variable sized detectors in V-detector method are maximized detector Unanswered question: what is the self space? traditional detectors: constant size V-detector: maximized size

contribution12 Why is the idea of variable sized detectors novel? The rational of constant size: a uniform matching threshold Detectors of variable size exist in some negative selection algorithms as a different mechanism Allowing multiple or evolving size to optimize the coverage – limited by the concern of overlap Variable size as part of random property of detectors/candidates V-detector uses variable sized detectors to maximize the coverage with limited number of detectors Size is decided on by the training data Large nonself region is covered easily Small detectors cover holes Overlap is not an issue in V-detector

contribution13 Statistical estimate of detector coverage Exiting works: estimate necessary number of detectors – no direct relationship between the estimate and the actual detector set obtained. Novelty of V-detector: Evaluate the coverage of the actual detector set Statistical inference is used as an integrated components of the detector generation algorithm, not to estimate coverage of finished detector set.

contribution14 Basic idea leading to the new estimation mechanism Random points are taken as detector candidates. The probability that a random point falls on covered region (some exiting detectors) reflects the portion that is covered -- similar to the idea of Monte Carlo integral. Proportion of covered nonself space = probability of a sample point to be a covered point. (the points on self region not counted) When more nonself space has been covered, it becomes less likely that a sample point to be an uncovered one. In other words, we need try more random point to find a uncovered one - one that can be used to make a detector.

contribution15 Statistics involved Central limit theory: sample statistic follows normal distribution Using sample statistic to population parameter In our application, use proportion of covered random points to estimate the actual proportion of covered area Point estimate versus confidence interval Estimate with confidence interval versus hypothesis testing Proportion that is close to 100% will make the assumption of central limit theory invalid – not normal distribution. Purpose of terminating the detector generation proportion 01

16 Hypothesis testing Identifying null hypothesis/alternative hypothesis. Type I error: falsely reject null hypothesis Type II error: falsely accept null hypothesis The null hypothesis is the statement that wed rather take as true if there is not strong enough evidence showing otherwise. In other words, we consider type I error more costly. In term of coverage estimate, we consider falsely inadequate coverage is more costly. So the null hypothesis is: the current coverage is below the target coverage. Choose significant level: maximum probability we are willing to accept in making Type I Error. Collect sample and compute its statistic, in this case, the proportion. Calculate z score from proportion an compare with z If z is larger, we can reject null hypothesis and claim adequate coverage with confidence

17 Boundary-aware algorithm versus point-wise interpretation A new concept in negative selection algorithm Previous works of NSA Matching threshold is used as mechanism to control the extent of generalization However, each self sample is used individually. The continuous area represented by a group of sample is not captured. (point-wise interpretation) More specificity Relatively more aggressive to detect anomaly More generalization The real boundary is Extended. Desired interpretation: The area represented by The group of points

contribution18 Boundary–aware: using the training points as a collection Boundary-aware algorithm A clustering mechanism though represented in negative space The training data are used as a collection instead individually. Positive selection cannot do the same thing

contribution19 V-detector is more than a real-valued negative selection algorithm V-detector can be implemented for any data representation and distance measure. Usually negative selection algorithms were designed with specific data representation and distance measure. The features we just introduced are not limited by representation scheme or generation mechanism. (as long as we have a distance measure and a threshold to decide matching)

contribution20 V-detector algorithm with confidence in detector coverage

contribution21 V-detector algorithm with confidence in detector coverage

contribution22 V-detector algorithm with confidence in detector coverage

contribution23 V-detectors contributions Efficiency: fewer detectors fast generation Coverage confidence Extensibility, simplicity

24 Experiments A large pool of synthetic data (2-D real space) are experimented to understand V-detectors behaviorsynthetic data More detail analysis of the influence of various parameters is planned as work to do Real world data Confirm it works well enough to detect real world anomaly Compare with methods dealing with similar problems Demonstration How actual training data and detector look like Basic UI and visualization of V-detector implementation

25 Parameters to evaluate its performance Detection rate False alarm rate Number of detectors

contribution26 Control parameters and algorithm variations Self radius – key parameter Target coverage Significant level (of hypothesis testing) Boundary-aware versus point-wise Hypothesis testing versus naïve estimate Reuse random points versus minimum detector set (to be implemented)

contribution27 Datas influence on performance Specific shape Intuitively, corners will affect the results. Number of training points Major influence

28 Synthetic data (intersection and pentagram): compare naïve estimate and hypothesis testing intersection shapepentagram

29 Synthetic data : results for different shapes of self region

30 Synthetic data (ring): compare boundary-aware and point-wise Detection rate False alarm rate

31 Synthetic data (cross-shaped self): balance of errors

contribution32 Real world data Biomedical data Pollution data Ball bearing – preprocessed time series data Others: Iris data, gene data, India Telugu

33 Results of biomedical data Training DataAlgorithmDetection RateFalse Alarm rateNumber of Detectors MeanSDMeanSDMeanSD 100% trainingMILA * 0 NSA r= r= % trainingMILA * 0 NSA r = r= % trainingMILA * 0 NSA r= r=

34 Results of air pollution data Detection rate and false alarm rateNumber of detectors

35 Ball bearing data raw data: time series of acceleration measurements Preprocessing (from time domain to representation space for detection) 1. FFT (Fast Fourier Transform) with Hanning windowing: window size Statistical moments: up to 5 th order Example of raw data (new bearings, first 1000 points)

contribution36 Ball bearing experiments with two different preprocessing techniques

37 Results of Iris data Detection rateFalse alarm rate Setosa 100%MILA NSA (single level)100 0 V-detector Setosa 50%MILA NSA (single level) V-detector Versicolor 100%MILA NSA (single level) V-detector Versicolor 50%MILA NSA (single level) V-detector Virginica 100%MILA NSA (single level) V-detector Virginica 50%MILA NSA (single level) V-detector

to do38 Work to do next Extension to different data representation Searching for real world applications Compare with other methods, e.g. SVM Analysis on the influence of control parameters and algorithm variations Analysis on the influence of control parameters and algorithm variations

39 Publications Dasgupta, Ji, Gonzalez, Artificial immune system (AIS) research in the last five years, CEC 2003 Ji, Dasgupta, Augmented negative selection algorithm with variable- coverage detectors, CEC 2004 Ji, Dasgupta, Real-valued negative selection algorithm with variable-sized detectors, GECCO 2004 Ji, Dasgupta, Estimating the detector coverage in a negative selection algorithm, GECCO 2005 Ji, A boundary-aware negative selection algorithm, ASC 2005 Ji, Dasgupta, Revisiting negative selection algorithms, submitted to the Evolutionary Computation Journal Ji, Dasgupta, An efficient negative selection algorithm of probably adequate coverage, submitted to SMC

40 Questions and comments? Thank you!

41 What is matching rule? When a sample and a detector are considered matching. Matching rule plays an important role in negative selection algorithm. It largely depends on the data representation.

42 In real-valued representation, detector can be visualized as hyper-sphere. Candidate 1: thrown-away; candidate 2: made a detector. Match or not match?

43 Experiments and Results Synthetic Data 2D. Training data are randomly chosen from the normal region. Fishers Iris Data One of the three types is considered as normal. Biomedical Data Abnormal data are the medical measures of disease carrier patients. Air Pollution Data Abnormal data are made by artificially altering the normal air measurements Ball bearings: Measurement: time series data with preprocessing - 30D and 5D

44 Synthetic data - Cross-shaped self space Shape of self region and example detector coverage (a) Actual self space (b) self radius = 0.05 (c) self radius = 0.1

45 Synthetic data - Cross-shaped self space Results Detection rate and false alarm rateNumber of detectors

46 Synthetic data - Ring-shaped self space Shape of self region and example detector coverage (a) Actual self space (b) self radius = 0.05 (c) self radius = 0.1

47 Synthetic data - Ring-shaped self space Results Detection rate and false alarm rateNumber of detectors

48 Iris data Comparison with other methods: number of detectors meanmaxMinSD Setosa 100% Setosa 50% Veriscolor 100% Versicolor 50% Virginica 100% Virginica 50%

49 Iris data Virginica as normal, 50% points used to train Detection rate and false alarm rateNumber of detectors

50 Biomedical data Blood measure for a group of 209 patients Each patient has four different types of measurement 75 patients are carriers of a rare genetic disorder. Others are normal.

51 Biomedical data Detection rate and false alarm rateNumber of detectors

52 Air pollution data Totally 60 original records. Each is 16 different measurements concerning air pollution. All the real data are considered as normal. More data are made artificially: 1. Decide the normal range of each of 16 measurements 2. Randomly choose a real record 3. Change three randomly chosen measurements within a larger than normal range 4. If some the changed measurements are out of range, the record is considered abnormal; otherwise they are considered normal Totally 1000 records including the original 60 are used as test data. The original 60 are used as training data.

53 Example of data (FFT of new bearings) --- first 3 coefficients of the first 100 points

54 Example of data (statistical moments of new bearings) --- moments up to 3rd order of the first 100 points

55 Ball bearings structure and damage Damaged cage

56 Ball bearing data: results Ball bearing conditionsTotal number of data pointsNumber of detected anomalies Percentage detected New bearing (normal)273900% Outer race completely broken % Broken cage with one loose element % Damage cage, four loose elements % No evident damage; badly worn % Ball bearing conditionsTotal number of data pointsNumber of detected anomalies Percentage detected New bearing (normal)265100% Outer race completely broken % Broken cage with one loose element % Damage cage, four loose elements289200% No evident damage; badly worn289200% Preprocessed with FFT Preprocessed with statistical moments

57 Ball bearing data: performance summary

58 How much one sample tells

59 Samples may be on boundary

60 In term of detectors

61 Comparing three methods Constant-sized detectors V-detector New algorithm Self radius = 0.05

62 Comparing three methods Constant-sized detectors V-detectorsNew algorithm Self radius = 0.1

contribution63 Experiments on 2-D synthetic data Training points (1000)Test data (1000 points) and the real shape we try to learn

contribution64 Detector sets generated Trained with 1000 pointsTrained with 100 points

65 Back to the presentation