Download presentation
Presentation is loading. Please wait.
Published byMaria Cox Modified over 9 years ago
1
Nasraoui, Gonzalez, Cardona, Dasgupta: Scalable Artificial Immune System Based Data Mining NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD Artificial Immune Systems and Data Mining: Bridging the Gap with Scalability and Improved Learning Olfa Nasraoui, Fabio González Cesar Cardona, Dipankar Dasgupta The University of Memphis A Demo/Poster at the National Science Foundation Workshop on Next Generation Data Mining, Nov. 2002
2
Nasraoui, Gonzalez, Cardona, Dasgupta: Scalable Artificial Immune System Based Data Mining NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD Inspired by Nature… living organisms exhibit extremely sophisticated learning and processing abilities that allow them to survive and proliferate living organisms exhibit extremely sophisticated learning and processing abilities that allow them to survive and proliferate nature has always served as inspiration for several scientific and technological developments, exp: Neural Networks, Evolutionary Computation nature has always served as inspiration for several scientific and technological developments, exp: Neural Networks, Evolutionary Computation immune system: parallel and distributed adaptive system w/ tremendous potential in many intelligent computing applications. immune system: parallel and distributed adaptive system w/ tremendous potential in many intelligent computing applications.
3
Nasraoui, Gonzalez, Cardona, Dasgupta: Scalable Artificial Immune System Based Data Mining NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD What is the Immune System? Protects our bodies from foreign pathogens (viruses/bacteria) Protects our bodies from foreign pathogens (viruses/bacteria) Innate Immune System (initial, limited, ex: skin, tears, …etc) Innate Immune System (initial, limited, ex: skin, tears, …etc) Acquired Immune System (Learns how to respond to NEW threats adaptively) Acquired Immune System (Learns how to respond to NEW threats adaptively) Primary immune response Primary immune response First response to invading pathogens First response to invading pathogens Secondary immune response Secondary immune response Encountering similar pathogen a second time Encountering similar pathogen a second time Remember past encounters Remember past encounters Faster and stronger response than primary response Faster and stronger response than primary response
4
Nasraoui, Gonzalez, Cardona, Dasgupta: Scalable Artificial Immune System Based Data Mining NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD Points of Strength of The Immune System Recognition (Anomaly detection, Noise tolerance) Recognition (Anomaly detection, Noise tolerance) Robustness (Noise tolerance) Robustness (Noise tolerance) Feature extraction Feature extraction Diversity (can face an entire repertoire of foreign invaders) Diversity (can face an entire repertoire of foreign invaders) Reinforcement learning Reinforcement learning Memory (remembers past encounters: basis for vaccine) Memory (remembers past encounters: basis for vaccine) Distributed Detection (no single central system) Distributed Detection (no single central system) Multi-layered (defense mechanisms at multiple levels) Multi-layered (defense mechanisms at multiple levels) Adaptive (Self-regulated) Adaptive (Self-regulated)
5
Nasraoui, Gonzalez, Cardona, Dasgupta: Scalable Artificial Immune System Based Data Mining NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD Major Players: B-Cells Through a process of recognition and stimulation, B-Cells will clone and mutate to produce a diverse set of antibodies adapted to different antigens B-Cells secrete antibodies w/ paratopes that can bind to specific antigens (epitopes) and destroy their host invading agent through a KILL, SUICIDE, or INGEST signal. B-Cells antibody paratopes also can bind to antibody idiotopes on other B-Cells, hence sending a STIMULATE or SUPPRESS signal hence the Network Memory
6
Nasraoui, Gonzalez, Cardona, Dasgupta: Scalable Artificial Immune System Based Data Mining NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD Requirements for Clustering Data Streams (Barbara, 02) Compactness of representation Compactness of representation Network of B-cells: each cell can recognize several antigens Network of B-cells: each cell can recognize several antigens B-cells compressed into clusters/sub-networks B-cells compressed into clusters/sub-networks Fast incremental processing of new data points Fast incremental processing of new data points New antigen influences only activated sub-network New antigen influences only activated sub-network Activated cells updated incrementally Activated cells updated incrementally Proposed approach learns in 1 pass. Proposed approach learns in 1 pass. Clear and fast identification of “outliers” Clear and fast identification of “outliers” New antigen that does not activate any subnetwork is a potential outlier create new B-cell to recognize it New antigen that does not activate any subnetwork is a potential outlier create new B-cell to recognize it This new B-cell could grow into a subnetwork (if it is stimulated by a new trend) or die/move to disk (if outlier) This new B-cell could grow into a subnetwork (if it is stimulated by a new trend) or die/move to disk (if outlier)
7
Nasraoui, Gonzalez, Cardona, Dasgupta: Scalable Artificial Immune System Based Data Mining NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD General Architecture 1-Pass Adaptive Immune Learning Evolving Immune Network (compressed into subnetworks) Evolving data Immune network information system Stimulation (competition & memory) Age (old vs. new) Outliers (based on activation) ?
8
Nasraoui, Gonzalez, Cardona, Dasgupta: Scalable Artificial Immune System Based Data Mining NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD Internal and External Immune Interactions: Before & After Internal Immune Interactions Lifeline of B-cell Internal Stimulation External Stimulation
9
Nasraoui, Gonzalez, Cardona, Dasgupta: Scalable Artificial Immune System Based Data Mining NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD Continuous Immune Learning Initialize ImmuNet and MaxLimitPresent NEW antigen data Update subNet* ‘s ARBs’ stimulations Clone and Mutate ARBs Trap Initial Data Compress ImmuNet Compress ImmuNet into K subNet’s Compute soft activations in subNet* Update subNet* ‘s ARB Influence range /scale Identify nearest subNet* Kill lethal ARBs #ARBs > MaxLimit? Kill extra ARBs (based on age/stimulation strategy) OR increase acuteness of competition OR Move oldest patterns to aux. storage Memory Constraints No Start/Reset Secondary storage Outlier? ImmuNet Stat’s & Visualization Activates ImmuNet? Clone antigen Yes No Yes Domain Knowledge Constraints
10
Nasraoui, Gonzalez, Cardona, Dasgupta: Scalable Artificial Immune System Based Data Mining NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD Model for Artificial Immune Cell Antigens represent data and the B-Cells represent clusters or patterns to be learned/extracted Antigens represent data and the B-Cells represent clusters or patterns to be learned/extracted ARB/B-cell object: ARB/B-cell object: Represents not just a single item, but a fuzzy set Represents not just a single item, but a fuzzy set Better Approximate Reasoning abilities Better Approximate Reasoning abilities Each ARB is allowed to have is own zone of influence with size/scale: i Each ARB is allowed to have is own zone of influence with size/scale: i ARBs dynamically adapt their influence zones/hence stimulation level in a strife for survival. ARBs dynamically adapt their influence zones/hence stimulation level in a strife for survival. Membership function dynamically adapts to data Membership function dynamically adapts to data Outliers are easily detected through weak activations Outliers are easily detected through weak activations No more dependence on hard threshold-cuts to establish network No more dependence on hard threshold-cuts to establish network Can include most probabilistic and possibilistic models of uncertainty Can include most probabilistic and possibilistic models of uncertainty Flexible for different attributes types (numerical, categorical, …etc) Flexible for different attributes types (numerical, categorical, …etc)
11
Nasraoui, Gonzalez, Cardona, Dasgupta: Scalable Artificial Immune System Based Data Mining NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD Immune Based Learning of Web profiles The Web server plays the role of the human body, and the incoming requests play the role of antigens that need to be detected The Web server plays the role of the human body, and the incoming requests play the role of antigens that need to be detected The input data is similar to web log data (a record of all files/URLs accessed by users on a Web site) The input data is similar to web log data (a record of all files/URLs accessed by users on a Web site) The data is pre-processed to produce session lists: The data is pre-processed to produce session lists: A session list S i for user #i is a list of URLs visited by same user A session list S i for user #i is a list of URLs visited by same user In discovery mode, a session is fed to the learning system as soon as it is available In discovery mode, a session is fed to the learning system as soon as it is available B-cell i : i th candidate profile: List of URLs Historic Evidence/Support: List of supporting cumulative conditional probabilities (URL k, prob(URL k )) with prob(URL k ) = prob(URL k | B-cell i ) Each profile has its own influence zone defined by i
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.