Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Web Mining.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.
Presented by: GROUP 7 Gayathri Gandhamuneni & Yumeng Wang.
Artificial Immune Systems Razieh Khamseh-Ashari Department of Electrical and Computer Eng Isfahan University of Technology Supervisor: Dr. Abdolreza Mirzaei.
1 BY: Nazanin Asadi Zohre Molaei Isfahan University of Technology.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Clustering data in an uncertain environment using an artificial.
Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Artificial Immune Systems Andrew Watkins. Why the Immune System? Recognition –Anomaly detection –Noise tolerance Robustness Feature extraction Diversity.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
Improving Robustness in Distributed Systems Jeremy Russell Software Engineering Honours Project.
Learning Classifier Systems to Intrusion Detection Monu Bambroo 12/01/03.
Introduction to Evolutionary Computation  Genetic algorithms are inspired by the biological processes of reproduction and natural selection. Natural selection.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Data Mining By Archana Ketkar.
Discovery of Aggregate Usage Profiles for Web Personalization
Sparsity, Scalability and Distribution in Recommender Systems
1 A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier.
Artificial Immune Systems Our body’s immune system is a perfect example of a learning system. It is able to distinguish between good cells and potentially.
1 Real Time, Online Detection of Abandoned Objects in Public Areas Proceedings of the 2006 IEEE International Conference on Robotics and Automation Authors.
1 Synthesizing High-Frequency Rules from Different Data Sources Xindong Wu and Shichao Zhang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Birch: An efficient data clustering method for very large databases
Population-based metaheuristics Nature-inspired Initialize a population A new population of solutions is generated Integrate the new population into the.
By : Anas Assiri.  Introduction  fraud detection  Immune system  Artificial immune system (AIS)  AISFD  Clonal selection.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Chapter 14: Artificial Intelligence Invitation to Computer Science, C++ Version, Third Edition.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
NGDM’02 1 Efficient Data-Reduction Methods for On-line Association Rule Mining H. Bronnimann B. ChenM. Dash, Y. Qiao, P. ScheuermannP. Haas Polytechnic.
Indiana GIS Conference, March 7-8, URBAN GROWTH MODELING USING MULTI-TEMPORAL IMAGES AND CELLULAR AUTOMATA – A CASE STUDY OF INDIANAPOLIS SHARAF.
Nasraoui, Gonzalez, Cardona, Dasgupta: Scalable Artificial Immune System Based Data Mining NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD Artificial Immune Systems.
Demo. Overview Overall the project has two main goals: 1) Develop a method to use sensor data to determine behavior probability. 2) Use the behavior probability.
Automatic Test-Data Generation: An Immunological Approach Kostas Liaskos Marc Roper {Konstantinos.Liaskos, TAIC PART 2007.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
APPLICATION OF THE METHOD AND COMBINED ALGORITHM ON THE BASIS OF IMMUNE NETWORK AND NEGATIVE SELECTION FOR IDENTIFICATION OF TURBINE ENGINE SURGING Lytvynenko.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Pattern Discovery of Fuzzy Time Series for Financial Prediction -IEEE Transaction of Knowledge and Data Engineering Presented by Hong Yancheng For COMP630P,
1 Murat Ali Bayır Middle East Technical University Department of Computer Engineering Ankara, Turkey A New Reactive Method for Processing Web Usage Data.
Part II - Association Rules © Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II – Association Rules Margaret H. Dunham Department of.
K. Kolomvatsos 1, C. Anagnostopoulos 2, and S. Hadjiefthymiades 1 An Efficient Environmental Monitoring System adopting Data Fusion, Prediction & Fuzzy.
Changing the Rules of the Game Dr. Marco A. Janssen Department of Spatial Economics.
A Roadmap towards Machine Intelligence
Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Immunology B cells and Antibodies – humoral
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Presentation By SANJOG BHATTA Student ID : July 1’ 2009.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Surface Defect Inspection: an Artificial Immune Approach Dr. Hong Zheng and Dr. Saeid Nahavandi School of Engineering and Technology.
Supervised Time Series Pattern Discovery through Local Importance
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Market Basket Analysis and Association Rules
Web Mining Department of Computer Science and Engg.
Immunocomputing and Artificial Immune Systems
Presentation transcript:

Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment of the Requirement for the Degree of Master of Science in Artificial Intelligence-Computer Engineering 1

Outline The Immune System Natural and Artificial Association Rules Web Usage Mining Proposed Algorithm AISWUM Results and Conclusion 2

Natural Immune System A system that protects the body from foreign substances and pathogenic organisms. Immune System The immune system creates antibodies which match the antigens and cause the pathogens to be destroyed Antibody Substances capable of starting a specific immune response are referred to as antigens (viruses, bacteria, fungi). Antigen 3

A High Level Overview 4

Natural Immune System Immunity Innate Danger Theory Adaptive Clonal Selection Network Theory Affinity Maturation Hyper mutation 5

Innate versus Adaptive IS Innate immediately available for combat Adaptive antibody production specific to a determined infectious agent 6

Adaptive Immunity epitope Low affinity receptor structurally similar – high affinity 7

Clonal Selection & Affinity Maturation 8

Network Theory Ag Stimulation (Positive Response) Suppression (Negative Response) Idiotypic network (Jerne, 1974): B cells stimulate each other. Creates an immunological memory 9

Danger Theory 10

Artificial Immune System Algorithms Affinity Representation AIS A Framework for AIS 11

Association Rules Set of items: I={I 1,I 2,…,I m } Transactions: D={t 1,t 2, …, t n }, t j  I Itemset: {I i1,I i2, …, I ik }  I Large (Frequent) itemset: Itemset whose number of occurrences is above a threshold. Support of an itemset: Percentage of transactions which contain that itemset. 12

Given: a set of items I={I 1,I 2,…,I m }, a database of transactions D={t 1,t 2, …, t n } where t i ={I i1,I i2, …, I ik } and I ij  I, The Association Rule Problem is to identify all association rules X  Y with a minimum support and confidence. 13

Association Rule Mining Steps Find Frequent Itemsets. Generate rules from frequent itemsets. Challenging Step In Association Rule Mining 14

Goal In this project our goal is to find all the in using frequent itemsets Web usage data artificial immune system 15

Web Usage Mining Web usage mining also known as Web log mining Mining techniques to discover interesting usage patterns from the secondary data derived from the interactions of the users while surfing the web. 16

Web Usage Mining Applications Target potential customers for electronic commerce Enhance the quality and delivery of Internet information services to the end user Improve Web server system performance Identify potential prime advertisement locations Facilitates personalization/adaptive sites Improve site design Fraud/intrusion detection Predict user’s actions (allows prefetching) 17

Motivations (of choosing this application) 18

WUM-Definitions Set of all accessed to URLs of a Web site that is stored in Web server Web Logs A sequence of URLs that are accessed by a user in one visit of Web site. (Itemset) Session crowded paths that frequently are traversed by users. (Frequent Itemsets) Strong trend 19

Web Log O: || T:1997/09/12-22:43:00 ||U:/ || R: O: || T:1997/09/12-22:50:27 || U:/categories/software/ || R: O: || T:1997/09/12-22:50:38 || U:`/categories/software/Windows/ || R: O: || T:1997/09/12-22:50:47 || U:/categories/software/Windows/V909V03.TXT || R: ows/ O: || T:1997/09/12-22:51:06 || U:/categories/software/Windows/ || R: 20

Session Construction URLS ID X /0 /categories/soft ware/ 1 /categories/soft ware/Windows/ 2 /categories/soft ware/Windows/ V909V03.TXT 3 /categories 4 /manufacturers 5 /samples.html/ 6 /gearlists/ 7 /features/ 8 /ecards/ : 27 00: 11 02: 10 00: 19 02: 01 00: Duration Frequency 21

Representation Antibody: (strong trends) Antigen: (incoming sessions) URL1 (0/1) URL2 (0/1) URLm (0/1) URL1 (0/1) URL2 (0/1) URLm (0/1) 22

Scenario Antigen enters the body Determine if the first signal is produced? (2 signals are needed for an antigen to trigger AIS, first signal is produced if antigen is harmful to body) If first signal is produced, present antigen to antibodies and compute distance, weight and influence zone. Determine antibody with maximum weight. If maximum weight > threshold compute SL and IZ for antibody else create by duplication a new antibody. Clone and Mutate. 23

Danger Signal Danger Theory (two signal approach) If antigen is harmful trigger an IS response else discard the antigen. In data mining context : harmful interesting (valid) What is Danger signal in our system? ◦We should find a measure to determine the validity of sessions. 24

Validity Measure 25

Validity Measure 26

Affinity Measure What affinity measure is used in our proposed algorithm? 27

Affinity Measure Weight function decreases with distance from the antigen/data location. is a scale parameter that controls the decay rate of the weights along the spatial dimensions 28

Stimulation Level 29

Weighted Stimulation 30

Network Stimulation & Suppression 31

Cloning Antibodies are cloned in proportion to their stimulation levels relative to the average network stimulation. To avoid preliminary proliferation of antibodies and to encourage a diverse repertoire new antibodies do not clone before they are mature (their age exceeds a threshold) 32

Hypermutation Somatic hyper mutation is a powerful natural exploration mechanism in IS, that allows it to learn how to respond to new antigens that have never been seen before. very costly and inefficient operation since its complexity is exponential in the number of features. we model this operation in AIS by an instant antigen duplication whenever an antigen is encountered that fails to activate the entire immune network. 33

Directed Mutation Antibodies which are added to population via mutation are always superior individuals. In this mutation mechanism whenever the system realize there are not enough good antibodies to confront with antigens, new antibodies add to population. It is a new from of DANGER THEORY. Directed mutation mechanism is as follow: 34

Directed Mutation Web log In to the system 35

Directed Mutation

Directed Mutation

Directed Mutation

Directed Mutation

Directed Mutation

Directed Mutation

Directed Mutation

Decide to Mutate After some times

Mutation Occur After some times

Directed Mutation Directed mutation is not computationaly complex. It doesn't cause antibodies to destroy before they have to leave population. It make system intelligent -> system can decide when to create new individuals. After each T antigens enter the system, directed mutation happens. 45

Compression Compression: cluster antibody population into k clusters. external interactions: those occurring between an antigen (external agent) and the antibody in the immune network. internal interactions: those occurring between one antibody and all other antibodies in the immune network. The most expensive computation and storage overhead stems from calculating and storing all the internal network interactions (quadratic complexity with respect to the network size). After compression: ◦internal interactions: ◦external interactions: k choosing an appropriate number of clusters 46

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization

Algorithm Visualization X

Algorithm Visualization X

Algorithm Visualization

Algorithm Visualization

Pseudocode 71

Data Data set 1 One week of HTTP requests to Music Machine Web site Requests Sessions URLs. Data set 2 One week of HTTP requests to the University of Saskatchewan’s WWW server Requests Sessions URLs. 72

Ground Profiles For evaluating learned profiles, it should be shown that the learned profiles are good representatives of the input data: Summarization ability of AISWUM In order to show this ability, a comparison between distribution of the learned profiles and input data should be done, so: we need some ground profiles Ground profiles are extracted using: Scalable K-Means 73

Evaluation Metrics 74

Results (Music Machine) Distribution of the learned antibodies that are simultaneously precise and complete per input category at time t. 75

Precision Distribution of precise antibodies per input category at time t. 76

Coverage Distribution of complete antibodies per input category at time t. 77

Results (Saskatchewan University) Distribution of the learned antibodies that are simultaneously precise and complete per input category at time t. 78

Precision Distribution of precise antibodies per input category at time t. 79

Coverage Distribution of complete antibodies per input category at time t. 80

Evaluation Metrics Overall level of learned antibodies precision with respect to input data Ratio of learned antibodies that accurately represent the past input data to all of learned antibodies 81

Evaluation Metrics Overall coverage of learned antibodies with respect to input data Ratio of past input data that are summarized accurately with antibodies to the all input data. 82

Results (Music Machines) Ratio of learned antibodies that accurately represent past input data to the all of learned antibodies. Ratio of past input data that are summarized accurately with antibodies to the all input data. 83

Results (Saskatchewan) 84 Ratio of learned antibodies that accurately represent past input data to the all of learned antibodies. Ratio of past input data that are summarized accurately with antibodies to the all input data.

Results Maximum Contentment Minimum Contentment Average Contentment of 50 users 41%15%28% State 1 60%40%51% State 2 67%45%56%State 3 Danger Theory Weighted Items Weighted Sessions State 1No State 2YesNo State 3Yes 85

Run Time The rune time with one scan of data with non-optimal C++ code on Pentium 4 PC tooks: ◦For the first dataset: less than 6 min. ◦For the second dataset: less than 3 min. 86

Comparison with other methods Method AIS-WUMSKMDBSCANBIRCHaiNet Fuzzy AIS SOSDM Reliability/Insensitivit y to initial condition YesNoYesNoYes Noise toleranceYesNoYesNo Yes Moderately Need to scan before learning NoYes No Time complexity O(N) O(Nlog(N))O(N)O(N²) O(N) Buffer dataNoYes Number of clusters specified NoYesNoYesNo Yes Handle evolving clusters YesNo Yes Automatic scale estimation YesNo Yes No Clustering ModelNetworkCentroidsMedoidsCentroidsNetwork Handle different similarity measures YesNoYesNoYes Density/Partition based DensityPartition/ Distance DensityPartitionPartition/ Distance DensityPartition/ Distance 87

Novelties of the proposed algorithm Low Computational Complexity. Danger Theory in Two FormsDirected MutationWeighted Stimulation Learning the Data in a Single PassNatural MechanismApplicable to Stream Data Bi-functionality: Frequent Itemsets Mining + Finding Centroids of Clusters in Large Datasets Clear and fast identification of outliers. 88

Conclusion A robust and scalable algorithm for frequent itemsets mining is designed which is well fitted for noisy sparse data like Web usage data. 89

Conclusion The main factor behind the ability of proposed algorithm to learn in a single pass lies in the richness of the immune network structure that form a dynamic synopsis of the data and danger theory which decide which antigen is dangerous and when new antibodies are needed for combating antigens. 90

Publications B.Hoda Helmi, Adel T. Rahmani, Nona Helmi, “An Evolutionary Control Model for a Generic Multiagent System Using Artificial Immune Systems”, in proceeding of First Joint Congress on Fuzzy and Intelligent Systems,2007, Ferdowsi University. B. Hoda Helmi, Adel T. Rahmani, “Image Segmentation with a New Texture Feature Based on AIS ”, In proceeding of the first conference on Data Mining, AmirKabir University, 2007, Tehran, Iran.(farsi) B.Hoda Helmi, Adel T. Rahmani, “An AIS Algorithm for Web Usage Mining with Directed Mutation”, accepted in IEEE World Congress on Computational Intelligence, CEC division, 2008, Hong Kong. B. Hoda Helmi, Adel T. Rahmani, “An Enhanced AIS for WUM, inspired by Danger Theory”, submitted to ICEE 2008, Tarbiat Modarres University, 2008, Tehran, Iran. (farsi) 91

Publications Adel T. Rahmani, B.Hoda Helmi, “EIN-WUM an AIS-based Algorithm for Web Usage Mining”, submitted to Genetic and Evolutionary Computation Conference, 2008, Atlanta, Georgia. B. Hoda Helmi, Adel T. Rahmani, “A New Web Usage Mining Method based on An Artificial Immune System Solution with Enhanced Network and Danger Theory ”, submitted to International Journal of Control, Automation, and Systems. B.Hoda Helmi, Adel T. Rahmani, “Evolutionary based Combining of Evolved Neural Network Classifiers”, accepted in IASTAD International Conference on Signal Processing, Pattern Recognition and applications, 2006, Austria. (unrelated) 92

پایان Thanks 93

Somatic Hypermutation 94

Cross Reaction 95

Artificial Immune System Algorithms Affinity Representation AIS A Framework for AIS Shape-Space Binary Integer Real-valued Symbolic 96

Artificial Immune System Algorithms Affinity Representation AIS A Framework for AIS Euclidean Manhattan Hamming … 97

Artificial Immune System Algorithms Affinity Representation AIS A Framework for AIS Bone Marrow Clonal Selection Negative Selection Positive Selection Immune Network 98

Sessions : : : :0000:0000:3500: : :0000:5400: : :0700:00 00: : :0002:0000: Sesison 1 Sesison 2 Sesison 3 Sesison 4 Sesison 5 99

Affinity Measure 100

Precision versus Coverage 101