Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies.

1 Selected topics in Ant 2002 By Hanh Nguyen

2 Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies as Logistic Processes Optimizers Anti-pheromone as a Tool for Better Exploration of Search Space.

3 Homogeneous Ants for Web Document Similarity Modeling and Categorization

4 Outline Abstract Introduction Ant Colony Models for Data Clustering Homogeneous Multi-agent System for Document Clustering Experimental Result Concluding Remarks & Future Directions References

5 Abstract Apply the self-organizing and autonomous behavior of social insects for organizing of web document.

6 Introduction Why? The size of the internet has doubling its size every year. Estimated 2.1 billion as of July 2001 Organizing and categorizing document is not scalable to the growth of internet. Document clustering? Is the operation of grouping similar document to classes that can be used to obtain an analysis of the content. Ant clustering algorithm categorize web document to different interest domain.

7 Ant Colony Models for Data Clustering Data clustering? is the task that seek to identify groups of similar objects based on the value of their attributes. Messor sancta ants collect and pile dead corpses to form “cemeteries” (Deneubourg et al. ) f: fraction of items in the neighborhood of the agent k 1, k 2 : threshold constants

8 Ant Colony Models for Data Clustering The model later extend by Lume & Faieta to include distance function d, between data objects. c is a cell, N(c) is the number of adjacent cells of c, alpha is constant

9 Homogeneous Multi-agent System for Document Clustering Main components: colony of agents, feature vector of web document, 2D grid. Rule: agent move one step at a time to an adjacent cell. Only a single agent and/or a single item are allowed to occupy a cell at a time. Picking up or dropping item based on P p & P d N(c) = 8,o i is the item at cell i, g(o i ) determine the similarity of o i and other item of o j, where j E N(c) Density:

10 Homogeneous Multi-agent System for Document Clustering r is the number of common term in doc i and doc j m,n is the total number of term in doc i and doc j, respectively. F is the frequency Similarity measure

11 Homogeneous Multi-agent System for Document Clustering

12 Experimental Results Experimental data: 84 web pages from 4 different categories: Business, Computer, Health and Science. These web page have 17,776 distinct words. Use 30x30 toroidal grid 15 agents. t max is 300,000. k 1 and k 2 in [0.01, 0.2] increment of 0.05 for each run.

13 Experimental Results t = 0

14 Experimental Results t = 50,000

15 Experimental Results t = 200,000

16 Experimental Results t = 300,000

17 Experimental Result Table

18 Concluding Remarks and Future Directions Initial study on ant-based in clustering web documents. The experiments are able to product a near homogeneous clusters. Future work: a larger perceivable time dependent neighborhood for agent. Formulation of a stopping criterion based on homogeneity and spatial distribution of clusters. Introduction of pheromone traces.

19 References 1. Lawrence, S., Giles, C.L.: Accessibility of Information on the Web. Nature, 400. (1999) 2. Cyveillance: Sizing the Internet. A Cyveillance Study (2000) 3. Yahoo! Web Directory: 4. Baeza-Yates, R., Ribeiro-Yates, B.: Modern Information Retrieval. ACM, NY (1999) 5. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, NY (1973) 6. Bonabeau, E., Dorigo, M., Theraulaz, G.: Swarm Intelligence: From Natural to Artificial Systems. Oxford University Press, New York, NY (1999) 7. Deneubourg, J. L., Goss, S., Franks, N.R., Sendova-Franks, A., Detrain, C., Chretien, L.: The Dynamics of Collective Sorting: Robot-like Ants and Ant-like Robots. Proc. Int. Conf. Sim. of Adap. Behavior: From Animals to Animats. MIT, MA (1990)

