Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor Čakulev Intelligent Internet Search Department of Computer Engineering School of Electrical Engineering University of Belgrade POB 35-54, Belgrade Serbia, Yugoslavia
Problem statement Number of Internet presentations and Web servers grows exponentially Variety of presentations grows, too Search and retrieval of documents gets harder Existing tools do not give satisfactory results
Existing solutions Keyword search and document indexing - e.g. Altavista Following links - e.g. Spiders + search is exhaustive - too many keywords result in too few documents found, and vice versa - it requires a large database of indexed documents + fast, no indexing and no database - it searches only a limited number of documents + possibility of changing the input parameters during the search - poor evaluation function
Our solution Design of intelligent agents for Internet search Two basic approaches: 1. Simulated annealing - inherently serial 2. Genetic algorithms - inherently parallel Character of the search: 1. Local search - following only the links of the input documents - Best First Search Algorithm 2. Global search - following the links of the input documents and occasionally mutating them - Genetic Algorithm Spider implementation: 2. Mobile 1. Static
Our research Essence: Creating a set of packages for experimenting in the domain of intelligent Internet search All written in Sun Java - JDK 1.1 Lego approach - stand alone applications but easily interfaced with one another Code and executable version available at Further research in mobile domain
Measure the fitness value for each document in CC Set Select the best one for the Output Set Best First Search Algorithm Select the initial WWW presentation or a set thereof Extract all URLs and fetch the corresponding WWW presentations; They are inserted into the CurrentConfiguration Set CC Set Output Set and add documents linked to it into the CC Set. Input Set
Basic Genetic Algorithm 1. Initialize the population randomly pick a set of possible solutions 2. Select individuals for the mating pool measure the fitness value and pick the best ones 3. Perform crossover create new individuals using genetic material from parents in the mating pool 4. Perform mutation randomly create new individuals, completely unrelated to those in the mating pool 5. Insert offspring in the population 6. Is the stopping criteria satisfied? desired number of solutions is found or specified time for search has elapsed No? GOTO Step 2 Yes? The end!
Genetic Algorithm applied to Internet Search Select the initial WWW presentation or a set thereof Extract all URLs and fetch the corresponding WWW presentations; They are inserted into the CurrentConfiguration Set Measure the fitness value for each document in CC Set CC Set Output Set and add documents linked to it into the CC Set. Mutate - e.g. by inserting documents from the database of URLs Select the best one for the Output Set Database Input Set
Mutation operator Generational - generate a new URL DB based - pick existing URL from a database Semantic - use some logical reasoning to direct the search
Package #1 - Spider Spider - off-line browser Author: Saša Slijepčević Fetches all linked documents up to the specified depth and stores them on the local disk in the structure suitable for off-line browsing
Agent - program for the Best First Search Algorithm Author: Nela Tomča Package #2 - Agent Starts from the input set of URLs and finds the most similar to them following the links in input documents
Generator - program for generation of database of topic-sorted URLs Authors: Mladen Mrkić Vladan Obradović yahoo Database Package #3 - Generator It fills the existing database with URLs obtained from as a result of a query submitted by the user, under the specified categorywww.yahoo.com
Package #4 - Pathfinder Pathfinder - program for discovering all servers with the same sufix as the one submitted by the user Author: Igor Čakulev Example: for galeb.etf.bg.ac.yu it gives orao.etf.bg.ac.yu; zmaj.etf.bg.ac.yu; buef31.etf.bg.ac.yu; kiklop.etf.bg.ac.yu...
Package #5 - Tropical Tropical - program for performing genetic algorithm search with database mutation Author: Jelena Mirković Database Repeating the Hong Kong experiment Chen, H., Chung, Y., Ramsey, M., Yang, C., Ma, P., Yen, J., "Intelligent Spider for Internet Searching", Proceedings of the Thirtieth Annual Hawaii International Conference on System Sciences, Maui, Hawaii, USA, January 1997.
Packages in progress - Space Space - program for performing genetic algorithm search with database mutation and occasional spatial locality mutation Database
Packages in progress - Time Time - program for performing genetic algorithm search with database mutation and occasional temporal locality mutation Topic Database Time Database
Current System
The Vision
Newly open problems Too many linked documents imply high network traffic Disk space consumed increases exponentially with the number of linked documents, while only small percent of them is found to be useful Program is unable to learn Future directions Implementation in mobile domain Autonomous agents that transport themselves on the host computer and perform examination of documents there, transferring to the home computer only the best ones network traffic and disk usage decreases Intelligent agents that remain active in the background able to learn and adapt to user’s needs
References Goldberg, D., Genetic Algorithms in Search, Optimization and Machine Learning, Addison- Wesley, Reading, Massachusetts, USA Milojičić S., Musliner D., Shroeder-Preikschat W "Agents: Mobility and communication", Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences, Maui, Hawaii, USA, January Joerg P., Mueller "The Design of Intelligent Agents: A layered approach", Springer-Verlag, Germany, Chen, H., Chung, Y., Ramsey, M., Yang, C., Ma, P., Yen, J., "Intelligent Spider for Internet Searching", Proceedings of the Thirtieth Annual Hawaii International Conference on System Sciences, Maui, Hawaii, USA, January Kraus, L., Milutinovic, V., "Technical Report on a New Genetic Algorithm for Internet Search Based on Priciples of Spatial and Temporal Locality", Proceedings of the SinfoN '97, Zlatibor, Serbia, Yugoslavia, November Tomca, N., A Flexible Tool for Jaccard Score Evaluation, B.Sc. Thesis, University of Belgrade, Belgrade, Serbia, Yugoslavia, November Award paper at SinfoN-97, Zlatibor, Serbia, Yugoslavia, October Slijepcevic, S., A Programmable Agent for Internet Retrieval, B.Sc. Thesis, University of Belgrade, Belgrade, Serbia, Yugoslavia, October Award paper at SinfoN-97, Zlatibor, Serbia, Yugoslavia, October 1997.