Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Java Packages CSci 1130 Intro to Computer Programming with Java Instructor Tatyana Volk.
A distributed method for mining association rules
1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
Networking Problems in Cloud Computing Projects. 2 Kickass: Implementation PROJECT 1.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Parallelized Evolution System Onur Soysal, Erkin Bahçeci Erol Şahin Dept. of Computer Engineering Middle East Technical University.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA) Jia-Long Wu Alice M. Agogino Berkeley Expert System Laboratory U.C. Berkeley.
Artificial Intelligence Genetic Algorithms and Applications of Genetic Algorithms in Compilers Prasad A. Kulkarni.
Implementation of One Stop Search by XSLT By Dave Low University of Hong Kong 9-Dec-2003.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
Connecting Diverse Web Search Facilities Udi Manber, Peter Bigot Department of Computer Science University of Arizona Aida Gikouria - M471 University of.
Chapter 6: Transform and Conquer Genetic Algorithms The Design and Analysis of Algorithms.
McGraw-Hill/Irwin ©2005 The McGraw-Hill Companies, All rights reserved ©2005 The McGraw-Hill Companies, All rights reserved McGraw-Hill/Irwin.
SEARCH ENGINES By, CH.KRISHNA MANOJ(Y5CS021), 3/4 B.TECH, VRSEC. 8/7/20151.
1 / 76 Genetic Algorithms Authors: Aleksandra Popovic, Aleksandra Jankovic, Prof. Dr. Dusan Tosic,
Genetic Algorithms Jelena Mirković, Aleksandra Popović, Dražen Drašković, Veljko Milutinović School of Electrical Engineering, University of Belgrade Marko.
Computer Programming My Home Page My Paper Job Description Computer programmers write, test, and maintain the detailed instructions, called programs,
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Genetic Algorithm.
Server-side Scripting Powering the webs favourite services.
Lecturer: Ghadah Aldehim
Avalanche Internet Data Management System. Presentation plan 1. The problem to be solved 2. Description of the software needed 3. The solution 4. Avalanche.
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
Postacademic Interuniversity Course in Information Technology – Module C1p1 Contents Data Communications Applications –File & print serving –Mail –Domain.
Mobile Topic Maps for e-Learning John McDonald & Darina Dicheva Intelligent Information Systems Group Computer Science Department Winston-Salem State University,
Genetic Algorithms Authors: Aleksandra Popovic, Drazen Draskovic, Veljko Milutinovic,
Master Thesis Defense Jan Fiedler 04/17/98
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Optimization Problems - Optimization: In the real world, there are many problems (e.g. Traveling Salesman Problem, Playing Chess ) that have numerous possible.
The Generational Control Model This is the control model that is traditionally used by GP systems. There are a distinct number of generations performed.
Fuzzy Genetic Algorithm
FINAL EXAM SCHEDULER (FES) Department of Computer Engineering Faculty of Engineering & Architecture Yeditepe University By Ersan ERSOY (Engineering Project)
Towards an Experience Management System at Fraunhofer Center for Experimental Software Engineering Maryland (FC-MD)
Derivative Free Optimization G.Anuradha. Contents Genetic Algorithm Simulated Annealing Random search method Downhill simplex method.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.
SPEECH DESCRIPTORS GENERATION SOFTWARE UTILIZED FOR CLASSIFICATION AND RECOGNITION PURPOSES Lukasz Laszko Department of Biomedical.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
1 UNIT 13 The World Wide Web Lecturer: Kholood Baselm.
Evolving the goal priorities of autonomous agents Adam Campbell* Advisor: Dr. Annie S. Wu* Collaborator: Dr. Randall Shumaker** School of Electrical Engineering.
Web- and Multimedia-based Information Systems Lecture 2.
Chapter 12 Develop the Knowledge Management System.
Search Tools and Search Engines Searching for Information and common found internet file types.
Chapter 4 Decision Support System & Artificial Intelligence.
Genetic Algorithms What is a GA Terms and definitions Basic algorithm.
Information Retrieval
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
CSCI-235 Micro-Computers in Science The Internet and World Wide Web.
Application of the GA-PSO with the Fuzzy controller to the robot soccer Department of Electrical Engineering, Southern Taiwan University, Tainan, R.O.C.
Principles in the Evolutionary Design of Digital Circuits J. F. Miller, D. Job, and V. K. Vassilev Genetic Programming and Evolvable Machines.
Agenda  INTRODUCTION  GENETIC ALGORITHMS  GENETIC ALGORITHMS FOR EXPLORING QUERY SPACE  SYSTEM ARCHITECTURE  THE EFFECT OF DIFFERENT MUTATION RATES.
WebScan: Implementing QueryServer 2.0 Karl Geiger, Amgen Inc. BRS NA UG August 1999.
Chapter 13: Query Processing
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
General Architecture of Retrieval Systems 1Adrienn Skrop.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
EVOLUTIONARY SYSTEMS AND GENETIC ALGORITHMS NAME: AKSHITKUMAR PATEL STUDENT ID: GRAD POSITION PAPER.
Genetic Algorithms An Evolutionary Approach to Problem Solving.
 Presented By: Abdul Aziz Ghazi  Roll No:  Presented to: Sir Harris.
WWW and HTTP King Fahd University of Petroleum & Minerals
Chapter Five Web Search Engines
Genetic Algorithms Authors: Aleksandra Popovic,
Chapter 27 WWW and HTTP.
Web Mining Department of Computer Science and Engg.
Information Retrieval and Web Design
Presentation transcript:

Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor Čakulev Intelligent Internet Search Department of Computer Engineering School of Electrical Engineering University of Belgrade POB 35-54, Belgrade Serbia, Yugoslavia 

Problem statement Number of Internet presentations and Web servers grows exponentially Variety of presentations grows, too Search and retrieval of documents gets harder Existing tools do not give satisfactory results

Existing solutions Keyword search and document indexing - e.g. Altavista Following links - e.g. Spiders + search is exhaustive - too many keywords result in too few documents found, and vice versa - it requires a large database of indexed documents + fast, no indexing and no database - it searches only a limited number of documents + possibility of changing the input parameters during the search - poor evaluation function

Our solution Design of intelligent agents for Internet search Two basic approaches: 1. Simulated annealing - inherently serial 2. Genetic algorithms - inherently parallel Character of the search: 1. Local search - following only the links of the input documents - Best First Search Algorithm 2. Global search - following the links of the input documents and occasionally mutating them - Genetic Algorithm Spider implementation: 2. Mobile 1. Static

Our research Essence: Creating a set of packages for experimenting in the domain of intelligent Internet search All written in Sun Java - JDK 1.1 Lego approach - stand alone applications but easily interfaced with one another Code and executable version available at Further research in mobile domain

Measure the fitness value for each document in CC Set Select the best one for the Output Set Best First Search Algorithm Select the initial WWW presentation or a set thereof Extract all URLs and fetch the corresponding WWW presentations; They are inserted into the CurrentConfiguration Set CC Set Output Set and add documents linked to it into the CC Set. Input Set

Basic Genetic Algorithm 1. Initialize the population randomly pick a set of possible solutions 2. Select individuals for the mating pool measure the fitness value and pick the best ones 3. Perform crossover create new individuals using genetic material from parents in the mating pool 4. Perform mutation randomly create new individuals, completely unrelated to those in the mating pool 5. Insert offspring in the population 6. Is the stopping criteria satisfied? desired number of solutions is found or specified time for search has elapsed No? GOTO Step 2 Yes? The end!

Genetic Algorithm applied to Internet Search Select the initial WWW presentation or a set thereof Extract all URLs and fetch the corresponding WWW presentations; They are inserted into the CurrentConfiguration Set Measure the fitness value for each document in CC Set CC Set Output Set and add documents linked to it into the CC Set. Mutate - e.g. by inserting documents from the database of URLs Select the best one for the Output Set Database Input Set

Mutation operator Generational - generate a new URL DB based - pick existing URL from a database Semantic - use some logical reasoning to direct the search

Package #1 - Spider Spider - off-line browser Author: Saša Slijepčević Fetches all linked documents up to the specified depth and stores them on the local disk in the structure suitable for off-line browsing

Agent - program for the Best First Search Algorithm Author: Nela Tomča Package #2 - Agent Starts from the input set of URLs and finds the most similar to them following the links in input documents

Generator - program for generation of database of topic-sorted URLs Authors: Mladen Mrkić Vladan Obradović yahoo Database Package #3 - Generator It fills the existing database with URLs obtained from as a result of a query submitted by the user, under the specified categorywww.yahoo.com

Package #4 - Pathfinder Pathfinder - program for discovering all servers with the same sufix as the one submitted by the user Author: Igor Čakulev Example: for galeb.etf.bg.ac.yu it gives orao.etf.bg.ac.yu; zmaj.etf.bg.ac.yu; buef31.etf.bg.ac.yu; kiklop.etf.bg.ac.yu...

Package #5 - Tropical Tropical - program for performing genetic algorithm search with database mutation Author: Jelena Mirković Database Repeating the Hong Kong experiment Chen, H., Chung, Y., Ramsey, M., Yang, C., Ma, P., Yen, J., "Intelligent Spider for Internet Searching", Proceedings of the Thirtieth Annual Hawaii International Conference on System Sciences, Maui, Hawaii, USA, January 1997.

Packages in progress - Space Space - program for performing genetic algorithm search with database mutation and occasional spatial locality mutation Database

Packages in progress - Time Time - program for performing genetic algorithm search with database mutation and occasional temporal locality mutation Topic Database Time Database

Current System

The Vision

Newly open problems Too many linked documents imply high network traffic Disk space consumed increases exponentially with the number of linked documents, while only small percent of them is found to be useful Program is unable to learn Future directions Implementation in mobile domain Autonomous agents that transport themselves on the host computer and perform examination of documents there, transferring to the home computer only the best ones network traffic and disk usage decreases Intelligent agents that remain active in the background able to learn and adapt to user’s needs

References Goldberg, D., Genetic Algorithms in Search, Optimization and Machine Learning, Addison- Wesley, Reading, Massachusetts, USA Milojičić S., Musliner D., Shroeder-Preikschat W "Agents: Mobility and communication", Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences, Maui, Hawaii, USA, January Joerg P., Mueller "The Design of Intelligent Agents: A layered approach", Springer-Verlag, Germany, Chen, H., Chung, Y., Ramsey, M., Yang, C., Ma, P., Yen, J., "Intelligent Spider for Internet Searching", Proceedings of the Thirtieth Annual Hawaii International Conference on System Sciences, Maui, Hawaii, USA, January Kraus, L., Milutinovic, V., "Technical Report on a New Genetic Algorithm for Internet Search Based on Priciples of Spatial and Temporal Locality", Proceedings of the SinfoN '97, Zlatibor, Serbia, Yugoslavia, November Tomca, N., A Flexible Tool for Jaccard Score Evaluation, B.Sc. Thesis, University of Belgrade, Belgrade, Serbia, Yugoslavia, November Award paper at SinfoN-97, Zlatibor, Serbia, Yugoslavia, October Slijepcevic, S., A Programmable Agent for Internet Retrieval, B.Sc. Thesis, University of Belgrade, Belgrade, Serbia, Yugoslavia, October Award paper at SinfoN-97, Zlatibor, Serbia, Yugoslavia, October 1997.