1 / 76 Genetic Algorithms Authors: Aleksandra Popovic, Aleksandra Jankovic, Prof. Dr. Dusan Tosic,

Slides:



Advertisements
Similar presentations
1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
Advertisements

Using Parallel Genetic Algorithm in a Predictive Job Scheduling
CSM6120 Introduction to Intelligent Systems Evolutionary and Genetic Algorithms.
21-May-15 Genetic Algorithms. 2 Evolution Here’s a very oversimplified description of how evolution works in biology Organisms (animals or plants) produce.
Genetic Algorithms Representation of Candidate Solutions GAs on primarily two types of representations: –Binary-Coded –Real-Coded Binary-Coded GAs must.
Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.
1 Wendy Williams Metaheuristic Algorithms Genetic Algorithms: A Tutorial “Genetic Algorithms are good at taking large, potentially huge search spaces and.
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
Imagine that I am in a good mood Imagine that I am going to give you some money ! In particular I am going to give you z dollars, after you tell me the.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
Genetic Algorithms Nehaya Tayseer 1.Introduction What is a Genetic algorithm? A search technique used in computer science to find approximate solutions.
Genetic Algorithms Jelena Mirković, Aleksandra Popović, Dražen Drašković, Veljko Milutinović School of Electrical Engineering, University of Belgrade Marko.
Genetic Algorithms Overview Genetic Algorithms: a gentle introduction –What are GAs –How do they work/ Why? –Critical issues Use in Data Mining –GAs.
Genetic Algorithms: A Tutorial
Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor.
Genetic Algorithm.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
Genetic Algorithms Authors: Aleksandra Popovic, Drazen Draskovic, Veljko Milutinovic,
Schemata Theory Chapter 11. A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Theory Why Bother with Theory? Might provide performance.
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory Mixed Integer Problems Most optimization algorithms deal.
CS 484 – Artificial Intelligence1 Announcements Lab 3 due Tuesday, November 6 Homework 6 due Tuesday, November 6 Lab 4 due Thursday, November 8 Current.
Zorica Stanimirović Faculty of Mathematics, University of Belgrade
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay
Computational Complexity Jang, HaYoung BioIntelligence Lab.
Chapter 4.1 Beyond “Classic” Search. What were the pieces necessary for “classic” search.
1 “Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for optimal combinations of things, solutions.
GENETIC ALGORITHM A biologically inspired model of intelligence and the principles of biological evolution are applied to find solutions to difficult problems.
Derivative Free Optimization G.Anuradha. Contents Genetic Algorithm Simulated Annealing Random search method Downhill simplex method.
GENETIC ALGORITHMS.  Genetic algorithms are a form of local search that use methods based on evolution to make small changes to a popula- tion of chromosomes.
1 A New Method for Composite System Annualized Reliability Indices Based on Genetic Algorithms Nader Samaan, Student,IEEE Dr. C. Singh, Fellow, IEEE Department.
2005MEE Software Engineering Lecture 11 – Optimisation Techniques.
Algorithms and their Applications CS2004 ( ) 13.1 Further Evolutionary Computation.
Exact and heuristics algorithms
Why do GAs work? Symbol alphabet : {0, 1, * } * is a wild card symbol that matches both 0 and 1 A schema is a string with fixed and variable symbols 01*1*
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
Introduction to Genetic Algorithms. Genetic Algorithms We’ve covered enough material that we can write programs that use genetic algorithms! –More advanced.
Edge Assembly Crossover
Genetic Algorithms. 2 Overview Introduction To Genetic Algorithms (GAs) GA Operators and Parameters Genetic Algorithms To Solve The Traveling Salesman.
1. Genetic Algorithms: An Overview  Objectives - Studying basic principle of GA - Understanding applications in prisoner’s dilemma & sorting network.
Parallel Genetic Algorithms By Larry Hale and Trevor McCasland.
Waqas Haider Bangyal 1. Evolutionary computing algorithms are very common and used by many researchers in their research to solve the optimization problems.
D Nagesh Kumar, IIScOptimization Methods: M8L5 1 Advanced Topics in Optimization Evolutionary Algorithms for Optimization and Search.
Genetic Search Algorithms Matt Herbster. Why Another Search?  Designed in the 1950s, heavily implemented under John Holland (1970s)  Genetic search.
An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay
Why do GAs work? Symbol alphabet : {0, 1, * } * is a wild card symbol that matches both 0 and 1 A schema is a string with fixed and variable symbols 01*1*
Genetic Algorithms. Underlying Concept  Charles Darwin outlined the principle of natural selection.  Natural Selection is the process by which evolution.
Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
1 Comparative Study of two Genetic Algorithms Based Task Allocation Models in Distributed Computing System Oğuzhan TAŞ 2005.
Genetic Algorithms. Solution Search in Problem Space.
EVOLUTIONARY SYSTEMS AND GENETIC ALGORITHMS NAME: AKSHITKUMAR PATEL STUDENT ID: GRAD POSITION PAPER.
Genetic Algorithms An Evolutionary Approach to Problem Solving.
Genetic Algorithms And other approaches for similar applications Optimization Techniques.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
 Presented By: Abdul Aziz Ghazi  Roll No:  Presented to: Sir Harris.
Presented By: Farid, Alidoust Vahid, Akbari 18 th May IAUT University – Faculty.
Introduction to Genetic Algorithms
Genetic Algorithms.
Artificial Intelligence (CS 370D)
Genetic Algorithms Authors: Aleksandra Popovic,
Genetic Algorithms, Search Algorithms
Genetic Algorithms: A Tutorial
GENETIC ALGORITHMS & MACHINE LEARNING
Dr. Unnikrishnan P.C. Professor, EEE
Traveling Salesman Problem by Genetic Algorithm
Genetic Algorithm Soft Computing: use of inexact t solution to compute hard task problems. Soft computing tolerant of imprecision, uncertainty, partial.
Genetic Algorithms: A Tutorial
Presentation transcript:

1 / 76 Genetic Algorithms Authors: Aleksandra Popovic, Aleksandra Jankovic, Prof. Dr. Dusan Tosic, Prof. Dr. Veljko Milutinovic,

2 / 76 Summary Slide What You Will Learn From This Tutorial? What You Will Learn From This Tutorial?

3 / 76 What You Will Learn From This Tutorial? What is a genetic algorithm? What is a genetic algorithm? Principles of genetic algorithms. Principles of genetic algorithms. How to design an algorithm? How to design an algorithm? Comparison of gas and conventional algorithms. Comparison of gas and conventional algorithms. Mathematics behind GA-s Mathematics behind GA-s Applications of GA Applications of GA –GA and the Internet –Genetic search based on multiple mutation approaches Part II Part I Part III

Part I: GA Theory What are genetic algorithms? How to design a genetic algorithm?

5 / 76 Genetic Algorithm Is Not......Gene coding...Gene coding

6 / 76 Genetic Algorithm Is... … Computer algorithm That resides on principles of genetics and evolution

7 / 76 Instead of Introduction... Hill climbing Hill climbing local global

8 / 76 Instead of Introduction…(2) Multi-climbers Multi-climbers

9 / 76 Instead of Introduction…(3) Genetic algorithm Genetic algorithm I am not at the top. My high is better! I am at the top Height is... I will continue

10 / 76 Instead of Introduction…(3) Genetic algorithm - few microseconds after Genetic algorithm - few microseconds after

11 / 76 GA Concept Genetic algorithm (GA) introduces the principle of evolution and genetics into search among possible solutions to given problem. Genetic algorithm (GA) introduces the principle of evolution and genetics into search among possible solutions to given problem. The idea is to simulate the process in natural systems. The idea is to simulate the process in natural systems. This is done by the creation within a machine of a population of individuals represented by chromosomes, in essence a set of character strings, that are analogous to the DNA, that we have in our own chromosomes. This is done by the creation within a machine of a population of individuals represented by chromosomes, in essence a set of character strings, that are analogous to the DNA, that we have in our own chromosomes.

12 / 76 Survival of the Fittest The main principle of evolution used in GA is “survival of the fittest”. The main principle of evolution used in GA is “survival of the fittest”. The good solution survive, while bad ones die. The good solution survive, while bad ones die.

13 / 76 Nature and GA... Nature Genetic algorithm ChromosomeString Gene Character Locus String position Genotype Population Phenotype Decoded structure

14 / 76 The History of GA Cellular automata Cellular automata –John Holland, university of Michigan, Until the early 80s, the concept was studied theoretically. Until the early 80s, the concept was studied theoretically. In 80s, the first “real world” GAs were designed. In 80s, the first “real world” GAs were designed.

15 / 76 Algorithmic Phases Initialize the population Initialize the population Select individuals for the mating pool Perform crossover Insert offspring into the population The End Perform mutation yes no no Stop?

16 / 76 Designing GA... Designing GA... How to represent genomes? How to represent genomes? How to define the crossover operator? How to define the crossover operator? How to define the mutation operator? How to define the mutation operator? How to define fitness function? How to define fitness function? How to generate next generation? How to generate next generation? How to define stopping criteria? How to define stopping criteria?

17 / 76 Representing Genomes... RepresentationExample string array of strings http avala yubc net ~apopovic tree - genetic programming > bxor or c ba

18 / 76 Crossover Crossover is concept from genetics. Crossover is concept from genetics. Crossover is sexual reproduction. Crossover is sexual reproduction. Crossover combines genetic material from two parents, in order to produce superior offspring. Crossover combines genetic material from two parents, in order to produce superior offspring. Few types of crossover: Few types of crossover: –One-point –Multiple point.

19 / 76 One-point Crossover Parent #1 Parent #

20 / 76 One-point Crossover Parent #1 Parent #

21 / 76 Mutation Mutation introduces randomness into the population. Mutation introduces randomness into the population. Mutation is asexual reproduction. Mutation is asexual reproduction. The idea of mutation is to reintroduce divergence into a converging population. The idea of mutation is to reintroduce divergence into a converging population. Mutation is performed on small part of population, in order to avoid entering unstable state. Mutation is performed on small part of population, in order to avoid entering unstable state.

22 / 76 Mutation Parent Child

23 / 76 About Probabilities... Average probability for individual to crossover is, in most cases, about 80%. Average probability for individual to crossover is, in most cases, about 80%. Average probability for individual to mutate is about 1-2%. Average probability for individual to mutate is about 1-2%. Probability of genetic operators follow the probability in natural systems. Probability of genetic operators follow the probability in natural systems. The better solutions reproduce more often. The better solutions reproduce more often.

24 / 76 Fitness Function Fitness function is evaluation function, that determines what solutions are better than others. Fitness function is evaluation function, that determines what solutions are better than others. Fitness is computed for each individual. Fitness is computed for each individual. Fitness function is application depended. Fitness function is application depended.

25 / 76 Selection The selection operation copies a single individual, probabilistically selected based on fitness, into the next generation of the population. The selection operation copies a single individual, probabilistically selected based on fitness, into the next generation of the population. There are few possible ways to implement selection: There are few possible ways to implement selection: –“Only the strongest survive” Choose the individuals with the highest fitness for next generationChoose the individuals with the highest fitness for next generation –“Some weak solutions survive” Assign a probability that a particular individual will be selected for the next generationAssign a probability that a particular individual will be selected for the next generation More diversityMore diversity Some bad solutions might have good parts!Some bad solutions might have good parts!

26 / 76 Selection - Survival of The Strongest Previous generation Next generation

27 / 76 Selection - Some Weak Solutions Survive Previous generation Next generation

28 / 76 Mutation and Selection... Phenotype D D D Selection Mutation Solution distribution

29 / 76 Stopping Criteria Final problem is to decide when to stop execution of algorithm. Final problem is to decide when to stop execution of algorithm. There are two possible solutions to this problem: There are two possible solutions to this problem: –First approach: Stop after production of definite number of generationsStop after production of definite number of generations –Second approach: Stop when the improvement in average fitness over two generations is below a thresholdStop when the improvement in average fitness over two generations is below a threshold

30 / 76 GA Vs. Ad-hoc Algorithms Genetic Algorithm Ad-hoc Algorithms Speed Human work Applicability Performance Slow * Generally fast Minimal Long and exhaustive General There are problems that cannot be solved analytically Excellent Depends * Not necessary!

31 / 76 Problems With Gas Sometimes GA is extremely slow, and much slower than usual algorithms Sometimes GA is extremely slow, and much slower than usual algorithms

32 / 76 Advantages of Gas Concept is easy to understand. Concept is easy to understand. Minimum human involvement. Minimum human involvement. Computer is not learned how to use existing solution, but to find new solution! Computer is not learned how to use existing solution, but to find new solution! Modular, separate from application Modular, separate from application Supports multi-objective optimization Supports multi-objective optimization Always an answer; answer gets better with time !!! Always an answer; answer gets better with time !!! Inherently parallel; easily distributed Inherently parallel; easily distributed Many ways to speed up and improve a GA-based application as knowledge about problem domain is gained Many ways to speed up and improve a GA-based application as knowledge about problem domain is gained Easy to exploit previous or alternate solutions Easy to exploit previous or alternate solutions

33 / 76 GA: An Example - Diophantine Equations Diophantine equation (n=4): Diophantine equation (n=4): A*x + b*y + c*z + d*q = s For given a, b, c, d, and s - find x, y, z, q For given a, b, c, d, and s - find x, y, z, q Genome: Genome: (X, y, z, p) = x yzq

34 / 76 GA:An Example - Diophantine Equations(2) Crossover Crossover Mutation Mutation ( 1, 2, 3, 4 ) ( 5, 6, 7, 8 ) ( 1, 6, 3, 4 ) ( 5, 2, 7, 8 ) ( 1, 2, 3, 4 ) ( 1, 2, 3, 9 )

35 / 76 GA:An Example - Diophantine Equations(3) First generation is randomly generated of numbers lower than sum (s). First generation is randomly generated of numbers lower than sum (s). Fitness is defined as absolute value of difference between total and given sum: Fitness is defined as absolute value of difference between total and given sum: Fitness = abs ( total - sum ), Algorithm enters a loop in which operators are performed on genomes: crossover, mutation, selection. Algorithm enters a loop in which operators are performed on genomes: crossover, mutation, selection. After number of generation a solution is reached. After number of generation a solution is reached.

Part II: Mathematics Behind GA-s Two methods for analyzing genetics algorithms: Schema analyses Mathematical modeling

37 / 76 Schema Analyses Weaknesses: In determining some characteristics of the population In determining some characteristics of the population Schema analyses makes some approximations that weaken it Schema analyses makes some approximations that weaken itAdvantages: A simple way to view the standard GA A simple way to view the standard GA They have made possible proofs of some interesting theorems They have made possible proofs of some interesting theorems They provide a nice introduction to algorithmic analyses They provide a nice introduction to algorithmic analyses

38 / 76 Schema Analyses… Schema – a template made up of a string of 1s, 0s, and *s, where * is used as a wild card that can be either 1 or 0 For example, H = 1 * * 0 * 0 is a schema. It has eight instances (one of which is ) It has eight instances (one of which is ) Order, o ( H ), the number of non-*, or defined, bits (in example 3) Order, o ( H ), the number of non-*, or defined, bits (in example 3) Defining length, d ( H ), greatest distance between two defined bits Defining length, d ( H ), greatest distance between two defined bits (in example H has a defining length of 3) Let S be the set of all strings of length l. There is possible schemas on S, but different subsets of S There is possible schemas on S, but different subsets of S Schema cannot be used to represent every possible population within S, Schema cannot be used to represent every possible population within S, but forms a representative subset of the set of all subsets of S but forms a representative subset of the set of all subsets of S

39 / 76 Schema analyses… The end-of-iterations conditions: expected number of instances of schema H as we iterate the GA expected number of instances of schema H as we iterate the GA M(H, t) – the number of instances of H at time t M(H, t) – the number of instances of H at time t f(x) – fitness of chromosome x f(x) – fitness of chromosome x - average fitness at time t : - average fitness at time t :, n=|S| - average fitness of instances of H at time t: - average fitness of instances of H at time t:

40 / 76 Schema Analyses… - If we completely ignore the effects of crossover and mutation, we get the expected value: we get the expected value: - Now we consider only the effects of crossover and mutation, which lower the number of instances of H in the population. Then we will get a good lower bound on E(m(H,t+1)) - probability that a random crossover bit is between the defining bits of H - probability that a random crossover bit is between the defining bits of H - probability of crossover occurring - probability of crossover occurring

41 / 76 Schema Analyses… - probability of an instance of H remaining the same after mutation; it is dependent on the order of H - probability of an instance of H remaining the same after mutation; it is dependent on the order of H - probability of mutation - probability of mutation - With the above notation, we have: Schema Theorem, provided by John Holland Schema Theorem, provided by John Holland

42 / 76 Schema analyses… The Schema Theorem only shows how schemas dynamically change, and how short, low-order schemas whose fitness remain above the average mean receive exponentially growing increases in the number of samples. It cannot make more direct predictions about the population composition, distribution of fitness and other statistics more directly related to the GA itself.

43 / 76 Mathematical Model One change is that one of the new individuals One change is that one of the new individuals will be immediately deleted (thus the loop is iterated n times, not n/2) S – set of all strings of length l S – set of all strings of length l N – size of S, or N – size of S, or column vector with rows such that the i-th component column vector with rows such that the i-th component is equal to the proportion of the population P as time t that has chromosome i is equal to the proportion of the population P as time t that has chromosome i column vector with rows such that i- th component column vector with rows such that i- th component is equal to the probability that chromosome i will be selected as a parent is equal to the probability that chromosome i will be selected as a parent

44 / 76 Mathematical Model… Example: l=2, 3 individuals in the population, two with chromosome 10 and one with chromosome 11, then If the fitness is equal to the number of 1s in the string, then

45 / 76 Mathematical Model… diagonal matrix with diagonal matrix with Relation between and : Relation between and : Goal: given a column vector, to construct a column-vector-valued function such that Goal: given a column vector, to construct a column-vector-valued function such that M represents recombination – composition of crossover and mutation M represents recombination – composition of crossover and mutation component wise sum of i and j mod 2 component wise sum of i and j mod 2 component wise product of i and j component wise product of i and j matrix whose i, j th entry is the probability that 0 result from the recombination of i and j matrix whose i, j th entry is the probability that 0 result from the recombination of i and j

46 / 76 Mathematical Model… permutation operator on permutation operator on Finally, is given by the following expression Finally, is given by the following expression With this expression, we can calculate explicitly the expected value of each generation from the proceeding generation. With this expression, we can calculate explicitly the expected value of each generation from the proceeding generation. I hope you now fully understand the mathematics behind GA. I hope you now fully understand the mathematics behind GA.

Part III: Applications of GAs GA and the Internet Genetic search based on multiple mutation approaches

48 / 76 Some Applications of Gas GA Internet search Data mining Software guided circuit design Control systems design Stock prize prediction Path finding Mobile robots search Optimization Trend spotting

Genetic Algorithm and the Internet The system designed by EBI Group, Faculty for Electrical Engineering, University of Belgrade

50 / 76 Algorithms Phases Process set of URLs given by user Select all links from input set Evaluate fitness function for all genomes Perform crossover, mutation, and reproduction Satisfactorysolutionobtained? The End

51 / 76 Introduction GA can be used for intelligent internet search. GA can be used for intelligent internet search. GA is used in cases when search space is relatively large. GA is used in cases when search space is relatively large. GA is adoptive search. GA is adoptive search. GA is heuristic search method. GA is heuristic search method.

52 / 76 System for GA Internet Search Designed at faculty for electrical engineering, university of belgrade Designed at faculty for electrical engineering, university of belgrade CONTROLPROGRAM AgentSpider Input set Topic Space Time Output set Current set Top data Net data Generator

53 / 76 Spider Spider is software packages, that picks up internet documents from user supplied input with depth specified by user. Spider is software packages, that picks up internet documents from user supplied input with depth specified by user. Spider takes one URL, fetches all links, and documents thy contain with predefined depth. Spider takes one URL, fetches all links, and documents thy contain with predefined depth. The fetched documents are stored on local hard disk with same structure as on the original location. The fetched documents are stored on local hard disk with same structure as on the original location. Spider’s task is to produce the first generation. Spider’s task is to produce the first generation. Spider is used during crossover and mutation. Spider is used during crossover and mutation.

54 / 76 Agent Agent takes as an input a set of urls, and calls spider, for every one of them, with depth 1. Agent takes as an input a set of urls, and calls spider, for every one of them, with depth 1. Then, agent performs extraction of keywords from each document, and stores it in local hard disk. Then, agent performs extraction of keywords from each document, and stores it in local hard disk.

55 / 76 Generator Generator generates a set of urls from given keywords, using some conventional search engine. Generator generates a set of urls from given keywords, using some conventional search engine. It takes as input the desired topic, calls yahoo search engine, and submits a query looking for all documents covering the specific topic. It takes as input the desired topic, calls yahoo search engine, and submits a query looking for all documents covering the specific topic. Generator stores URL and topic of given web page in database called topdata. Generator stores URL and topic of given web page in database called topdata.

56 / 76 Topic It uses topdata DB in order to insert random urls from database into current set. It uses topdata DB in order to insert random urls from database into current set. Topic performs mutation. Topic performs mutation.

57 / 76 Space Space takes as input the current set from the agent application and injects into it those urls from the database netdata that appeared with the greatest frequency in the output set of previous searches. Space takes as input the current set from the agent application and injects into it those urls from the database netdata that appeared with the greatest frequency in the output set of previous searches.

58 / 76 Time Time takes set of urls from agent and inserts ones with greatest frequency into DB netdata. Time takes set of urls from agent and inserts ones with greatest frequency into DB netdata. The netdata DB contains of three fields: URL, topic, and count number. The netdata DB contains of three fields: URL, topic, and count number. The DB is updated in each algorithm iteration. The DB is updated in each algorithm iteration.

59 / 76 How Does The System Work? CONTROLPROGRAM AgentSpider Input set Topic Space Time Output set Current set Top data Net data Generator command flow data flow

60 / 76 GA and the Internet: Conclusion GA for internet search, on contrary to other gas, is much faster and more efficient that conventional solutions, such as standard internet search engines. GA for internet search, on contrary to other gas, is much faster and more efficient that conventional solutions, such as standard internet search engines. INTERNET

61 / 76 Genetic Search Based on Multiple Mutation Approaches Concept and its improvements adapted to specific applications in e-business, and concrete software package Main problems in finding information on the Internet: How to find quickly and retrieve efficiently the potentially useful information considering the fact of the fast growth of the quantity and variety of Internet sites How to find quickly and retrieve efficiently the potentially useful information considering the fact of the fast growth of the quantity and variety of Internet sites Huge number of documents, many of which are completely unrelated to what the user originally attempted to find, searched with indexing engines Huge number of documents, many of which are completely unrelated to what the user originally attempted to find, searched with indexing engines Documents placed on the top of the result list are often less acceptable then the lower ones Documents placed on the top of the result list are often less acceptable then the lower ones Indexing process may take days, weeks, or even longer, because the volume of new information being created daily Indexing process may take days, weeks, or even longer, because the volume of new information being created daily

62 / 76 Links Based Approach The question is: How to locate and retrieve the needed information before it gets indexed? The efficient way to locate the new not-yet-indexed information: Using links-based approaches genetic search Using links-based approaches genetic search simulated annealing simulated annealing Best result: indexing - based approaches indexing - based approaches+ links - based approaches links - based approaches

63 / 76 Genetic Search Algorithm GENETIC ALGORITHM OF ZERO ORDER, with no mutation Start: Model Web presentation that contains all the needed types of information (fitness function is evaluated). It is assumes that it includes URL pointers to other similar Web presentations, and these are downloaded. The Web presentations that “survived” the fitness function are assumed to include additional URL pointers, and their related Web presentations are downloaded next. After the end-of-search condition is met, the Web presentations are ranked according to their fitness value.

64 / 76 Genetic Search Algorithm… Type of mutation: Topic-oriented database mutation Topic-oriented database mutation Semantic mutations Semantic mutations - based on the principles of spatial locality - based on the principles of temporal locality Logical reasoning and semantics consideration is involve in picking out URLs for mutation.

65 / 76 Innovations Required by Domain Area APPLICATION LEVEL APPLICATION LEVEL LEVEL OF THE GENERAL PROJECT APPROACH LEVEL OF THE GENERAL PROJECT APPROACH AND PRODUCT ARCHITECTURE AND PRODUCT ARCHITECTURE ALGORITHMIC LEVEL ALGORITHMIC LEVEL IMPLEMENTATION LEVEL IMPLEMENTATION LEVEL

66 / 76 Application Level Statistical analysis and data mining has to be performed, Statistical analysis and data mining has to be performed, in order to figure out the common and typical patterns of behavior and need in order to figure out the common and typical patterns of behavior and need The state-of-the-art of mutual referencing has to be determined The state-of-the-art of mutual referencing has to be determined The trends and asymptotic situations foreseen for the time of project finalization has to be determined The trends and asymptotic situations foreseen for the time of project finalization has to be determined

67 / 76 Level of the General Project Approach and Product Architecture Decisions have to be made about the most important goals to be achieved: Maximizing the speed of search Maximizing the speed of search Maximizing the sophistication of search Maximizing the sophistication of search Maximizing specific effects of interest for a given institution or a customer Maximizing specific effects of interest for a given institution or a customer Maximizing a combination of the above Maximizing a combination of the above Decision on this level affect the applicability of the final product / tool.

68 / 76 Algorithmic Level Develop an efficient mutation algorithm of interest for the application in the direction of database architecture and design in introducing the elements of semantic-based mutation Semantics-based mutations are especially of interest for chaotic markets, typical of new markets in developed countries or traditional markets in under-developed countries.

69 / 76 Semantics-based Mutation Mutation based on spatial localities After a “fruitful” Web presentation is reached (using a tradicional algorithm with mutation), the site of the same Internet service provider is searched for other presentations on the same or similar topic After a “fruitful” Web presentation is reached (using a tradicional algorithm with mutation), the site of the same Internet service provider is searched for other presentations on the same or similar topic Explanation : In chaotic markets, it is very unlikely that service/product offers from the same small geographic area each other on their Web presentations After a successful “side trip” based on spatial mutation, one continue with the traditional database mutation.

70 / 76 Semantics-based Mutation… Mutation based on temporal localities One comes back periodically to a Web presentation which was “fruitful” in the past One comes back periodically to a Web presentation which was “fruitful” in the past One comes back periodically to other Web presentations developed by the author who created some “fruitful” Web presentations in the past One comes back periodically to other Web presentations developed by the author who created some “fruitful” Web presentations in the past Temporal mutation can use direct revisits or a number of indirect forms or revisit.

71 / 76 Implementation Level Utilization of novel technologies, for maximal performance and minimal implementation complexity Utilization of novel technologies, for maximal performance and minimal implementation complexity Important for: - good flexibility - extendibility - reliability - availability Utilization of mobile platforms and mobile agents Utilization of mobile platforms and mobile agents

72 / 76 Implementation Level… Static agents Static agents - one has to download megabytes of information - treat that information with a decision-making code of size measured in kilobytes - derive the final business related decision, which is binary in size (one bit: yes or no) A huge amount of data is transferred through the network in vain, because only a small percent of fetched documents will turn out to be useful Mobile agents Mobile agents - they would browse through the network and perform the search locally, on the remote servers, transferring only the needed documents and data - they load the network only with kilobytes and a single bit

73 / 76 Simulation Result Links-based approach in the static domain Links-based approach in the static domain How various mutation strategies can affect the search efficiency How various mutation strategies can affect the search efficiency Set of software packages have developed, that would perform Internet search using genetic algorithms (by Veljko Milutinovic, Dragana Cvetkovic, and Jelena Mirkovic) Set of software packages have developed, that would perform Internet search using genetic algorithms (by Veljko Milutinovic, Dragana Cvetkovic, and Jelena Mirkovic) As the fitness function they have measured average Jaccard’s score for the output documents, while changing the type and rate of mutation As the fitness function they have measured average Jaccard’s score for the output documents, while changing the type and rate of mutation

74 / 76 Simulation Result… The simulation result for topic mutation The simulation result for temporal and spatial mutation combined with topic mutation

75 / 76 Simulation Result… The simulation result for topic, spatial and temporal mutation combined. Constant increase in the quality of pages found.

76 / 76 Conclusion: Evolution Tutorial download: galeb.etf.bg.ac.yu/~vm Option:Tutorials