Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Evolving Hyper-Heuristics using Genetic Programming Supervisor: Moshe Sipper Achiya Elyasaf.

Similar presentations


Presentation on theme: "1 Evolving Hyper-Heuristics using Genetic Programming Supervisor: Moshe Sipper Achiya Elyasaf."— Presentation transcript:

1 1 Evolving Hyper-Heuristics using Genetic Programming Supervisor: Moshe Sipper Achiya Elyasaf

2 22 Overview  Introduction Searching Games State-Graphs Uninformed Search Heuristics Informed Search  Evolving Heuristics  Previous Work Rush Hour FreeCell

3 33 Representing Games as State-Graphs  Every puzzle/game can be represented as a state graph: In puzzles, board games etc., every piece move can be counted as a different state In computer war games etc. – the place of the player / the enemy, all the parameters (health, shield…) define a state

4 44 Rush-Hour as a state-graph

5 55 Searching Games State-Graphs Uninformed Search  BFS – Exponential in the search depth  DFS – Linear in the length of the current search path. BUT: We might “never” track down the right path. Usually games contain cycles  Iterative Deepening: Combination of BFS & DFS Iterative Deepening Each iteration DFS with a depth limit is performed. Limit grows from one iteration to another Worst case - traverse the entire graph

6 66 Searching Games State-Graphs Uninformed Search  Most of the game domains are PSPACE- Complete!  Worst case - traverse the entire graph  We need an informed-search!

7 77 Searching Games State-Graphs Heuristics  h:states -> Real. For every state s, h(s) is an estimation of the minimal distance/cost from s to a solution h is perfect: an informed search that tries states with highest h-score first – will simply stroll to solution For hard problems, finding h is hard Bad heuristic means the search might never track down the solution  We need a good heuristic function to guide informed search

8 88 Searching Games State-Graphs Informed Search  Best-First search: Like DFS but select nodes with higher heuristic value first Best-First search Not necessarily optimal Might enter cycles (local extremum)  A*: A* Holds closed and sorted (by h-value) open lists. Best node of all open nodes is selected Maintenance and size of open and closed is not admissible

9 99 Searching Games State-Graphs Informed Search (Cont.)  IDA*: Iterative-Deepening with A* IDA* The expanded nodes are pushed to the DFS stack by descending heuristic values Let g(s i ) be the min depth of state s i : Only nodes with f(s)=g(s)+h(s)<depth-limit are visited  Near optimal solution (depends on path-limit)  The heuristic need to be admissible

10 10 Iterative Deepening

11 11 Best-First Search 12 3 4

12 12 A* 4 12 3

13 13 Overview  Introduction Searching Games State-Graphs Uninformed Search Heuristics Informed Search  Evolving Heuristics  Previous Work Rush Hour FreeCell

14 14  For H 1, …,H n – building blocks (not necessarily admissible or in the same range), How should we choose the fittest heuristic? Minimum? Maximum? Linear combination?  GA/GP may be used for: Building new heuristics from existing building blocks Finding weights for each heuristic (for applying linear combination) Finding conditions for applying each heuristic H should probably fit stage of search E.g., “goal” heuristics when assuming we’re close Evolving Heuristics

15 15 Evolving Heuristics: GA W 1 =0.3W 2 =0.01W 3 =0.2…W n =0. 1

16 16 Evolving Heuristics: GP If And ≤ ≤ H1 0.4 ≥ ≥ H2 0.7 + + H2 * * H1 0.1 * * H5 / / H1 0.1 Condition True False

17 17 Evolving Heuristics: Policies ConditionResult Condition 1Heuristics Weights 1 Condition 2Heuristics Weights 2 Condition nHeuristics Weights n Default Heuristics Weights

18 18 Evolving Heuristics: Fitness Function

19 19 Overview  Introduction Searching Games State-Graphs Uninformed Search Heuristics Informed Search  Evolving Heuristics  Previous Work Rush Hour FreeCell

20 20 Rush Hour GP-Rush [Hauptman et al, 2009] Bronze Humie award

21 21 Domain-Specific Heuristics  Hand-Crafted Heuristics / Guides: Blocker estimation – lower bound (admissible) Blocker estimation Goal distance – Manhattan distance Goal distance Hybrid blockers distance – combine above two Hybrid blockers distance Is Move To Secluded – did the car enter a secluded area? Is Move To Secluded Is Releasing Move

22 22 Blockers Estimation  Lower bound for number of steps to goal  By: Counting moves to free blocking cars Example:  O is blocking RED  Need at least: Free A; Move A; Move B Move C; Move O  H = 4

23 23 Goal Distance 16  Deduce goal  Use “Manhattan Distance” from goal as h measure

24 24 Hybrid 16+8=24  “Manhattan Distance” + Blockers Estimation

25 25  Moving C and A to the secluded areas are always good moves! Is Move To Secluded CL2, AR1

26 26 Policy “Ingredients”  Functions & Terminals: ConditionsResults TerminalsIsMoveToSecluded, isReleasingMove, g, PhaseByDistance, PhaseByBlockers, NumberOfSyblings, DifficultyLevel, BlockersLowerBound, GoalDistance, Hybrid, 0, 0.1, …, 0.9, 1 BlockersLowerBound, GoalDistance, Hybrid, 0, 0.1, …, 0.9, 1 SetsIf, AND, OR, ≤, ≥+, *

27 27 Coevolving (Hard) 8x8 Boards RED H H F F G G MM PP II SS KK KK KK K K H H F F G G MM PP II SS KK KK KK K K H H F F G G MM PP II SS KK KK KK K K

28 28 Results  Average reduction of nodes required to solve test problems, with respect to the number of nodes scanned by a blind search: Heuristic: Problem IDH1H2H3HcPolicy 6x6100%28%6%-2%30%60% 8x8100%31%25%30%50%90%

29 29 Results (cont’d)  Time (in seconds) required to solve problems JAM01... JAM40:

30 30 FreeCell  FreeCell remained relatively obscure until Windows 95  There are 32,000 solvable problems (known as Microsoft 32K), except for game #11982, which has been proven to be unsolvable Evolving hyper heuristic-based solvers for Rush-Hour and FreeCell [Hauptman et al, SOCS 2010] GA-FreeCell: Evolving Solvers for the Game of FreeCell [Elyasaf et al, GECCO 2011]

31 31 FreeCell (cont’d)  As opposed to Rush Hour, blind search failed miserably  The best published solver to date solves 96% of Microsoft 32K  Reasons: High branching factor Hard to generate a good heuristic

32 32 Learning Methods: Random Deals Which deals should we use for training? First method tested - random deals This is what we did in Rush Hour Here it yielded poor results Very hard domain

33 33 Learning Methods: Gradual Difficulty Second method tested - gradual difficulty Sort the problems by difficulty Each generation test solvers against 5 deals from the current difficulty level + 1 random deal

34 34 Learning Methods: Hillis-Style Coevolution Third method tested - Hillis-style coevolution using “Hall-of-Fame”: A deal population is composed of 40 deals (=40 individuals) + 10 deals that represent a hall-of- fame Each hyper-heuristic is tested against 4 deal individuals and 2 hall-of-fame deals  Evolved hyper-heuristics failed to solve almost all Microsoft 32K! Why?

35 35 Learning Methods: Rosin-style Coevolution Fourth method tested - Rosin-style coevolution: Each deal individual consists of 6 deals Mutation and crossover: 118973042238457364 2837118923983412 179875984 3001113498 p1 p2 118973042238457364 2837118923983412 179875984 3001113498 p1 118973042238457364179875984 2015

36 36 Results Learning Method Run Node Reduction Time Reduction Length Reduction Solved -HSD100% 96% Gradual Difficulty GA-123%31%1%71% GA-227%30%-3%70% GP---- Policy28%36%6%36% Rosin-style coevolution GA 87%93%41%98% Policy 89%90%40%99%

37 37 Additional Proposed Research Search Aspects of Our Method  In the literature: Generally, in non-optimal search the objective is to find as short as possible solutions Admissible heuristics and memory-based heuristics are commonly used Advisors are hardly used and critical domain knowledge is lost

38 38 Additional Proposed Research Search Aspects of Our Method (cont’d)  Our objective is to reduce search resources  We use a wide range of domain knowledge, including: Boolean advisors and highly underestimating advisors  We believe that our method can outperform previous ones  We wish to introduce our acquired knowledge within the search community (in addition to the EA community)

39 39 Additional Proposed Research Search Aspects of Our Method (cont’d)  To achieve this goal we plan multiple sets of experiments for comparing several search algorithms with different types of heuristics: admissible non-admissible Underestimating advisors Overestimating advisors Boolean advisors

40 40 Additional Proposed Research Algorithmic Framework  We aim to create an algorithmic framework based upon evolutionary algorithms  Using this framework one could create a solver for different models and domains  For each model or domain an automatic strategy will be evolved allowing efficient solutions

41 41 Additional Proposed Research Domain-Independent Planner  An immediate extension would be altering the method to evolve a planner for solving problems from different domains, without knowing a-priori the domains  Algorithms for generating and maintaining agendas, policies, interfering sub-goals, relaxed problems, and other methodologies are readily available, provided we encode problems (e.g., Rush Hour, FreeCell) as a planning domains  However, using evolution in conjunction with these techniques is non-trivial

42 42 Additional Proposed Research Domain-Independent Planner (cont’d)  To achieve this goal we need to define domain-independent heuristics to be used as building blocks for the evolutionary process  There are several state-of-the-art heuristics for domain- independent planners, such as FF and HSP  Problem: Using these heuristics as building blocks will take an unreasonable amount of time to calculate all of them  Possible solution: It is crucial to find easy-to-calculate heuristics and advisors for independent domains  A possible direction might be using the taxonomic syntax database described by Yoon et al. [Learning control knowledge for forward search planning]taxonomic syntax database described

43 43 Learning control knowledge for forward search planning Yoon et al.

44 44 Additional Proposed Research Probabilistic Models: MDP & POMDP  Markov Decision Process (MDP) & Partially Observable Markov Decision Process (POMDP) are used to describe more-generalized problems: In MDP we don’t know how our action will end up (what the state will be) In addition to MDP, the agent in a POMDP world doesn’t even know for sure where he is  We believe that our model can be altered to support MDP & POMDP as well

45 45 Thank you for listening any questions?


Download ppt "1 Evolving Hyper-Heuristics using Genetic Programming Supervisor: Moshe Sipper Achiya Elyasaf."

Similar presentations


Ads by Google