Download presentation
Presentation is loading. Please wait.
1
1 Evolving Hyper-Heuristics using Genetic Programming Supervisor: Moshe Sipper Achiya Elyasaf
2
22 Overview Introduction Searching Games State-Graphs Uninformed Search Heuristics Informed Search Evolving Heuristics Previous Work Rush Hour FreeCell
3
33 Representing Games as State-Graphs Every puzzle/game can be represented as a state graph: In puzzles, board games etc., every piece move can be counted as a different state In computer war games etc. – the place of the player / the enemy, all the parameters (health, shield…) define a state
4
44 Rush-Hour as a state-graph
5
55 Searching Games State-Graphs Uninformed Search BFS – Exponential in the search depth DFS – Linear in the length of the current search path. BUT: We might “never” track down the right path. Usually games contain cycles Iterative Deepening: Combination of BFS & DFS Iterative Deepening Each iteration DFS with a depth limit is performed. Limit grows from one iteration to another Worst case - traverse the entire graph
6
66 Searching Games State-Graphs Uninformed Search Most of the game domains are PSPACE- Complete! Worst case - traverse the entire graph We need an informed-search!
7
77 Searching Games State-Graphs Heuristics h:states -> Real. For every state s, h(s) is an estimation of the minimal distance/cost from s to a solution h is perfect: an informed search that tries states with highest h-score first – will simply stroll to solution For hard problems, finding h is hard Bad heuristic means the search might never track down the solution We need a good heuristic function to guide informed search
8
88 Searching Games State-Graphs Informed Search Best-First search: Like DFS but select nodes with higher heuristic value first Best-First search Not necessarily optimal Might enter cycles (local extremum) A*: A* Holds closed and sorted (by h-value) open lists. Best node of all open nodes is selected Maintenance and size of open and closed is not admissible
9
99 Searching Games State-Graphs Informed Search (Cont.) IDA*: Iterative-Deepening with A* IDA* The expanded nodes are pushed to the DFS stack by descending heuristic values Let g(s i ) be the min depth of state s i : Only nodes with f(s)=g(s)+h(s)<depth-limit are visited Near optimal solution (depends on path-limit) The heuristic need to be admissible
10
10 Iterative Deepening
11
11 Best-First Search 12 3 4
12
12 A* 4 12 3
13
13 Overview Introduction Searching Games State-Graphs Uninformed Search Heuristics Informed Search Evolving Heuristics Previous Work Rush Hour FreeCell
14
14 For H 1, …,H n – building blocks (not necessarily admissible or in the same range), How should we choose the fittest heuristic? Minimum? Maximum? Linear combination? GA/GP may be used for: Building new heuristics from existing building blocks Finding weights for each heuristic (for applying linear combination) Finding conditions for applying each heuristic H should probably fit stage of search E.g., “goal” heuristics when assuming we’re close Evolving Heuristics
15
15 Evolving Heuristics: GA W 1 =0.3W 2 =0.01W 3 =0.2…W n =0. 1
16
16 Evolving Heuristics: GP If And ≤ ≤ H1 0.4 ≥ ≥ H2 0.7 + + H2 * * H1 0.1 * * H5 / / H1 0.1 Condition True False
17
17 Evolving Heuristics: Policies ConditionResult Condition 1Heuristics Weights 1 Condition 2Heuristics Weights 2 Condition nHeuristics Weights n Default Heuristics Weights
18
18 Evolving Heuristics: Fitness Function
19
19 Overview Introduction Searching Games State-Graphs Uninformed Search Heuristics Informed Search Evolving Heuristics Previous Work Rush Hour FreeCell
20
20 Rush Hour GP-Rush [Hauptman et al, 2009] Bronze Humie award
21
21 Domain-Specific Heuristics Hand-Crafted Heuristics / Guides: Blocker estimation – lower bound (admissible) Blocker estimation Goal distance – Manhattan distance Goal distance Hybrid blockers distance – combine above two Hybrid blockers distance Is Move To Secluded – did the car enter a secluded area? Is Move To Secluded Is Releasing Move
22
22 Blockers Estimation Lower bound for number of steps to goal By: Counting moves to free blocking cars Example: O is blocking RED Need at least: Free A; Move A; Move B Move C; Move O H = 4
23
23 Goal Distance 16 Deduce goal Use “Manhattan Distance” from goal as h measure
24
24 Hybrid 16+8=24 “Manhattan Distance” + Blockers Estimation
25
25 Moving C and A to the secluded areas are always good moves! Is Move To Secluded CL2, AR1
26
26 Policy “Ingredients” Functions & Terminals: ConditionsResults TerminalsIsMoveToSecluded, isReleasingMove, g, PhaseByDistance, PhaseByBlockers, NumberOfSyblings, DifficultyLevel, BlockersLowerBound, GoalDistance, Hybrid, 0, 0.1, …, 0.9, 1 BlockersLowerBound, GoalDistance, Hybrid, 0, 0.1, …, 0.9, 1 SetsIf, AND, OR, ≤, ≥+, *
27
27 Coevolving (Hard) 8x8 Boards RED H H F F G G MM PP II SS KK KK KK K K H H F F G G MM PP II SS KK KK KK K K H H F F G G MM PP II SS KK KK KK K K
28
28 Results Average reduction of nodes required to solve test problems, with respect to the number of nodes scanned by a blind search: Heuristic: Problem IDH1H2H3HcPolicy 6x6100%28%6%-2%30%60% 8x8100%31%25%30%50%90%
29
29 Results (cont’d) Time (in seconds) required to solve problems JAM01... JAM40:
30
30 FreeCell FreeCell remained relatively obscure until Windows 95 There are 32,000 solvable problems (known as Microsoft 32K), except for game #11982, which has been proven to be unsolvable Evolving hyper heuristic-based solvers for Rush-Hour and FreeCell [Hauptman et al, SOCS 2010] GA-FreeCell: Evolving Solvers for the Game of FreeCell [Elyasaf et al, GECCO 2011]
31
31 FreeCell (cont’d) As opposed to Rush Hour, blind search failed miserably The best published solver to date solves 96% of Microsoft 32K Reasons: High branching factor Hard to generate a good heuristic
32
32 Learning Methods: Random Deals Which deals should we use for training? First method tested - random deals This is what we did in Rush Hour Here it yielded poor results Very hard domain
33
33 Learning Methods: Gradual Difficulty Second method tested - gradual difficulty Sort the problems by difficulty Each generation test solvers against 5 deals from the current difficulty level + 1 random deal
34
34 Learning Methods: Hillis-Style Coevolution Third method tested - Hillis-style coevolution using “Hall-of-Fame”: A deal population is composed of 40 deals (=40 individuals) + 10 deals that represent a hall-of- fame Each hyper-heuristic is tested against 4 deal individuals and 2 hall-of-fame deals Evolved hyper-heuristics failed to solve almost all Microsoft 32K! Why?
35
35 Learning Methods: Rosin-style Coevolution Fourth method tested - Rosin-style coevolution: Each deal individual consists of 6 deals Mutation and crossover: 118973042238457364 2837118923983412 179875984 3001113498 p1 p2 118973042238457364 2837118923983412 179875984 3001113498 p1 118973042238457364179875984 2015
36
36 Results Learning Method Run Node Reduction Time Reduction Length Reduction Solved -HSD100% 96% Gradual Difficulty GA-123%31%1%71% GA-227%30%-3%70% GP---- Policy28%36%6%36% Rosin-style coevolution GA 87%93%41%98% Policy 89%90%40%99%
37
37 Additional Proposed Research Search Aspects of Our Method In the literature: Generally, in non-optimal search the objective is to find as short as possible solutions Admissible heuristics and memory-based heuristics are commonly used Advisors are hardly used and critical domain knowledge is lost
38
38 Additional Proposed Research Search Aspects of Our Method (cont’d) Our objective is to reduce search resources We use a wide range of domain knowledge, including: Boolean advisors and highly underestimating advisors We believe that our method can outperform previous ones We wish to introduce our acquired knowledge within the search community (in addition to the EA community)
39
39 Additional Proposed Research Search Aspects of Our Method (cont’d) To achieve this goal we plan multiple sets of experiments for comparing several search algorithms with different types of heuristics: admissible non-admissible Underestimating advisors Overestimating advisors Boolean advisors
40
40 Additional Proposed Research Algorithmic Framework We aim to create an algorithmic framework based upon evolutionary algorithms Using this framework one could create a solver for different models and domains For each model or domain an automatic strategy will be evolved allowing efficient solutions
41
41 Additional Proposed Research Domain-Independent Planner An immediate extension would be altering the method to evolve a planner for solving problems from different domains, without knowing a-priori the domains Algorithms for generating and maintaining agendas, policies, interfering sub-goals, relaxed problems, and other methodologies are readily available, provided we encode problems (e.g., Rush Hour, FreeCell) as a planning domains However, using evolution in conjunction with these techniques is non-trivial
42
42 Additional Proposed Research Domain-Independent Planner (cont’d) To achieve this goal we need to define domain-independent heuristics to be used as building blocks for the evolutionary process There are several state-of-the-art heuristics for domain- independent planners, such as FF and HSP Problem: Using these heuristics as building blocks will take an unreasonable amount of time to calculate all of them Possible solution: It is crucial to find easy-to-calculate heuristics and advisors for independent domains A possible direction might be using the taxonomic syntax database described by Yoon et al. [Learning control knowledge for forward search planning]taxonomic syntax database described
43
43 Learning control knowledge for forward search planning Yoon et al.
44
44 Additional Proposed Research Probabilistic Models: MDP & POMDP Markov Decision Process (MDP) & Partially Observable Markov Decision Process (POMDP) are used to describe more-generalized problems: In MDP we don’t know how our action will end up (what the state will be) In addition to MDP, the agent in a POMDP world doesn’t even know for sure where he is We believe that our model can be altered to support MDP & POMDP as well
45
45 Thank you for listening any questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.