RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.

RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games

Overview I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion

I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion

Domain RTS games  Resource Production  Tactical Planning Tactical Assault battles

RTS game - Wargus Screenshot of a typical battle scenario in Wargus

Planning problem Large state space Temporal actions Spatial reasoning Concurrency Stochastic actions Changing goals

Related Work Board games – bridge, poker, Go etc.,  Monte Carlo simulations RTS games  Resource Production  Means-ends analysis – Chan et al.,  Tactical Planning  Monte Carlo simulations – Chung et al.,  Nash strategies – Sailer et al.,  Reinforcement learning – Wilson et al., Bandit-based problems, Go  UCT – Kocsis et al., Gelly et al.,

Our Approach Monte Carlo simulations UCT algorithm Advantage  Complex plans from simple abstract actions  Exploration/Exploitation tradeoff  Changing goals

Method Planning architecture UCT Algorithm Search space formulation Monte Carlo simulations Challenges

Planning Architecture Online Planner State space abstraction  Grouping of units Abstract actions  Join(G)  Attack(f,e)

UCT Algorithm Exploration/Exploitation tradeoff Monte Carlo simulation – get subsequent states Search tree  Root node – current state  Edges – available actions  Intermediate nodes – subsequent states  Leaf nodes – terminal states Rollout-based construction Value estimates

UCT Algorithm – Pseudo Code 1 At each interesting time point in the game: build_UCT_tree(current state); choose argmax action(s) based on the UCT policy; execute the aggregated actions in the actual game; wait until one of the actions get executed; build_UCT_tree(state): for each UCT pass do run UCT_rollout(state); (.. continued)

UCT Algorithm – Pseudo Code 2 UCT_rollout(state): recursive algorithm if leaf node reached then estimate final reward; propagate reward up the tree and update value functions; return; populate possible actions; if all actions explored at least once then choose the action with best value function; else if there exists unexplored action choose an action based on random sampling; run Monte-Carlo simulation to get next state based on current state and action; call UCT_rollout(next state);

UCT Algorithm - Formulae Action Selection: Value Updation:

Search Space Formulation Abstract State  Friendly and enemy groups  Hit points  Location  Current actions  Current time Calculation of group hit points: Calculation of mean location: centroid

Monte Carlo Simulations Domain-specific Actual game play – Wargus  Join actions  Attack actions Reward calculation – objective function  Time  Hit points Note: Partial simulations (time cutoff)

Domain-specific Challenges State space abstraction  Grouping of units (proximity-based) Concurrency  Aggregation of actions  Join actions – simple  Attack actions – complex (partial simulations)

Planning problem - revisited Large state space – abstraction Temporal actions – Monte Carlo simulations Spatial reasoning – Monte Carlo simulations Concurrency – aggregation of actions Stochastic actions – UCT (online planning) Changing goals – UCT (different objective functions)

Experiments # Scenario Name # of friendly groups Friendly groups composition # of enemy groups Enemy groups composition # of possible ‘Join’ actions # of possible ‘Attack’ actions Total # of possible actions 12vs22{6,6}2{5,5}145 23vs23{6,2,4}2{5,5}369 34vs2_14{2,4,2,4}2{5,5}6814 44vs2_24{2,4,2,4}2{5,5}6814 54vs2_34{2,4,2,4}2{5,5}6814 64vs2_44{2,4,2,4}2{5,5}6814 74vs2_54{2,4,2,4}2{5,5}6814 84vs2_64{2,4,2,4}2{5,5}6814 94vs2_74{3,3,6,4}2{5,9}6814 104vs2_84{3,3,3,6}2{5,8}6814 112vs4_12{9,9}4{4,5,5,4}189 122vs4_22{9,9}4{5,5,5,5}189 132vs4_32{9,9}4{5,5,5,5}189 142vs5_12{9,9}5{5,5,5,5,5}11011 152vs5_22{10,10}5{5,5,5,5,5}11011 163vs43{12,4,4}4{5,5,5,5}31215 Table 1: Details of the different game scenarios

Planners UCT Planners  UCT(t)  UCT(hp) Number of rollouts – 5000 Averaged over – 5 runs

Planners Baseline Planners  Random  Attack-Closest  Attack-Weakest  Stratagus-AI  Human

Video – Planning in action Simple scenario Complex scenario

Results Figure 1: Time results for UCT(t) and baselines.

Results Figure 2: Hit point results for UCT(t) and baselines.

Results Figure 3: Time results for UCT(hp) and baselines.

Results Figure 4: Hit point results for UCT(hp) and baselines.

Results - Comparison Figures 1, 2, 3 & 4: Comparison between UCT(t) and UCT(hp) metrics Time resultsHit point results U C T (t) U C T (hp)

Results Figure 5: Time results for UCT(t) with varying rollouts.

Conclusion  Hard planning problem  Less expert knowledge  Different objective functions Future Work  Computational time – engineering aspects  Machine Learning techniques  Beyond Tactical Assault

Thank you

RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.

Similar presentations

Presentation on theme: "RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.

Similar presentations

Presentation on theme: "RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games."— Presentation transcript:

Similar presentations

About project

Feedback