1 Project Ideas
2 Algorithmic Evaluations/Comparisons Compare variants of (nested) policy rollout using different bandit algorithms Compare some variants of Monte-Carlo tree search Implement an algorithm from the literature and attempt to replicate results, e.g. Forward Search Sparse Sampling (a type of Monte- Carlo tree search algorithm) Anytime AO* Least-Squares Policy Iteration I could give other pointers depending on interests
3 Algorithmic Comparisons Compare some reinforcement learning algorithms across some interesting problems E.g. compare TD-based vs. Policy Gradient based You could use the domains I have in the Java framework for evaluation
4 Solve a Particular Problem Pick a challenging sequential decision making problem Apply one or more of our planning/learning approaches to it and evaluate Problems from past projects: Games Tetris Pokemon Blockus Chess Backgammon Othello Clue Space Wars (Galcon Fusion) Starcraft Pac Man
5 Solve a Particular Problem Problems from past projects: Compiler scheduling Adaptive Java program optimization Forest Fire Management Crop Management Optimizing Policies for Network Protocols Controllers for Real-Time Strategy Games Subproblems of the game Optimizing file sharing policies Reinforcement learning and Monte-Carlo were the most commonly applied solution approaches