Download presentation
Presentation is loading. Please wait.
Published byAron Ball Modified over 9 years ago
1
External Memory Value Iteration Stefan Edelkamp, Shahid Jabbar Chair for Programming Systems, University of Dortmund, Germany Blai Bonet Departamento de Computacion Universidad Simon Bolivar, Caracas, Venezuela
2
External Memory Value Iteration Edelkamp, Jabbar & Bonet 2 Motivation: Reinforcement Learning Aim: Write Controller to act successfully in the environment Minimize Cost/Maximize Rewards Agent Environment atat ctct stst
3
External Memory Value Iteration Edelkamp, Jabbar & Bonet 3 Motivation: External Reinforcement Learning Cover deterministic, non-deterministic, probabilistic environments (and games) But what to do, if the agent’s state space or policy space is too large to be computed and stored in RAM? Disk Space is Cheap (500 GB ~ 100$) External Memory Algorithm
4
External Memory Value Iteration Edelkamp, Jabbar & Bonet 4 Overview Uniform Search Model Internal Memory Value Iteration Existing External Model and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook
5
External Memory Value Iteration Edelkamp, Jabbar & Bonet 5 Overview Uniform Search Model Internal Memory Value Iteration Existing External Model and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook
6
External Memory Value Iteration Edelkamp, Jabbar & Bonet 6 Uniform Search Modell: Deterministic Non-Deterministic Probabilistic
7
External Memory Value Iteration Edelkamp, Jabbar & Bonet 7 Overview Uniform Search Model Internal Memory Value Iteration Existing External Model and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook
8
External Memory Value Iteration Edelkamp, Jabbar & Bonet 8 ε-Optimal for solving MDPs, AND/OR trees… Problem: Needs to have the whole state space in the main memory.
9
External Memory Value Iteration Edelkamp, Jabbar & Bonet 9 Why External Memory Algorithms ? Search algorithms perform well as long as they consume RAM only! Virtual memory slows down the performance! 0x000…000 0xFFF…FFF Virtual Address Space Memory Page 7 I/Os
10
External Memory Value Iteration Edelkamp, Jabbar & Bonet 10 Overview Uniform Search Model Internal Memory Value Iteration Existing External Memory Model and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook
11
External Memory Value Iteration Edelkamp, Jabbar & Bonet 11 External Memory Model [Vitter and Shriver, 94] M If the input size is very large, running time depends on the I/Os rather than on the number of instructions. Input of size N >> M B
12
External Memory Value Iteration Edelkamp, Jabbar & Bonet 12 External Breadth-First Search (Munagala and Ranade, SODA’99) A D C B E A Open (0) A A D D E External Sort Open (2) A D E Compact Open (2) D E Remove Duplicates w.r.t 2 previous layers Open (2) B C Open (1) D A A D E For undirected graphs, subtracting two layers is enough [Munagala & Ranade, 99]. For directed graphs, the longest back-edge has to be taken into account [Zhou & Hansen, 05].
13
External Memory Value Iteration Edelkamp, Jabbar & Bonet 13 External Memory Algorithms for Implicit Graphs Frontier Search [Korf, 03] External A* [Edelkamp, Jabbar, Schrödl, 04] Structured Duplicate Detection [Zhou & Hansen, 04]. Cost-Optimal External Planning [Edelkamp, Jabbar, 06] Model Checking for Linear Temporal Logic [Jabbar & Edelkamp, 05] for safety error detection [Edelkamp & Jabbar, 06] for liveness detection (cycle) [Barnat, Brim, Simecek, 07] for liveness detection (cycle) Real-Time Model Checking/Scheduling [Edelkamp, Jabbar, 06]
14
External Memory Value Iteration Edelkamp, Jabbar & Bonet 14 Overview Uniform Search Model Internal Memory Value Iteration Existing External Memory Model and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook
15
External Memory Value Iteration Edelkamp, Jabbar & Bonet 15 External Memory Algorithm for Value Iteration What makes value iteration different from the usual external memory search algorithms? Answer: Propagation of information from states to predecessors! Edges are more important than the states. Ext-VI works on Edges:
16
External Memory Value Iteration Edelkamp, Jabbar & Bonet 16 External Memory Value Iteration Phase I: Generate the edge space by External BFS. Open(0) = Init; i = -1 while (Open(i-1) != empty ) Open(i) = Succ(Open(i-1)) Externally-Sort-and-Remove-Duplicates(Open(i)) for loc = 1 to Locality(Graph) Open(i) = Open(i) \ Open(i - loc) i++ endwhile Merge all BFS layers into one edge list on disk! Open t = Open(0) U Open(1) U … U Open(DIAM) Temp = Open t Sort Open t wrt. the successors; Sort Temp wrt. the predecessors Remove previous layers
17
External Memory Value Iteration Edelkamp, Jabbar & Bonet 17 Working of Ext-VI Phase-II {(Ø, 1), (1,2), (1,3), (1,4), (2,3), (2,5), (3,4), (3,8), (4,6), (5,6), (5,7), (6,9), (7,8), (7,10), (9,8), (9,10)} {(Ø,1), (1,2), (1,3), (2,3), (1,4), (3,4), (2,5), (4,6), (5,6), (5,7), (3,8), (7,8), (9,8), (6,9), (7,10), (9,10)} 3 2 2 2 2 1 2 0 1 1 1 1 0 0 0 0 3 2 2 2 2 2 1 1 1 1 0 0 0 1 0 0 3211222221000100 1 2 3 4 7 8 9 5 6 10 ITT h=3 2 2 2 1 1 1 1 0 0 Temp : Edge List on Disk – Sorted on Predecessors Open t : Edge List on Disk – Sorted on Successors h= h’= Alternate sorting and update until residual < epsilon
18
External Memory Value Iteration Edelkamp, Jabbar & Bonet 18 Complexity Analysis Phase-I: External Memory Breadth-First Search. Expansion: Scanning the red bucket: O(scan(|E|)) Duplicates Removal: Sorting the green bucket having one state for every edge from the red bucket. Scanning and compaction. O(sort(|E|)) Subtraction: Removing states of blue buckets (duplicates free) from the green one. O(l x scan(|E|)) Complexity of Phase-I: O(l x scan(|E|) + sort(|E|) ) I/Os ………………
19
External Memory Value Iteration Edelkamp, Jabbar & Bonet 19 Complexity Analysis Phase-II: Backward Update Update: Simple block-wise scanning. Scanning time for red and green files: O(scan(|E|)) I/Os External Sort: Sorting the blue file with the updated values to be used as red file later: O(sort(|E|)) I/Os Fast External Sort: If |E| / M < Max file pointers O(scan(|E|)) I/Os Total Complexity of Phase-II: For t max iterations, O( t max x sort(|E|)) I/Os With Fast External Sort: O( t max x scan(|E|)) I/Os Sorted on preds Sorted on states Updated h-values ………
20
External Memory Value Iteration Edelkamp, Jabbar & Bonet 20 Overview Uniform Search Model Internal Memory Value Iteration Existing External Model and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook
21
External Memory Value Iteration Edelkamp, Jabbar & Bonet 21 Experiments: 3x3 Sliding Tiles Puzzle p=1.0; heuristic = 0 Alg.|S|/|E|RAM#IterationsTime VI181,44021M276.3 Ext-VI483,83911M3271.5 p=0.9; heuristic = Manhattan distance Alg.|S|/|E|RAM#IterationsTime VI181,44021M358.3 Ext-VI967,67712M43237.4 Number of Iterations differ!!
22
External Memory Value Iteration Edelkamp, Jabbar & Bonet 22 3x4 Sliding Tile Puzzle with p=0.9 (State space: 12!/2 = 239 x 10 6 ) On 2 Gigabytes, VI could not generate the state space. External VI Finished: Took 45 GB of disk space for the edges. Total 1,357,171,197 edges. Took 437 hours and 72 iterations to converge. ε = 0.0001 RAM used: 1.4 Gigabytes
23
External Memory Value Iteration Edelkamp, Jabbar & Bonet 23 Race Track Domain Example Alg.150x300 RaceTrack VIOut of mem. > 2GB LRTDPOut of mem. >2 GB; 12 hours LDFSOut of time >1.5 GB; 118 hours Ext-VIConverged! 1.6GB; 91 hours
24
External Memory Value Iteration Edelkamp, Jabbar & Bonet 24 Overview Uniform Search Model Internal Memory Value Iteration Existing External Model and BFS External Memory Value Iteration Experimental Highlights Summary & Outlook
25
External Memory Value Iteration Edelkamp, Jabbar & Bonet 25 Summary Achievements First I/O efficient disk-based algorithm for solving Markov Decision Processes. I/O Complexity Analysis. Features General Cost Model Can Pause-and-Resume Execution to add more Hard Disks. Refinements Disk Space eaten by Duplicate States: Start “Early” Delayed Duplicate Detection
26
External Memory Value Iteration Edelkamp, Jabbar & Bonet 26 Outlook Application to Bellman-Ford Parallel External Value Iteration: During the time of internal update, hard disk is not in use..
27
Thank You! Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.