Presentation is loading. Please wait.

Presentation is loading. Please wait.

DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe.

Similar presentations


Presentation on theme: "DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe."— Presentation transcript:

1 DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe 1 Manish Jain

2 Motivation Real-world Applications of Mobile Sensor Networks ◦ Robots in an urban setting ◦ Autonomous Under-water vehicles 2 Manish Jain

3 Challenges Rewards are unknown Limited time-horizon Anytime performance is important 3 Manish Jain

4 Distributed Constraint Optimization for sensor networks ◦ [Lesser03, Zhang03, …] Mobile Sensor Nets for Communication ◦ [Cheng2005, Marden07, …] Factor Graphs ◦ [Farinelli08, …] Swarm Intelligence, Potential Games Other Robotic Approaches … Existing Models Manish Jain 4

5 Contributions Propose new algorithms for DCOPs Seamlessly interleave Distributed Exploration and Distributed Exploitation Tests on physical hardware 5 Manish Jain

6 Outline Background on DCOPs Solution Techniques Experimental Results Conclusions and Future Work 6 Manish Jain

7 a2a3Reward 10 0 0 6 a1a2Reward 10 0 0 6 DCOP Framework a1 a2 a3 7 Manish Jain

8 Applying DCOP Manish Jain 8 DCOP ConstructDomain Equivalent AgentsRobots Agent ValuesSet of Possible Locations Reward on the Link Signal Strength between neighbors Objective: Maximize Net Reward Objective: Maximize net signal strength

9 k-Optimality [Pearce07] 1-optimal solutions: all or all R = 12R = 6 a2a3Reward 10 0 0 6 a1a2Reward 10 0 0 6 a1 a2 a3 9 Manish Jain

10 MGM-Omniscient a1 a2 a3 a_ia_jReward 10 0 0 6 Manish Jain

11 MGM-Omniscient a1 a2 a3 10 11 Manish Jain a_ia_jReward 10 0 0 6

12 MGM-Omniscient a1 a2 a3 a_ia_jReward 10 0 0 6 12 10 12 Manish Jain

13 MGM-Omniscient a1 a2 a3 a_ia_jReward 10 0 0 6 12 10 a1a2a3 0 0 0 0 0 0 Only one agent per neighborhood allowed to change Monotonic Algorithm 13 Manish Jain

14 Solution Techniques Static Estimation ◦ SE-Optimistic ◦ SE-Realistic Balanced Exploration using Decision Theory ◦ BE-Backtrack ◦ BE-Rebid ◦ BE-Stay 14 Manish Jain

15 Static Estimation Techniques SE-Optimistic ◦ Always assume that exploration is better ◦ Greedy Approach 15 Manish Jain

16 Static Estimation Techniques SE-Optimistic ◦ Always assume that exploration is better ◦ Greedy Approach SE-Realistic ◦ More conservative – assume exploration gives mean reward ◦ Faster convergence 16 Manish Jain

17 17 Manish Jain Balanced Exploration Techniques

18 BE-Backtrack ◦ Decision Theoretic Limit on exploration ◦ Track previous best location R b ◦ State of the agent: (R b,T) 18 Manish Jain Balanced Exploration Techniques

19 Manish Jain 19

20 Balanced Exploration Techniques Manish Jain 20 Utility of Exploration

21 Balanced Exploration Techniques Manish Jain 21 Utility of Backtrack after Successful Exploration

22 Balanced Exploration Techniques Manish Jain 22 Utility of Backtrack after Unsuccessful Exploration

23 BE-Rebid ◦ Allows agents to backtrack ◦ Re-evaluate every time-step ◦ Allows for on-the-flyreasoning ◦ Same equations as BE-Backtrack 23 Manish Jain Balanced Exploration Techniques

24 BE-Stay ◦ Agents unable to backtrack ◦ Dynamic Programming Approach 24 Manish Jain Balanced Exploration Techniques

25 Results 25 Manish Jain

26 Results 26 Manish Jain Learning Curve (20 agents, chain, 100 rounds)

27 Results (simulation) 27 Manish Jain (chain topology, 100 rounds)

28 Results (simulation) 28 Manish Jain (10 agents, random graphs with 15-20 links)

29 Results (simulation) 29 Manish Jain (20 agents, 100 rounds)

30 Results (physical robots) 30 Manish Jain

31 Results (physical robots) 31 Manish Jain (4 robots, 20 rounds)

32 Conclusions Provide algorithms for DCOPs addressing real-world challenges Demonstrated improvement with physical hardware 32 Manish Jain

33 Future Work Scaling up the evaluation ◦ different approaches ◦ different parameter settings Examine alternate metrics ◦ battery drain ◦ throughput ◦ cost to movement Verify algorithms in other domains Manish Jain 33

34 34 Manish Jain Thank You manish.jain@usc.edu http://teamcore.usc.edu/manish

35 Conclusions Provide algorithms for DCOPs addressing real-world challenges Demonstrated improvement with physical hardware 35 Manish Jain manish.jain@usc.edu http://teamcore.usc.edu/manish


Download ppt "DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks Manish Jain Matthew E. Taylor Makoto Yokoo MilindTambe."

Similar presentations


Ads by Google