Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software Multiagent Systems: Lecture 13 Milind Tambe University of Southern California

Similar presentations


Presentation on theme: "Software Multiagent Systems: Lecture 13 Milind Tambe University of Southern California"— Presentation transcript:

1 Software Multiagent Systems: Lecture 13 Milind Tambe University of Southern California tambe@usc.edu

2 Teamwork When agents act together

3 Understanding Teamwork  Ordinary traffic  Driving in a convoy  Two friends A & B together drive in a convoy  B is secretly following A  Pass play in Soccer  Contracting with a software company  Orchestra

4 Understanding Teamwork Not just a union of simultaneous coordinated actions Different from contracting Together Joint Goal Co-labor  Collaborate

5 Why Teamwork? Why not: Master-Slave? Contracts?

6 Why Teams Robust organizations Responsibility to substitute Mutual assistance Information communicated to peers Still capable of structure (not necessarily flat) Subteams, subsubteams Variations in capabilities and limitations

7 Approach Theory Practical teamwork architectures

8 Taking a step back…

9 Key Approaches in Multiagent Systems Market mechanisms Auctions Distributed Constraint Optimization (DCOP) x1 x2 x3x4 Belief-Desire-Intention (BDI) Logics and Psychology (JPG  p (MB   p) ۸ (MG   p) ۸ (Until [(MB  p) ۷ (MB    p)] (WMG   p)) Distributed POMDP Hybrid DCOP/ POMDP/ AUCTIONS/ BDI Essential in large-scale multiagent teams Synergistic interactions

10 Key Approaches for Multiagent Teams    Markets BDI Dis POMDPs Local interactions UncertaintyLocal utility Human usability & plan structure DCOP      Markets BDI Dis POMDPs Local interactions UncertaintyLocal utility Human usability & plan structure DCOP BDI-POMDP Hybrid

11 Distributed POMDPs Three papers on the web pages: What to read: Ignore all the proofs Ignore complexity results JAIR article: the model and the results at the end Understand fundamental principles

12 Domain: Teamwork for Disaster Response

13 Multiagent Team Decision Problem (MTDP) MTDP: S: s1, s2, s3… Single global world state, one per epoch A: domain-level actions; A = {A1, A2, A3,…An} Ai is a set of actions for each agent i Joint action

14 MTDP P: Transition function: P(s’ | s, a1, a2, …an) R A : Reward R(s, a1, a2,…an) One common reward; not separate Central to teamwork

15 MTDP (cont’d)  : observations Each agent: different finite sets of possible observations  O: probability of observation O(destination-state, joint-action, joint-observation) P(o1,o2..om | a1, a2,…am, s’)

16 Simple Scenario Cost of action: -0.2 Must fight fires together Observe own location and fire status +20 +40

17

18 MTDP Policy  he problem: Find optimal JOINT policies One policy for each agent  i : Action policy Maps belief state into domain actions (Bi  A) for each agent Belief state: sequence of observations

19 MTDP Domain Types Collectively partially observable: general case, no assumptions Collectively observable: Team (as a whole) observes state For all joint observations, there is a state s, such that, for all other states s’ not equal to s, Pr (o1,o2…on | s’) = 0 Pr (o1, o2, …on | s ) = ? Pr (s | o1,o2..on) = ? Individually observable: each agent observes the state For all individual observations, there is a state s, such that for all other states s’ not equal to s, Pr (oi | s’) = 0

20 From MTDP to COM-MTDP Two separate actions: communication vs domain actions Two separate reward types: Communication rewards and domain rewards Total reward: sum two rewards Explicit treatment of communication Analysis

21 Communicative MTDPs(COM-MTDPs)  : communication capabilities, possible “speech acts” e.g., “I am moving to fire1.” R  : communication cost (over messages) e.g., saying, “I am moving to fire1,” has a cost R   Why ever communicate?

22 Two Stage Decision Process Agent World Observes Actions SE1 P1 b1 P2 SE2 b2 Communications to and from P1: Communication policy P2: Action policy Two state estimators Two belief State updates

23 COM-MTDP Continued  Belief state (each Bi history of observations, Communication) Two stage belief update Stage 1: Pre-communication belief state for agent i (updates just from observations)  i 0      i 1      i t-1   t-1   i t   Stage 2: Post-communication belief state for i (updates from observations and communication)  i 0      i 1      i t-1   t-1   i t   t  Cannot create probability distribution over states

24 COM-MTDP Continued  he problem: Find optimal JOINT policies One policy for each agent   : Communication policy Maps pre-communication belief state into message (Bi   for each agent  A : Action policy Maps post-communication belief state into domain actions (Bi  A) for each agent

25 More Domain Types General Communication: no assumptions on R  Free communication: R  (s,  ) = 0 No communication: R  (s,  ) is negatively infinite

26 Teamwork Complexity Results Individual observability Collective observability Collective Partial obser. No communication P-complete NEXP complete NEXP complete General communication P-complete NEXP complete NEXP complete Full communication P-complete PSPACE complete

27 Classifying Different Models Individual observability Collective observability Collective Partial obser. No communication MMDP DEC-POMDP POIPSG General communication XUAN-LESSER COM-MTDP Full communication

28 True or False If agents communicated all their observations at each step then the distributed POMDP would be essentially a single agent POMDP In distributed POMDPs, each agent plans its own policy Solving Distributed POMDPs with two agents is of same complexity as solving two separate individual POMDPs

29 Algorithms

30 NEXP-complete No known efficient algorithms Brute force search 1. Generate space of possible joint policies 2. For each policy in policy space 3.Evaluate over finite horizon T Complexity: No. of policies Cost of evaluation

31 Locally optimal search Joint equilibrium based search for policies JESP

32 Nash Equilibrium in Team Games Nash equilibrium vs Global optimal reward for the team 3,67,1 5,18,2 6,06,2 x y z uv A B 98 610 68 x y z uv A B

33 JESP: Locally Optimal Joint Policy 958 6710 638 x y z uv A B w Iterate keeping one agent’s policy fixed More complex policies the same way

34 Joint Equilibrium-based Search Description of algorithm: 1. Repeat until convergence 2.For each agent i 3.Fix policy of all agents apart from i 4.Find policy for i that maximizes joint reward Exhaustive-JESP: brute force search in policy space of agent I Expensive

35 JESP: Joint Equilibrium Search (Nair et al, IJCAI 03) Repeat until convergence to local equilibrium, for each agent K: Fix policy for all except agent K Find optimal response policy for agent K Optimal response policy for K, given fixed policies for others in MTDP: Transformed to a single-agent POMDP problem: “Extended” state defined as not as Define new transition function Define new observation function Define multiagent belief state Dynamic programming over belief states Fast computation of optimal response

36 Extended State, Belief State Sample progression of beliefs: HL and HR are observations a2: Listen

37 Run-time Results Method234567 Exhaustive-JESP10317800---- DP-JESP0020110136030030

38 Is JESP guaranteed to find the global optimal? Random restarts 958 6710 638

39 Not All Agents are Equal Scaling up Distributed POMDPs for Agent Networks

40 Runtime

41 POMDP vs. distributed POMDP Distributed POMDPs more complex Joint transition and observation functions Better policy Free communication = POMDP Less dependency = lower complexity

42 BDI vs. distributed POMDP BDI teamworkDistributed POMDP teamwork Explicit joint goalExplicit joint reward Plan/organization hierarchiesUnstructured plans/teams Explicit commitmentsImplicit commitments No costs / uncertaintiesCosts & uncertainties included


Download ppt "Software Multiagent Systems: Lecture 13 Milind Tambe University of Southern California"

Similar presentations


Ads by Google