A Decision-Theoretic Approach to Designing Proactive Communication in Multi-Agent Teamwork Thomas R. Ioerger, Yu Zhang, Richard Volz, John Yen (PSU-IST) Dept. of Computer Science Texas A&M University
2 Motivation Agent Multi-Agent Team Agents share a large amount of knowledge about the teamwork. Hard coded Interactions among participants. High-frequency message exchange. Communication risk.
3 Challenging Issues in Designing Communication Protocols Each agent has incomplete information from which uncertainties arise. Each agent has different problem solving capabilities. Data are decentralized and lack systems’ global control. Excessive/unrestricted communication leads to lack of scalability
4 Our Approach and Its Contributions Proactive Communication OBPC: Reduction of communication load through OBservations. DIP: Dynamic estimation of the probability distribution of Information Production and need. DTPC: Decision-Theoretic determination of communication strategies.
5 Background CAST (Collab. Agents for Simulating Teamwork) MALLET (Multi-Agent Logic-based Language for Encoding Teamwork) (team-plan killwumpus(?w) (process (seq (agent-bind ?ca (constraint (play-role ?ca scout))) (DO ?ca (findwumpus ?w))) (agent-bind ?fi (constraint ((play-role ?fi fighter) (closest-to-wumpus ?fi ?w)))) (DO ?fi (movetowumpus ?w)) (DO ?fi (shootwumpus ?w)))))) (ioper shootwumpus (?w) (pre-cond (wumpus ?w) (location ?w ?x ?y) (dead ?w false)) (effect (dead ?w true)))
6 Overview CAST KB Proactive Communication OBPC DIP DTPC Optimal Communication Strategy Team Structure & Teamwork Procedure
7 Agent Execution Cycle Observe Sense Predict Info. need and production Decide Strategy Communicate Information Act Effect Execution Cycle
8 Syntax of Observability ::= (CanSee )* (BelieveCanSee )* ::= ::= | ::= ( )* ::= ( ) ::= (DO ( )) ::= | ::=
9 Example Observability Rules (CanSee ca (location ?o ?x ?y) (location ca ?xc ?yc) (location ?o ?x ?y) (inradius ?x ?y ?xc ?yc rca) ) //The carrier can see the location property of an object. (CanSee ca (DO ?fi (shootwumpus ?w)) (play-role fighter ?fi) (location ca ?xc ?yc) (location ?fi ?x ?y) (adjacent ?xc ?yc ?x ?y) ) //The carrier can see the shootwumpus action of a fighter. (BelieveCanSee ca fi (location ?o ?x ?y) (location fi ?xi ?yi) (location ?o ?x ?y) (inradius ?x ?y ?xi ?yi rfi) ) //The carrier believes the fighter is able to see the location property of an object. (BelieveCanSee ca fi (DO ?f (shootwumpus ?w)) (play-role fighter ?f) ( ?f fi) (location ca ?xc ?yc) (location fi ?xi ?yi) (location ?f ?x ?y) (inradius ?xi ?yi ?xc ?yc rca) (inradius ?x ?y ?xc ?yc rca) (adjacent ?x ?y ?xi ?yi) ) //The carrier believes the fighter is able to see the shootwumpus action of another fighter.
10 Proactive Communication Based on Observation ProactiveTell –A provider reasons about what information it will have. –A provider reasons about whether to deliver a piece of information when having the information. ActiveAsk –A needer reasons about what information it will need. –A needer reasons about whether to ask for a piece of information when needing the information.
11 Evaluation 20 wumpuses, 8 pits, and 20 piles of gold per world. 1 carrier and 3 fighters compose a team. The team goal is to kill wumpuses and get the gold without being killed. 5 randomly generated worlds with 20×20 cells. Multi-Agent Wumpus World
12 Decision-Theoretic Proactive Communication Strategies Utility Function Cost Function Value Function Decision-Making
13 Decision-Making on Situation PA e e a-b: ProactiveTell a-b: Silence b-a: Accept b-a: Wait b-a: Silence e e b-a: ActiveAsk Situation PA: Provider produces a new piece of information a: provider b: needer e: end
14 DM on Situation PB 0 a-b: Reply e a-b: WaitUntilNext Situation PB: Provider receives a request for a piece of information e
15 DM on Situation NA b-a: ActiveAsk b-a: Silence b-a: Wait a-b: Reply a-b: WaitUntilNext a-b: Silence a-b: ProactiveTell Situation NA: Needer needs a piece of information t t e e e t: transfer
16 DM on Situation NB Situation NB: Needer receives a piece of information t 0 e b-a: Accept
17 Utility Function Parameters in utility function: –I: information about which communication occurs –t: time of decision-making –t 1 : time at which I is needed –t 2 : time at which the value for I used is produced –SU: situation at t –S: strategy available at SU –M: a set of messages involving in obtaining I –E: environment state at t U(I, t, t 1, t 2, SU, S, M, E) =V(I, t, t 1, t 2, SU, S)–C(M)
18 Value Function V(I, t, t 1, t 2, SU, S) =T(I, t, t 1, t 2, SU, S)//Timeliness +R(I, t, t 1, t 2, SU, S)//Relevance
19 Timeliness –Whether agents use a value that can be produced in time when they need I. d(I, t, t 1, t 2, SU, S) = max(0, t 2 –t 1 ) ft(d(I, t, t 1, t 2, SU, S)) s.t. ft(x) < ft(y) if y < x T(I, t, t 1, t 2, SU, S) = ft(d(I, t, t 1, t 2, SU, S)) Timeliness Function
20 Relevance Function Relevance –Unprocessed, Most recent, Important P(I, t, t 1, t 2, SU, S) = P r (I t t 1 t 2 no other value for I was produced between Int[t 1,t 2 ] | S SU) fr I (P(I, t, t 1, t 2, SU, S)) s.t. fr I (x) < fr I (y) if x < y R(I, t, t 1, t 2, SU, S) = fr I (P(I, t, t 1, t 2, SU, S))
21 Cost Function 0 if M i = C(M i ) = k 1 + k 2 × len(M i ) otherwise
22 Expected Utility E(U) = Time Strategy t1t1 t2t2 P.ProactiveTell P.Silence +T P.Reply P.WaitUntilNext N.ActiveAsk if a Reply if a WaitUnitlNext N.Silence N.Wait if a ProactiveTell +T if a Silence N.Accept
23 Strategies t Current time Unknown Known Next production Last sent Last not sent Last need aware of Unfulfilled need Situation PA: Situation PA: provider produces I ProactiveTell? Silence?
24 Strategies t Current time Unknown Known Next production Last production Situation PB: Situation PB: provider receives a request for I Reply? WaitUntilNext?
25 Strategies t Current time Unknown Known Next production Last I received Most recent production Situation NA: Situation NA: needer needs I ActiveAsk? Wait? Silence?
26 Strategies Situation NB: Situation NB: needer receives I Accept
27 Summary Advantages of Approach: allows agents to make intelligent choices of communication policy based on: –frequencies: of needs, of sensing, of info. change –costs: of messages, plus penalities for delays in action, or acting with incorrect information
28 Criteria for Applicable Domains There are information needs among the team. Agents can communicate. There is uncertainty in the environment. –Stochastic properties of teamwork process. –Agents have incomplete/disjoint knowledge about the world. The team acts under critical time constraints, so proactive assistance becomes important.