Smart Home Technologies Decision Making. Motivation Intelligent Environments are aimed at improving the inhabitants’ experience and task performance Provide.

Slides:

Advertisements

Similar presentations

Dialogue Policy Optimisation

Advertisements

Making Simple Decisions

Markov Decision Process

Bayesian Network and Influence Diagram A Guide to Construction And Analysis.

Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220: Reasoning and Decision under Uncertainty L09: Graphical Models for Decision Problems Nevin.

Planning under Uncertainty

POMDPs: Partially Observable Markov Decision Processes Advanced AI

CSE 471/598, CBS 598 Intelligent Agents TIP We’re intelligent agents, aren’t we? Fall 2004.

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

Reinforcement Learning

Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC421 – Fall 2005.

Distributed Reinforcement Learning for a Traffic Engineering Application Mark D. Pendrith DaimlerChrysler Research & Technology Center Presented by: Christina.

CSE 471/598 Intelligent Agents TIP We’re intelligent agents, aren’t we? Spring 2004.

Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

Robotics for Intelligent Environments

Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC421 – Fall 2003 material from Jean-Claude Latombe, and Daphne Koller.

Dynamic Bayesian Networks CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanningLearning.

Making Decisions CSE 592 Winter 2003 Henry Kautz.

Smart Home Technologies CSE 4392 / CSE 5392 Spring 2006 Manfred Huber

8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Smart Home Technologies Decision Making. Motivation Intelligent Environments are aimed at improving the inhabitants’ experience and task performance Provide.

Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.

Chapter 8 Prediction Algorithms for Smart Environments

MAKING COMPLEX DEClSlONS

1 AI and Agents CS 171/271 (Chapters 1 and 2) Some text and images in these slides were drawn from Russel & Norvig’s published material.

Chapter 11: Artificial Intelligence

Reinforcement Learning

REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.

CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)

K. J. O’Hara AMRS: Behavior Recognition and Opponent Modeling Oct Behavior Recognition and Opponent Modeling in Autonomous Multi-Robot Systems.

1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.

CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.

Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.

Neural Networks Chapter 7

MDPs (cont) & Reinforcement Learning

Distributed Models for Decision Support Jose Cuena & Sascha Ossowski Pesented by: Gal Moshitch & Rica Gonen.

Rational Agency CSMC Introduction to Artificial Intelligence January 8, 2007.

Bayesian networks and their application in circuit reliability estimation Erin Taylor.

Rational Agency CSMC Introduction to Artificial Intelligence January 8, 2004.

REU 2007 Computer Science and Engineering Department The University of Texas at Arlington Research Experiences for Undergraduates in Information Processing.

RULES Patty Nordstrom Hien Nguyen. "Cognitive Skills are Realized by Production Rules"

Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

CSE 471/598 Intelligent Agents TIP We’re intelligent agents, aren’t we?

1 An infrastructure for context-awareness based on first order logic 송지수 ISI LAB.

Computer Science and Engineering Department The University of Texas at Arlington MavHome: An Intelligent Home Environment.

Intelligent Agents Chapter 2. How do you design an intelligent agent? Definition: An intelligent agent perceives its environment via sensors and acts.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

Intelligent Agents (Ch. 2)

Some tools and a discussion.

Chapter 11: Artificial Intelligence

Chapter 11: Artificial Intelligence

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Intelligent Agents (Ch. 2)

Markov ó Kalman Filter Localization

Announcements Homework 3 due today (grace period through Friday)

Dr. Unnikrishnan P.C. Professor, EEE

13. Acting under Uncertainty Wolfram Burgard and Bernhard Nebel

CS 416 Artificial Intelligence

Reinforcement Learning Dealing with Partial Observability

CS 416 Artificial Intelligence

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

Smart Home Technologies Decision Making

Motivation Intelligent Environments are aimed at improving the inhabitants’ experience and task performance Provide appropriate information Automate functions in the home Prediction techniques can only determine what would happen next, not what should happen next. Automated functions can be different from inhabitant actions Computer has to determine actions that would optimize inhabitant experience

Decision Making Decision Making attempts to determine the actions the system should take in the current situation Should a function be automated ? What should be done next ? Decisions should be based on the current context and the requirements of the inhabitants Just programmed timers for automation are not sufficient Decision maker has to take into account the stream of data

Decision Making in Intelligent Environments Example Decision Making Tasks in Intelligent Environments: Automation of physical devices Turn on lights Regulate heating and air conditioning Control media devices Automate lawn sprinklers Automate robotic components (vacuum cleaner, etc) Control of information devices Provide recipe services in the kitchen Construct shopping lists Decide which types of alarms to display (and where)

Decision Making in Intelligent Environments Objectives of decision making: Optimize inhabitant productivity Minimize operating costs Maximize inhabitant comfort Decision making process has to be safe Decisions made can never endanger inhabitants or cause damage Decisions should be within the range accepted by the inhabitants

Example Task Should a light be turned on ? Decision Factors: Inhabitant’s location (current and future) Inhabitant’s task Inhabitant’s preferences Time of the day Other inhabitants Energy efficiency Security Possible Decisions Turn on Do not automate

Decision Making Approaches Pre-programmed decisions Timer-based automation Reactive decision making systems Decisions are based on condition-action rules Decisions are driven by the available facts Goal-based decision making systems Decisions are made in order to achieve a particular outcome Utility-based decision making systems Decisions are made in order to maximize a given performance measure

Reactive Decision Making

Goal-Based Decision Making

Utility-Based Decision Making

Qualities of a Decision Making Ideal Complete: always makes a decision Correct: decision is always right Natural: knowledge easily expressed Efficient Rational Decisions made to maximize performance

Decision-Making Techniques Reactive Decision Making Rule-based expert system Goal-Based Decision Making Planning Decision theoretic Decision Making Belief Networks Markov decision process Learning Techniques Neural Networks Reinforcement Learning

Rule-Based Decision Making Decisions are made based on rules and facts Facts represent the state of the environment Represented as first-order predicate logic Condition-Action rules represent heuristic knowledge about what to do Rules represent implications that imply actions from logic sentences about facts Inference mechanism Deduction: {A, A  B}  B The left hand side of rules are matched against the set of facts Rules where the left hand side matches are active

Rule-Based Inference Rules define what actions should be executed for a given set of conditions (facts) Actions can either be external actions (“automation”) or internal updates of the set of facts (“state update”) Rules are often heuristics provided by an expert Multiple rules can be active at any given time Conflict resolution to decide which rule to fire Scheduling of active rules to perform sequence of actions

Example Facts: CurrentTime = 6:30 Location(CurrentTime,bedroom) CurrentDay = Monday Rules: Internal actions: (CurrentDay=Monday)^(CurrentTime>6:00) ^(CurrentTime<7:00)^(Location(CurrentTime,bedroom)) ->Set(Location(NextTime,bathroom)) External actions: (Location(NextTime,X)) -> Action(TurnOnLight,X)

Rule-Based Expert Systems Intended to simulate (and automate) human reasoning process Domain is modeled in first-order logic State is represented by a set of facts Internal rules model behavior of the environment Experts provide sets of heuristic condition-action rules Rules with internal actions can model reasoning process Rules with external actions indicate decisions the expert would make The system can optionally be provided with queries by including them in the facts set.

Internal Rules Internal rules have to model the behavior of the system Persistence over time E.g.: (Location(CurrentTime,X))^(NoMove(CurrentTime)) -> Set(Location(NextTime,X)) Dynamic behavior of devices E.g.: (Temperature(CurrentTime,X))^(HeatingOn) -> Set(Temperature(NextTime,X+2)) Behavior of the inhabitants E.g.: (Location(CurrentTime,bedroom)) ^(CurrentTime>23:00) ^(LightOn(CurrentTime, bedroom)) -> Action(TurnOffLight, bedroom)

Rule-Based Expert Systems WORKING MEMORY (Facts) RULE BASE EXECUTION ENGINE INFERENCE ENGINE PATTERN MATCHER AGENDA Rule-Based Expert System Architecture

Logic Inference Systems and Expert System Shells Logic programming systems provide inference capabilities. Examples: Prolog OTTER Expert system shells provide the infrastructure to build complete expert systems Examples: CLIPS (for C) JESS (for Java)

Example System: IRoom [Kul02] Initial versions of the MIT IRoom project used JESS as an inference engine to make decisions about activating devices For example: If a person enters the room and the room is empty then turn on the light Rules are programmed by the system designer before the room is used and then refined based on experience [Kul02] Ajay Kulkarni. Design Principles of a Reactive Behavioral System for the Intelligent Room

Rule-Based Decision Making Characteristics Complete and correct (given complete rules) Natural (given expert specified rules) Advantages Permits the system to be programmed relatively efficiently by an expert Can address relatively complex systems Problems Quality of the rules is essential Behavior of the environment mimics the expert Anticipating all possible contexts is difficult

Planning Decisions A planning system searches for a sequence of actions which can achieve a defined goal. States can be represented as logic sequences Actions are defined as operators (symbolic representations of the effect and conditions of actions) which contain: Preconditions of actions Effects of actions A goal is a set of states Planning system uses constraints to efficiently search for a sequence of operators that lead from the start state to a goal state.

Example Initial State : (Location(bedroom))^(Light(bathroom,off)) Goal: Happy(Inhabitant) Action 1: MakeHappy Precondition: (Location(X))^(Light(X,on)) Effect: Add: Happy(Inhabitant) Action 2: TurnOnLight(X) Precondition: Light(X,off) Effect: Delete: Light(X,off), Add: Light(X,on) Action 3: Move(X, Y) Precondition: (Location(X))^(Light(Y,on)) Effect: Delete: Location(X), Add: Location(Y) Plan: Action 2, Action 3, Action 1

Example Start Location(bedroom)Light(bathroom,off) Finish Happy(Inhabitant) MoveTo Location(bedroom) Location(bathroom) Light(bathroom,on) TurnOnLight Light(bathroom,off) Light(bathroom,on) MakeHappy Location(bathroom) Happy(Inhabitant) Light(bathroom,on)

Example Planning Systems Partial Order Planners Derive plans without requiring to find actions in sequence SNLP (Univ. of Washington) GraphPlan (CMU) Builds and prunes graph of possible plans Conditional Planners Derive plans under uncertainty by constructing plans that work under given conditions UCPOP (Univ. of Washington) Partial Order Planner with Universal quanitification and Conditional effects CPOP Sensory GraphPlan (CMU)

Planning Decisions Characteristics Complete and correct (given complete rules) Relatively natural formulation Advantages Permits a sequence of actions to be found that performs a given task Goals can be defined easily Problems Requires complete description of the system Uncertainty is difficult to handle Planning is generally very complex

Decision Theory Decision theory addresses rational decision making under uncertainty Uncertainty is represented using probabilities Uncertainty due to incomplete observability Uncertainty due to nondeterministic action outcomes Uncertainty due to nondeterministic system behavior Utility theory is used to achieve rational decisions Utility is a measure of the expected “value” of a given situation or decision Rational decisions are the ones that yield the highest expected utility in the current situation

Modeling Uncertainty The current situation can be represented as a Belief state, i.e. as a probability distribution over the states indicating the likelihood that any given state x i is the current state {(x 1, P(x 1 )), (x 2, P(x 2 )),…, (x n, P(x n ))} The probability of a state can be expressed as the probability of all state attributes P(x)=P(a 1,a 2,…,a n ) Uncertainties from incomplete observability and nondeterminism can be modeled as conditional probabilities State transition model: Observation model: P(o | x)

Bayes Rule Bayes rule permits to invert cause and effect when calculating probabilities It is often easier to estimate P(e | c) Probability of a state given a set of sensor readings, P(x | o), can be calculated knowing the observation probabilities P(o | x)

Utility Theory Utilities U(A) represent the “value” of a given situation or decision A and model preferences The utility function for a particular system is not unique Only relative differences between utility values are important U(A) > U(B)  A preferred to B U(A) = U(B)  agent indifferent to A and B Utilities for uncertain situations can be calculated as the expected value of the utility of all possibilities U({(x 1,P(x 1 )),…,(x n,P(x n ))) =  i P(x i )* U(x i )

Rational Decisions The rational decision is the one that leads to the highest utility Rational decisions in Decision theory requires Complete causal model of the environment P(x i | x j, d) Complete knowledge of the observation (sensor) model P(o | x i ) Knowledge of the Utility function for all states U(x i )

Decision Networks Decision Networks combine Bayesian Networks with decision theory Bayesian network represents probabilistic model of the current and the state resulting from a given decision in terms of attributes Chance nodes represent attributes Connections represent conditional effects Additional nodes introduce decisions and utilities Decision node represent possible decisions Utility node calculates the utility of the decision

Decision Network Example Rain forecast Neighbor watering Rain Lawn wet Cloudy Sprinklers Utility Lawn growth Chance NodeDecision NodeUtility Node

Decision Networks To determine rational decisions the network has to be evaluated and utilities computed Set evidence variables according to current state For each action value of decision node Set value of decision node to action Use belief-net inference to calculate posterior probabilities for parents of utility node Calculate utility for action Return action with highest utility

Decision Network Evaluation Evaluation of the network involves computing the probabilities for all the chance nodes Connections between nodes indicate conditional dependence P(a i | Parents(a i )) Values of chance nodes can be computed from the values of the parent chance nodes Connections to Utility node represent the influence the given attribute has on the utility of the resulting state

Decision Networks Characteristics Complete and Correct (given complete network) Advantages Takes into account uncertainty Makes optimal decisions Relatively compact representation Problems Requires complete probabilistic description of the system Requires design of the utility function for all states

Markov Decision Processes Markov Decision Processes (MDPs) form a probabilistic model of all possible system behavior MDPs can be described by a tuple representing states, actions, transition probabilities, and reinforcements. System has to obey the Markov assumption P(x t+1 |x t, d t, x t-1, d t-1, …, x 0 ) = P(x t+1 | x t, d t ) Reinforcement represents the instantaneous change in utility obtained in a given state Models costs and payoffs Are generally sparse and delayed

Utility Function for MDPs In an MDP, the utility of a state under a given policy  can be defined as the expected sum of discounted reinforcements The optimal utility function U * can be computed using Value iteration Optimal policy (decision strategy) can be extracted from the utility function

MDP Example S = {(1,1), (1,2), … (4, 3)} A = { , , ,  } T: P(intended direction) = 0.8, P(right angle to intended) = 0.1 R: +1 at goal, -1 at trap, 0.04 in all other states  = 1

MDP Example Optimal PolicyOptimal Utilities

Markov Decision Processes Characteristics Complete and Correct Advantages Takes into account transition uncertainty Makes optimal decisions Automatically calculates the utility function Problems Requires complete probabilistic description of the system Requires complete observability of the state

Partially Observable MDPs Partially Observable Markov Decision Processes (POMDPs) extend MDP by permitting states to be only partially observable. Systems can be represented by a tuple where is an MDP and O, V are mapping observations about the state to probabilities of a given state O = {o i } is the set of observations V: V(x, o) = P(o | x) To determine an optimal policy, an optimal utility function for the belief states has to be computed

POMDPs Characteristics Complete and Correct Advantages Takes into account all uncertainty Makes optimal decisions Problems Requires complete probabilistic description of the system Optimal solution is so far intractable (dynamic decision networks and approximation techniques exist and work for small state spaces)

Learning Decisions Learning techniques permit decisions to be learned from past experience and feedback from the inhabitants or the environment. Supervised learning Requires the desired decision to be specified during training Reinforcement learning Learns by experimentation from scalar reward feedback Inhabitant feedback (e.g. device interactions) Explicit environment feedback (e.g. energy consumption) Implicit feedback (e.g. prediction of comfort of inhabitant)

Feedforward Neural Networks Neural networks are a supervised learning mechanism that can be trained to make decisions based on a set of training examples. Training for reactive decisions involves the presentation of a set of examples (x i, d(x i )),where d(x i ) is the desired decision to be made in configuration x i. Training for goal-based or utility-based decisions involves learning a model that maps input (x i, d(x i )) to the outcome of the action f(x i, d(x i )) and then selecting the decision with the best outcome.

Example System: Regulation in the Adaptive House [DLRM94] Neural network learns to regulate the lights in the house to maintain a given light intensity. 1. Learns a network that predicts the light intensity if a given set of lights are turned on Input: The current light device levels (7 inputs) The current light sensor levels (4 inputs) The new light device levels (7 inputs) Output: The new light sensor levels (4 outputs) [DLRM94] Dodier, R. H., Lukianow, D., Ries, J., & Mozer, M. C. (1994). A comparison of neural net and conventional techniques for lighting control. A comparison of neural net and conventional techniques for lighting control. Applied Mathematics and Computer Science, 4,

: : : : Example System: Regulation in the Adaptive House continued 2. Decisions are made by comparing the output of the network for all possible decisions (i.e. compinations of lights to be turned on) with the desired light intensity and taking the decision that most closely matches it. Decision: State x i Decision d Prediction f(x i, d) Set point p

Neural Networks Characteristics Efficient Advantages Can learn arbitrary decision functions from training data Generalizes to new situations Fast decision making Problems Requires training data that contains desired decision or a goal/objective Requires design of sufficient input representation

Reinforcement Learning Reinforcement learning learns an optimal decision strategy from trial and error and sparse reward feedback. On-line method to solve Markov Decision Processes (or, with extensions, POMDPs). Reward, R, is a signal encoding the instantaneous feedback to the system. System learns a mapping from states to decisions,  * (x i ), which optimizes the expected utility.

Q-Learning Q-learning is the most popular reinforcement learning technique for MDPs. Learns a utility function for state-action pairs Q(x, d) Utility U(x) = max a Q(x,d) Learns by experimentation. Update Q(x i,d) after each observed transition from state x i by comparing the expected utility of (x i,d) with the expectation computed after observing the actual outcome x j. Q(x i,d) = Q(x i,d) +  * (R(x i ) +  max d’ Q(x j,d’) - Q(x i,d)) Decisions are made to optimize Q-values  (x) = argmax d Q(x,d)

Example System: Regulation in the Adaptive House [Moz98] Neural network regulators can control lighting and heating to achieve a given set point Set point is learned using reinforcement Energy usage Inhabitant interactions with light switches or thermostats [Moz98] Mozer, M. C. The neural network house: An environment that adapts to its inhabitants. In Proc. AAAI Spring Symposium on Intelligent Environments (pp ). Menlo, Park, CA, 1998.The neural network house: An environment that adapts to its inhabitants.

Example System: MavHome Uses Q-learning on a state space including device status and the Active LeZi prediction. State s t at time t s t = (x t, p t ) Reinforcement includes multiple metrics Energy usage Number of inhabitant-device interactions Decisions are device interactions and an action representing the decision not to perform an action. System operates event-driven, making a decision every time an event happens. Learner is pre-trained using the Active LeZi predictor.

Example System: MavHome Example task: getting up in the morning and taking a shower.

Example System: MavHome Home learns to automate light activations such as to minimize energy usage without increasing the number of inhabitant interactions

Reinforcement Learning Characteristics Optimal policies (given enough training) Advantages Can learn optimal decision strategies without explicit training Can deal with multiple objectives Problems Trial and error learning can lead to spurious actions leading to potential safety issues Requires complete state space representations Can be very complex

Conclusions Decision making is an integral component of intelligent environments. Automates devices Determines information to inhabitants Different decision making approaches apply to different conditions based on the available information. Reactive / Goal-based / Utility-based Programmed / Learning Decision-making approaches can be “mixed”. Many open issues remain: How to deal with complexity of intelligent environments? (Hierarchical systems, multi-agent systems, etc) How to assure safety and acceptability of learning decision makers ?