Fuzzy Inference System Learning By Reinforcement Presented by Alp Sardağ.

Slides:

Advertisements

Similar presentations

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Advertisements

14 de Fevereiro de 2004, Instituto Sistemas e Robótica Emotion-Based Decision and Learning Bruno Damas.

Reinforcement Learning

Reinforcement learning

Slides from: Doug Gray, David Poole

Fuzzy Inference Systems. Review Fuzzy Models If then.

Introduction to Neural Networks Computing

RL for Large State Spaces: Value Function Approximation

Support Vector Machines

Optimal Policies for POMDP Presented by Alp Sardağ.

Basic Concepts of Fuzzy Logic Apparatus of fuzzy logic is built on: –Fuzzy sets: describe the value of variables –Linguistic variables: qualitatively and.

Fuzzy Logic E. Fuzzy Inference Engine. “antecedent” “consequent”

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

Fuzzy Expert System.

An Introduction to Machine Learning In the area of AI (earlier) machine learning took a back seat to Expert Systems Expert system development usually consists.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Learning From Data Chichang Jou Tamkang University.

Incorporating Advice into Agents that Learn from Reinforcement Presented by Alp Sardağ.

Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,

Fuzzy Q-Learning Integration to RoboSoccer Presented by Alp Sardağ.

WELCOME TO THE WORLD OF FUZZY SYSTEMS. DEFINITION Fuzzy logic is a superset of conventional (Boolean) logic that has been extended to handle the concept.

Part I: Classification and Bayesian Learning

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Neural Networks Lecture 8: Two simple learning algorithms

The Reinforcement Learning Toolbox – Reinforcement Learning in Optimal Control Tasks Gerhard Neumann Master Thesis 2005 Institute für Grundlagen der Informationsverarbeitung.

FAULT DIAGNOSIS OF THE DAMADICS BENCHMARK ACTUATOR USING NEURO-FUZZY SYSTEMS WITH LOCAL RECURRENT STRUCTURE FAULT DIAGNOSIS OF THE DAMADICS BENCHMARK ACTUATOR.

CpSc 810: Machine Learning Design a learning system.

1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.

Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Machine Learning.

Mobile Robot Navigation Using Fuzzy logic Controller

Neural-Network-Based Fuzzy Logical Control and Decision System 主講人虞台文.

Fuzzy Reinforcement Learning Agents By Ritesh Kanetkar Systems and Industrial Engineering Lab Presentation May 23, 2003.

Fuzzy Systems Michael J. Watts

Fuzzy Sets and Control. Fuzzy Logic The definition of Fuzzy logic is a form of multi-valued logic derived frommulti-valued logic fuzzy setfuzzy set theory.

Neural Networks Chapter 7

Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.

Learning Agents MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.

ECE 8443 – Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional Likelihood Mutual Information Estimation (CMLE) Maximum MI Estimation.

CITS7212: Computational Intelligence An Overview of Core CI Technologies Lyndon While.

John Lafferty Andrew McCallum Fernando Pereira

Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

A field of study that encompasses computational techniques for performing tasks that require intelligence when performed by humans. Simulation of human.

Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.

A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.

A PID Neural Network Controller

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.

Continuous Control with Prioritized Experience Replay

Fuzzy Systems Michael J. Watts

C.-S. Shieh, EC, KUAS, Taiwan

Fuzzy expert systems Fuzzy inference Mamdani fuzzy inference

Dr. Unnikrishnan P.C. Professor, EEE

EXPERT SYSTEMS.

Overview of Machine Learning

RL for Large State Spaces: Value Function Approximation

Chapter 2: Evaluative Feedback

Chapter 8: Generalization and Function Approximation

October 6, 2011 Dr. Itamar Arel College of Engineering

Deep Reinforcement Learning

Neuro-Computing Lecture 2 Single-Layer Perceptrons

Hybrid intelligent systems:

Chapter 2: Evaluative Feedback

Presentation transcript:

Fuzzy Inference System Learning By Reinforcement Presented by Alp Sardağ

A Comparison of Fuzzy & Classical Controllers  Fuzzy Controller: Expert systems based on if-then rules where premises and conclusions are expressed by means of linguistic terms.  Rules close to natural language  A priori knowledge  Classical Controller: Need analytical task model.

Design Problem of FC  A priori knowledge extraction is not easy:  Disagreement between experts  Great number of variables necessary to solve the control task

Self Tunning FIS  A direct teacher: based on input-output set of trainning data.  A distal teacher: does not give the correct actions, but the desired effect on the process.  A performance measure: EA  A critic: gives rewards and punishment with respect to state reached by the learner. RL methods.  There are no more than two fuzzy sets activated for an input value

Goal  To overcome the limitations of classical reinforcement learning methods, ”discrete state perception and discrete actions”. NOTE: In this paper MISO FIS is used.

A MIMO FIS FIS is made of N rules of the following form: R i : i th rule of the rule base S i :input variables L i j : linguistic term of input variable; its membership function  Lij Y N O :output variables O i j : linguistic term of output variable

Rule Preconditions  Membership functions are triangles and trapezoids (altough not differentiable).  because they are simple  Sufficient in a number of application  Strong fuzzy partition used:  All values activate at least one fuzzy set, the input universe is completely covered.

Strong Fuzzy Partition Example

Rule Conclusions  Each of i rule has No corresponding conclusions:  For Each Rule the truth value with respect to S is computed with: where T norm is implemented by a product:  The FIS outputs are

Learning  Number and positions of the input fuzzy labels being set using a priori knowledge.  Structural Learning: consists in tuning the number of rules.  FACL and FQL learning: are reinforcement learning methods that deal with only the conclusion part.

Reinforcement Learning NOTE: state observability is total.

Markovian Decision Problem  S a finite discrete state  U a finite discrete action  R primary reinforcements R:SxU  R  P transition probabilities P:SxUxS  [0,1].  State evaluation function:

The Curse of Dimensionality  Some form of generalization must be incorporated in state representation. Various function approximators used:  CMAC  Neural Networks  FIS: the state space encoding is based on a vector corresponding to the current state.

Adaptive Heuristic Critic  AHC is made of two components:  Adaptive Critic Element: Critic developed in an adaptive way from primary reinforcements, represent an evaluation function more informative than the one given by the environment through rewards and punishment (V(S) values).  Associative Search Element: selects actions which lead to better critic values

FACL Scheme

The Critic At time step t, the critic value is computed with conclusion vector: TD error is given by: TD-learning update rule:

The Actor  When the rule R i is activated, one of the R i local action is elected to participate in the global action, based on its quality. The global action triggered: where  -greedy is a function implementing mixed exploration-exploitation strategy.

Tunning vector w  TD error, the improvement measure except in the beginning is a good approximator of the optimal evaluation function. The actor learning rule:

Meta Learning Rule  Update strategie for learning rate:  Every parameter should have its learning rate. (  =1  n)  Every learning rate should be allowed to vary over time. (in order V values to converge)  When the derivative of a parameter have the same sign for several consecutive time steps, its learning rate should be increased.  When the parameter derivative sign alternates for several consecutive time steps, its learning rate should be decreased. Delta-Bar-Delta rule:

Execution Procedure 1. Estimation of evaluation function corresponding to the current state. 2. Computation of the TD error. 3. Tunning of parameter vector v and w. 4. Estimation of the new evaluation function for the current state with new conclusion vector v t Learning rate updating with Delta-Bar-Delta rule. 6. For each activated rule, election of the local action: computation and triggering of the global action U t+1.

Example

Example Cont.  The number of rules is twenty five.  For the sake of simplicity, the discerete actions available are the same for all rules.  The discerete action set:  The reinforcement function:

Results  Performance measure for distance:  Results: