Final Exam: May 10 Thursday. If event E occurs, then the probability that event H will occur is p ( H | E ) IF E ( evidence ) is true THEN H ( hypothesis.

Slides:



Advertisements
Similar presentations
Learning from Observations Chapter 18 Section 1 – 3.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
1 Inferences with Uncertainty Decision Support Systems and Intelligent Systems, Efraim Turban and Jay E. Aronson Copyright 1998, Prentice Hall, Upper Saddle.
Lecture 4 Fuzzy expert systems: Fuzzy logic
Uncertainty in Expert Systems (Certainty Factors).
Decision Making Under Uncertainty CSE 495 Resources: –Russell and Norwick’s book.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Chapter 14.7 Russell & Norvig. Fuzzy Sets  Rules of thumb frequently stated in “fuzzy” linguistic terms. John is tall. If someone is tall and well-built.
B. Ross Cosc 4f79 1 Uncertainty Knowledge can have uncertainty associated with it - Knowledge base: rule premises, rule conclusions - User input: uncertain,
Fuzzy Expert System Fuzzy Logic
Fuzzy Expert System. Basic Notions 1.Fuzzy Sets 2.Fuzzy representation in computer 3.Linguistic variables and hedges 4.Operations of fuzzy sets 5.Fuzzy.
Fuzzy Expert Systems. Lecture Outline What is fuzzy thinking? What is fuzzy thinking? Fuzzy sets Fuzzy sets Linguistic variables and hedges Linguistic.
5/17/20151 Probabilistic Reasoning CIS 479/579 Bruce R. Maxim UM-Dearborn.
Decision Tree Algorithm
Fuzzy Expert System.
Induction of Decision Trees
Lecture 5 (Classification with Decision Trees)
Chapter 18 Fuzzy Reasoning.
LEARNING DECISION TREES
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
Learning decision trees
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
ICS 273A Intro Machine Learning
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
9/3/2015Intelligent Systems and Soft Computing1 Lecture 4 Fuzzy expert systems: Fuzzy logic Introduction, or what is fuzzy thinking? Introduction, or what.
Machine Learning CPS4801. Research Day Keynote Speaker o Tuesday 9:30-11:00 STEM Lecture Hall (2 nd floor) o Meet-and-Greet 11:30 STEM 512 Faculty Presentation.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
Learning from Observations Chapter 18 Through
CHAPTER 18 SECTION 1 – 3 Learning from Observations.
Learning from Observations Chapter 18 Section 1 – 3, 5-8 (presentation TBC)
Learning from Observations Chapter 18 Section 1 – 3.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30 Uncertainty, Fuizziness.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Logical Systems and Knowledge Representation Fuzzy Logical Systems 1.
Lógica difusa  Bayesian updating and certainty theory are techniques for handling the uncertainty that arises, or is assumed to arise, from statistical.
Uncertainty in Expert Systems
ID3 Algorithm Michael Crawford.
“Principles of Soft Computing, 2 nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved. CHAPTER 12 FUZZY.
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
PROBABILITY, PROBABILITY RULES, AND CONDITIONAL PROBABILITY
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
AI Fuzzy Systems. History, State of the Art, and Future Development Sde Seminal Paper “Fuzzy Logic” by Prof. Lotfi Zadeh, Faculty in Electrical.
Fuzzy Expert System n Introduction n Fuzzy sets n Linguistic variables and hedges n Operations of fuzzy sets n Fuzzy rules n Summary.
Classification Today: Basic Problem Decision Trees.
Chapter 18 Section 1 – 3 Learning from Observations.
Learning From Observations Inductive Learning Decision Trees Ensembles.
Reasoning with Uncertainty Dr Nicholas Gibbins 32/3019.
Fuzzy Relations( 關係 ), Fuzzy Graphs( 圖 形 ), and Fuzzy Arithmetic( 運算 ) Chapter 4.
Decision Tree Learning CMPT 463. Reminders Homework 7 is due on Tuesday, May 10 Projects are due on Tuesday, May 10 o Moodle submission: readme.doc and.
Lecture 4 Fuzzy expert systems: Fuzzy logic n Introduction, or what is fuzzy thinking? n Fuzzy sets n Linguistic variables and hedges n Operations of fuzzy.
Machine Learning Reading: Chapter Classification Learning Input: a set of attributes and values Output: discrete valued function Learning a continuous.
REASONING UNDER UNCERTAINTY: CERTAINTY THEORY
Learning from Observations
Learning from Observations
Introduction to Fuzzy Logic and Fuzzy Systems
Artificial Intelligence CIS 342
Introduce to machine learning
Presented By S.Yamuna AP/CSE
Learning from Observations
Entropy CSCI284/162 Spring 2009 GWU.
Learning from Observations
Decision trees One possible representation for hypotheses
Machine Learning: Decision Tree Learning
© Negnevitsky, Pearson Education, Lecture 4 Fuzzy expert systems: Fuzzy logic Introduction, or what is fuzzy thinking? Introduction, or what is.
Sets, Combinatorics, Probability, and Number Theory
Presentation transcript:

Final Exam: May 10 Thursday

If event E occurs, then the probability that event H will occur is p ( H | E ) IF E ( evidence ) is true THEN H ( hypothesis ) is true with probability p Bayesian reasoning

Bayesian reasoning Example: Cancer and Test P(C) = 0.01 P(¬C) = 0.99 P(+|C) = 0.9 P(-|C) = 0.1 P(+|¬C) = 0.2P(-|¬C) = 0.8 P(C|+) = ?

Expand the Bayesian rule to work with multiple hypotheses ( H 1... H m ) and evidences ( E 1... E n ) Assuming conditional independence among evidences E 1... E n Bayesian reasoning with multiple hypotheses and evidences

Expert data: Bayesian reasoning Example

user observes E 3 E 1 E 2

Bayesian reasoning Example expert system computes posterior probabilities user observes E 2

Propagation of CFs For a single antecedent rule: cf(E) is the certainty factor of the evidence. cf(R) is the certainty factor of the rule.

Single antecedent rule example IF patient has toothache THEN problem is cavity {cf 0.3} Patient has toothache {cf 0.9} What is the cf(cavity, toothache)?

Propagation of CFs (multiple antecedents) For conjunctive rules: IF AND... AND THEN {cf} For two evidences E1 and E2: cf(E1 AND E2) = min(cf(E1), cf(E2))

Propagation of CFs (multiple antecedents) For disjunctive rules: IF OR... OR THEN {cf} For two evidences E1 and E2: cf(E1 OR E2) = max(cf(E1), cf(E2))

Exercise IF (P1 AND P2) OR P3 THEN C1 (0.7) AND C2 (0.3) Assume cf(P1) = 0.6, cf(P2) = 0.4, cf(P3) = 0.2 What is cf(C1), cf(C2)?

Defining fuzzy sets with fit-vectors A can be defined as: So, for example: Tall men = (0/180, 1/190) Short men=(1/160, 0/170) Average men=(0/165,1/175,0/185)

What about linguistic values with qualifiers ? e.g. very tall, extremely short, etc. Hedges are qualifying terms that modify the shape of fuzzy sets e.g. very, somewhat, quite, slightly, extremely, etc. Qualifiers & Hedges

Representing Hedges

Crisp Set Operations

Complement To what degree do elements not belong to this set? tall men = {0/180, 0.25/182, 0.5/185, 0.75/187, 1/190}; Not tall men = {1/180, 0.75/182, 0.5/185, 0.25/187, 1/190}; Fuzzy Set Operations  ¬ A ( x ) = 1 –  A ( x )

Containment Which sets belong to other sets? tall men = {0/180, 0.25/182, 0.5/185, 0.75/187, 1/190}; very tall men = {0/180, 0.06/182, 0.25/185, 0.56/187, 1/190}; Fuzzy Set Operations Each element of the fuzzy subset has smaller membership than in the containing set

Intersection To what degree is the element in both sets? Fuzzy Set Operations  A ∩ B ( x ) = min [  A ( x ),  B ( x ) ]

tall men = {0/165, 0/175, 0/180, 0.25/182, 0.5/185, 1/190}; average men = {0/165, 1/175, 0.5/180, 0.25/182, 0/185, 0/190}; tall men ∩ average men = {0/165, 0/175, 0/180, 0.25/182, 0/185, 0/190}; or tall men ∩ average men = {0/180, 0.25/182, 0/185};  A ∩ B ( x ) = min [  A ( x ),  B ( x ) ]

Union To what degree is the element in either or both sets? Fuzzy Set Operations  A  B ( x ) = max [  A ( x ),  B ( x ) ]

tall men = {0/165, 0/175, 0/180, 0.25/182, 0.5/185, 1/190}; average men = {0/165, 1/175, 0.5/180, 0.25/182, 0/185, 0/190}; tall men  average men = {0/165, 1/175, 0.5/180, 0.25/182, 0.5/185, 1/190};  A  B ( x ) = max [  A ( x ),  B ( x ) ]

25 Choosing the Best Attribute: Binary Classification Want a formal measure that returns a maximum value when attribute makes a perfect split and minimum when it makes no distinction Information theory (Shannon and Weaver 49) Entropy: a measure of uncertainty of a random variable A coin that always comes up heads --> 0 A flip of a fair coin (Heads or tails) --> 1(bit) The roll of a fair four-sided die --> 2(bit) Information gain: the expected reduction in entropy caused by partitioning the examples according to this attribute

26 Formula for Entropy Examples: Suppose we have a collection of 10 examples, 5 positive, 5 negative: H(1/2,1/2) = -1/2log 2 1/2 -1/2log 2 1/2 = 1 bit Suppose we have a collection of 100 examples, 1 positive and 99 negative: H(1/100,99/100) = -.01log log 2.99 =.08 bits

Information gain Information gain (from attribute test) = difference between the original information requirement and new requirement Information Gain (IG) or reduction in entropy from the attribute test: Choose the attribute with the largest IG

Information gain For the training set, p = n = 6, I(6/12, 6/12) = 1 bit Consider the attributes Patrons and Type (and others too): Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root

Example contd. Decision tree learned from the 12 examples: Substantially simpler than “true”

Perceptrons X = x 1 w 1 + x 2 w 2 Y = Y step

Perceptrons How does a perceptron learn? A perceptron has initial (often random) weights typically in the range [-0.5, 0.5] Apply an established training dataset Calculate the error as expected output minus actual output : error e = Y expected – Y actual Adjust the weights to reduce the error

Perceptrons How do we adjust a perceptron’s weights to produce Y expected ? If e is positive, we need to increase Y actual (and vice versa) Use this formula:, where and α is the learning rate (between 0 and 1) e is the calculated error

Perceptron Example – AND Train a perceptron to recognize logical AND Use threshold Θ = 0.2 and learning rate α = 0.1

Perceptron Example – AND Train a perceptron to recognize logical AND Use threshold Θ = 0.2 and learning rate α = 0.1

Perceptron Example – AND Repeat until convergence i.e. final weights do not change and no error Use threshold Θ = 0.2 and learning rate α = 0.1