CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Naïve-Bayes Classifiers Business Intelligence for Managers.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.
Classification Techniques: Decision Tree Learning
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Decision Tree Rong Jin. Determine Milage Per Gallon.
COMP 328: Midterm Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Induction of Decision Trees
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Lecture 5 (Classification with Decision Trees)
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Visual Recognition Tutorial
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Experimental Evaluation
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes April 3, 2012.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Bayesian Networks. Male brain wiring Female brain wiring.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Bayesian Networks Martin Bachler MLA - VO
Machine Learning Queens College Lecture 2: Decision Trees.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29 and 30– Decision Tree Learning; ID3;Entropy.
Classification Techniques: Bayesian Classification
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.
CS690L Data Mining: Classification
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Perceptrons Gary Cottrell. Cognitive Science Summer School 2 Perceptrons: A bit of history Frank Rosenblatt studied a simple version of a neural net called.
Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Classification Today: Basic Problem Decision Trees.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
CIS 335 CIS 335 Data Mining Classification Part I.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05 Prof. Pushpak Bhattacharyya Fundamentals of Information.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Oliver Schulte Machine Learning 726
ID3 Vlad Dumitriu.
Classification with Perceptrons Reading:
Data Mining Lecture 11.
Vincent Granville, Ph.D. Co-Founder, DSC
CSE P573 Applications of Artificial Intelligence Bayesian Learning
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
A task of induction to find patterns
A task of induction to find patterns
Presentation transcript:

CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012

Term Project Presentations Homework 6 is due Tuesday in class (hard copy) We need 4 teams to volunteer to make presentations on April 12 !! The other 5 teams will make presentations on April 17 (last day) Each presentation will be strictly limited to 15 minutes, with 3 minutes for discussion/questions. Make sure your slides/demos load immediately – we do not have time to wait for Google Docs exploration.

Supervised Learning (cont.) Decision Tree Learning (actually, classification learning) – return for further discussion We also consider techniques for evaluating supervised learning systems Perceptrons/Neural Nets Naïve Bayes Classifiers April 3, 5, 10: finish ML, introduce NLP /id3-c45.html#1. /id3-c45.html#1

ID3 and C4.5 Golfing Example: Attributes Decision: Play or Don’t Play

ID3 and C4.5 Golfing Example: Training Data Decision: Play or Don’t Play

Stock Market Example

Table of Entropy values

Review the Algorithm In the case of our golfing example, for the attribute Outlook we have Info(Outlook,T) = 5/14*I(2/5,3/5) + 4/14*I(4/4,0) + 5/14*I(3/5,2/5) = Consider the quantity Gain(X,T) defined as Gain(X,T) = Info(T) - Info(X,T) This represents the difference between the information needed to identify an element of T and the information needed to identify an element of T after the value of attribute X has been obtained, that is, this is the gain in information due to attribute X. In our golfing example, for the Outlook attribute the gain is: Gain(Outlook,T) = Info(T) - Info(Outlook,T) = = If we instead consider the attribute Windy, we find that Info(Windy,T) is and Gain(Windy,T) is Thus Outlook offers a greater informational gain than Windy.

C4.5 Extension Example 1 Notice that in this example two of the attributes have continuous ranges, Temperature and Humidity. ID3 does not directly deal with such cases. We can deal with the case of attributes with continuous ranges as follows: Say that attribute Ci has a continuous range. We examine the values for this attribute in the training set. Say they are, in increasing order, A1, A2,.., Am. Then for each value Aj, j=1,2,..m, we partition the records into those that have Ci values up to and including Aj, and those that have values greater than Aj. For each of these partitions we compute the gain, or gain ratio, and choose the partition that maximizes the gain. This makes Ci a Boolean (or binary) attribute. In our Golfing example, for humidity, if T is the training set, we determine the information for each partition and find the best partition at 75. Then the range for this attribute becomes { 75}. Notice that this method involves a substantial number of computations.

C4.5 Extension Example 2

ID3 and C4.5 ID3 algorithm (we learned last time) is important not because it summarizes what we know, i.e. the training set, but because we hope it will classify correctly new cases. Thus when building classification models one should have both training data to build the model and test data to verify how well it actually works. C4.5 is an extension of ID3 that accounts for unavailable values, continuous attribute value ranges, pruning of decision trees, rule derivation, and so on. C4.5

Perceptrons and Neural Networks: Another Supervised Learning Approach

Perceptron Learning (Supervised) Assign random weights (or set all to 0) Cycle through input data until change < target Let α be the “learning coefficient” For each input: – If perceptron gives correct answer, do nothing – If perceptron says yes when answer should be no, decrease the weights on all units that “fired” by α – If perceptron says no when answer should be yes, increase the weights on all units that “fired” by α

Naive Bayes Classifiers: Our next example of machine learning A supervised learning method Making independence assumption, we can explore a simple subset of Bayesian nets, such that: It is easy to estimate the CPT’s from sample data Uses a technique called “maximum likelihood estimation” – Given a set of correctly classified representative examples – Q: What estimates of conditional probabilities maximize the likelihood of the data that was observed? – A: The estimates that reflect the sample proportions

# Juniors # Non-Juniors were Juniors and were Non-Juniors

Naive Bayes Classifier with multi-valued variables Major: Science, Arts, Social Science Student characteristics: Gender (M,F), Race/Ethnicity (W, B, H, A) International (T/F) What do the conditional probability tables look like??

Theoretical Foundation and Application to Text Classification - thanks Prof. Daphne Koller at Stanford