CS 2750: Machine Learning Review

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
A Tutorial on Learning with Bayesian Networks
Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Review: Bayesian learning and inference
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
© Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes April 3, 2012.
Bayesian networks Chapter 14. Outline Syntax Semantics.
A Brief Introduction to Graphical Models
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
1 CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes Raymond J. Mooney University of Texas at Austin.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
Introduction to Bayesian Networks
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Lecture 2: Statistical learning primer for biologists
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Conditional Probability, Bayes’ Theorem, and Belief Networks CISC 2315 Discrete Structures Spring2010 Professor William G. Tanner, Jr.
Belief Networks CS121 – Winter Other Names Bayesian networks Probabilistic networks Causal networks.
PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
1 Chapter 6 Bayesian Learning lecture slides of Raymond J. Mooney, University of Texas at Austin.
A Brief Introduction to Bayesian networks
CS 2750: Machine Learning Directed Graphical Models
Probability Theory and Parameter Estimation I
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Web-Mining Agents Part: Data Mining
CS 2750: Machine Learning Density Estimation
Computer Science Department
Read R&N Ch Next lecture: Read R&N
CS 2750: Machine Learning Probability Review Density Estimation
Conditional Probability, Bayes’ Theorem, and Belief Networks
Learning Bayesian Network Models from Data
CS 2750: Machine Learning Expectation Maximization
Bayesian Networks Probability In AI.
CSCI 121 Special Topics: Bayesian Networks Lecture #2: Bayes Nets
Read R&N Ch Next lecture: Read R&N
CSCI 5822 Probabilistic Models of Human and Machine Learning
Bayesian Statistics and Belief Networks
Hankz Hankui Zhuo Bayesian Networks Hankz Hankui Zhuo
Belief Networks CS121 – Winter 2003 Belief Networks.
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
Presentation transcript:

CS 2750: Machine Learning Review Changsheng Liu University of Pittsburgh April 4, 2016

Plan for today Review some questions from HW 3 Density Estimation Mixture of Gaussian Naïve Bayesian

HW 3 Please see whiteboard

Density Estimation Maximum Likelihood Maximum a posteriori estimation

Density Estimation A set of random variables X ={X1,X2,…Xd} A model of distribution over variables in X with Parameters Θ : P(X|Θ) Data D={D1,D2,…Dn} Objective: Find parameter Θ that P(X|Θ) fits data D the best

Density Estimation Maximum likelihood Maximize P(D| Θ ,ξ) Maximum a posteriori probability(MAP) A model of distribution over variables in X with Parameters Θ : P(Θ|D, ξ)

A coin example A biased coin, with the probability of a head θ Data HHTTHHTHTHTTTHTHHHHTHHHHT Heads 15 Tails:10 What is a good estimate of θ? Slide from Milos

Maximum likelihood Use the frequency of occurrences 15/25 This is the maximum likelihood estimate The likelihood of the data Maximum likelihood Slide from Milos

Maximum likelihood Slide from Milos

Maximum a posteriori estimate Slide from Milos

Maximum a posteriori estimate Choose from the same family for convienence Slide from Milos

Maximum a posteriori estimate Slide from Bishop

Prior ∙ Likelihood = Posterior Slide from Bishop

The Gaussian Distribution Slide from Bishop

The Gaussian Distribution Diagonal covariance matrix Covariance matrix proportional to the identity matrix Slide from Bishop

Mixtures of Gaussians (1) Old Faithful data set Single Gaussian Mixture of two Gaussians Slide from Bishop

Mixtures of Gaussians (2) Combine simple models into a complex model: K=3 Component Mixing coefficient Slide from Bishop

Mixtures of Gaussians (3) Slide from Bishop

Bayesian Networks Directed Acyclic Graph (DAG) Nodes are random variables Edges indicate causal influences Burglary Earthquake Alarm JohnCalls MaryCalls Slide credit: Ray Mooney

Conditional Probability Tables Each node has a conditional probability table (CPT) that gives the probability of each of its values given every possible combination of values for its parents (conditioning case). Roots (sources) of the DAG that have no parents are given prior probabilities. P(B) .001 P(E) .002 Burglary Earthquake B E P(A) T .95 F .94 .29 .001 Alarm A P(J) T .90 F .05 A P(M) T .70 F .01 JohnCalls MaryCalls Slide credit: Ray Mooney

Conditional Independence a is independent of b given c Equivalently Notation Slide from Bishop

Conditionally independent via D-separation D-separation in the graph Let X,Y and Z be three sets of nodes If X and Y are d-separated by Z then X and Y are conditionally independent give Z D-separation A is d-separated from B give C if every undirected path between them is blocked with C Slide from Milos

D-separation Slide from Milos

Exercise Slide from Milos

Naïve Bayes as a Bayes Net Naïve Bayes is a simple Bayes Net Y … X1 X2 Xn Priors P(Y) and conditionals P(Xi|Y) for Naïve Bayes provide CPTs for the network. Slide credit: Ray Mooney