CS 2750: Machine Learning Review

Slides:

Advertisements

Similar presentations

CS188: Computational Models of Human Behavior

Advertisements

Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.

A Tutorial on Learning with Bayesian Networks

Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.

For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.

1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.

Review: Bayesian learning and inference

Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.

Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.

Bayesian networks Chapter 14 Section 1 – 2.

Bayesian Belief Networks

A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.

Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.

Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?

. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

© Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors.

Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)

Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.

Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)

Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.

CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes April 3, 2012.

Bayesian networks Chapter 14. Outline Syntax Semantics.

A Brief Introduction to Graphical Models

An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati

CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.

1 CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes Raymond J. Mooney University of Texas at Austin.

Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.

For Wednesday Read Chapter 11, sections 1-2 Program 2 due.

Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.

Introduction to Bayesian Networks

An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)

Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.

Lecture 2: Statistical learning primer for biologists

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

Conditional Probability, Bayes’ Theorem, and Belief Networks CISC 2315 Discrete Structures Spring2010 Professor William G. Tanner, Jr.

Belief Networks CS121 – Winter Other Names Bayesian networks Probabilistic networks Causal networks.

PROBABILISTIC REASONING Heng Ji 04/05, 04/08, 2016.

Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.

CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.

CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.

1 Chapter 6 Bayesian Learning lecture slides of Raymond J. Mooney, University of Texas at Austin.

A Brief Introduction to Bayesian networks

CS 2750: Machine Learning Directed Graphical Models

Probability Theory and Parameter Estimation I

Bayesian networks Chapter 14 Section 1 – 2.

Presented By S.Yamuna AP/CSE

Qian Liu CSE spring University of Pennsylvania

Web-Mining Agents Part: Data Mining

CS 2750: Machine Learning Density Estimation

Computer Science Department

Read R&N Ch Next lecture: Read R&N

CS 2750: Machine Learning Probability Review Density Estimation

Conditional Probability, Bayes’ Theorem, and Belief Networks

Learning Bayesian Network Models from Data

CS 2750: Machine Learning Expectation Maximization

Bayesian Networks Probability In AI.

CSCI 121 Special Topics: Bayesian Networks Lecture #2: Bayes Nets

Read R&N Ch Next lecture: Read R&N

CSCI 5822 Probabilistic Models of Human and Machine Learning

Bayesian Statistics and Belief Networks

Hankz Hankui Zhuo Bayesian Networks Hankz Hankui Zhuo

Belief Networks CS121 – Winter 2003 Belief Networks.

Bayesian networks Chapter 14 Section 1 – 2.

Probabilistic Reasoning

Read R&N Ch Next lecture: Read R&N

Presentation transcript:

CS 2750: Machine Learning Review Changsheng Liu University of Pittsburgh April 4, 2016

Plan for today Review some questions from HW 3 Density Estimation Mixture of Gaussian Naïve Bayesian

HW 3 Please see whiteboard

Density Estimation Maximum Likelihood Maximum a posteriori estimation

Density Estimation A set of random variables X ={X1,X2,…Xd} A model of distribution over variables in X with Parameters Θ : P(X|Θ) Data D={D1,D2,…Dn} Objective: Find parameter Θ that P(X|Θ) fits data D the best

Density Estimation Maximum likelihood Maximize P(D| Θ ,ξ) Maximum a posteriori probability(MAP) A model of distribution over variables in X with Parameters Θ : P(Θ|D, ξ)

A coin example A biased coin, with the probability of a head θ Data HHTTHHTHTHTTTHTHHHHTHHHHT Heads 15 Tails:10 What is a good estimate of θ? Slide from Milos

Maximum likelihood Use the frequency of occurrences 15/25 This is the maximum likelihood estimate The likelihood of the data Maximum likelihood Slide from Milos

Maximum likelihood Slide from Milos

Maximum a posteriori estimate Slide from Milos

Maximum a posteriori estimate Choose from the same family for convienence Slide from Milos

Maximum a posteriori estimate Slide from Bishop

Prior ∙ Likelihood = Posterior Slide from Bishop

The Gaussian Distribution Slide from Bishop

The Gaussian Distribution Diagonal covariance matrix Covariance matrix proportional to the identity matrix Slide from Bishop

Mixtures of Gaussians (1) Old Faithful data set Single Gaussian Mixture of two Gaussians Slide from Bishop

Mixtures of Gaussians (2) Combine simple models into a complex model: K=3 Component Mixing coefficient Slide from Bishop

Mixtures of Gaussians (3) Slide from Bishop

Bayesian Networks Directed Acyclic Graph (DAG) Nodes are random variables Edges indicate causal influences Burglary Earthquake Alarm JohnCalls MaryCalls Slide credit: Ray Mooney

Conditional Probability Tables Each node has a conditional probability table (CPT) that gives the probability of each of its values given every possible combination of values for its parents (conditioning case). Roots (sources) of the DAG that have no parents are given prior probabilities. P(B) .001 P(E) .002 Burglary Earthquake B E P(A) T .95 F .94 .29 .001 Alarm A P(J) T .90 F .05 A P(M) T .70 F .01 JohnCalls MaryCalls Slide credit: Ray Mooney

Conditional Independence a is independent of b given c Equivalently Notation Slide from Bishop

Conditionally independent via D-separation D-separation in the graph Let X,Y and Z be three sets of nodes If X and Y are d-separated by Z then X and Y are conditionally independent give Z D-separation A is d-separated from B give C if every undirected path between them is blocked with C Slide from Milos

D-separation Slide from Milos

Exercise Slide from Milos

Naïve Bayes as a Bayes Net Naïve Bayes is a simple Bayes Net Y … X1 X2 Xn Priors P(Y) and conditionals P(Xi|Y) for Naïve Bayes provide CPTs for the network. Slide credit: Ray Mooney