Preliminaries: Independence

Slides:



Advertisements
Similar presentations
Nonnegative Matrix Factorization with Sparseness Constraints S. Race MA591R.
Advertisements

CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
Change Detection C. Stauffer and W.E.L. Grimson, “Learning patterns of activity using real time tracking,” IEEE Trans. On PAMI, 22(8): , Aug 2000.
Neural and Evolutionary Computing - Lecture 4 1 Random Search Algorithms. Simulated Annealing Motivation Simple Random Search Algorithms Simulated Annealing.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Jonathan Richard Shewchuk Reading Group Presention By David Cline
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Figure 1.1 The observer in the truck sees the ball move in a vertical path when thrown upward. (b) The Earth observer views the path of the ball as a parabola.
Implementation of Nonlinear Conjugate Gradient Method for MLP Matt Peterson ECE 539 December 10, 2001.
A Brief Introduction to Graphical Models
Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Neural Network Introduction Hung-yi Lee. Review: Supervised Learning Training: Pick the “best” Function f * Training Data Model Testing: Hypothesis Function.
Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.
Model representation Linear regression with one variable
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
Dan Boneh Symmetric Encryption History Crypto. Dan Boneh History David Kahn, “The code breakers” (1996)
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Mixture of Gaussians This is a probability distribution for random variables or N-D vectors such as… –intensity of an object in a gray scale image –color.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
L22 Numerical Methods part 2 Homework Review Alternate Equal Interval Golden Section Summary Test 4 1.
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
Steepest Descent Method Contours are shown below.
Machine learning optimization Usman Roshan. Machine learning Two components: – Modeling – Optimization Modeling – Generative: we assume a probabilistic.
Daphne Koller Bayesian Networks Semantics & Factorization Probabilistic Graphical Models Representation.
RADFORD M. NEAL GEOFFREY E. HINTON 발표: 황규백
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
X = 2 + t y = t t = x – 2 t = (y + 3)/2 x – 2 = y x – 4 = y + 3 y – 2x + 7 = 0 Finding the Cartesian Equation from a vector equation x = 2.
Probability Distributions Table and Graphical Displays.
Daphne Koller Introduction Motivation and Overview Probabilistic Graphical Models.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
Fall 2004 Backpropagation CS478 - Machine Learning.
Particle Swarm Optimization with Partial Search To Solve TSP
Context-Specific CPDs
A Simple Artificial Neuron
ECE 5424: Introduction to Machine Learning
INTRODUCTION TO Machine Learning
Lecture 8 Generalized Linear Models &
Segmentation Using Metropolis Algorithm
Neural Networks for Vertex Covering
General Gibbs Distribution
Preliminaries: Distributions
Instructor :Dr. Aamer Iqbal Bhatti
METHOD OF STEEPEST DESCENT
Probabilistic Graphical Models Independencies Preliminaries.
Introduction to Neural Networks
General Gibbs Distribution
CHAPTER 10 Comparing Two Populations or Groups
I-equivalence Bayesian Networks Representation Probabilistic Graphical
Conditional Random Fields
Probabilistic Influence & d-separation
Reasoning Patterns Bayesian Networks Representation Probabilistic
MCMC Inference over Latent Diffeomorphisms
Preliminaries: Factors
Factorization & Independence
Factorization & Independence
I-maps and perfect maps
CS639: Data Management for Data Science
Backpropagation David Kauchak CS159 – Fall 2019.
I-maps and perfect maps
Tree-structured CPDs Local Structure Representation Probabilistic
Crypto Encryption Intro to public key.
Linear regression with one variable
Presentation transcript:

Preliminaries: Independence Probabilistic Graphical Models Introduction Preliminaries: Independence

Independence

Independence P(I,D) I D G Prob. i0 d0 g1 0.126 g2 0.168 g3 d1 0.009 0.045 i1 0.252 0.0224 0.0056 0.06 0.036 0.024 I Prob i0 0.6 i1 0.4 P(I,D) I D Prob i0 d0 0.42 d1 0.18 i1 0.28 0.12 D Prob d0 0.7 d1 0.3

Conditional Independence

Conditional Independence G Prob. i0 d0 g1 0.126 g2 0.168 g3 d1 0.009 0.045 i1 0.252 0.0224 0.0056 0.06 0.036 0.024 P(I,D | g1) I D Prob. i0 d0 0.282 d1 0.02 i1 0.564 0.134

Conditional Independence S G Prob. i0 s0 g1 0.126 g2 0.168 g3 s1 0.009 0.045 i1 0.252 0.0224 0.0056 0.06 0.036 0.024 P(S,G | i0) S Prob s0 0.95 s1 0.05 S G Prob. s0 g1 0.19 g2 0.323 g3 0.437 s1 0.01 0.017 0.023 G Prob. g1 0.2 g2 0.34 g3 0.46

END END END

Suppose q is at a local minimum of a function Suppose q is at a local minimum of a function. What will one iteration of gradient descent do? Leave q unchanged. Change q in a random direction. Move q towards the global minimum of J(q). Decrease q.

Consider the weight update: Which of these is a correct vectorized implementation?

Fig. A corresponds to a=0.01, Fig. B to a=0.1, Fig. C to a=1.

Factorized Representations 0.5 c1 c0 0.2 0.8 r0 r1 c1 c0 Cloudy 0.9 0.5 s0 s1 0.1 c1 c0 Sprinkler Rain WetGrass 0.3 0.08 0.25 0.4 g2 0.02 0.9 s1,r0 0.7 0.05 s0,r1 0.5 1 g1 g3 0.2 s1,r1 s0,r0 8 independent parameters