. Markov Chains as a Learning Tool. 2 Weather: raining today40% rain tomorrow 60% no rain tomorrow not raining today20% rain tomorrow 80% no rain tomorrow.

Slides:



Advertisements
Similar presentations
Markov Chain Nur Aini Masruroh.
Advertisements

1 Introduction to Discrete-Time Markov Chain. 2 Motivation  many dependent systems, e.g.,  inventory across periods  state of a machine  customers.
. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Chapter 4. Discrete Probability Distributions Section 4.11: Markov Chains Jiaping Wang Department of Mathematical.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Data Mining Classification: Alternative Techniques
K-means method for Signal Compression: Vector Quantization
Markov Chains Extra problems
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Lazy vs. Eager Learning Lazy vs. eager learning
Markov Chains Modified by Longin Jan Latecki
IERG5300 Tutorial 1 Discrete-time Markov Chain
1 Markov Chains (covered in Sections 1.1, 1.6, 6.3, and 9.4)
. Computational Genomics Lecture 7c Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
Hidden Markov Models Tunghai University Fall 2005.
. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
Link Analysis David Kauchak cs160 Fall 2009 adapted from:
. Hidden Markov Models - HMM Tutorial #5 © Ydo Wexler & Dan Geiger.
Topics Review of DTMC Classification of states Economic analysis
Lecture 12 – Discrete-Time Markov Chains
Chapter 17 Markov Chains.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Flows and Networks (158052) Richard Boucherie Stochastische Operations Research -- TW wwwhome.math.utwente.nl/~boucherierj/onderwijs/158052/ html.
Entropy Rates of a Stochastic Process
1 Discrete Structures & Algorithms Discrete Probability.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Review.
2D1431 Machine Learning Boosting.
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
Markov Chains Chapter 16.
INSTANCE-BASE LEARNING
CS Instance Based Learning1 Instance Based Learning.
Class 5 Hidden Markov models. Markov chains Read Durbin, chapters 1 and 3 Time is divided into discrete intervals, t i At time t, system is in one of.
. Markov Chains Tutorial #5 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
CS6800 Advanced Theory of Computation Fall 2012 Vinay B Gavirangaswamy
Rainbow Tool Kit Matt Perry Global Information Systems Spring 2003.
Inductive learning Simplest form: learn a function from examples
Courtesy of J. Akinpelu, Anis Koubâa, Y. Wexler, & D. Geiger
 { X n : n =0, 1, 2,...} is a discrete time stochastic process Markov Chains.
Chapter 3 : Problems 7, 11, 14 Chapter 4 : Problems 5, 6, 14 Due date : Monday, March 15, 2004 Assignment 3.
8/14/04J. Bard and J. W. Barnes Operations Research Models and Methods Copyright All rights reserved Lecture 12 – Discrete-Time Markov Chains Topics.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Discrete Time Markov Chains
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Flows and Networks (158052) Richard Boucherie Stochastische Operations Research -- TW wwwhome.math.utwente.nl/~boucherierj/onderwijs/158052/ html.
Great Theoretical Ideas in Computer Science for Some.
Stochastic Processes and Transition Probabilities D Nagesh Kumar, IISc Water Resources Planning and Management: M6L5 Stochastic Optimization.
Productivity. Operations Management Module 3 Productivity Efficiency Effectiveness Markov chains.
11. Markov Chains (MCs) 2 Courtesy of J. Bard, L. Page, and J. Heyl.
CS Machine Learning Instance Based Learning (Adapted from various sources)
K-Nearest Neighbor Learning.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Kansas State University Department of Computing and Information Sciences CIS 890: Special Topics in Intelligent Systems Wednesday, November 15, 2000 Cecil.
Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.
Let E denote some event. Define a random variable X by Computing probabilities by conditioning.
Availability Availability - A(t)
Markov Chains Tutorial #5
Classification with Perceptrons Reading:
Instance Based Learning (Adapted from various sources)
Learning.
Hidden Markov Models Part 2: Algorithms
Discrete-time markov chain (continuation)
Courtesy of J. Akinpelu, Anis Koubâa, Y. Wexler, & D. Geiger
Advanced Mathematics Hossein Malekinezhad.
Markov Chains Tutorial #5
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Lecture 11 – Stochastic Processes
Presentation transcript:

. Markov Chains as a Learning Tool

2 Weather: raining today40% rain tomorrow 60% no rain tomorrow not raining today20% rain tomorrow 80% no rain tomorrow Markov Process Simple Example rain no rain Stochastic Finite State Machine:

3 Weather: raining today40% rain tomorrow 60% no rain tomorrow not raining today20% rain tomorrow 80% no rain tomorrow Markov Process Simple Example Stochastic matrix: Rows sum up to 1 Double stochastic matrix: Rows and columns sum up to 1 The transition matrix: Rain No rain Rain No rain

4 Markov Process Markov Property: X t +1, the state of the system at time t+1 depends only on the state of the system at time t X1X1 X2X2 X3X3 X4X4 X5X5 Stationary Assumption: Transition probabilities are independent of time ( t ) Let X i be the weather of day i, 1 <= i <= t. We may decide the probability of X t+1 from X i, 1 <= i <= t.

5 – Gambler starts with $10 (the initial state) - At each play we have one of the following: Gambler wins $1 with probability p Gambler looses $1 with probability 1-p – Game ends when gambler goes broke, or gains a fortune of $100 (Both 0 and 100 are absorbing states) p p p p 1-p Start (10$) Markov Process Gambler’s Example 1-p

6 Markov process - described by a stochastic FSM Markov chain - a random walk on this graph (distribution over paths) Edge-weights give us We can ask more complex questions, like Markov Process p p p p 1-p Start (10$)

7 Given that a person’s last cola purchase was Coke, there is a 90% chance that his next cola purchase will also be Coke. If a person’s last cola purchase was Pepsi, there is an 80% chance that his next cola purchase will also be Pepsi. coke pepsi Markov Process Coke vs. Pepsi Example transition matrix: coke pepsi coke pepsi

8 Given that a person is currently a Pepsi purchaser, what is the probability that he will purchase Coke two purchases from now? Pr [ Pepsi  ?  Coke ] = Pr [ Pepsi  Coke  Coke ] + Pr [ Pepsi  Pepsi  Coke ] = 0.2 * * 0.2 = 0.34 Markov Process Coke vs. Pepsi Example (cont) Pepsi  ? ?  Coke

9 Given that a person is currently a Coke purchaser, what is the probability that he will buy Pepsi at the third purchase from now? Markov Process Coke vs. Pepsi Example (cont)

10 Assume each person makes one cola purchase per week Suppose 60% of all people now drink Coke, and 40% drink Pepsi What fraction of people will be drinking Coke three weeks from now? Markov Process Coke vs. Pepsi Example (cont) Pr[X 3 =Coke] = 0.6 * * = Q i - the distribution in week i Q 0 = (0.6,0.4) - initial distribution Q 3 = Q 0 * P 3 =(0.6438,0.3562)

11 Simulation: Markov Process Coke vs. Pepsi Example (cont) week - i Pr[X i = Coke] 2/3 stationary distribution coke pepsi

How to obtain Stochastic matrix? u Solve the linear equations, e.g., u Learn from examples, e.g., what letters follow what letters in English words: mast, tame, same, teams, team, meat, steam, stem. 12

How to obtain Stochastic matrix? u Counts table vs Stochastic Matrix 13 Pastme\0 a01/7 5/700 e4/7001/702/7 m1/8 003/8 s1/503/5001/5 2/800

Application of Stochastic matrix u Using Stochastic Matrix to generate a random word: l Generate most likely first letter l For each current letter generate most likely next letter 14 Aastme\0 a e m s If C[r,j] > 0, let A[r,j] = C[r,1]+C[r,2]+…+C[r,j] C

Application of Stochastic matrix u Using Stochastic Matrix to generate a random word: l Generate most likely first letter: Generate a random number x between 1 and 8. If 1 <= x <= 3, the letter is ‘s’; if 4 <= x <= 6, the letter is ‘t’; otherwise, it’s ‘m’. l For each current letter generate most likely next letter: Suppose the current letter is ‘s’ and we generate a random number x between 1 and 5. If x = 1, the next letter is ‘a’; if 2 <= x <= 4, the next letter is ‘t’; otherwise, the current letter is an ending letter. 15 Aastme\0 a e m s If C[r,j] > 0, let A[r,j] = C[r,1]+C[r,2]+…+C[r,j]

Supervised vs Unsupervised u Decision tree learning is “supervised learning” as we know the correct output of each example. u Learning based on Markov chains is “unsupervised learning” as we don’t know which is the correct output of “next letter”. 16

K-Nearest Neighbor u Features l All instances correspond to points in an n- dimensional Euclidean space l Classification is delayed till a new instance arrives l Classification done by comparing feature vectors of the different points l Target function may be discrete or real-valued

1-Nearest Neighbor

3-Nearest Neighbor

20 Example: Identify Animal Type 14 examples 10 attributes 5 types What’s the type of this new animal?

K-Nearest Neighbor u An arbitrary instance is represented by(a 1 (x), a 2 (x), a 3 (x),.., a n (x)) l a i (x) denotes features u Euclidean distance between two instances d(x i, x j )=sqrt (sum for r=1 to n (a r (x i ) - a r (x j )) 2 ) u Continuous valued target function l mean value of the k nearest training examples

Distance-Weighted Nearest Neighbor Algorithm u Assign weights to the neighbors based on their ‘distance’ from the query point l Weight ‘may’ be inverse square of the distances  All training points may influence a particular instance  Shepard’s method

Remarks + Highly effective inductive inference method for noisy training data and complex target functions + Target function for a whole space may be described as a combination of less complex local approximations + Learning is very simple - Classification is time consuming (except 1NN)