Download presentation
Presentation is loading. Please wait.
Published byAmberlynn Mosley Modified over 9 years ago
1
Random walk Presented by Changqing Li Mathematics Probability Statistics
2
What is a Random Walk? An Intuitive understanding: A series of movement which direction and size are randomly decided (e.g., the path a drunk person left behind). Formal Definition: Let a fixed vector in the d-dimensional Euclidean space and a sequence of independent, identically distributed (i.i.d.) real-valued random variables in. The discrete-time stochastic process defined by is called a d-dimensional random walk
3
Why Random Walks? A random walk (RW) is a useful model in understanding stochastic processes across a variety of scientific disciplines. Random walk theory supplies the basic probability theory behind BLAST ( the most widely used sequence alignment theory).
4
Definitions (cont.) If and RVs take values in, then is called d-dimensional lattice random walk. In the lattice walk case, if we only allow the jump from to where or, then the process is called d-dimensional sample random walk.
5
Definitions (cont.) A random walk is defined as restricted walk if the walk is limited to the interval [a, b]. The endpoints a and b are called absorbing barriers if the random walk eventually stays there forever; or reflecting barriers if the walk reaches the endpoint and bounces back.
6
Example: DNA sequence alignment modeled as RW | | | ||| || ||| ggagactgtagacagctaatgctata Gaacgccctagccacgagcccttatc Simple scoring schemes: at a position: +1, same nucleotides -1, different nucleotides *
7
Example: simple RW Ladder Point (LP):the point in the walk lower than any previously reached points. Excursion: the part of the walk from a LP until the highest point attained before the next LP. Excursions in Fig: 1, 1, 4, 0, 0, 0, 3; BLAST theory focused on the maximum heights achieved by these excursions. Ladder point
8
Example : General RW Consider arbitrary scoring scheme (e.g. substitution matrix)
9
General Walk Suppose generally the possible step sizes are, and their respective probabilities are, The mean of step size is negative, i.e., The mgf of S(step size) is,
10
General Walk There exists unique positive, such that, To consider the walk that start at 0, with stopping boundary at -1 and without upper boundary, impose an artificial barrier at The possible stopping points can be, And Wald’s Identity states, where, is the total displacement when the walk stops.
11
General Walk Thus, Where, is the probability that the walk finishes at the point k. The mean of number of steps until the walk stops or would be
12
Random Walks in real life! In Supernova stars – how “star stuff” gets to be inside us (eventually!)
13
Random Walks (in your body) Cells inside your body How two liquids (and air!) mix together! (Osmosis)
14
Random Walks and $$ (Wall Street) Stock Market – predicting the price /cost of a stock in the future
15
Application : BLAST BLAST is the most frequently used method for assessing which DNA or protein sequences in a large database have significant similarity to a given query sequence; a procedure that searches for high-scoring local alignments between sequences and then tests for significance of the scores found via P-value. The null hypothesis to be test is that for each aligned pair of animo acids, the two amino acids were generated by independent mechanism.
16
BLAST : modeling The positions in the alignment are numbered from left to right as 1, 2,…, N. A score S(j, k) is allocated to each position where the aligned amino acid pair (j,k) is observed, where S(j,k) is the (j,k) element in the substitution matrix chosen. An accumulated score at position i is calculated as the sum of the scores for the various amino acid comparison at position 1, 2,…,i. As i increases, the accumulated score undergoes a random walk.
17
BLAST : calculating parameters Let Y1, Y2,… be the respective maximum heights of the excursions of this walk after leaving one ladder point and before arriving the next, and let Ymax be the maximum of these maxima. It is in effect the test statistic used in BLAST. So it is necessary to find its null hypothesis distribution. The asymptotic probability distribution of any Yi is shown to be the geometric-like distribution. The values of C and in this distribution depend on the substitution matrix used and the amino acid frequencies {pj} and {pj’}. The probability distribution of Ymax also depends on n, the mean number of ladder points in the walk.
18
Reference http://mathworld.wolfram.com/RandomWal k2-Dimensional.html http://mathworld.wolfram.com/Borel- TannerDistribution.html http://www.bioss.ac.uk/~dirk/talks/tutorial _Blast.pdf#page=5&zoom=auto,53,792 http://www.jstor.org/discover/10.2307/278 51819?uid=3739840&uid=2129&uid=2&uid =70&uid=4&uid=3739256&sid=211029915 85977
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.