Download presentation
Presentation is loading. Please wait.
Published byGervase Charles Modified over 6 years ago
1
Repeat finding by normal approximation on whole genome shotgun assembling
Naotoshi Seo, Hiroshi Toyoizumi Performance Evaluation Laboratory University of Aizu
2
Abstract The purpose of this thesis is to verify repeat finding by normal approximation is more effective than one by traditional Poisson approximation. We first estimated stochastically, and then verified by using our simulator programs.
3
What is whole genome shotgun assembling?
It is impossible to read DNA at burst because it is too long. Therefore, following procedure is required. Copy Restriction enzyme DNA Scan AGCTGTGGAG TGGAGCTTGA Shotgun assembling AGCTGTGGAGCTTGA
4
Repeat Repeat means subsequences with same arrangement in one genome sequence. ATTGAC repeat Repeat subsequence must not be used for overlap detection because its original location can not be determined. So, methods to find repeat are needed.
5
How to find repeat 3 If the # of copies is 3, the redundancy of one subsequence ordinarily becomes 3. 6 If the genome have another subsequence with same arrangement, in short, repeat, the number becomes 6. 4 Actually, these numbers become smaller because DNA is fragmented.
6
Estimation of the redundancy of subsequences with same arrangement
: cut probability : word (subsequence) length : probability of miss reading a fragment : probability with complete subsequence : probability of not being cut at all in w length n w N miss reading Binomial distribution
7
Comparing with our simulator
Estimation Simulator result It seems that my estimation is correct.
8
Approximation to another distribution
The distribution was a binomial distribution. A binomial distribution requires much time for calculation. Therefore, it is better to be approximated to another distribution. Traditionally, it is approximated by a Poisson distribution.
9
A problem of traditional approximation and a prescription
Although the approximation is possible when n is sufficiently large, p is small, it is actually impossible because n is small such as 10 and p is large such as 0.8 in this case. A binomial distribution can be approximated by not only a Poisson distribution but also a normal distribution.
10
Comparing a Poisson distribution and a normal distribution
Binomial distribution Poisson distribution Binomial distribution resembles normal distribution rather than Poisson distribution.
11
Necessary copy number for repeat finding
Right peak is the distribution of redundancies having double repeat. Left one is normal, having no repeat, distribution. When the # of copies is small, there is big error probability judged by mistake whether a subsequence is repeat or not. We assumed that error is accepted if its ratio is less than 0.05. big error subtle error
12
Necessary copy number for each distribution
The error became less than 0.05 when n is 4 on a binomial distribution if good threshold is used. It did when n is 5 on a normal approximation. It did when n is 28 on a Poisson approximation. In this case, about 1/6 copies are enough in a normal approximation compared with a traditional Poisson approximation.
13
The effective threshold value for repeat finding
The effective threshold value can be calculated by the intersection’s x-coordinate of a no-repeat distribution’s curve and a double-repeat distribution’s curve.
14
Experimental proof Word length = 100 Cut probability Pc Miss reading probability Pm # of copies n Threshold value Error ratio 1/500 5 5.8 0.0 1/5 12 11.4 0.0196 1/200 15 13.1 0.0035 We assumed that error ratio that is less than 0.05 is accepted. This shows that repeat finding by normal approximation works well.
15
Conclusion The # of copies could be decreased by normal approximation compared with traditional Poisson approximation. Indeed, the repeat finding by the small copies’ number achieved good results. Therefore, it was verified that repeat finding by normal approximation is more effective than one by traditional Poisson approximation.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.