Download presentation
Presentation is loading. Please wait.
1
Bioinformatics: Buzzword or Discipline (???)
2
Outline of the course Analysis of one DNA sequence: Shotgun sequencing, Markov-Chain modeling, patterns and repeats. Analysis of multiple DNA or protein sequences: Dynamic programming alignments, substitution matrices. BLAST: Algorithm for sequence retrieval and comparison. Refresher on Markov Chains: Capsule theory, Markov-Chain Monte Carlo algorithms. Hidden Markov Models: Viterbi Algorithm and its applications. Evolutionary Models: Models of nucleotide mutation and substitution, recombination and genetic drift, with applications to genome evolution and gene mapping. Molecular phylogenetics (tree making): distance matrix, maximum likelihood and parsimony. Special topics: Gene and protein networks, analysis of DNA-microarray data, …
3
30,000 Genes make up only 3% of the genome BCM- HGSC
4
Genome Sizes Human 3.0 x 109 base pairs Mouse 3.0 x 109
Drosophila x 108 Worm x 108 Dictyostelium x 107 Yeast x 107 Bacteria x 106
5
Shotgun Sequencing High Accuracy Sequence: < 1 error/ 10,000 bases
6
The Human Genome: 3 Billion Base Pairs Whole Genome Shotgun Strategy
3 billion bases Libraries of clones 3kb, 10kb, 50kb base pairs DNA sequence reads 500 bases each AGGCTCACTG BCM- HGSC
7
Statistical issues in shotgun strategy
Model for the random fragments: Binomial/Poisson process Coverage of sequence by random fragments Mean number of contigs Mean size of contigs Coverage by anchored contigs
8
Binomial/Poisson Process
N fragments, of length L each, randomly scattered in the interval of length G. Coverage a = NL/G Contig: Union of overlapping fragments. We want to have them cover as much of G as possible. Pr[#frags with left end in (x, x-h) = k] “is” binomial(N,h/G) or approximately Poisson(Nh/G) (when?).
9
Mean number of contigs E[#contigs] = N Pr[a frag is rightmost in a contig] = N Pr[frag does not include the left end of any other frag] = N exp(- NL/G) = (aG/L) exp(- a) L = 800 G = 100,000
10
Mean contig size E[S] = E[#frags-1] E[inter-epoch distance] + L
11
Mean contig size E(S) a
12
Number of anchored contigs
#anchors = M #frags = N a = NL/G b = ML/G E[#anchored contigs] =Nb [exp(-a)-exp(-b)]/(b-a)
13
Conclusions Expected number of contigs first increases, then decreases with coverage. Expected size of contig increases with coverage. Expected number of anchored contigs first increases then decreases with anchor density . Attention: Computations do not involve boundary effects.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.