Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics: Buzzword or Discipline (???)

Similar presentations


Presentation on theme: "Bioinformatics: Buzzword or Discipline (???)"— Presentation transcript:

1 Bioinformatics: Buzzword or Discipline (???)

2 Outline of the course Analysis of one DNA sequence: Shotgun sequencing, Markov-Chain modeling, patterns and repeats. Analysis of multiple DNA or protein sequences: Dynamic programming alignments, substitution matrices. BLAST: Algorithm for sequence retrieval and comparison. Refresher on Markov Chains: Capsule theory, Markov-Chain Monte Carlo algorithms. Hidden Markov Models: Viterbi Algorithm and its applications. Evolutionary Models: Models of nucleotide mutation and substitution, recombination and genetic drift, with applications to genome evolution and gene mapping. Molecular phylogenetics (tree making): distance matrix, maximum likelihood and parsimony. Special topics: Gene and protein networks, analysis of DNA-microarray data, …

3 30,000 Genes make up only 3% of the genome BCM- HGSC

4 Genome Sizes Human 3.0 x 109 base pairs Mouse 3.0 x 109
Drosophila x 108 Worm x 108 Dictyostelium x 107 Yeast x 107 Bacteria x 106

5 Shotgun Sequencing High Accuracy Sequence: < 1 error/ 10,000 bases

6 The Human Genome: 3 Billion Base Pairs Whole Genome Shotgun Strategy
3 billion bases Libraries of clones 3kb, 10kb, 50kb base pairs DNA sequence reads 500 bases each AGGCTCACTG BCM- HGSC

7 Statistical issues in shotgun strategy
Model for the random fragments: Binomial/Poisson process Coverage of sequence by random fragments Mean number of contigs Mean size of contigs Coverage by anchored contigs

8 Binomial/Poisson Process
N fragments, of length L each, randomly scattered in the interval of length G. Coverage a = NL/G Contig: Union of overlapping fragments. We want to have them cover as much of G as possible. Pr[#frags with left end in (x, x-h) = k] “is” binomial(N,h/G) or approximately Poisson(Nh/G) (when?).

9 Mean number of contigs E[#contigs] = N  Pr[a frag is rightmost in a contig] = N  Pr[frag does not include the left end of any other frag] = N  exp(- NL/G) = (aG/L)  exp(- a) L = 800 G = 100,000

10 Mean contig size E[S] = E[#frags-1] E[inter-epoch distance] + L

11 Mean contig size E(S) a

12 Number of anchored contigs
#anchors = M #frags = N a = NL/G b = ML/G E[#anchored contigs] =Nb [exp(-a)-exp(-b)]/(b-a)

13 Conclusions Expected number of contigs first increases, then decreases with coverage. Expected size of contig increases with coverage. Expected number of anchored contigs first increases then decreases with anchor density . Attention: Computations do not involve boundary effects.


Download ppt "Bioinformatics: Buzzword or Discipline (???)"

Similar presentations


Ads by Google