Download presentation
Presentation is loading. Please wait.
1
Genome sequencing
2
Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli Library: collection of fragments of a genome in cloning vectors Draft: crude 1 st generation sequence assembly Scaffold: Sequences which are anchored to a genetic map
3
Vocabulary 2 Minimal tiling path: Minimal set of overlapping clones that together provides complete coverage across a genomic region Coverage: The number of times a genomic region is represented in a collection of clones or sequence reads Contig: Alignment of overlapping reads 'N50 length‘ is defined as the largest length L such that 50% of all nucleotides are contained in contigs of size at least L
5
Bac by Bac Whole genome shotgun
6
Bac by Bac sequencing (slow)
7
Minimal tiling path
10
Whole genome shotgun sequencing WGSA
11
Hybrid shotgun sequencing
12
N 50 Cumulative contig content in % of genome 0 400400 50 100 Contig size (in kb) Order contigs according to size Compute cumulative size N50 = contig size (sequence length) which marks 50% of genome content 1001000
13
Human genome 2001: 2 Draft sequences published Public Bac by Bac sequence Celeras WGSA –90% of euchromatic sequence –150.000 gaps –N 50 : 81 kb –Error rate: 1:10.000 2004 Finished public sequence –99 % of euchromatic sequence –341 gaps –N 50 : 38.500 kb –Error rate: 1:100.000
14
The problem with complex genomes Gaps Orientation of contigs not known Near identical repeats hard to resolve
15
Finishing the sequence GapDraft sequence
16
Resolving repeats
17
Detecting and resolving repeats in WGSA
18
Clone orientation
19
Segmental duplications / gaps Blue: duplications of size > 10kb Red: Gaps of size > 300 kb
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.