Download presentation
1
Jan Pačes Institute of Molecular Genetics AS CR
hard assembly Jan Pačes Institute of Molecular Genetics AS CR
2
problems genomes high GC content
repetitions (short - low informational content, long) polymorphic "unreadable" sequences, "weird" structures technologies nonrandom libraries wrong sizes erroneous or chimeric reads
3
sequencing technologies
ABI (sanger) 454 (pyrosequencing) solexa (reversible terminator) SOLiD (2base ligation) PacBio (SMRT)
4
example of errors in one technology
5
high GC regions are underrepresented
Aird et al. Genome Biology 2011
6
protocol optimization for high GC content
Aird et al. Genome Biology 2011
7
repetitions scaffold repetition
8
repetitions
9
repetitions recognition
Repeatmasker RepeatModeller (RECON and RepeatScout) position aware assemblers MIRA MaSuRCA SPAdes
10
k-mer distribution
11
k-mer analysis JELLYFISH - Fast, Parallel k-mer Counting for DNA
Quake is a package to correct substitution sequencing errors in experiments with deep coverage KHMER Trim off likely erroneous k-mers
12
repetitions repetition scaffold
13
filling gaps GapCloser (part of SOAPdenovo)
GapFiller (part of SSPACE) GapFiller
14
454 multiplicates
15
contig coverage by large libraries
16
illumina pe and mate-pairs libraries
1616 illumina pe and mate-pairs libraries
17
highly polymorphic genomes
two copies of polymorphic contigs scaffold
18
polymorphic assembly workflow
normal assembly condensing alternative contigs mapping to identify SNPs "repair" reads second "polymorpic" assembly
20
G-quadruplex
21
Chicken p53 – coverage from RNAseq data
AGCGACCCCCCCCCACCACCGCCACCACCACCTCTGCCATTGGCCGCCGCCGCCCCCCCCCCATTAAACCCCCCCACCCCCCCCCGCGCTGCCCCCTCCCCGGTGG Coverage > 13,000X
22
Chicken erythropoietin (EPO)– coverage from RNAseq data
CCCGCCCACCCCCACCCCCACCCGCACCCCCCACTCTCCCACCCCCACCCCCTTTTCTCCCACCCCCTCTTCTCCCACCCCCTTTTCCCCCCCTTCCTCCCCCCACTCCG CCCCCCCCCCGCCCCCTCCCCCCCCCCAGGTGAGGACCCT Coverage > 500X from RNAseq (*EPO locus not completed even from 1000X coverage genomic Illumina data!)
23
chicken missing genes
24
that’s it, thank you many thanks also to: Daniel Elleder Tomáš Hron
Michal Kolář Hynek Strnad
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.