Download presentation
Presentation is loading. Please wait.
Published byCaroline Ford Modified over 9 years ago
1
Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden
2
De novo assembly
3
Overall idea
5
Repeats and non random sheering
6
scaffolding Multiple libraries contigs are directed by mate pairs -> scaffolding
7
4 types of assemblers Greedy algorithms Overlap-layout-consensus Align-layout-consensus Bac by Bac sequencing
8
Types of assemblers I Greedy algorithms joins similar reads easily confused by repeats
9
Types of assemblers II Overlap layout consensus assembler nodes represent end of read lines represent similarity between reads (overlap) layout step removes redundant information consensus step is building of genome
10
Types of assemblers III Align-layout-consensus. process called comparative assembly. The overlap stage of assembly is replaced by an alignment step. The layout stage is also greatly simplified due to the additional constraints provided by the alignment to the reference.
11
Types of assemblers IV Bac by bac sequencing genome broken in fragments Bac’s location is determined in the lab minimum tiling path (whole genome is covered by at least one Bac Bac’s sequenced
12
Lander-Waterman equation “rain drops” to cover a tile 8-10 fold coverage 5 contigs for 1MB genome
13
Timeline 1975 Sanger sequencing 1990 First shotgun/EST assemblers overlap-layout-consensus approach 2000 Human shotgun assembly 2001 Mouse shotgun assembly 2005 454 roche available 2006 Solexa available 2007 short read assembers de Bruijn graphs
14
The complexity of sequence assembly Long reads –better identification –much slower Short reads –faster to align –more difficult with repeats Amount of reads Length of reads Mismatches Algorithms can show quadratic or even exponential complexity
16
3 NGS Projects Dragon fly Medical Maggots EST comparison
17
Dragon Fly (libelle) Class Odonata 3000 species 90 in Europe Undergo a morphic change
18
Pilot study for African Dragon Fly Morphic change Some migrate others don't Genetically divergent Contain lots of introns in their genome
19
Project questions What are the homologies with other species? How big is the genome? Are there already sequences in Genbank and are they present in the data?
20
Dragon fly project data Genomic Single end 1 x 1147762 reads Trimmed to 34/51 nucleotides 39.023.908 nucleotides sequenced CDNA Paired end 2 x 1291901 reads Read lenght = 51 131.773.902 nucleotides sequenced
21
Dragon fly methods Assemble cDNA Blast resulting contigs to determine homologies Align genomic DNA to contigs Calculate genome size
22
Dragon fly assembly results total contigs: 3898 average length of contigs: 176 average coverage of contigs: 24 contigs larger than 300 nucleotides: 800 average length of contigs larger then 300: 508 average coverage of contigs larger then 300: 15
23
Dragon fly genes and homologies libellula pulchella Enallagma aspersum Erythromma najas Ischnura verticalis many Drosophila species Criteria used for in this analysis was an e- value of less then 1*10^-40 and a score of more than 200. COII gene with accession number GQ256052.1 (partial) COI gene with accession number GQ256032.1 (partial) NDI gene with accession number GQ255994.1 (partial) found in the cDNA contigs.
24
Dragon fly genome size 30 genomic genes selected after blasting Size 300-1500 Alignment with Bowtie “calculation”
25
Medicinal maggots Treated to non healing wounds genes revealed Signaling proteins Inhibitor of apoptosis protein 2 Digestive enzymes Lipases proteinases antimicrobial peptides (AMPs) Lucilia defensin diptericin
26
Medicinal maggots data 5 degenerate peptide sequences 36 Peptides cDNA 8.199.983 reads read lenght 32 2.623.994.560
27
Medicinal maggots question Have we sequenced (pieces) of the genes corresponding to the peptides.
28
Medicinal maggots methods Build local library of peptides Assemble contigs CLCbio Nextgene Velvet Blast contigs to peptides Find hits Make coverage plot
29
Nextgene assembly maggots aantal contigs = 59048 gemiddelde lengte = 59 gemiddelde coverage = 11 aantal contigs >300 = 719 gemiddelde lengte >300 = 661 gemiddelde coverage >300 = 64
30
CLC assembly Aantal contigs = 78 gemiddelde lengte = 2282 gemiddelde coverage = 514
31
Velvet assembly made total contigs: 586 length of contigs:168 coverage of contigs: 55 contigs larger than 300 nucleotides:62 length of contigs larger then 300: 779 coverage of contigs larger then 300: 63
32
Found Genes Maggots C.vicina mRNA for arylphorin subunit A4 Velvet Drosophila willistoni GK21455 (Dwil\GK21455) mRNA nextgene Lucilia cuprina clone sbsp9 serine proteinase mRNA nextgene
33
EST comparison Traditional EST sequencing known library assemblers CLCbio Nextgene Velvet
34
EST comparison method Assemble cDNA and match with known ESTs
35
EST results
36
conclusions Big differences between assemblers coverage length amount of nodes sequence x performs best on EST test
37
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.