Download presentation
Presentation is loading. Please wait.
Published byMerry Stanley Modified over 9 years ago
1
Genomics Quick Start Mikhail Dvorkin Vladislav Isenbaev Eugene Kapun Scientific advisors Acad. Konstantin Skryabin, Bioengineering RAS Prof. Anatoly Shalyto, SPbSU ITMO
2
Collaboration with Bioengineering RAS Bioengineering RAS – Conducts biological experiments – Sets problems – Provides biological data SPbSU ITMO – Develops algorithms and programs Started in the end of 2009 Why us? SPbSU ITMO: Genomics Quick Start 2
3
SPbSU ITMO at ACM ICPC We train Zürich ETH May be, MIT? :-) SPbSU ITMO: Genomics Quick Start 3
4
4
5
5
6
6
7
Genome Team Coach Georgiy Korneev Members Mikhail Dvorkin Vladislav Isenbaev Eugene Kapun SPbSU ITMO: Genomics Quick Start 7
8
Problems Being Solved DNA assembly de novo based on pair reads – Generalized suffix tree traversal – Reduction to single reads DNA alignment with transfers SPbSU ITMO: Genomics Quick Start 8
9
DNA Assembly 1 Generalized suffix tree traversal
10
Suffix Tree Built upon reads Arc weight: number and quality of reads Possible extensions Erroneous nucleotides detection SPbSU ITMO: Genomics Quick Start10
11
Building up a Contig Start with high-quality read Use pair reads to select a nucleotide – “Backward” – match the past – “Forward” – match the future Build up to a branch SPbSU ITMO: Genomics Quick Start11
12
Results Caenorhabditis elegans Escherichia coli K-12 SPbSU ITMO: Genomics Quick Start12 Mean coverage156125100 Contig length100% 76%
13
DNA Assembly 2 Reduction to single reads
14
Concept De Bruijn graph with all reads Pair reads – Path in the graph – Low density – backtracking – Slow – Meet-in-the-middle SPbSU ITMO: Genomics Quick Start 14
15
Error detection Poorly covered vertices – Erroneous – Delete them – Repeat Paths – Single reads – Use another tool SPbSU ITMO: Genomics Quick Start 15
16
Results 60% erroneous reads detected < 0.1% errors left after one iteration 99.5% DNA coverage SPbSU ITMO: Genomics Quick Start 16
17
DNA Alignment with transfers
18
Concept Parts Matched (small edit distance) Unmatched Swapping allowed Penalties Number of parts Edit distance in matched parts Length of unmatched parts SPbSU ITMO: Genomics Quick Start 18
19
Implementation First DNA Tear into small pieces Hash ‘em and store ‘em Second DNA Tear into small pieces Look them up Build them up SPbSU ITMO: Genomics Quick Start 19
20
Results Bacteria12345 1. NC_012759100%90%88%11%0.1% 2. NC_01294790%100%96%11%0.1% 3. NC_01296789%97%100%11%0.1% 4. NC_01006711% 100%0.1% 5. NC_0029520.1% 100% SPbSU ITMO: Genomics Quick Start 20
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.