Download presentation
Presentation is loading. Please wait.
Published byJeffrey Atkinson Modified over 6 years ago
1
Jeong-Hyeon Choi, Sun Kim, Haixu Tang, Justen Andrews, Don G. Gilbert
A machine-learning approach to combined evidence validation of genome assemblies Jeong-Hyeon Choi, Sun Kim, Haixu Tang, Justen Andrews, Don G. Gilbert John K. Colbourne Presented By – F A Rezaur Rahman Chowdhury
2
Mate-Pair Shotgun DNA Sequencing
DNA target sample SHEAR & SIZE End Reads / Mate Pairs 550bp CLONE & END SEQUENCE 10,000bp
3
Assembling the fragments
NOte that contig orientation/order is not determined
4
Building Scaffolds Break DNA into random fragments
Sequence the ends of the fragments Assemble the sequenced ends Build scaffolds We need to determine the relative order/orientation of contigs Using forward-reverse constraints helps
5
Assembly Overview Assembly Scaffolding
6
Assembly Algorithms Greedy (TIGR , phrap, CAP3) De bruijn Graph
Graph based Greedy(Celera, Arachne) Euler path based
8
Statistical Error detection
Significant deviations from average coverage.
9
Distribution of clone length
10
Good and Bad Clones intra-contig or intra-scaffold clone is called good if the absolute Z- score of its length is smaller than a threshold Half-placed clones also bad Clones with paired-end reads that are placed in the same or outer orientation
11
Machine-learning approach
Combine evidence assembly validation Features are taken from the statistical approaches Five different classifier (J48, RF, RT, NB, BN)
12
Evaluation Simulated dataset with different error rates ( 0.001, 0.003, 0.005) Draft Assembly of Drosophila (D. mojavensis, D. erecta and D. virilis)
13
ROC Curve
14
Simulated Data
15
Drosophila Assembly
16
Cross Species
17
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.