Download presentation
Presentation is loading. Please wait.
1
Presented By: Chinua Umoja
A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE Presented By: Chinua Umoja
2
DRISEE Designed primarily as a means to assess metagenomic sequencing error in samples submitted to MG-RAST. (from website) an automated analysis platform for metagenomes providing quantitative insights into microbial populations based The group also provides platform independent code that allows users to perform analyses on their own
4
Figure 2. DRISEE performance on simulated and real data.
Keegan KP, Trimble WL, Wilkening J, Wilke A, et al. (2012) A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE. PLoS Comput Biol 8(6): e doi: /journal.pcbi
5
Figure 3. Total DRISEE errors of genomic and metagenomic data produced by 454 and Illumina technologies.
6
Figure 4. DRISEE error profiles for metagenomic sequencing data sets.
7
Figure 5. DRISEE calculated Errors, separated by error type, for 454 and Illumina metagenomic samples.
9
Step 1: Sequence Data Considers data in FASTA format in attempt to keep DRISEE platform independent Input consist of all sequenced reads produced in single run or sample
10
Step 2: Check for Random Sequencing
DRISEE was designed to consider sequences generated by random (shotgun) procedures. a random fragment sequencing application that is used on a sample derived from a pool of organisms Samples that exhibit nonrandom sequence patterns in their prefix are excluded
11
Step 3: Screen Reads Sequence Data goes through a two step filtering procedure to remove two types of reads: Reads that exhibit uncharacteristic lengths Default if farther than two standard deviations from mean Reads with ambiguous base calls
12
Step 4: Bin Reads List of unique prefixes in the screened FASTA file is generated Remaining reads are grouped according to their prefix
13
Step 5: Screen Bins Bins are sorted by read abundance
Bins with minimum number of reads are processed further
14
Step 6: Screen for Suspect Content
Bins can be screened for: eukaryotic content sequences with low complexity known sequences that may exhibit an unusually high level of biological repetition
15
Step 7: End-Trim Reads Reads within a bin are trimmed to uniform length
16
Step 8: Construct Consensus
All reads in a bin undergo MSA currently uses QIIME MSA are used to generate a consensus sequence
17
Step 9: Compare Base-by-Base comparison is made with each individual read in a bin and final consensus sequence for that bin Only the non-prefix portion of the read is compared Read is scored at each position as either match or mismatch: MATCH - A, T, C ,G insertion or deletion MISMATCH- Asub, Tsub, Csub,Gsub, insertion or deletion
18
Step 10: Construct Bin-level DRISEE profiles
Deviations and matches for all reads in a bin are tallied with respect to position and the correlating consensus Te consensus position indexed table of matches and mismatches for bin represents its bin-level DRISEE profile
19
Step 11: Construct run-level DRISEE Profiles
DRISSE profiles for all considered bins in a given run can be combined to produce DRISEE error profile for the run
20
Step 12: Sample Group Level DRISEE profile Construction
Combining DRISSE profiles for all bins in a group of runs DRISEE profiles and the data they contain can be visualized directly or processed further to generate detailed analyses of information they contain
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.