Presented By: Chinua Umoja

Presented By: Chinua Umoja
A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE Presented By: Chinua Umoja

DRISEE Designed primarily as a means to assess metagenomic sequencing error in samples submitted to MG-RAST. (from website) an automated analysis platform for metagenomes providing quantitative insights into microbial populations based The group also provides platform independent code that allows users to perform analyses on their own

Figure 2. DRISEE performance on simulated and real data.
Keegan KP, Trimble WL, Wilkening J, Wilke A, et al. (2012) A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE. PLoS Comput Biol 8(6): e doi: /journal.pcbi

Figure 3. Total DRISEE errors of genomic and metagenomic data produced by 454 and Illumina technologies.

Figure 4. DRISEE error profiles for metagenomic sequencing data sets.

Figure 5. DRISEE calculated Errors, separated by error type, for 454 and Illumina metagenomic samples.

Step 1: Sequence Data Considers data in FASTA format in attempt to keep DRISEE platform independent Input consist of all sequenced reads produced in single run or sample

Step 2: Check for Random Sequencing
DRISEE was designed to consider sequences generated by random (shotgun) procedures. a random fragment sequencing application that is used on a sample derived from a pool of organisms Samples that exhibit nonrandom sequence patterns in their prefix are excluded

Step 3: Screen Reads Sequence Data goes through a two step filtering procedure to remove two types of reads: Reads that exhibit uncharacteristic lengths Default if farther than two standard deviations from mean Reads with ambiguous base calls

Step 4: Bin Reads List of unique prefixes in the screened FASTA file is generated Remaining reads are grouped according to their prefix

Step 5: Screen Bins Bins are sorted by read abundance
Bins with minimum number of reads are processed further

Step 6: Screen for Suspect Content
Bins can be screened for: eukaryotic content sequences with low complexity known sequences that may exhibit an unusually high level of biological repetition

Step 7: End-Trim Reads Reads within a bin are trimmed to uniform length

Step 8: Construct Consensus
All reads in a bin undergo MSA currently uses QIIME MSA are used to generate a consensus sequence

Step 9: Compare Base-by-Base comparison is made with each individual read in a bin and final consensus sequence for that bin Only the non-prefix portion of the read is compared Read is scored at each position as either match or mismatch: MATCH - A, T, C ,G insertion or deletion MISMATCH- Asub, Tsub, Csub,Gsub, insertion or deletion

Step 10: Construct Bin-level DRISEE profiles
Deviations and matches for all reads in a bin are tallied with respect to position and the correlating consensus Te consensus position indexed table of matches and mismatches for bin represents its bin-level DRISEE profile

Step 11: Construct run-level DRISEE Profiles
DRISSE profiles for all considered bins in a given run can be combined to produce DRISEE error profile for the run

Step 12: Sample Group Level DRISEE profile Construction
Combining DRISSE profiles for all bins in a group of runs DRISEE profiles and the data they contain can be visualized directly or processed further to generate detailed analyses of information they contain

Presented By: Chinua Umoja

Similar presentations

Presentation on theme: "Presented By: Chinua Umoja"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presented By: Chinua Umoja

Similar presentations

Presentation on theme: "Presented By: Chinua Umoja"— Presentation transcript:

Similar presentations

About project

Feedback