Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented By: Chinua Umoja

Similar presentations


Presentation on theme: "Presented By: Chinua Umoja"— Presentation transcript:

1 Presented By: Chinua Umoja
A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE Presented By: Chinua Umoja

2 DRISEE Designed primarily as a means to assess metagenomic sequencing error in samples submitted to MG-RAST. (from website) an automated analysis platform for metagenomes providing quantitative insights into microbial populations based The group also provides platform independent code that allows users to perform analyses on their own

3

4 Figure 2. DRISEE performance on simulated and real data.
Keegan KP, Trimble WL, Wilkening J, Wilke A, et al. (2012) A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE. PLoS Comput Biol 8(6): e doi: /journal.pcbi

5 Figure 3. Total DRISEE errors of genomic and metagenomic data produced by 454 and Illumina technologies.

6 Figure 4. DRISEE error profiles for metagenomic sequencing data sets.

7 Figure 5. DRISEE calculated Errors, separated by error type, for 454 and Illumina metagenomic samples.

8

9 Step 1: Sequence Data Considers data in FASTA format in attempt to keep DRISEE platform independent Input consist of all sequenced reads produced in single run or sample

10 Step 2: Check for Random Sequencing
DRISEE was designed to consider sequences generated by random (shotgun) procedures. a random fragment sequencing application that is used on a sample derived from a pool of organisms Samples that exhibit nonrandom sequence patterns in their prefix are excluded

11 Step 3: Screen Reads Sequence Data goes through a two step filtering procedure to remove two types of reads: Reads that exhibit uncharacteristic lengths Default if farther than two standard deviations from mean Reads with ambiguous base calls

12 Step 4: Bin Reads List of unique prefixes in the screened FASTA file is generated Remaining reads are grouped according to their prefix

13 Step 5: Screen Bins Bins are sorted by read abundance
Bins with minimum number of reads are processed further

14 Step 6: Screen for Suspect Content
Bins can be screened for: eukaryotic content sequences with low complexity known sequences that may exhibit an unusually high level of biological repetition

15 Step 7: End-Trim Reads Reads within a bin are trimmed to uniform length

16 Step 8: Construct Consensus
All reads in a bin undergo MSA currently uses QIIME MSA are used to generate a consensus sequence

17 Step 9: Compare Base-by-Base comparison is made with each individual read in a bin and final consensus sequence for that bin Only the non-prefix portion of the read is compared Read is scored at each position as either match or mismatch: MATCH - A, T, C ,G insertion or deletion MISMATCH- Asub, Tsub, Csub,Gsub, insertion or deletion

18 Step 10: Construct Bin-level DRISEE profiles
Deviations and matches for all reads in a bin are tallied with respect to position and the correlating consensus Te consensus position indexed table of matches and mismatches for bin represents its bin-level DRISEE profile

19 Step 11: Construct run-level DRISEE Profiles
DRISSE profiles for all considered bins in a given run can be combined to produce DRISEE error profile for the run

20 Step 12: Sample Group Level DRISEE profile Construction
Combining DRISSE profiles for all bins in a group of runs DRISEE profiles and the data they contain can be visualized directly or processed further to generate detailed analyses of information they contain

21


Download ppt "Presented By: Chinua Umoja"

Similar presentations


Ads by Google