RDP – Capturing the Unclassified Use only on data that can be publicly shared. These are not secure tools.
Genboree RDP Output Tutorial 2 Dataset – QIIME chimeras removed – RDP Sample Period
Download files Raw.results.tar.gz
Unarchive and Decompress Use 7zip Seq.fna
Open in Bioedit
In Bioedit: – Ctrl +A – to select all sequences – Shift + Ctrl + C – to copy all sequence titles In Excel: – Paste into excel. In Column B (or other) – =left(a1,number_of_characters_in_titles) – Ctrl+Shift+Down arrow – Ctrl+D – to copy to all cells below Check your work. Select only your samples. Do not select blank cells. Copy the correct titles.
In Bioedit: Paste Over titles Save as: your_filename.fas In the pull down menu – choose fasta
rdp.cme.msu.edu
Make an Account
For very tiny datasets
very tiny datasets
Do not navigate away
For pyrosequenced datasets
You can navigate away and pick up the results later.
Check in while running?
Done: Download
What do you get back? Confidence file Classifications Failed classifications Check this file. – Problems have happened if not empty. Hierarchy
Open classifications in excel Focus on Phylum for tutorial. Use any level.
Tutorial ease condense sample periods
Keep it Tidy Cut out what isn’t needed or being used.
Confidence in the Classification Sort on the confidence level Odd groups – Leave in or take out? Replace those below your confidence level Unclassified_ =concatenate($column$row,cell) $ keeps the column or row static in your formula as you drag to multiple cells
Copy to a new column Remove Duplicates
Even at the Phylum Level 60 categorical levels – (could be 2 for every known phylum)
To count by sample and phylum classification =countifs($K:$K,$O2,$A:$A,P$1) How to stop recalculation and manually restart – don’t crash your machine! You can easily cause hours of computation on large matrixes!
Stop Automatic Recalculation In the Options Menu Under Formulas F9
Fill Formulas and Check Cells
Copy Whole and Paste As Values
Sum Rows and Sort On (Your Favorite) Total is Customary Can rearrange as needed
Select Data and Titles Only
Make a 100% Stacked Chart Not very pretty
Switch Perspectives
Size Correctly
To Compare to Genboree RDP must be run png.result.tar.gz
What did we learn?
Some Problems Commonly Encountered Column formatting is not always followed with RDP output. To get a clean graph with all taxonomic levels on one column, you may need to sort and remove sections of data. Some have additional levels Some have fewer levels of classification
Additional Levels of Classification Delete Move over Delete Move over
Fewer Levels of Classification Common Trouble Makers Bacteroidetes Verrucomicrobia Acidobacteria Dehalococcoidetes Cyanobacteria Chloroplast Deltaproteobacteria OD1_genera_incertae_sedis TM7_genera_incertae_sedis Armatimonadetes WS3_genera_incertae_sedis Move Over