Update on HTProcess Apps Sciplant May 8, 2014
HTProcessPipeline Purpose- – Provide a more functional set of commonly needed applications for RNASeq and Genome Assembly – Provide tools that allow bio-scientists to spend more time on considering the science of their data analysis path, and less time mousing, clicking, and typing – Key attributes: pipeline analysis environment, documentation of the analysis, smart information) management (including metadata
Current Active List
HTProcess_fastqc-0.1 Creates main HTProcess directory of read files Creates a manifest file to describe the reads in a library of sequencing files Runs Fastqc on each read file. For paired read files, tests for proper pairing of reads Takes in up to 3 different folders of reads: left reads; right reads; and unpaired reads Prepares single report readable within the user’s browser by clicking on it
HTProcess_fastqc-0.1 Example: fastqc_summary.html fastqc_summary.html
HTProcess_Reads
HTProcess.log HTPROCESS1 Mon May 5 15:38:12 MST 2014 fastqc is finished testing 2 files in the first paired read directory. fastqc is finished testing 2 files in the second paired read directory. fastqc is finished testing 1 files in the directory for single reads. Reads1 and Reads2 have the same number of files. Testing for valid pairing. SRR sra_1.fastq,SRR sra_2.fastq properly ordered SRR sra_1.fastq,SRR sra_2.fastq properly ordered All Trim settings have been set to trim settings 1. Edit them on manifest_file.txt to customize trimming. Starting creation of summary file for FASTQC reports First Phase of HTPROCESS1 FINISHED Mon May 5 15:46:07 MST 2014 The summary file for all the FASTQC reports has been created. HTPROCESS-FASTQC FINISHED Mon May 5 15:46:40 MST 2014
Manifest File- example HTProcess1_Reads Library_name=testfiles Library_num=1 condition=testing pairing=paired_and_unpaired pair_spacing=400 pair_sd=35 pair_type=fragment encoding_SRR sra_1.fastq=1.5 max_len_SRR sra_1.fastq=78 encoding_SRR sra_1.fastq=1.5 max_len_SRR sra_1.fastq=76 encoding_SRR sra_2.fastq=1.5 max_len_SRR sra_2.fastq=78 encoding_SRR sra_2.fastq=1.5 max_len_SRR sra_2.fastq=76 encoding_fragSc_1.fq=1.9 max_len_fragSc_1.fq=101 library_max=101 Paired Reads !PPP SRR sra_1.fastq,SRR sra_2.fastq !PPP SRR sra_1.fastq,SRR sra_2.fastq Reads1 !XXX SRR sra_1.fastq !XXX SRR sra_1.fastq Reads2 !YYY SRR sra_2.fastq !YYY SRR sra_2.fastq ReadsS !ZZZ fragSc_1.fq !!!TRIM SETTINGS!!! !PairTrim 1 SRR sra_1.fastq,SRR sra_2.fastq !PairTrim 1 SRR sra_1.fastq,SRR sra_2.fastq !SingleTrim 1 fragSc_1.fq
Apps for creating the input directories, and for creating them and running HTProcess_fastqc
HTProcess_trimmomatic_0.32 Trimmomatic is a mult-function paired or unpaired read trimmer Basic trimming of a given number of bases on either end Removes contaminants that match sequences given by the user in a separate fasta file – e.g. adapter, primer sequences 2 Different methods for quality trimming Allows for 2 different programs or sets of settings to be used with the reads in a library
Manifest File- example HTProcess1_Reads Library_name=testfiles Library_num=1 condition=testing pairing=paired_and_unpaired pair_spacing=400 pair_sd=35 pair_type=fragment encoding_SRR sra_1.fastq=1.5 max_len_SRR sra_1.fastq=78 encoding_SRR sra_1.fastq=1.5 max_len_SRR sra_1.fastq=76 encoding_SRR sra_2.fastq=1.5 max_len_SRR sra_2.fastq=78 encoding_SRR sra_2.fastq=1.5 max_len_SRR sra_2.fastq=76 encoding_fragSc_1.fq=1.9 max_len_fragSc_1.fq=101 library_max=101 Paired Reads !PPP SRR sra_1.fastq,SRR sra_2.fastq !PPP SRR sra_1.fastq,SRR sra_2.fastq Reads1 !XXX SRR sra_1.fastq !XXX SRR sra_1.fastq Reads2 !YYY SRR sra_2.fastq !YYY SRR sra_2.fastq ReadsS !ZZZ fragSc_1.fq !!!TRIM SETTINGS!!! !PairTrim 1 SRR sra_1.fastq,SRR sra_2.fastq !PairTrim 1 SRR sra_1.fastq,SRR sra_2.fastq !SingleTrim 1 fragSc_1.fq Change to 2 to use a separate program to trim the reads in this file!
Inputs for HTProcess_trimmomatic_0.32
Settings for trimmomatic
Output Files for HTProcess_trimmomatic_0.32
Combined unpaired reads for the entire library
Output Files for HTProcess_trimmomatic_0.32 Individual single read files for those who want to run all reads in a single library
Manifest File HTProcess1_Reads Library_name=testfiles Library_num=1 condition=testing pairing=paired_and_unpaired pair_spacing=400 pair_sd=35 pair_type=fragment encoding_SRR sra_1.fastq=1.5 max_len_SRR sra_1.fastq=78 encoding_SRR sra_1.fastq=1.5 max_len_SRR sra_1.fastq=76 encoding_SRR sra_2.fastq=1.5 max_len_SRR sra_2.fastq=78 encoding_SRR sra_2.fastq=1.5 max_len_SRR sra_2.fastq=76 encoding_fragSc_1.fq=1.9 max_len_fragSc_1.fq=101 library_max=101 Paired Reads !PPP SRR sra_1.fastq,SRR sra_2.fastq !PPP SRR sra_1.fastq,SRR sra_2.fastq Reads1 !XXX SRR sra_1.fastq !XXX SRR sra_1.fastq Reads2 !YYY SRR sra_2.fastq !YYY SRR sra_2.fastq ReadsS !ZZZ fragSc_1.fq !!!TRIM SETTINGS!!! !PairTrim 1 SRR sra_1.fastq,SRR sra_2.fastq !PairTrim 1 SRR sra_1.fastq,SRR sra_2.fastq !SingleTrim 1 fragSc_1.fq !!!TRIMMED READS!!! !TRIMMED_Pr TrmPr1_SRR sra_1.fastq,TrmPr2_SRR sra_2.fastq !TRIMMED_Pr TrmPr1_SRR sra_1.fastq,TrmPr2_SRR sra_2.fastq !TRIMMED_S TrmS_testfiles.fastq !!!TRIMMED ORPHAN AND INDIVIDUAL SINGLES!!! Not used for normal analysis with a completely uniform library !TRIMMED_OS TrmSos_SRR sra_1.fastq !TRIMMED_OS TrmSos_SRR sra_1.fastq !TRIMMED_OS TrmSos_fragSc_1.fq
Manifest File !!!TRIM SETTINGS!!! !PairTrim 1 SRR sra_1.fastq,SRR sra_2.fastq !PairTrim 1 SRR sra_1.fastq,SRR sra_2.fastq !SingleTrim 1 fragSc_1.fq !!!TRIMMED READS!!! !TRIMMED_Pr TrmPr1_SRR sra_1.fastq,TrmPr2_SRR sra_2.fastq !TRIMMED_Pr TrmPr1_SRR sra_1.fastq,TrmPr2_SRR sra_2.fastq !TRIMMED_S TrmS_testfiles.fastq !!!TRIMMED ORPHAN AND INDIVIDUAL SINGLES!!! Not used for normal analysis with a completely uniform library !TRIMMED_OS TrmSos_SRR sra_1.fastq !TRIMMED_OS TrmSos_SRR sra_1.fastq !TRIMMED_OS TrmSos_fragSc_1.fq Keep track of which reads are to be used for which analysis path with the entries in the manifest file
HTProcess_tophat Nearly finished Produces BAM files for all trimmed reads Will produce a merged BAM file, also, to reflect the whole library
Manifest file vs Metadata In the future if metadata can be read by app and written by an app, then : – The manifest file could be replaced by metadata – The manifest file could be populated by metadata – The metadata could be populated by the app, but the manifest file could be created, too, for a more portable list of files used
Mobile/Tablet Use? The HTProcess apps are written, in part, with the idea that tablet/touchscreen interfaces may be better supported by the DE HTProcess apps may work within a more pipeline-oriented interface within the DE or a separate/related interface
Additional Apps HTProcess_Kmergenie – Analyze kmer coverage of reads HTProcess_Cufflinks – If I have time RSEM (not HTProcess) Updates of older apps