Update on HTProcess Apps Sciplant May 8, 2014. HTProcessPipeline Purpose- – Provide a more functional set of commonly needed applications for RNASeq and.

Slides:



Advertisements
Similar presentations
In Silico Primer Design and Simulation for Targeted High Throughput Sequencing I519 – FALL 2010 Adam Thomas, Kanishka Jain, Tulip Nandu.
Advertisements

Holdings Management Overview
Genboree Microbiome Workbench 16S Workshop Part I March 11 th, 2014 Julia Cope Emily Hollister Kevin Riehle.
SYSTEM PROGRAMMING & SYSTEM ADMINISTRATION
US Army Corps of Engineers BUILDING STRONG ® Creating a Data Dictionary for Your Local Data USACE SDSFIE Training Prerequisites: Preparing Your Local Data.
System Center Configuration Manager Push Software By, Teresa Behm.
Chapter 7 Using Data Flow Diagrams
RNA-seq Analysis in Galaxy
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
With Windows 7 Comprehensive© 2012 Pearson Education, Inc. Publishing as Prentice Hall1 PowerPoint Presentation to Accompany GO! with Windows 7 Comprehensive.
System Design Chapter 8. Objectives  Understand the verification and validation of the analysis models.  Understand the transition from analysis to.
Polymorphism and Variant Analysis Lab
EnSight analyze, visualize, communicate EnSight 6.x Advanced Training Part 1 Instructors: Mike Krogh, Anders Grimsrud.
Copyright© 2003 Avaya Inc. All rights reserved Upgrade to Communication Manager 2.0 with Migration to Linux 8.0 Purpose: This presentation was prepared.
Inking. 2 Pen Basics You MUST keep your pen tethered at all times. If you lose the stylus, the replacement cost is $30. Buttons should face YOU in garage.
An intro to programming. The purpose of writing a program is to solve a problem or take advantage of an opportunity Consists of multiple steps:  Understanding.
Week 7 Working with the BASH Shell. Objectives  Redirect the input and output of a command  Identify and manipulate common shell environment variables.
10/6/2015 ©2007 Scott Miller, University of Victoria 1 2a) Systems Introduction to Systems Introduction to Software Systems Rev. 2.0.
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Polymorphism & Variant Analysis Lab Saurabh Sinha Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 1 Powerpoint by Casey Hanson.
Next Generation DNA Sequencing
Transcriptome Analysis
Eran Yanowski, Eran Hornstein’s: Monitor drug impact on the transcriptome of mouse beta cells (primary and cell-line) using Transeq/RNA-Seq Report.
Planetary Science Archive PSA User Group Meeting #1 PSA UG #1  July 2 - 3, 2013  ESAC PSA Archiving Standards.
NIH Extracellular RNA Communication Consortium 2 nd Investigators’ Meeting May 19 th, 2014 Sai Lakshmi Subramanian – (Primary
An Introduction to CCP4i The CCP4 Graphical User Interface Peter Briggs CCP4.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Databases. What is a database?  A database is used to store data. The word DATA is actually Latin for FACTS. A database is, therefore, a place, or thing.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Label Design Tool Management Council F2F Washington, D.C. November 29-30, 2006
Lesson 1 Operating Systems, Part 1. Objectives Describe and list different operating systems Understand file extensions Manage files and folders.
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
1 EndNote X2 Your Bibliographic Management Tool 29 September 2009 Humanities and Social Sciences Resource Teams.
Connecting with Computer Science2 Objectives Learn how software engineering is used to create applications Learn some of the different software engineering.
Object-oriented Design and Programming Conrad Huang PC204, Fall 2004.
Program Development Cycle
Integrate, check and share documents Module 3.3. Integrate, check and share documents Module 3.3.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
The iPlant Collaborative
Denovo Sequencing Practical. Overview Very small dataset from Staphylococcus aureus – 4 million x 75 base-pair, paired end reads Cover basic aspects of.
QC and pre-assembly analyses
ExRNA Data Analysis Tools in the Genboree Workbench Organized and Hosted by the Data Management and Resource Repository (DMRR) Sai Lakshmi Subramanian.
CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment.
1 Berger Jean-Baptiste
OCR A Level F453: The function and purpose of translators Translators a. describe the need for, and use of, translators to convert source code.
HOMER – a one stop shop for ChIP-Seq analysis
Using Galaxy to build and run data processing pipelines Jelle Scholtalbers / Charles Girardot GBCS Genome Biology Computational Support.
Overview of Genomics Workflows
Solvency II Tripartite template V2 and V3 Presentation of the conversion tools proposed by FundsXML France.
THE MOUSE Left Click THE MOUSE Right Click.
Computing challenges in working with genomics-scale data
IUIE Reporting Basics Workshop
Cancer Genomics Core Lab
Document Generation QRG
GE3M25: Data Analysis, Class 4
Managing results files
4. Javascript Pemrograman Web I Program Studi Teknik Informatika
SRA Submission Pipeline
Code Analysis, Repository and Modelling for e-Neuroscience
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
CHAPTER 6 ELECTRONIC DATA PROCESSING SYSTEMS
Code Analysis, Repository and Modelling for e-Neuroscience
Chapter 7 Software Testing.
BF528 - Sequence Analysis Fundamentals
Regulatory Genomics Lab
Games Development 2 Tools Programming
Lecture 23 CS 507.
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

Update on HTProcess Apps Sciplant May 8, 2014

HTProcessPipeline Purpose- – Provide a more functional set of commonly needed applications for RNASeq and Genome Assembly – Provide tools that allow bio-scientists to spend more time on considering the science of their data analysis path, and less time mousing, clicking, and typing – Key attributes: pipeline analysis environment, documentation of the analysis, smart information) management (including metadata

Current Active List

HTProcess_fastqc-0.1 Creates main HTProcess directory of read files Creates a manifest file to describe the reads in a library of sequencing files Runs Fastqc on each read file. For paired read files, tests for proper pairing of reads Takes in up to 3 different folders of reads: left reads; right reads; and unpaired reads Prepares single report readable within the user’s browser by clicking on it

HTProcess_fastqc-0.1 Example: fastqc_summary.html fastqc_summary.html

HTProcess_Reads

HTProcess.log HTPROCESS1 Mon May 5 15:38:12 MST 2014 fastqc is finished testing 2 files in the first paired read directory. fastqc is finished testing 2 files in the second paired read directory. fastqc is finished testing 1 files in the directory for single reads. Reads1 and Reads2 have the same number of files. Testing for valid pairing. SRR sra_1.fastq,SRR sra_2.fastq properly ordered SRR sra_1.fastq,SRR sra_2.fastq properly ordered All Trim settings have been set to trim settings 1. Edit them on manifest_file.txt to customize trimming. Starting creation of summary file for FASTQC reports First Phase of HTPROCESS1 FINISHED Mon May 5 15:46:07 MST 2014 The summary file for all the FASTQC reports has been created. HTPROCESS-FASTQC FINISHED Mon May 5 15:46:40 MST 2014

Manifest File- example HTProcess1_Reads Library_name=testfiles Library_num=1 condition=testing pairing=paired_and_unpaired pair_spacing=400 pair_sd=35 pair_type=fragment encoding_SRR sra_1.fastq=1.5 max_len_SRR sra_1.fastq=78 encoding_SRR sra_1.fastq=1.5 max_len_SRR sra_1.fastq=76 encoding_SRR sra_2.fastq=1.5 max_len_SRR sra_2.fastq=78 encoding_SRR sra_2.fastq=1.5 max_len_SRR sra_2.fastq=76 encoding_fragSc_1.fq=1.9 max_len_fragSc_1.fq=101 library_max=101 Paired Reads !PPP SRR sra_1.fastq,SRR sra_2.fastq !PPP SRR sra_1.fastq,SRR sra_2.fastq Reads1 !XXX SRR sra_1.fastq !XXX SRR sra_1.fastq Reads2 !YYY SRR sra_2.fastq !YYY SRR sra_2.fastq ReadsS !ZZZ fragSc_1.fq !!!TRIM SETTINGS!!! !PairTrim 1 SRR sra_1.fastq,SRR sra_2.fastq !PairTrim 1 SRR sra_1.fastq,SRR sra_2.fastq !SingleTrim 1 fragSc_1.fq

Apps for creating the input directories, and for creating them and running HTProcess_fastqc

HTProcess_trimmomatic_0.32 Trimmomatic is a mult-function paired or unpaired read trimmer Basic trimming of a given number of bases on either end Removes contaminants that match sequences given by the user in a separate fasta file – e.g. adapter, primer sequences 2 Different methods for quality trimming Allows for 2 different programs or sets of settings to be used with the reads in a library

Manifest File- example HTProcess1_Reads Library_name=testfiles Library_num=1 condition=testing pairing=paired_and_unpaired pair_spacing=400 pair_sd=35 pair_type=fragment encoding_SRR sra_1.fastq=1.5 max_len_SRR sra_1.fastq=78 encoding_SRR sra_1.fastq=1.5 max_len_SRR sra_1.fastq=76 encoding_SRR sra_2.fastq=1.5 max_len_SRR sra_2.fastq=78 encoding_SRR sra_2.fastq=1.5 max_len_SRR sra_2.fastq=76 encoding_fragSc_1.fq=1.9 max_len_fragSc_1.fq=101 library_max=101 Paired Reads !PPP SRR sra_1.fastq,SRR sra_2.fastq !PPP SRR sra_1.fastq,SRR sra_2.fastq Reads1 !XXX SRR sra_1.fastq !XXX SRR sra_1.fastq Reads2 !YYY SRR sra_2.fastq !YYY SRR sra_2.fastq ReadsS !ZZZ fragSc_1.fq !!!TRIM SETTINGS!!! !PairTrim 1 SRR sra_1.fastq,SRR sra_2.fastq !PairTrim 1 SRR sra_1.fastq,SRR sra_2.fastq !SingleTrim 1 fragSc_1.fq Change to 2 to use a separate program to trim the reads in this file!

Inputs for HTProcess_trimmomatic_0.32

Settings for trimmomatic

Output Files for HTProcess_trimmomatic_0.32

Combined unpaired reads for the entire library

Output Files for HTProcess_trimmomatic_0.32 Individual single read files for those who want to run all reads in a single library

Manifest File HTProcess1_Reads Library_name=testfiles Library_num=1 condition=testing pairing=paired_and_unpaired pair_spacing=400 pair_sd=35 pair_type=fragment encoding_SRR sra_1.fastq=1.5 max_len_SRR sra_1.fastq=78 encoding_SRR sra_1.fastq=1.5 max_len_SRR sra_1.fastq=76 encoding_SRR sra_2.fastq=1.5 max_len_SRR sra_2.fastq=78 encoding_SRR sra_2.fastq=1.5 max_len_SRR sra_2.fastq=76 encoding_fragSc_1.fq=1.9 max_len_fragSc_1.fq=101 library_max=101 Paired Reads !PPP SRR sra_1.fastq,SRR sra_2.fastq !PPP SRR sra_1.fastq,SRR sra_2.fastq Reads1 !XXX SRR sra_1.fastq !XXX SRR sra_1.fastq Reads2 !YYY SRR sra_2.fastq !YYY SRR sra_2.fastq ReadsS !ZZZ fragSc_1.fq !!!TRIM SETTINGS!!! !PairTrim 1 SRR sra_1.fastq,SRR sra_2.fastq !PairTrim 1 SRR sra_1.fastq,SRR sra_2.fastq !SingleTrim 1 fragSc_1.fq !!!TRIMMED READS!!! !TRIMMED_Pr TrmPr1_SRR sra_1.fastq,TrmPr2_SRR sra_2.fastq !TRIMMED_Pr TrmPr1_SRR sra_1.fastq,TrmPr2_SRR sra_2.fastq !TRIMMED_S TrmS_testfiles.fastq !!!TRIMMED ORPHAN AND INDIVIDUAL SINGLES!!! Not used for normal analysis with a completely uniform library !TRIMMED_OS TrmSos_SRR sra_1.fastq !TRIMMED_OS TrmSos_SRR sra_1.fastq !TRIMMED_OS TrmSos_fragSc_1.fq

Manifest File !!!TRIM SETTINGS!!! !PairTrim 1 SRR sra_1.fastq,SRR sra_2.fastq !PairTrim 1 SRR sra_1.fastq,SRR sra_2.fastq !SingleTrim 1 fragSc_1.fq !!!TRIMMED READS!!! !TRIMMED_Pr TrmPr1_SRR sra_1.fastq,TrmPr2_SRR sra_2.fastq !TRIMMED_Pr TrmPr1_SRR sra_1.fastq,TrmPr2_SRR sra_2.fastq !TRIMMED_S TrmS_testfiles.fastq !!!TRIMMED ORPHAN AND INDIVIDUAL SINGLES!!! Not used for normal analysis with a completely uniform library !TRIMMED_OS TrmSos_SRR sra_1.fastq !TRIMMED_OS TrmSos_SRR sra_1.fastq !TRIMMED_OS TrmSos_fragSc_1.fq Keep track of which reads are to be used for which analysis path with the entries in the manifest file

HTProcess_tophat Nearly finished Produces BAM files for all trimmed reads Will produce a merged BAM file, also, to reflect the whole library

Manifest file vs Metadata In the future if metadata can be read by app and written by an app, then : – The manifest file could be replaced by metadata – The manifest file could be populated by metadata – The metadata could be populated by the app, but the manifest file could be created, too, for a more portable list of files used

Mobile/Tablet Use? The HTProcess apps are written, in part, with the idea that tablet/touchscreen interfaces may be better supported by the DE HTProcess apps may work within a more pipeline-oriented interface within the DE or a separate/related interface

Additional Apps HTProcess_Kmergenie – Analyze kmer coverage of reads HTProcess_Cufflinks – If I have time RSEM (not HTProcess) Updates of older apps