MiSeq Validation Pipeline

Slides:



Advertisements
Similar presentations
GP2013 (R2) New features in GP2013 (R2). New Ribbon for windows Edit List is the Print button on the right without the paper background Action pane can.
Advertisements

Tutorial 6 Creating a Web Form
Variant Calling Workshop Chris Fields Variant Calling Workshop v2 | Chris Fields1 Powerpoint by Casey Hanson.
Creating Web Page Forms
Tutorial 6 Forms Section A - Working with Forms in JavaScript.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
A walkthrough of the SageQuest Mobile Control Online & ESC integration.
NGS Analysis Using Galaxy
Chapter 9 Collecting Data with Forms. A form on a web page consists of form objects such as text boxes or radio buttons into which users type information.
Business Optix Library Service – Workflow
4-Sep-15 HTML Forms Mrs. Goins Web Design Class. Parts of a Web Form A Form is an area that can contain Form Control/Elements. Each piece of information.
1 Creating Web Forms in HTML Web forms collect information from customers Web forms include different control elements including: –Input boxes –Selection.
JQuery Page Slider. Our goal is to get to the functionality of the Panic Coda web site.Panic Coda web site.
Chapter 6: Forms JavaScript - Introductory. Previewing the Product Registration Form.
© 2011 Delmar, Cengage Learning Chapter 9 Collecting Data with Forms.
Let’s Make An Form! Bonney Armstrong GD 444 Westwood College February 9, 2005.
XHTML Introductory1 Forms Chapter 7. XHTML Introductory2 Objectives In this chapter, you will: Study elements Learn about input fields Use the element.
Developing Workflows with SharePoint Designer David Coe Application Development Consultant Microsoft Corporation.
Variant Calling Workshop Chris Fields Variant Calling Workshop | Chris Fields | PowerPoint by Casey Hanson.
ENTERING ELIGIBLE ENERGY RESOURCE APPLICATIONS IN DELAFILE Version 2.0 August 25, 2015.
C# Tutorial -1 ASP.NET Web Application with Visual Studio 2005.
© 2010 Delmar, Cengage Learning Chapter 8 Collecting Data with Forms.
Chapter 8 Collecting Data with Forms. Chapter 8 Lessons Introduction 1.Plan and create a form 2.Edit and format a form 3.Work with form objects 4.Test.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
ChrGeneticist introduction for reviewer Jinlian Wang 10/8/2014.
11/25/2015Slide 1 Scripts are short programs that repeat sequences of SPSS commands. SPSS includes a computer language called Sax Basic for the creation.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Variant Calling Workshop.
WinCvs. WinCVS WinCvs is a window based version control system. Use WinCvs when  You want to save every version of your file you have ever created. CVS.
Baseline Edge Task Force Training
OBA functionality in PowerPoint 2007 Purpose : This slide will provide you a quick walk through of the possibility of OBA functionality in Power Point.
Learning Aim C.  In this section we will look at how text, tables, forms and frames can be used in web pages.
Tutorial 6 Creating a Web Form
Enterprise Oracle Solutions Oracle Report Manager The New ADI and More Revised:June 20091Report Manager/SROAUG Presentation.
SDA Formulas Online Webinar Please use the mute button or press #6 on your phones to mute the teleconference line If you have questions or technical problems.
SDA Formulas Online webinar Tomika Moore Senior Chemist, Nonbeverage Products Laboratory September 13, 2012.
Invoices and Service Invoices Training Presentation for Raytheon Supply Chain Platform (RSCP) April 2016.
How to Create eInvoices in SCP-RR Training Presentation for Supply Chain Platform: Rolls-Royce January 2016.
How To Make Easysite Forms By Joshua Crawley Contact:
From Reads to Results Exome-seq analysis at CCBR
WampServer 2 Installation WAMP is a solution stack of open source programs used together to run dynamic Web sites or servers Most common expansion:  Windows,
2440: 141 Web Site Administration Web Forms Instructor: Joseph Nattey.
MIKADO – Generation of ISO – SeaDataNet metadata files
Produce the help package
ICE Integrated Cloud Environment Cloud Scanning and Mobile Printing
Topic 2: Hardware and Software
Project Management: Messages
Chopping and Releasing HAZREPs
Cancer Genomics Core Lab
Next Generation Sequencing Analysis
UW-Superior V10.7 for Instructors
EMBL-EBI, programmatically - take a REST from manual searching: Sequence analysis tools Web Production Team Anna Foix Joon Lee.
Variant Calling Workshop
Initial points: Five-year post-tenure review is stipulated by both NJ Statutes and the NJ-AFT Agreement; process is governed by MOA 99 5-year review.
Single Sample Registration
Testing The JCOP Framework
First Bite of Variant Calling in NGS/MPS Precourse materials
Section 17.1 Section 17.2 Add an audio file using HTML
CCA Skill Certification
Macrosystems EDDIE: Getting Started + Troubleshooting Tips
Yonglan Zheng Galaxy Hands-on Demo Step-by-step Yonglan Zheng
What is Cookie? Cookie is small information stored in text file on user’s hard drive by web server. This information is later used by web browser to retrieve.
Using JDeveloper.
Macrosystems EDDIE: Getting Started + Troubleshooting Tips
Maximize read usage through mapping strategies
Yating Liu July 2018 G-OnRamp workshop
TIMS 2019 Update to Lost Assignments Report
Macrosystems EDDIE: Getting Started + Troubleshooting Tips
Submitting and Accessing 5-year Post-Tenure Review Materials in Vibe
The Variant Call Format
Presentation transcript:

MiSeq Validation Pipeline Michael Wornow

MiSeq FASTQ => ICE Entry Illumina MiSeq Pipeline.py We want to take sequencing results from the MiSeq and compare them against a reference ICE entry to see if they are valid Run name Sample IDs IGV XML HTML Excel

MiSeq file terminology Main Library SubLibrary1 (aka Pool1 or Sample1) 1-2 fastq.gz SubLibrary2 SubLibrary3 1 Reference Sequence FASTA from ICE entry Each sequencing run is known as the “main Library.” Within each Main Library are multiple “SubLibraries,” “Samples,” or “Pools” (as they’re known on the PacBio). Each SubLibrary can contain 1 or 2 fastq.gz files, depending on if the read was single or paired

Pipeline Process FASTQ BAM Joel’s Tools Perl NERSC VCF & BED GATK Java IGV, HTML, Excel Ernst’s Postprocessing Python SMB Server SampleSheet.csv FASTQ files for each sequencing run are pulled from JBEI’s SMB Server. The exact names of the sequencing results are taken from the SampleSheet.csv file, located in the MiSeqOutput folder of the SMB Server. These FASTQ files are passed to Joel’s Perl scripts, which were modified to not be NERSC-specific, to generate BAM files. The GATK generates VCF and BED files, then Ernst’s Postprocessing scripts (also slightly modified), to generate IGV file and HTML and Excel summaries

In-depth Joel’s Tools GATK Ernst’s Post-processing Aligns SubLibrary’s reads to a reference sequence Generates: BAM for every SubLibrary BAM file: Alignment of a sequence to 1+ reference sequences GATK Calculates coverage, depth, and finds SNPs in aligned reads Generates: VCF, BED, covdepth VCF: Info on SNPs (mutations that are only one nucleotide) BED: Annotations of aligned reads Covdepth: Coverage and depth of aligned reads Ernst’s Post-processing Makes calls on coverage information Generates: call_summary.txt, IGV, HTML, Excel Call_summary.txt: Summary of calls for each SubLibrary IGV: Links to BED, BAM, and VCF files, to view in IGV Viewer HTML: Prettified version of call_summary.txt Excel: Prettified version of call_summary.txt

Workflow Scientist runs MiSeq Goes to Web Interface, submits Run Name, Sample IDs, Reference sequence, and email to Pipeline Pipeline runs – emailed when finished IGV file and Excel sheet automatically uploaded to ICE

Website This website is up and running on my local Name of MiSeq Run – Dropdown select2 menu, autofills with all the Folders currently on the SMB Server Sample IDs – Dropdown select2 menu, autofills with applicable sample IDs _ Recently added button to auto fill this field with ALL samples for a given MiSeq run Email – User’s email

After submitting the form, the website runs the command listed above in red. The pipeline.py itself is a command line utility with five flags: -m MainLibraryName -s SubLibrariesNames -r ReferenceSequence -e Email -l LogFileLocation

Actual pipeline.py Output mwornow-m:seqval mwornow-m$ python3 pipeline.py –m –s –r –e -l Get sublibraries fastq.gz and reference sequences FASTA from SMB… Running prep_ref… Picard CreateSequencDictionary Runtime.totalMemory()=128974848 BWA Index Running beta_prep_setup_dirs… AAHBB_libName_libName libName /Users/mwornow-m/Desktop/seqval/MiSeqOutputFolder/118433_TAAGGCG.fastq.gz Running beta_slice_fq… 200.9517548084259 seconds Running beta_run_alignments… BWA Picard FixMateInformation Picard MarkDuplicates Runtime.totalMemory()=128188416 2226.508181810379 seconds Creating config.xml for postprocessing.sh script... Running postprocessing.sh… Running GATK Depth of Coverage... covdepth file generated Running GATK Unified Genotyper... snps.gatk.vcf file generated Running GATK Callable Loci... callable.bed file generated Running make_calls_gatk.py script... call_summary.txt file generated Actual pipeline.py Output 3-5 mins 30-40 mins with JGI sequences, 5 mins with JBEI sequences This is output of the actual pipeline.py running. The main bottleneck is the beta_run_alignments.pl _ With JGI’s longer fastq.gz, it took 30 mins for a SubLibrary. With JBEI’s fastq.gz’s, however, it only took about 5 minutes with 5 SubLibraries

What’s working User submits info to website Pipeline runs Logs output Runs Joel’s Tools and Ernst’s Scripts Generates file structure storing all files (bam, bed, vcf, call_summary.txt) This file structure can be zipped and archived for later review of sequencing runs IGV file correctly generated

To do… Fix beta_run_alignments.pl and run_bwa.pl Have reference sequences come directly from ICE Upload IGV files to ICE reference entry Create interface in ICE to view IGV files Send email to user notifying that pipeline has finished Fix beta_run_alignments.pl and run_bwa.pl => Explained on next slide

Pipeline | Correct For some reason, my Pipeline.py outputs very similar but slightly different numbers than Ernst’s JGI Pipeline. The “type” of calls (e.g. color coding) is always correct, but the actual number in the circles can be off by 10-50 units. I’ve traced the error to the beta_run_alignments.pl script of Joel’s Tools (which itself calls run_bwa.pl) but due to the extremely long running time of the script it’s been a bit hard to debug. During the presentation, I asked Ernst what he thought about this discrepancy, and he said that he wasn’t sure why the numbers weren’t coming out right but that it might be OK.