Konstantin Okonechnikov Qualimap v2: advanced quality control of

Slides:



Advertisements
Similar presentations
SRI International Bioinformatics 1 Genome Browser Markus Krummenacker Bioinformatics Research Group SRI, International Q
Advertisements

IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
DNAseq analysis Bioinformatics Analysis Team
01 The sRNA Workbench project date 20/07/2012 Matthew. B. Stocks.
Algorithm Animation for Bioinformatics Algorithms.
1 PingER Executive Plots MAGGIE 21 st Feb Sequence 1. Brief Overview of Project 2. Current Implementation and Capabilities 3. Types of Charts.
Introduction to eValid Presentation Outline What is eValid? About eValid, Inc. eValid Features System Architecture eValid Functional Design Script Log.
Data Formats & QC Analysis for NGS Rosana O. Babu 8/19/20151.
NGS Analysis Using Galaxy
DRAW+SneakPeek: Analysis Workflow and Quality Metric Management for DNA-Seq Experiments O. Valladares 1,2, C.-F. Lin 1,2, D. M. Childress 1,2, E. Klevak.
Lesson 5 – Looking at the Output MATSim Tutorial, 2011, Shanghai 1.
PAGE: A Framework for Easy Parallelization of Genomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering The Ohio.
07/06/11 New Features of WS-PGRADE (and gUSE) 2010 Q Q2 Miklós Kozlovszky MTA SZTAKI LPDS.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
NGS data analysis CCM Seminar series Michael Liang:
Copyright OpenHelix. No use or reproduction without express written consent1.
Next Generation DNA Sequencing
The iPlant Collaborative
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
Automatic Report Generation for WLCG/EGEE D. D. Sonvane (Gridview Team) B.A.R.C.
Evolutionary Art with Multiple Expression Programming By Quentin Freeman.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California.
De novo assembly validation
Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015
The iPlant Collaborative
No reference available
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
Copyright OpenHelix. No use or reproduction without express written consent1.
Bayesian Evolutionary Analysis by Sampling Trees (BEAST) LEE KIM-SUNG Environmental Health Institute National Environment Agency.
Google Sites Credit to: Rich Hoeg, Create rich web pages easily Collect all your info in one place Control who can view and.
Using Galaxy to build and run data processing pipelines Jelle Scholtalbers / Charles Girardot GBCS Genome Biology Computational Support.
Introduction of Wget. Wget Wget is a package for retrieving files using HTTP and FTP, the most widely-used Internet protocols. Wget is non-interactive,
Canadian Bioinformatics Workshops
From Reads to Results Exome-seq analysis at CCBR
1 RIC 2009 Symbolic Nuclear Analysis Package - SNAP version 1.0: Features and Applications Chester Gingrich RES/DSA/CDB 3/12/09.
Galaxy for analyzing genome data Hardison October 05, 2010
Canadian Bioinformatics Workshops
Software: the good, the bad and the ugly
Introductory RNA-seq Transcriptome Profiling
Using command line tools to process sequencing data
Canadian Bioinformatics Workshops
An Introduction to RNA-Seq Data and Differential Expression Tools in R
Placental Bioinformatics
Amos Tanay Nir Yosef 1st HCA Jamboree, 8/2017
Proposed IT Infrastructure for TOP OS project
A web portal for management of biological data and applications
Integrative Genomics Viewer (IGV)
NGS Analysis Using Galaxy
Data-intensive Computing: Case Study Area 1: Bioinformatics
Bioinformatics Research Group
Gene expression from RNA-Seq
Short Read Sequencing Analysis Workshop
RNA-Seq analysis in R (Bioconductor)
QC analysis Uppsala University Work done by Jonas Almlöf
GE3M25: Data Analysis, Class 4
ChipViewer is coded to visualize and analyze the tiling chip data.
Kallisto: near-optimal RNA seq quantification tool
University of Pittsburgh
Yonglan Zheng Galaxy Hands-on Demo Step-by-step Yonglan Zheng
Algorithm Animation for Bioinformatics Algorithms
MapView: visualization of short reads alignment on a desktop computer
Exploring and Understanding ChIP-Seq data
Maximize read usage through mapping strategies
BF528 - Sequence Analysis Fundamentals
Computational Pipeline Strategies
Tractography Algorithm: Toolbox:
Automating NGS Gene Panel Analysis Workflows
RNA-Seq Data Analysis UND Genomics Core.
Quality Control & Nascent Sequencing
Presentation transcript:

Konstantin Okonechnikov Qualimap v2: advanced quality control of high throughput sequencing data Max Planck Institute For Infection Biology Molecular Biology department 10.07.2015 BOSC 2015 Dublin 1 1

Quality Control of HTS data High-throughput sequencing : high speed, deep coverage, various applications However there are platform specific, protocol-based and analysis errors (duplicates, PCR, algorithm induced biases, etc…) There are tools that allow to perform Quality Control task: FastQC Samtools Picard tools RSeQC … Qualimap There are various solutions. Also Qualimap. Previous year second version of Qualimap.

Qualimap A Java application, computes statistics and graphs for the evaluation and quality control of HTS alignment data. Both GUI and command line interfaces available. Input: BAM/SAM, GTF/GFF/BED Output: HTML /PDF, simple text format Analysis modes: BAM QC, RNA-seq QC, Counts QC So we provided our solution. Since we are here in the visualization Qualimap

Qualimap 2 A Java application, computes statistics and graphs for the evaluation and quality control of HTS alignment data. Both GUI and command line interfaces available. Input: BAM/SAM, GTF/GFF/BED Output: HTML /PDF, simple text format Analysis modes: BAM QC, Multi-sample BAM QC, RNA-seq QC + Counts QC v2 So we provided our solution. Since we are here in the visualization Qualimap

BAM QC mode BAM file global statistics: alignment types, coverage analysis, insert size, duplication rate, etc. Improvements and novelties: Additional metrics for mapping quality, coverage, insert size Region/out-of-region analysis Duplicate alignments exclusion … many more … Global data (reference size, number of reads), coverage (mapped, paired, per chromosome) , reads info (insert size, quality, homopolymers, duplication rate)

Novel mode: multisample BAM QC Common analysis of generated BAM QC results Detailed plots: coverage, GC-content, insert size, etc. Principal component analysis based on selected parameters A new feature, available from the snapshots

Novel mode: multisample BAM QC Common analysis of generated BAM QC results: Detailed plots: coverage, GC-content, insert size, etc. Principal component analysis based on selected parameters A new feature, available from the snapshots

Redesigned RNA-seq quality control: RNA-seq QC + Counts QC Analysis types: transcript coverage and proportion, GC bias, 5‘-3‘ bias, counts analysis (expression level, gene types, etc.) Novel :multisample analysis in Counts QC Main properties are comparable to RSeQC and RNASeQC. Novelty multisample analysis, example saturation, expression analysis

Becoming more “open-source” Source code repository: bitbucket Discussion forum: google-groups User activity (09.2014 – 06.2015): 15 bug reports 9 novel issue suggestions 3 bug-fixes from users Supporters are mentioned in commits, news and history log Each user mentioned in the log

Thank you for your attention! Useful links: Web-site: http://qualimap.bioinfo.cipf.es/ Bitbucket: https://bitbucket.org/kokonech/qualimap Reference: García-Alcalde F, Okonechnikov K, et al “Qualimap: evaluating next-generation sequencing alignment data." Bioinformatics 28, no. 20 (2012): 2678-2679 This is it. If you are working with NGS alignment data, give Qualimap a try. Hope you will find it as useful as we find it for our research. Thanks a lot.