Quality Control and Target Validation in Sequencing v1.0 Simon Andrews Laura Biggins Boo Virk.

Slides:



Advertisements
Similar presentations
Functional Genomics with Next-Generation Sequencing
Advertisements

Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Visualising and Exploring BS-Seq Data
Gene SymbolGene DescriptionFold Lna21-SP21-S RP2retinitis pigmentosa 2 (X-linked recessive) DDAH1dimethylarginine dimethylaminohydrolase
Simon v2.3 RNA-Seq Analysis Simon v2.3.
 Genomic sequence of model eukaryote Saccharomyces cerevisiae completed in (12.1 Mb)  Despite 16 years of intense research, function of nearly.
Peter Tsai Bioinformatics Institute, University of Auckland
WV-INBRE West Virginia IDeA Network of Biomedical Research Excellence Managing the NextGen data pipeline Jim Denvir, Ph.D.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
RNA-seq Analysis in Galaxy
SeqMonk tools for methylation analysis
High Throughput Sequencing
Sequencing Data Quality Saulo Aflitos. Read (≈100bp) Contig (≈2Kbp) Scaffold (≈ 2Mbp) Pseudo Molecule (Super Scaffold) Paired-End Mate-Pair LowComplexityRegion.
NGS Analysis Using Galaxy
Detecting enriched regions (Chip- seq, RIP-seq) Statistical evaluation of enriched regions Data displayed in Genome Browser Detection of enriched motifs.
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Figure 1. Figure 2 Somatic Mutation Summary A A A98-07-F009a tumor mutant reads > 10 normal mutant read = 0 tumor mutant fraction.
Next Generation DNA Sequencing
RNA-Seq Analysis Simon V4.1.
Introduction To Next Generation Sequencing (NGS) Data Analysis
Presenting Results Laura Biggins v1.0 1.
Next Generation Sequencing pipeline: a joint LONI – BIRN [UCLA – UCI] collaborative project F. Macciardi – March 16, 2011.
Tsute (George) Chen Bioinformatics Core Department of Microbiology The Forsyth Institute March 24 th, 2015 HOMD A Tour to the Data and Tools.
Molecular and Genomic Evolution Getting at the Gene Pool.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
The end replication problem: -DNA polymerase requires an OH group to attach bases too -There is no OH group at the extreme 5’ end of the lagging strand.
Computational methods for genomics-guided immunotherapy Sahar Al Seesi Computer Science & Engineering Department, UCONN Immunology Department, UCONN Health.
Moderní metody analýzy genomu - analýza Mgr. Nikola Tom Brno,
Canadian Bioinformatics Workshops
User-friendly Galaxy interface and analysis workflows for deep sequencing data Oskari Timonen and Petri Pölönen.
HOMER – a one stop shop for ChIP-Seq analysis
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
 Facilities Open House Functional Genomics Facility Molishree Joshi, Ph.D. 6/1/2015 Contact Information:
Canadian Bioinformatics Workshops
QuasR: Quantify and Annotate Short Reads in R Anita Lerch, Dimos Gaidatzis, Florian Hahne and Michael Stadler Friedrich Miescher Institute for Biomedical.
Centralizing Bioinformatics Services: Analysis Pipelines, Opportunities, and Challenges with Large- scale –Omics, and other BigData High-Performance Computing.
Misleading bioinformatics: Mistakes, Biases, Mis-interpretations and how to avoid them Festival of Genomics 2017 Course Exercise Material:
Software: the good, the bad and the ugly
Simon v RNA-Seq Analysis Simon v
Virginia Commonwealth University
SeqMonk tools for methylation analysis
SeqMonk tools for methylation analysis
RNA Quantitation from RNAseq Data
Placental Bioinformatics
Cancer Genomics Core Lab
Biases and their Effect on Biological Interpretation
Understanding and Validating Experimental Expectations
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Canadian Bioinformatics Workshops
Sequencing technology and assembly
The FASTQ format and quality control
Unit 4: Genetic Information, Variation and Relationships between Organisms Lesson 2 The Triplet Code A sequence of three DNA bases, called a triplet,
Artefacts and Biases in Gene Set Analysis
2nd (Next) Generation Sequencing
Quality control for Sequencing Experiments
ChIP-Seq Data Processing and QC
Eukaryote Regulation and Gene Expression
Exploring and Understanding ChIP-Seq data
Misleading Bioinformatics Mistakes, Biases, Mis-Interpretations and how to avoid them Festival of Genomics 2017.
Visualising and Exploring BS-Seq Data
A critical evaluation of HTQC: a fast quality control toolkit for Illumina sequencing data Chandan Pal, PhD student Sahlgrenska Academy Institute of.
Artefacts and Biases in Gene Set Analysis
BF nd (Next) Generation Sequencing
Evolution of Genomes Chapter 21.
Presentation transcript:

Quality Control and Target Validation in Sequencing v1.0 Simon Andrews Laura Biggins Boo Virk

Support service for bioinformatics – Academic – Babraham Institute – Commercial – Consultancy Support BI Sequencing Facility – HiSeq / MiSeq based sequencing service – Data Management / Processing / Analysis

Interests in QC Developed QC for in-house sequencing Developed QC packages – FastQC – BamQC – FastQ Screen Developed application specific QC – Bismark (bisulphite methylation) – HiCUP (Hi-C genome structure) Developed data visualisation QC – SeqMonk (generic sequencing visualisation / analysis) RNA-Seq QC Small RNA QC Duplication QC

Areas for today How do sequencing experiments go wrong – Learn from mistakes of others How to construct good QC – What should you run – What should you look for – How should you interpret / act What software exists – Review of existing QC packages / use cases

An example… PC1 PC2

Genes for PC1 (85 total) GeneDescription Arhgef4Rho guanine nucleotide exchange factor (GEF) 4 CflarCASP8 and FADD-like apoptosis regulator Als2amyotrophic lateral sclerosis 2 (juvenile) homolog (human) Cxcr2chemokine (C-X-C motif) receptor 2 Col4a3collagen, type IV, alpha 3 Sagretinal S-antigen Gpr35G protein-coupled receptor 35 Acmsdamino carboxymuconate semialdehyde decarboxylase Qsox1quiescin Q6 sulfhydryl oxidase O13RikRIKEN cDNA O13 gene Mrps14mitochondrial ribosomal protein S14 Scyl3SCY1-like 3 (S. cerevisiae) Ildr2immunoglobulin-like domain containing receptor 2 Atp1a2ATPase, Na+/K+ transporting, alpha 2 polypeptide Slamf8SLAM family member 8 Wdr38WD repeat domain 38 Exd1exonuclease 3'-5' domain containing 1 Serf2small EDRK-rich factor 2

Coverage of Raw Data Normal Gene PCA Gene

Using a different read mapper…

The Evolution of Sequencing Analysis Early Data Exploratory Tools Application specific tools Pipeline Development High throughput analysis Meta-analyses

Good Lessons from exploration and tool development General structure of the data Quantitation Points of commonality – Expectations – Reference points – Normalisation Statistics

Bad Lessons from exploration and tool development Failure modes – Contamination – Library failures Artefacts Biases Mis-interpretations

The Evolution of Sequencing Analysis Early Data Exploratory Tools Application specific tools Pipeline Development High throughput analysis Meta-analyses

Areas for today How do sequencing experiments go wrong – Learn from mistakes of others How to construct good QC – What should you run – What should you look for – How should you interpret / act What software exists – Review of existing QC packages / use cases