Canadian Bioinformatics Workshops

Slides:



Advertisements
Similar presentations
Create a Web Site with Publisher 2000 for Marilyn Seguins Class.
Advertisements

Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
Variant Calling Workshop Chris Fields Variant Calling Workshop v2 | Chris Fields1 Powerpoint by Casey Hanson.
High Throughput Sequencing
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
1000 Genomes SV detection Boston College Chip Stewart 24 November 2008.
NGS Analysis Using Galaxy
RExPrimer Pongsakorn Wangkumhang, M.Sc. Biostatistics and Informatics Laboratory, Genome Institute, National Center for Genetic Engineering and Biotechnology.
Copyright © Texas Education Agency, All rights reserved. 1 Web Technologies Website Development with Dreamweaver.
Customized cloud platform for computing on your terms !
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
Variant Calling Workshop Chris Fields Variant Calling Workshop | Chris Fields | PowerPoint by Casey Hanson.
Computer Lab (I) Introduction of galaxy and UCSC genome browser.
Website Development with Dreamweaver
Session 1 SESSION 1 Working with Dreamweaver 8.0.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
NGS data analysis CCM Seminar series Michael Liang:
Next Generation DNA Sequencing
Introduction to ArcGIS for Environmental Scientists Module 1 – Data Visualization Chapter 4 - Layouts.
Mutation Calling IGV Exercises. Run IGV – Web search IGV (Integrative Genomics Viewer) – Go to Download page – may need to provide – Launch with.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
Sackler Medical School
Identification of Copy Number Variants using Genome Graphs
Copyright OpenHelix. No use or reproduction without express written consent1.
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
Genome STRiP ASHG Workshop demo materials
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
TRACKSTER &CIRCSTER DEMO Slides: /g/funcgen/trainings/visualization/Demos/Trackster+Circster.ppt Galaxy: Galaxy Dev:
Using Galaxy to build and run data processing pipelines Jelle Scholtalbers / Charles Girardot GBCS Genome Biology Computational Support.
IGV Demo Slides:/g/funcgen/trainings/visualization/Demos/IGV_demo.ppt Galaxy Dev: 0.
Canadian Bioinformatics Workshops
Visualizing data from Galaxy
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Galaxy for analyzing genome data Hardison October 05, 2010
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Konstantin Okonechnikov Qualimap v2: advanced quality control of
Using command line tools to process sequencing data
Lesson: Sequence processing
What is Microsoft Internet Explorer?
Integrative Genomics Viewer (IGV)
Quality Control & Preprocessing of Metagenomic Data
NGS Analysis Using Galaxy
Variant Calling Workshop
Canadian Bioinformatics Workshops
1. Introduction to Visual Basic
Figure 2: Make a component
ChIP-Seq Analysis – Using CLCGenomics Workbench
Section 3: Gene Technologies in Detail
SVs and CNVs They are often confused…
Prepared by Kimberly Sayre and Jinbo Bi
MapView: visualization of short reads alignment on a desktop computer
Exploring and Understanding ChIP-Seq data
Copy Number Variation Sequencing for Comprehensive Diagnosis of Chromosome Disease Syndromes  Desheng Liang, Ying Peng, Weigang Lv, Linbei Deng, Yanghui.
Genomic alterations in breast cancer cell line MDA-MB-231.
Material for today’s workshop is at:
Eric Samorodnitsky, Jharna Datta, Benjamin M
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Yating Liu July 2018 G-OnRamp workshop
BF528 - Genomic Variation and SNP Analysis
Canadian Bioinformatics Workshops
Sequence Visualization
Introduction to RNA-Seq & Transcriptome Analysis
Welcome To Microsoft Word 2016
Presentation transcript:

Canadian Bioinformatics Workshops www.bioinformatics.ca

Module #: Title of Module 2

Module 3 Tools for HT-seq Data Visualization Sorana Morrissy Informatics for High Throughput Sequencing Data June 10-11, 2015

Learning Objectives of Module to appreciate the different data viz tools in genomics to know when to use a particular tool to gain more experience with genome browsers to become an expert in variation inspection single nucleotide and structural variants

Organization Part I Part II visualization tools genome browsers IGV overview Part II visualizing single nucleotide polymorphisms visualizing structural variants

Part I: Visualization Tools

Why visualize our data?

Anscombe’s quartet Each dataset (I, II, III, IV): Average y value = 7.50 and average x value = 9 The variance for x = 11 and for y = 4.12 The correlation between x and y is 0.816 in each case A linear regression for each dataset follows the equation y = 0.5x + 3

Anscombe’s quartet Each dataset (I, II, III, IV): Average y value = 7.50 and average x value = 9 The variance for x = 11 and for y = 4.12 The correlation between x and y is 0.816 in each case A linear regression for each dataset follows the equation y = 0.5x + 3

Preattentive vs attentive visual processing From: https://www.propublica.org/nerds/item/a-big-article-about-wee-things Preattentive processing: “cognitive operations that can be performed prior to focusing attention on any particular region of an image.” i.e. the stuff you notice right away.

Preattentive attributes From: http://www.perceptualedge.com/articles/ie/visual_perception.pdf encoded properly using preattentive attributes, outliers are easily identified visually

Why visualize? the human visual system is a low-cost* and high-performance sense maker, to identify patterns debugger, to identify issues and outliers * compared to cost of writing, debugging, and running computational scripts

Visualization Tools in Genomics there are over 40 different genome browsers, which to use? depends on task at hand kind and size of data data privacy

HT-seq Genome Browsers Integrative Genome Viewer UCSC Genome Browser Cancer Genome Browser Trackster (part of Galaxy) Savant Genome Browser task at hand : visualizing HT-seq reads, especially good for inspecting variants kind and size of data : large BAM files, stored locally or remotely data privacy : run on the desktop, can keep all data private UCSC Genome Browser has been retro-fitted to display BAM files Trackster is a genome browser that can perform visual analytics on small windows of the genome, deploy full analysis with Galaxy

Integrative Genomics Viewer (IGV) Desktop application for the interactive visual exploration of integrated genomic datasets Epigenomics RNA-Seq Microarrays NGS alignments http://www.broadinstitute.org/igv >85,000 registrations (2014) mRNA, CNV, Seq

Features With IGV you can… Explore large genomic datasets with an intuitive, easy-to- use interface. Integrate multiple data types with clinical and other sample information. View data from multiple sources: local, remote, and “cloud-based”. Automation of specific tasks using command-line interface

IGV data sources Local files TCGA GenomeSpace HTTP server FTP server - View local files without uploading. - View remote files without downloading the whole dataset.

Using IGV: the basics Launch IGV Select a reference genome Load data Navigate through the data WGS data SNVs structural variations

Launch IGV

Launch IGV 1. Select genome from the drop-down menu 2. Load data

Load data Select File > Load from Server… Open the Tutorials menu, select UI Basics

Screen layout

Screen layout menus toolbar genome ruler data panel tracks track names genome features

File formats and track types The file format defines the track type. The track type determines the display options For current list see: www.broadinstitute.org/igv/FileFormats

Viewing alignments Whole chromosome view

Viewing alignments – Zoom in How far do you need to zoom in to see the alignments? 30 kb or set a different threshold in Preferences Higher value  requires more memory Low coverage files  ok to use higher value Very deep coverage files  use lower value

Viewing alignments – Zoom in Bases that do not match the reference sequence are highlighted by color

SNVs and Structural variations Important metrics for evaluating the validity of SNVs: Coverage Amount of support Strand bias / PCR artifacts Mapping qualities Base qualities Important metrics for evaluating SVs: Insert size Read pair orientation

Viewing SNPs and SNVs

Viewing SNPs and SNVs

Viewing Structural Events Paired reads can yield evidence for genomic “structural events”, such as deletions, translocations, and inversions. Alignment coloring options help highlight these events based on: Inferred insert size (template length) Pair orientation (relative strand of pair)

Paired-end sequencing DNA or cDNA Fragment and size select

Paired-end sequencing DNA or cDNA Fragment and size select insert size Read from each end

Paired-end sequencing DNA or cDNA Fragment and size select insert size Read from each end Align to Reference inferred insert size

Interpreting inferred insert size The “inferred insert size” can be used to detect structural variants including Deletions Insertions Inter-chromosomal rearrangements: (Undefined insert size)

Deletion What is the effect of a deletion on inferred insert size?

Deletion Reference Genome Subject

Deletion Reference Genome Subject

Deletion Reference Genome Subject

Deletion Inferred insert size is > expected value Reference Genome inferred insert size Subject expected insert size

Color by insert size

Deletion Note drop in coverage Pairs with larger than expected insert size are colored red.

Insert size color scheme Smaller than expected insert size: Larger than expected insert size: Pairs on different chromosomes Each end colored by chromosome of its mate

Rearrangement CHR 1 CHR 6 TUMOR NORMAL

Rearrangement CHR 1 CHR 6 TUMOR NORMAL Color indicates mate is on chromosome 6

Interpreting Read-Pair Orientations Orientation of paired reads can reveal structural events: Inversions Duplications Translocations Complex rearrangements Orientation is defined in terms of read strand, left vs right, and read order, first vs second

Inversion Reference genome

Inversion Reference genome A B

Inversion Reference Genome A B Subject B A

Inversion Reference Genome A B Subject B A

Inversion Reference Genome A B Subject B A

Inversion Reference Genome A B Subject B A

Inversion Reference Genome A B Subject B A

Inversion Reference Genome A B Subject B A

Inversion Reference Genome A B Subject B A

Inversion Reference Genome A B

Inversion Anomaly: expected orientation of pair is inward facing ( ) Reference Genome A B Anomaly: expected orientation of pair is inward facing ( )

Inversion Reference Genome A B “Left” side pair

Inversion Reference Genome A B “Right” side pair

Color by pair orientation

Inversion Note drop in coverage at breakpoints

Visualization Lab