Sequence Visualization

Slides:



Advertisements
Similar presentations
KompoZer. This is what KompoZer will look like with a blank document open. As you can see, there are a lot of icons for beginning users. But don't be.
Advertisements

SRI International Bioinformatics 1 Genome Browser Markus Krummenacker Bioinformatics Research Group SRI, International Q
Variant Calling Workshop Chris Fields Variant Calling Workshop v2 | Chris Fields1 Powerpoint by Casey Hanson.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Welcome to the Turnitin.com Instructor Quickstart Tutorial ! This brief tour will take you through the basic steps teachers and students new to Turnitin.com.
Scaffold Download free viewer:
NGS Analysis Using Galaxy
Variant Calling Workshop Chris Fields Variant Calling Workshop | Chris Fields | PowerPoint by Casey Hanson.
1 The Genome Browser allows you to –Browse the Rice-Japonica, Maize and Arabidopsis genomes. –View the location of a particular feature on the rice genome.
IC 3 BASICS, Internet and Computing Core Certification Key Applications Lesson 10 Creating and Formatting an Excel Worksheet.
1 IE in the Classroom The Internet Explorer Web Browser EDW647 Internet for Educators Roger Webster, Ph.D. Millersville University Department of Computer.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Copyright OpenHelix. No use or reproduction without express written consent1.
Pathfinders How to engage your students in computer-based learning quickly and easily.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
Sackler Medical School
GISMO/GEBndPlan Overview Geographic Information System Mapping Object.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Variant Calling Workshop.
Copyright OpenHelix. No use or reproduction without express written consent1.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Accessing and visualizing genomics data
Welcome to the combined BLAST and Genome Browser Tutorial.
XP New Perspectives on Macromedia Dreamweaver MX 2004 Tutorial 5 1 Adding Shared Site Elements.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
TRACKSTER &CIRCSTER DEMO Slides: /g/funcgen/trainings/visualization/Demos/Trackster+Circster.ppt Galaxy: Galaxy Dev:
IGV Demo Slides:/g/funcgen/trainings/visualization/Demos/IGV_demo.ppt Galaxy Dev: 0.
Visualizing data from Galaxy
Canadian Bioinformatics Workshops
The poster title goes here and here
Using command line tools to process sequencing data
Canadian Bioinformatics Workshops
with a few tips and tools for managing mail
Weebly Elements, Continued
Imaging and Design for Online Environment
Integrative Genomics Viewer (IGV)
NGS Analysis Using Galaxy
Variant Calling Workshop
Bioinformatics Research Group
PowerPoint: Tables and Charts
Adding a File to a Course
Presenting Prezi Michael Pelitera
Assessment of HaloPlex Amplification for Sequence Capture and Massively Parallel Sequencing of Arrhythmogenic Right Ventricular Cardiomyopathy–Associated.
Adding Assignments and Learning Units to Your TSS Course
Tutorial for using Case It for bioinformatics analyses
University of Pittsburgh
Collaboration with Google Docs
The poster title goes here and here
Introducing Microsoft Office 2010
Poster title Author(s) Institution(s) Corresponding author’s Name
Poster title Author(s) Institution(s) Corresponding author’s Name
Assessment of HaloPlex Amplification for Sequence Capture and Massively Parallel Sequencing of Arrhythmogenic Right Ventricular Cardiomyopathy–Associated.
BLAST.
Annotation Presentation
The poster title goes here and here
The poster title goes here and here
Title of a Poster Paper for IW-FCV2019
Regulatory Genomics Lab
BF528 - Genomic Variation and SNP Analysis
The poster title goes here and here
Introduction to RNA-Seq & Transcriptome Analysis
Regulatory Genomics Lab
Chapter 8 Using Document Collaboration and Integration Tools
The poster title goes here and here
The poster title goes here and here
The poster title goes here and here
Presentation transcript:

Sequence Visualization

Tutorials and References IGV 🡪 Griffith Lab Tutorials (https://github.com/griffithlab/rnaseq_tutorial/) Broad Institute of MIT & Harvard (http://software.broadinstitute.org/software/igv/) Additional Reading: Oldies but goodies

Sequence Visualization - Motivation High-throughput genomics – daunting at first Files contains millions of reads – go through each one? Our favorite file formats are easy for machines to read, not us

Google Maps Comparison Would google maps be effective if it just spat out minimally formatted sets of coordinates? The map to the left is a human-centered visual summary of how to get from LSEB to SED Additional layers beyond start, stop, and directions provide additional context Genome browsers (like IGV) provide a human-centered visual summary of one/many sequencing experiments I guess mapquest is the equivalent to just receiving a set of coordinates

Integrative Genomics Viewer Genomic ”address”

Why use a genome browser? Visually confirm phenomena from sequencing experiments (seeing is believing) Left: Visualization of a SNP identified in a lab-evolved strain of yeast Integration of multiple experiments on the same coordinate system – collapsing several files Communication of key findings from sequencing experiments What’s better? Handing your boss a list of SNPs or showing them a few examples of the SNPs and comparing them visually to other genomic loci? Ward et al. (2013) – Latent regulatory potential of human-specific repetitive elements TE – transposable element. Encompasses several classes of human genetic elements with viral origin. These elements have integrated into our genome and settled over time

Commonly Used Genome Visualization Tools Integrative Genomics Viewer http://software.broadinstitute.org/software/igv/ UCSC Genome Browser https://genome.ucsc.edu

Goals for this Lecture Visualize a variety of genomic data Quickly navigate around the genome Learn how to be able to visualize your own read alignments Learn how to recognize SNPs and structural rearrangements

Integrative Genomics Viewer (IGV) For sequence visualization, there’s not really any important theory to go through (unless you’re interested in how coverage is calculated and stuff like that) These browsers are TOOLS. You get to know how to use a tool by taking it out for a spin. It’s really the best way to familiarize yourself with something completely new That’s why I had you do the tutorial ahead of time. Much better than having me up here trying to lead a live demo or just drone on about certain examples without you having seen them before We can also take the time now to clarify anything that may not have been clear in the tutorial. Some of the instructions were indeed lacking.

IGV: Introduction to Usage Download software from: http://software.broadinstitute.org/software/igv/download Open up the application Choose genome (e.g. Hg38, Mm10, or a custom genome) Drop down menu to select genome There’s a lot of information here presented in a way that makes sense to humans Once you get used to how the information is presented, it starts to make sense to you The tracks there are just the ones they had in the tutorial. You can add any bam, bed, etc. track from any experiment as long as it is mapped to the same reference There’s different phenomena kind of peppered in here as well. Different color SNPs corresponding to different nucleotides relative to the reference. The purple “I” looking character which shows insertions. You can look at the coverage histogram to determine whether a SNP is homozygous or heterozygous You can zoom in and out, navigate within a local regions by dragging, input a new region of your choice

IGV: Introduction to Usage Download software from: http://software.broadinstitute.org/software/igv/download Open up the application Choose genome (e.g. Hg38, Mm10, or a custom genome) Load alignment file(s) Visualize alignments: Coverage plot shows distribution of alignment Each elongated pentagon is a read Colored lines = differences from reference Reference sequence, amino acid sequences, and gene There’s a lot of information here presented in a way that makes sense to humans Once you get used to how the information is presented, it starts to make sense to you The tracks there are just the ones they had in the tutorial. You can add any bam, bed, etc. track from any experiment as long as it is mapped to the same reference There’s different phenomena kind of peppered in here as well. Different color SNPs corresponding to different nucleotides relative to the reference. The purple “I” looking character which shows insertions. You can look at the coverage histogram to determine whether a SNP is homozygous or heterozygous You can zoom in and out, navigate within a local regions by dragging, input a new region of your choice

SNPs (From Lecture 8) reference: AA-TACGGACGGACTTTA read1: AACTACGG-CGGACTTTA read2: AACTACGG-CGGACTTTA read4: AACTACGG-CGGACTTGA read5: AACTACGG-CGGACTTGA samtools mpileup -u -v -r chr22:29268316-29300343 -d 150 -f ../06/ref/chr22.fa NA12878_phased_chr22.bam > NA12878_chr22_samtools_EWSR1.vcf gatk HaplotypeCaller \ -L chr22:29268316-29300343 \ -R ../06/ref/chr22.fa \ -I NA12878_phased_chr22.bam \ -O NA12878_chr22_gatk_EWSR1.vcf.gz \ -ERC GVCF # BP_RESOLUTION INsertion DELetion SNP

IGV: Visualize SNPs Identified From Variant Calling How do we go from a set of labelled coordinates to a human-centered visual summary? (e.g. a VCF file)

IGV: Visualize SNPs Identified From Variant Calling Load tracks (.BAM files, .VCF files, etc.). Here: Alignment file for 1 sample Zoom into locus of interest. Here: chrXIV of our custom genome Set visualization parameters (colors, shading, etc.). Here: paired-end reads colored by forward (red) or reverse (blue) read Use annotation (.GTF file) to identify which gene SNP is in SNP

IGV: A Homopolymer Run A long stretch consisting of a single base You want to be looking at the sequence here (all those Ts) Difficult to map against, particularly at ends of reads Here we see things that the aligner thought were insertions or deletions in this homopolymer region

IGV: Coverage by GC percent Benjamini & Speed (2012) proposed that PCR step generates this GC bias Severity differs from experiment to experiment We see a concordance of GC content with coverage.

IGV: Low Mapping Quality Repetitive elements (tandem repeats, LINEs, SINEs, etc.) can have multiple nearly identical copies in the genome Reads will map to multiple versions in the genome Referred to as “low mapping quality” (reads visualized as white, not grey) Those white color reads interspersed with the grey reads are low mapping quality because they map to multiple regions in the genome equally well. They cannot be uniquely mapped and are therefore assigned a low mapping quality (hover over individual reads to inspect the MAPQ scores)

IGV: Homozygous Deletion All mate pairs that map here span the deletion Visually, the reference contains an “insert” of ~3kb Look at the sizes of other fragments

Automating Tasks IGV has its own set of common commands that it recognizes You can load a bunch of tracks for example using successive “load” commands in a script file The commands can be harnessed to do cool things (like sweep through a bed file and create snapshots of all the regions): That’s our very own David Jenkins As you get more used to looking at your alignment tracks, you’re going to start wanting to implement some of these automated tasks

UCSC genome browser The UCSC genome browser is a “site that contains the reference sequence and working draft assemblies for a large collection of genomes” The genome browser itself is just one of many utilities of the genome.ucsc website Other utilities of potential interest: Table browser – annotations with many different options for formatting. Made to order. Endlessly useful

Selecting which species to browse A wide variety of species/references are available This is first screen you hit on the genome.ucsc.edu website Everything from Human to the Ebola virus All human references from hg16 to hg19 (and hg38)

UCSC genome browser interface Shown here is the human reference genome (hg38) on some random window on chromosome 1. You can see how similar it is to the IGV interface that you’re used to: At the top, you have your navigation options, and zoom You have the window you’re currently in, how many bases long it is, and a search bar where you can query the location of something you might be interested in (genes, different genome positions etc.) There’s a schematic representation for where you are within the chromosome

UCSC Live Demo

Where UCSC beats IGV This is a screenshot of when you scroll down on the genome browser page. There’s a vast collection of annotation tracks that are readily available and quick to load onto the reference you’re browsing. If you remember from when you were loading in annotations from a server using IGV, there wasn’t nearly as vast a collection of annotations available. Each of the above categories has many, many tracks that can be layered on to the reference you’re looking at. You can select or hide as many of them as you like

Options for viewing your own data Online: Individual tracks can be loaded using the “add custom tracks” option (not recommended) Paste link to a track or track hub hosted elsewhere Other labs might host their data somewhere

Options for viewing your own data Local: Version of the UCSC genome browser can be downloaded (VirtualBox + GBiB) Supports viewing custom tracks, local track hub configurations Left: Text files that configure a local track hub A bit of a pain to set up for the first time, but the upkeep is easy and you can script ways to generate these files pretty easily as part of your analysis pipelines All the “usual suspect” formats are supported as well. You can actually use IGV and UCSC pretty interchangeably if you run them locally

Other Fun Things from UCSC BLAT = BLAST-like alignment too. It is very quick but requires exact or nearly-exact matches

IGV vs. UCSC Both are great and have very similar interfaces I’ve found IGV to be a bit faster locally, though I have absolutely nothing to substantiate this claim UCSC is much, much better for quick referencing In reality, the browser you end up using may be decided by what your supervisor/lab/company is already using There are also plenty of other available browsers aside from IGV and UCSC

Other Genome Visualization Tools Circos http://circos.ca/software/ MizBee (A Multiscale Synteny Browser) http://www.cs.utah.edu/~miriah/mizbee/Overview.html