Canadian Bioinformatics Workshops
2Module #: Title of Module
Module 4 Visual Analysis of HT-seq data
Module 4 bioinformatics.ca Learning Objectives of Module to appreciate the different data viz tools in genomics to know when to use a particular tool to gain more experience with genome browsers to become an expert in variation inspection – single nucleotide and structural variants to become familiar with next-gen variant analysis tools
Module 4 bioinformatics.ca Organization Part I (9:00-10:30) – genome browsers – visualizing single nucleotide and structural variants Part II (11:00-12:30) – variant search engines – finding disease-causing genetic mutations
Module 4 bioinformatics.ca Part I : browsing HT-seq data, inspecting variants
Module 4 bioinformatics.ca Why visualize our data?
Module 4 bioinformatics.ca Anscombe’s quartet each of these datasets has the same mean and variance
Module 4 bioinformatics.ca Preattentive processing encoded properly, outliers are easily identified
Module 4 bioinformatics.ca Preattentive processing (video)
Module 4 bioinformatics.ca Why visualize? the human visual system is a low-cost* and high- performance – sense maker, to identify patterns – debugger, to identify issues and outliers * compared to cost of writing, debugging, and running computational scripts
Module 4 bioinformatics.ca Visualization Tools in Genomics
Module 4 bioinformatics.ca Which tool to use? there are over 40 different genome browsers, which to use? depends on – task at hand – kind and size of data – data privacy
Module 4 bioinformatics.ca HT-seq Genome Browsers task at hand : visualizing HT-seq reads, especially good for inspecting previously identified variants kind and size of data : large BAM files, stored locally or remotely data privacy : run on the desktop, can keep all data private Integrative Genome Viewer Savant Genome Browser
Module 4 bioinformatics.ca You might also want to try New web-technologies are being applied to make HT-seq data browsing more interactive UCSC Genome Browser has been retrofitted to display BAM files Trackster is a genome browser that can perform visual analytics on small windows of the genome, deploy full analysis with Galaxy UCSC Genome Browser Trackster (part of Galaxy)
Module 4 bioinformatics.ca Savant desktop genome browser, designed for HT-seq data – emphasis on manually inspecting single nucleotide and structural variations
Module 4 bioinformatics.ca Review: structural variation detection covered in Module 3 two complementary approaches: – depth of coverage (DOC) – paired end mapping (PEM)
Module 4 bioinformatics.ca PEM: small insertions donor reference
Module 4 bioinformatics.ca PEM: large insertions donor reference
Module 4 bioinformatics.ca PEM: deletions reference donor
Module 4 bioinformatics.ca PEM: inversions reference donor one read inverted when mapped
Module 4 bioinformatics.ca PEM: tandem duplications reference donor order of read mappings reversed
Module 4 bioinformatics.ca Structural Variants in Savant Savant has a visualization mode for BAM files called “Matepair (Arc)” that is specialized for identifying structural variants using the PEM methodology it connects the locations of paired mappings by an arc – arc height represents the mapped distance – arc color represents the relative orientation of the reads (for complex rearrangements, like inverstions)
Module 4 bioinformatics.ca Savant demo
Module 4 bioinformatics.ca Lab Time
Module 4 bioinformatics.ca We are on a Coffee Break & Networking Session
Canadian Bioinformatics Workshops
28Module #: Title of Module
Module 4 Visual Analysis of HT-seq data
Module 4 bioinformatics.ca Quiz for Module 4 Part I
Module 4 bioinformatics.ca Question 1 which visualization mode in Savant is best for finding SNPs? why?
Module 4 bioinformatics.ca Question 2 which visualization mode in Savant is best for finding structural variations? why?
Module 4 bioinformatics.ca Question 3 e.g. chr1: 5,195, ,199,144 what kind of event does this image depict?
Module 4 bioinformatics.ca A: INSERTION donor reference
Module 4 bioinformatics.ca Question 4 what kind of event does this image depict? chr1: 26,489, ,490,661
Module 4 bioinformatics.ca A: DELETION reference donor
Module 4 bioinformatics.ca Question 5 what would a heterozygous deletion look like? chr1: 31,574, ,578,242
Module 4 bioinformatics.ca Question 6 what kind of event does this image depict? chr1: 81,659, ,661,916
Module 4 bioinformatics.ca A: Inversion reference donor one read inverted when mapped
Module 4 bioinformatics.ca Question 7 what kind of event does this image depict? chr1: 11,050, ,055,457
Module 4 bioinformatics.ca A: Tandem Duplication reference donor order of read mappings reversed
Module 4 bioinformatics.ca Part II : visual analytics for variants this is bonus material, covered if time permits contact for
Module 4 bioinformatics.ca Genetic Variant Analysis finding disease-causing genetic mutation is “like trying to find a needle in a haystack needlestack” lots of variants many distractors – many false positives errors in sequencing errors in variant prediction – most true positives are not causal not related to phenotype of interest, not damaging
Module 4 bioinformatics.ca Genetic Variant Analysis filter variants based on quality, effect, and relevance to disease variant calling annotationfiltrationvisualization Modules 1-3Module 4.1
Module 4 bioinformatics.ca Existing Tools command-line is powerful but not interactive Excel / Genome Browsers are interactive but not powerful
Module 4 bioinformatics.ca chr1 : 102,435,394 – 129,485,349 GO
Module 4 bioinformatics.ca MedSavant, a variant search engine
Module 4 bioinformatics.ca MedSavant visual analytics from variant calling to disease mutation discovery variant calling annotationfiltrationvisualization MedSavant
Module 4 bioinformatics.ca MedSavant demo
Module 4 bioinformatics.ca You might also want to try VarSifter works in memory, good for small projects this space is evolving; difficult to do a comprehensive comparison much more commercial activity compared to genome browsers VarSifterGolden Helix SVS (commercial)
Module 4 bioinformatics.ca We are on a Coffee Break & Networking Session