Canadian Bioinformatics Workshops www.bioinformatics.ca.

Slides:



Advertisements
Similar presentations
Imputation for GWAS 6 December 2012.
Advertisements

Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Exploiting SNP polymorphism data Formation Bio-informatique, 9 au 13 février 2015.
G ENOTYPE AND SNP C ALLING FROM N EXT - GENERATION S EQUENCING D ATA Authors: Rasmus Nielsen, et al. Published in Nature Reviews, Genetics, Presented.
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
GBS & GWAS using the iPlant Discovery Environment
Presented by Qing Duan Dr. Yun Li group UNC at Chapel Hill
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
DNAseq analysis Bioinformatics Analysis Team
High Throughput Sequencing
Ruibin Xi Peking University School of Mathematical Sciences
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
Toward a unified view of human genetic variation Gabor Marth Boston College Biology Department on behalf of the International 1000 Genomes Project.
1000G Pilot 3 Progress in silico analysis and comparison to experimental validation Gabor Marth (Boston College) + A + L Kiran Garimella (Broad Institute)
Variant discovery Different approaches: With or without a reference? With a reference – Limiting factors are CPU time and memory required – Crossbow –
The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.
Canadian Bioinformatics Workshops
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Sequence Variation Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Informatics for next-generation sequence analysis – SNP calling Gabor T. Marth Boston College Biology Department PSB 2008 January
Informatics challenges and computer tools for sequencing 1000s of human genomes Gabor T. Marth Boston College Biology Department Cold Spring Harbor Laboratory.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
The Phase 1 Variant Set and Future Developments
NGS Workshop Variant Calling
Whole Exome Sequencing for Variant Discovery and Prioritisation
NGS Workshop Variant Calling and Structural Variants from Exomes/WGS
NGS Cancer Systems Biology Workshop Variant Calling and Structural Variants from Exomes/WGS Ramesh Nair May 30, 2014.
Capture / Resequencing Data Handling and Analysis
High throughput sequencing: informatics & software aspects Gabor T. Marth Boston College Biology Department BI543 Fall 2013 January 29, 2013.
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
Informative SNP Selection Based on Multiple Linear Regression
Gerton Lunter Wellcome Trust Centre for Human Genetics From calling bases to calling variants: Experiences with Illumina data.
ParSNP Hash Pipeline to parse SNP data and output summary statistics across sliding windows.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
E XOME SEQUENCING AND COMPLEX DISEASE : practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand 1.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Variant Calling Workshop.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
Genome STRiP ASHG Workshop demo materials
PanMap Mapping Genomic Variation in Western Chimpanzees
Variant calling: number of individuals vs. depth of read coverage Gabor T. Marth Boston College Biology Department 1000 Genomes Meeting Cold Spring Harbor.
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
Personalized genomics
Calling Somatic Mutations using VarScan
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
Analysis of Next Generation Sequence Data BIOST /06/2015.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Integrated variant detection Erik Garrison, Boston College.
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health National Cancer Institute Frederick National Laboratory is a federally funded research.
From Reads to Results Exome-seq analysis at CCBR
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Data and Hartwig Medical Foundation
> cd ~ > cp –R /media/sf_shared/BioNGS/GenomicVar/* .
Variant Calling Chris Fields
VCF format: variants c.f. S. Brown NYU
Introduction to RAD Acropora millepora.
EMC Galaxy Course November 24-25, 2014
Genomic Formats and the HLA Data Standard
10 Years of GWAS Discovery: Biology, Function, and Translation
New Statistical Methods for Family-Based Sequencing Studies
BF528 - Genomic Variation and SNP Analysis
BF528 - Whole Genome Sequencing and Genomic Variation
Variant Calling Chris Fields
Evaluating the Effects of Imputation on the Power, Coverage, and Cost Efficiency of Genome-wide SNP Platforms  Carl A. Anderson, Fredrik H. Pettersson,
The Variant Call Format
Presentation transcript:

Canadian Bioinformatics Workshops

Module 2 SNP & short-INDEL Discovery

SNP and short-INDEL Discovery bioinformatics.ca Genetic Variations: SNPs & INDELs

SNP and short-INDEL Discovery bioinformatics.ca SNP Discovery: Goal sequencing errors SNP

SNP and short-INDEL Discovery bioinformatics.ca SNP Discovery: Base Qualities High qualityLow quality

SNP and short-INDEL Discovery bioinformatics.ca SNPs & Bayesian Statistics base quality# of individualsallele call in read

SNP and short-INDEL Discovery bioinformatics.ca Genotyping & Consensus Generation AACGTTAGCATA strain 1 [A] strain 2 [C] strain 3 [A] haploid individual 1 [A/C] individual 3 [A/A] individual 2 [C/C] diploid AACGTTCGCATA AACGTTAGCATA AACGTTCGCATA AACGTTAGCATA

SNP and short-INDEL Discovery bioinformatics.ca Handling Trios Take advantage of duplicate data De novo mutation rate

SNP and short-INDEL Discovery bioinformatics.ca 1000G Consortium July 2010

SNP and short-INDEL Discovery bioinformatics.ca The power of imputation # of variant genotype calls # of incorrect variant genotype calls 1000G Consortium July 2010

SNP and short-INDEL Discovery bioinformatics.ca Nielsen et al June 2011

SNP and short-INDEL Discovery bioinformatics.ca BAM files Raw variants (VCF) Filtered variants (VCF) 200 GB 1 GB samtools GATK unified genotyper freeBayes glfMultiples samtools GATK unified genotyper freeBayes glfMultiples Expert user judgment GATK variant filtration 10 hours days 30 min Adapted from Mark DePristo Broad Institute February 2010 File size File format Tools Time Recalibrated BQ, duplicates removed Sites with non-reference bases are genotyped Separate true segregating variation from machine/alignment artifacts

SNP and short-INDEL Discovery bioinformatics.ca QC: HapMap & dbSNP International HapMap Project (phase III) – 1301 individuals in 11 populations genotyped – ~1 SNP per 2 kb – Proxy for false negatives dbSNP (build 130) – 14 million SNPs in human genome – Varying quality – Proxy for false positives

SNP and short-INDEL Discovery bioinformatics.ca QC: Coverage Auton & Hernandez Cornell University June 2009

SNP and short-INDEL Discovery bioinformatics.ca QC: Inter-SNP Distance

SNP and short-INDEL Discovery bioinformatics.ca QC: Hardy-Weinberg Violations Auton & Hernandez Cornell University June 2009 HapMap sites in red, other sites in blue. CEU, P(seg)>0.5, coverage 2-5x

SNP and short-INDEL Discovery bioinformatics.ca QC: Other metrics P(SNP) – Determining at the optimal P(SNP) threshold Transitions:transversions – Adjusting filters so that the ratio approaches 2

SNP and short-INDEL Discovery bioinformatics.ca Using multiple QC metrics Mark DePristo Broad Institute February 2010

SNP and short-INDEL Discovery bioinformatics.ca VCF ##fileformat=VCFv4.0 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA rs G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,. Genotype Genotype quality Read depth Haplotype qualities # samples Combined depth Allele frequency In dbSNP? In HapMap2?

SNP and short-INDEL Discovery bioinformatics.ca VCF Mark DePristo Broad Institute February 2010

SNP and short-INDEL Discovery bioinformatics.ca Experimental Design: Tools BAM files BQ recalibration Duplicate filtering SNP discovery (samtools) SNP discovery (GATK) View SNPs and INDELs (igv)

SNP and short-INDEL Discovery bioinformatics.ca