Considerations for Analyzing Targeted NGS Data HLA

Slides:



Advertisements
Similar presentations
Functional Genomics with Next-Generation Sequencing
Advertisements

HLA: matching and donor selection
Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women.
A Few More Things About B Cell Development
Genetic Approaches to Rare Diseases: What has worked and what may work for AHC Erin L. Heinzen, Pharm.D, Ph.D Center for Human Genome Variation Duke University.
HLA TYPING D Middleton MDSC175: Transplantation Science for Transplant Clinicians (Online) POSTGRADUATE SCHOOL OF MEDICINE A MEMBER OF THE RUSSELL GROUP.
Basics of Linkage Analysis
Introduction  Human leukocyte antigen (HLA) is the major histocompatibility complex (MHC) in humans  Group of genes ('superregion') on chromosome 6.
Next-generation sequencing
Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
RFLP DNA molecular testing and DNA Typing
DNA-Based Tissue Typing
High Throughput Sequencing
Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO.
Considerations for Analyzing Targeted NGS Data Exome Tim Hague, CTO.
Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Dr Katie Snape Specialist Registrar in Genetics St Georges Hospital
Chapter 3 -- Genetics Diversity Importance of Genetic Diversity Importance of Genetic Diversity -- Maintenance of genetic diversity is a major focus of.
Whole Exome Sequencing for Variant Discovery and Prioritisation
Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.
Todd J. Treangen, Steven L. Salzberg
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
HLA Analysis and Next Generation Sequencing Henry Erlich, Ph.D. Cherie Holcomb, Ph.D. Roche Molecular Systems picture placeholder NGS and EFI, May 14,
CS177 Lecture 10 SNPs and Human Genetic Variation
Considerations for Analyzing Targeted NGS Data Exome Tim Hague, CTO.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.
E XOME SEQUENCING AND COMPLEX DISEASE : practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand 1.
Identification of Copy Number Variants using Genome Graphs
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
HW2: exome sequencing and complex disease Jacquemin Jonathan de Bournonville Sébastien.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Simple-Sequence Length Polymorphisms SSLPs Short tandemly repeated DNA sequences that are present in variable copy numbers at a given locus. Scattered.
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.
Specific Defenses of the Host Part 2 (acquired or adaptive immunity)
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
From Reads to Results Exome-seq analysis at CCBR
Simple-Sequence Length Polymorphisms
Interpreting exomes and genomes: a beginner’s guide
MHC March 24, :00-12:00.
Validation of HLA Typing by NGS
Anajane G. Smith1, 2 Shalini E. Pereira1, 2 Dan E. Geraghty1, 2
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
16-point HLA Typing with Long Amplicon Analysis v2
Disease risk prediction
Introduction to GENDX HLA typing products
Causes of Variation in Substitution Rates
Major Histocompatibility Complex
Interpretation Next Generation Sequencing (Bench Clinic)
SBT Unique Selling Points
DNA Analysis of the HLA Gene Complex
Major Histocompatibility complex OR
Major Histocompatibility complex OR
Resolving Ambiguities
بنام خداي زيبائيها.
Richard G. Phelps, Andrew J. Rees  Kidney International 
HLA-Class I: Typing Theory
Single-Molecule Sequencing: Towards Clinical Applications
BF528 - Genomic Variation and SNP Analysis
BF528 - Whole Genome Sequencing and Genomic Variation
SNPs and CNPs By: David Wendel.
Presentation transcript:

Considerations for Analyzing Targeted NGS Data HLA Tim Hague, CTO 1

Introduction Human leukocyte antigen (HLA) is the major histocompatibility complex (MHC) in humans. Group of genes ('superregion') on chromosome 6 Essentially encodes cell-surface antigen- presenting proteins. 2

Functions HLA genes have functions in: combating infectious diseases graft/transplant rejection autoimmunity cancer 3

Alleles Large number of alleles (and proteins). Many alleles are already known. The number of known alleles is increasing 4

Gene DRA DRB* DQA1 DQB1 DPA1 DPB1 Alleles 7 1260 47 176 34 155 HLA Class I Gene A B C Alleles 2013 2605 1551 Proteins 1448 1988 1119 HLA Class II Gene DRA DRB* DQA1 DQB1 DPA1 DPB1 Alleles 7 1260 47 176 34 155 Proteins 2 901 29 126 17 134 HLA Class II - DRB Alleles Gene DRB1 DRB3 DRB4 DRB5 Alleles 1159 58 15 20 Proteins 860 46 8 17 5

Analysis Challenges HLA genes have specific analysis challenges regardless of the sequencing technology. 6

High Polymorphism High rate of polymorphism – up to 100 times the average human mutation rate. The HLA-DRB1 and HLA-B loci have the highest sequence variation rate within the human genome. High degree of heterozygosity – homozygotes are the exception in this region. 7

8

Duplications High level of segmental duplications Lots of similar genes and lots of very similar pseudegenes. Duplicated segments can be more similar to each other within an individual than they are similar to the corresponding segments of the reference genome. 9

10

Complex Genetics Particularly HLA-DRB* The DR β-chain is encoded by 4 loci, however only no more than 3 functional loci are present in a single individual, and only a maximum of 2 per chromosome. 11

12

Mitigating Factors It's not all bad news: Many HLA alleles are already well known – both in terms of sequence and frequencies within the population. The HLA region is fairly small so there a high degree of linkage disequilibrium, and therefore lots of known haplotypes. 13

Traditional Typing SSO – low resolution, high throughput, cheap SSP – very fast results, low resolution SBT – sequence-based typing, high resolution, usually done by Sanger sequencing. 14

High resolution, an alternative to Sanger- based SBT NGS Typing High resolution, an alternative to Sanger- based SBT Why is it needed? 15

Sanger and HLA Sanger data is still the gold standard in the genomic sequencing industry, even though it is very expensive compared to NGS. 1 in 1'000 base error rate, if forward and reverse typing are done, error rate drops to 1 in 1'000'000. So why is it bad for HLA? 16

Phase Resolution 2x chromosome 6 Many loci, many alleles Lots of heterozygosity 17

Allele Phasing problem reference sequence consensus sequence T / A G / T OR??? Allele 1 Allele 2 T Allele 1 A T A Allele 2 18

The Problem with Sanger There is only one signal High degree of heterozygosity = high degree of ambiguity Requires statistical techniques based on known allele frequencies, plus manual intervention by trained operators Ambiguity can only be resolved statistically, which can lead to wrong assignment for rare types 19

20

Number of potential alleles HLA typing by Sanger method GGACSGGRASACACGGAAWGTGAAGGCCCACTCACAGACTSACCGAGYGRACCTGGGGACCCTGCGCGGCTACTACAACCAGAGCGAGGMCGGT Number of potential alleles 21

NGS Advantages Can reduce ambiguity Phase resolution - two signals, but lots of short reads Cheaper and faster than Sanger Less manual intervention required 22

NGS Data - Unphased 23

NGS Data - Phased 24

NGS Approaches HLA*IMP – chip based imputation engine Reference-based alignment, followed by a HLA call based on the variants detected during alignment Search against database of known alleles 25

Has been attempted by Broad Institute (HLA Caller) and Roche NGS Reference-based Fraught with difficulties Very hard to align reads to this region The variant/HLA call is only as good as the alignment No coverage = no call Has been attempted by Broad Institute (HLA Caller) and Roche 26

Alignment Efforts RainDance provide a targeted HLA amplification kit call HLAseq. Target: the whole MHC superregion (except for some tandem repeat regions) Goal: align this data, before doing variant/HLA call. 27

Diverse variant “density” in the MHC superregion Based on a single sample 28

Default BWA alignment – No coverage at an exon of HLA-DMB 29

Low coverage and orphaned reads at a HLA-DRB1 exon 30

BWA vs more permissive alignment: higher coverage = higher noise 31

Large targeted region without usable coverage 32

Not providing enough coverage everywhere NGS Reference-based Not providing enough coverage everywhere What about de novo? 33

De novo assembly (MIRA) 287 contigs (longest contig: 2199 bp) Mean contig size: 268 bp Median contig size: 209 bp Total consensus: 77084 bp RainDance target: ~ 3800000 bp 34

De novo assembly (MIRA) 35

NGS De Novo Alignment Not enough contigs produced, not enough coverage of the target region. What about a hybrid approach? 36

De novo assembly with “backbone” First, alignment to backbone, then de novo assembly Backbone: 2220 contigs from HG19 chr 6 (sum: 3554852 bps) → almost whole RainDance target Results: Max reads / backbone contig: 197 Max coverage: 71 37

De novo assembly with “backbone” 38

NGS Typing - Alignment Based We tried: Burrows Wheeler aligner More sensitive, seed and extend aligner De novo aligner 'Hybrid' de novo aligner The variant/HLA call is only as good as the alignment The alignments were not good enough 39

NGS Database Based Search against 'database' of known alleles Such as IMGT/HLA database, available from EBI web site Stanford, Connexio, JSI Medical, BC Cancer Agency and Omixon have all tried this approach. 40

41

DB Based Approach Advantages Less mapping headaches Unambiguous results Potential to be fast Difficulties Novel allele detection Homozygous alleles 42

43

44

Results with Exome data 45

Exon level detail 46

Detailed results - short read pileup 47

Conclusions DB based approach to HLA typing is new but very promising NGS approaches can resolve much of the ambiguity of Sanger SBT DB based approach can also overcome the limitations of NGS reference-based alignment 48

Conclusions Available DB based HLA typing tools differ in: Speed Sequencers supported Types of sequencing data supported (targeted, exome, whole genome) Ease of use Ambiguity of results Degree of manual intervention required Novel allele detection capabilities 49