Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction  Human leukocyte antigen (HLA) is the major histocompatibility complex (MHC) in humans  Group of genes ('superregion') on chromosome 6.

Similar presentations


Presentation on theme: "Introduction  Human leukocyte antigen (HLA) is the major histocompatibility complex (MHC) in humans  Group of genes ('superregion') on chromosome 6."— Presentation transcript:

1

2 Introduction  Human leukocyte antigen (HLA) is the major histocompatibility complex (MHC) in humans  Group of genes ('superregion') on chromosome 6  Essentially encodes cell-surface antigen-presenting proteins

3 Functions  HLA genes have functions in:  combating infectious diseases  graft/transplant rejection  autoimmunity  cancer

4 The number of known alleles is increasing Alleles  Large number of alleles (and proteins)  Many alleles are already known

5 HLA Class I Gene A B C Alleles 2013 2605 1551 Proteins144819881119 HLA Class II Gene DRADRB* DQA1 DQB1 DPA1 DPB1 Alleles 7 1260 47 176 34 155 Proteins 2 901 29 126 17 134 HLA Class II - DRB Alleles Gene DRB1 DRB3 DRB4 DRB5 Alleles 1159 58 15 20 Proteins 860 46 8 17 HLA Polymorphism

6 Analysis Challenges  HLA genes have specific analysis challenges regardless of the sequencing technology  HLA is the most polymorphic region of the human genome, and is difficult to analyze with any technique (including NGS) – Many repeated structures and pseudogenes – Some of the HLA genes have complex genetics – Difficult to find the appropriate reference genome – Phasing the heterozygous positions separated by more than one read length is problematic

7 High Polymorphism High rate of polymorphism – up to 100 times the average human mutation rate  The HLA-DRB1 and HLA-B loci have the highest sequence variation rate within the human genome  High degree of heterozygosity – homozygotes are the exception in this region

8 Classical Alignment to a Reference

9 Duplications  High level of segmental duplications  Lots of similar genes and lots of very similar pseudegenes  Duplicated segments can be more similar to each other within an individual than they are similar to the corresponding segments of the reference genome

10 Homology, Pseudogenes & Repeats

11 Complex Genetics  Particularly HLA-DRB*  The DR β-chain is encoded by 4 loci, however only no more than 3 functional loci are present in a single individual, and only a maximum of 2 per chromosome.

12 HG19 Haplotypes at the HLA Region

13 Mitigating Factors It's not all bad news:  Many HLA alleles are already well known – both in terms of sequence and frequencies within the population  The HLA region is fairly small so there a high degree of linkage disequilibrium, and therefore lots of known haplotypes

14 Traditional Typing  SSO – low resolution, high throughput, cheap  SSP – very fast results, low resolution  SBT – sequence-based typing, high resolution, usually done by Sanger sequencing

15 NGS Typing High resolution, an alternative to Sanger-based SBT Why is it needed?

16 Sanger and HLA  Sanger data is still the gold standard in the genomic sequencing industry, even though it is very expensive compared to NGS  1 in 1'000 base error rate, if forward and reverse typing are done, error rate drops to 1 in 1'000'000

17 Phase Resolution  2x chromosome 6  Many loci, many alleles  Lots of heterozygosity

18 reference sequence A A T T Allele 1 Allele 2 A A T T Allele 1 Allele 2 OR??? T/AT/A T/AT/A G/TG/T G/TG/T consensus sequence Allele Phasing problem

19 The Problem with Sanger  There is only one signal  High degree of heterozygosity = high degree of ambiguity  Requires statistical techniques based on known allele frequencies, plus manual intervention by trained operators  Ambiguity can only be resolved statistically, which can lead to wrong assignment for rare types

20 The Problem with Sanger

21 GGACSGGRASACACGGAAWGTGAAGGCCCACTCACAGACTSACCGAGYGRACCTGGGGACCCTGCGCGGCTACTACAACCAGAGCGAGGMCGGT Number of Potential Alleles HLA Typing by Sanger Method

22

23

24 NGS Advantages  Can reduce ambiguity  Phase resolution - two signals, but lots of short reads  Cheaper and faster than Sanger  Less manual intervention required

25 NGS Data - Unphased

26 NGS Data - Phased

27 NGS Approaches  HLA*IMP – chip based imputation engine  Reference-based alignment, followed by a HLA call based on the variants detected during alignment  Search against database of known alleles

28 NGS Reference-Based  Fraught with difficulties  Very hard to align reads to this region  The variant/HLA call is only as good as the alignment  No coverage = no call Has been attempted by Broad Institute (HLA Caller) and Roche

29 Alignment Efforts RainDance provide a targeted HLA amplification kit call HLAseq Target: the whole MHC superregion (except for some tandem repeat regions) Goal: align this data, before doing variant/HLA call

30 Based on a single sample Diverse Variant “Density” in the MHC Superregion

31 Default BWA Alignment  No coverage at an exon of HLA-DMB

32 Default BWA Alignment  Low coverage and orphaned reads at a HLA-DRB1 exon

33 BWA vs More Permissive Alignment  Higher Coverage = Higher Noise

34 Default BWA Alignment  Large targeted region without usable coverage

35 NGS Reference-Based Not providing enough coverage everywhere What about de novo?

36 De Novo Assembly (MIRA)  287 contigs (longest contig: 2199 bp)  Mean contig size: 268 bp  Median contig size: 209 bp  Total consensus: 77084 bp  RainDance target: ~ 3800000 bp

37 De Novo Assembly (MIRA)

38 NGS De Novo Alignment Not enough contigs produced, not enough coverage of the target region What about a hybrid approach?

39  First, alignment to backbone, then de novo assembly  Backbone: 2220 contigs from HG19 chr 6 (sum: 3554852 bps) → almost whole RainDance target  Results: — Max reads / backbone contig: 197 — Max coverage: 71 De Novo Assembly with “Backbone”

40

41 NGS Typing - Alignment Based We tried:  Burrows Wheeler aligner  More sensitive, seed and extend aligner  De novo aligner  'Hybrid' de novo aligner — The variant/HLA call is only as good as the alignment — The alignments were not good enough

42 NGS Database Base  Search against 'database' of known alleles  Such as IMGT/HLA database, available from EBI web site Stanford, Connexio, JSI Medical, BC Cancer Agency and Omixon have all tried this approach

43 IMGT/HLA Database

44 DB Based Approach Advantages  Less mapping headaches  Unambiguous results  Potential to be fast Difficulties  Novel allele detection  Homozygous alleles

45 HLA Genotyping with NGS – R.454 Reads

46 HLA Genotyping with NGS – Illumina Reads

47 Results with Exome Data

48 Exon Level Detail

49 Detailed Results - Short Read Pileup

50 Conclusions  DB based approach to HLA typing is new but very promising  NGS approaches can resolve much of the ambiguity of Sanger SBT  DB based approach can also overcome the limitations of NGS reference-based alignment

51 Conclusions Available DB based HLA typing tools differ in:  Speed  Sequencers supported  Types of sequencing data supported (targeted, exome, whole genome)  Ease of use  Ambiguity of results  Degree of manual intervention required  Novel allele detection capabilities

52 Contact Tim Hague, CEO Omixon Biocomputing Solutions Tim.Hague@omixon.com +36 70 318 4878

53


Download ppt "Introduction  Human leukocyte antigen (HLA) is the major histocompatibility complex (MHC) in humans  Group of genes ('superregion') on chromosome 6."

Similar presentations


Ads by Google