Download presentation
Presentation is loading. Please wait.
Published byChristal Haynes Modified over 6 years ago
1
The Application of NGS to HLA Typing Challenges in Data Interpretation
Marcelo A. Fernández Viña, Ph.D. Department of Pathology Medical School Stanford University
2
The HLA system High degree of polymorphism at most of the expressed loci (function) Lack of a single predominant allele, high degree of heterozygosity (function) Strong linkage disequilibrium (unknown, function?)
3
543BRD 940 73 September 2016 3492
4
GENOMIC ORGANIZATION OF THE HLA GENES
HLA-A HLA-B HLA-C HLA-DQA1 HLA-DQB1 HLA-DRB1 HLA-DPA1 HLA-DPB1 4
5
DRB1*08 Alleles INTRON 1 INTRON 1 INTRON 1 INTRON 2
6
6
8
Why not whole-genome sequencing?
Inadequate coverage of complex genomic regions, such as HLA. Conventional WGS (30x avg. coverage) provides only sparse coverage of HLA. Complexities due to: Indels GC-rich regions, secondary structure Paralogous genes Repeat regions across HLA loci Cost. Using WGS, to achieve adequate coverage of HLA would require >1,000X avg. coverage
9
J.Immunol Jun 15;148(12): HLA-J, a second inactivated class I HLA gene related to HLA-G and HLA-A. Implications for the evolution of the HLA-A-related genes. Messer G, Zemmour J, Orr HT, Parham P, Weiss EH, Girdlestone J. Ragoussis and co-workers described a class I HLA gene that maps to within 50 kb of HLA-A. Comparison of the nucleotide sequences of HLA-J alleles shows this gene is more related to HLA-G, A, and H. All alleles of HLA-J are pseudogenes because of deleterious mutations that produce translation termination either in exon 2 orexon 4. HLA-J appears, like HLA-H, to be an inactivated gene that result from duplication of an Ag-presenting locus related to HLA-A. Evolutionary relationships as assessed by construction of trees suggest the four modern loci: HLA-A, G, H, and J were formed by successive duplications from a common ancestral gene. In this scheme one intermediate locus gave rise to HLA-A and H, the other to HLA-G and J.
10
Alleles at different HLA loci (genes and pseudogenes) share nucleotide sequences
HLA_A and HLA-H (pseudogene) AA Codon A*24:02:01: GGC TAC GTG GAC GAC ACG CAG TTC GTG CGG TTC GAC AGC GAC GCC GCG AGC CAG AGG ATG GAG CCG CGG GCG CCG A*01:01:01: A A*02:01:01: A*25:01: A*32:01: T H*01:01:01: GGC TAC GTG GAC GAT ACG CAG TTC GTG CGG TTC GAC AGC GAC GCC GCG AGC CAG AGG ATG GAG CCG CGG GCG CCG HLA-A, B and HLA-H (pseudogene) AA Codon B*57:01: GAG AAC CTG CGG ATC GCG CTC CGC TAC TAC AAC CAG AGC GAG GCC G B*07:02: G A- CT- -G- G B*08:01: G A- CT- -G- G B*15:17:01: B*35:01:01: G A- CT- -G- G B*44:02:01: C -C B*51:01:01: H*01:01:01: GAG AAC CTG CGG ATC GCG CTC CGC TAC TAC AAC CAG AGC GAG GGC G A*24:02:01: GAG AAC CTG CGG ATC GCG CTC CGC TAC TAC AAC CAG AGC GAG GCC G A*01:01:01: C G-- -C- CT- -G- G A- - A*02:01:01: T- G G-- -C- CT- -G- G A*25:01: G A- - A*32:01: G
11
DRB Gene Content varies in Haplotypes Bearing Different DRB1 Allele-Sero-Groups – (Copy Number Variation has been known in HLA for more than 3 decades)
12
Nature Jul 3-9;322(6074): Polymorphism of human Ia antigens: gene conversion between two DR beta lociresults in a new HLA-D/DR specificity. Gorski J, Mach B. Molecular mapping of the DR beta-chain region allows true allelic comparisons of the two expressed DR beta-chain loci, DR beta I and DR beta III. At the more polymorphic locus, DR beta I, the allelic differences are clustered and may result from gene conversion events over very short distances. The gene encoding the HLA-DR3/Dw3 specificity has been generated by a gene conversion involving the DR beta I and the DR beta III loci of the HLA-DRw6/Dw18 haplotype, as recipient and donor gene, respectively. The generation of HLA-DR polymorphism within the DRw52 supertypic group can thus be accounted for by a succession of gene duplication, divergence and gene conversion.
13
Alleles at different HLA-DRB loci share nucleotide sequences
AA Codon DRB1*01:01: CA CGT TTC TTG TGG CAG CTT AAG TTT GAA TGT CAT TTC TTC AAT GGG ACG GAG CGG GTG CGG TTG CTG GAA AGA DRB1*01: DRB1*03:01:01: GA- T-C TC- -C- -C- --G AC C --- DRB1*04:02: GA- --- G-- --A CA- --G C C C --- DRB1*07:01:01: C GG A- A-G C A- --C DRB1*11:01: GA- T-C TC- -C- -C- --G C C --- DRB1*11: GA- -T C- --G C C --- DRB1*13:01: GA- T-C TC- -C- -C- --G C C --- DRB3*01:01:02: GA- -T- -G C- --G AC C --- DRB3*02:02:01: GA- -T C- --G C G ---
14
Alleles at different HLA-DRB loci share nucleotide sequences Importance of determining Phase
AA Codon DRB1*01:01: CA CGT TTC TTG TGG CAG CTT AAG TTT GAA TGT CAT TTC TTC AAT GGG ACG GAG CGG GTG CGG TTG CTG GAA AGA DRB1*03:01:01: GA- T-C TC- -C- -C- --G AC C --- DRB1*13:01:01: GA- T-C TC- -C- -C- --G C C --- DRB1*13: GA- -T C- --G C C --- DRB3*01:01:02: GA- -T- -G C- --G AC C --- DRB3*02:02:01: GA- -T C- --G C G --- AA Codon DRB1*01:01: TGC ATC TAT AAC CAA GAG GAG TCC GTG CGC TTC GAC AGC GAC GTG GGG GAG TAC CGG GCG GTG ACG GAG CTG GGG DRB1*03:01:01:02 -A- T-- C G AA T DRB1*13:01:01:01 -A- T-- C G AA T DRB1*13: A- T-- C G AA T DRB3*01:01:02:01 -A- T-- C G T- C DRB3*02:02:01:02 CA- T-- C G A- -C G AA Codon DRB1*01:01: CGG CCT GAT GCC GAG TAC TGG AAC AGC CAG AAG GAC CTC CTG GAG CAG AGG CGG GCC GCG GTG GAC ACC TAC TGC DRB1*03:01:01: A G- CG AT DRB1*13:01:01: A A G-C GA DRB1*13: A A G-C GA DRB3*01:01:02: TC C A G- CG AT DRB3*02:02:01: A G- CA AT AA Codon DRB1*01:01: AGA CAC AAC TAC GGG GTT GGT GAG AGC TTC ACA GTG CAG CGG CGA G DRB1*03:01:01: TG DRB1*13:01:01: TG DRB1*13: DRB3*01:01:02: DRB3*02:02:01:
15
HLA typing using high throughput sequencing technologies.
Exon-wise amplification of few exons. Whole-gene amplification. 15 15 15
16
16 16
17
Sequencing workflow Fragmentation Ligate barcoded adaptors
Ex 1 Ex 2 Ex 3 Ex 4 Fragmentation Ligate barcoded adaptors Size select and purify bp fragments
18
Sequencing library Q/C
Ex 3 Ex 4 Ex 1 Ex 2
19
What would be the ideal sequencing machine?
High-throughput Accurate Long read length Simple to use Able to detect all types of genomic changes (SNP’s, insertion or deletionss, large scale rearrangements, methylation) 19 19 19
20
High-throughput sequencing technologies: an overview
All platforms share core similarities: DNA templates are spatially segregated, no physical separation step DNA is sequenced through synthesis, rather than termination DNA sequence is decoded by the emission of light or pH change Platforms differ by: Specific method used to generate template libraries Chemistries/approaches used to generate the sequence signal (light) signal Throughput (amount of bases sequenced per run) Length of sequence read Error modalities and error rates (e.g. homopolymer regions) 20 20 20
21
The Platforms that we Tested
454-Roche: exon coverage only – Multiplexing – Work flow was demanding ( ) PGM-Ion Torrent: Instrument problems – reads too short -homopolymer problems ( ) Pacific Biosciences: Extremely log reads!- Throughput and Workflow still in development; appears to be simple – Base calling (15 percent error) by consensus - homopolymer problems (2014) Illumina: Less error rate – robust instruments ( ). Various systems
22
Examples of ambiguities: exon shuffling, segmental exchange, substitutions in untested segments
23
Potential benefits of next-generation sequencing for HLA typing
Clonal template amplification in vitro to eliminate problem of sequencing heterozygous DNA Sufficiently long read length (300+ bp) to cover entire exon (or more) in phase Increased sequence coverage of HLA genes Capability to multiplex patient specimens Potential to complete run and data analysis within one week 23 23 23
24
Practical Advantages or of Extending Sequence Coverage
Test complete gene No Assumptions made Transplantation: Detect mismatches thought to be absent Mapping of Disease Susceptibility Factors
25
Many allele groups in HLA-A show one allele with an insertion of an extra ‘C’ after seven ‘C’
A* AC CCC CCC .AAG ACA CAT ATG ACC CAC CAC A*0104N C A* G A --G T A* A*0321N C A* G T A*3114N C--- --G T C A* AAAACGCATATGACTCACCAC A*0321N CAAGACACATATGACCCACCA MAARMSMMWWWK A* AAGACACATATGACCCACCAC A*02null CAAAACGCATATGACTCACCA MARAMMSMWWWK
26
Resolution of common and well documented null- alleles ( clinically relevant)
88
27
Detection of C. 04:09N (common) and A
Detection of C*04:09N (common) and A*31:14N(rare) allele in single pass A*31:01:02 (red line) shows interrupted coverage at the beginning of Exon 4, while A*31:14N (blue line), which differs from A*31:01:02 with one base insertion, show continuous coverage. C*04:01:01:01 (red line) shows interrupted coverage at the end of Exon 7, while C*04:09N(blue line), which differs from C*04:01:01:01 with one base deletion, show continuous coverage.
28
HLA Typing by NGS Wang C, Krishnakumar S, Wilhelmy J, Babrzadeh F, Stepanyan L, Su LF, Levinson D, Fernandez-Viña MA, Davis RW, Davis MM, Mindrinos M High-throughput, high-fidelity HLA genotyping with deep sequencing. Proc Natl Acad Sci U S A. 2012May 29;109(22): doi: /pnas Epub 2012 May 15. PubMed PMID: ; PubMed Central PMCID: PMC New methodology that leverages the power of Next Generation Sequencing (NGS) and long range PCR Interrogated the entire sequences of the class I genes and most of the extent Class II genes in more than 9,000 subjects
29
NGS HLA TYPING SYSTEMS 5. Library preparation 7. Data analysis
1. Sample Collection 2. Long-Range PCR 3. Quantification & Pooling 4. Fragmentation 5. Library preparation 6. Sequencing 7. Data analysis 29
30
Data Analysis Shotgun sequencing
30 30 30
31
Genotype calling Genomic mapping cDNA mapping
One nucleotide difference at exon 3 distinguishes A*02:01:01:01(A) from A*02:07(G). The cell line BM9 HLA-A is A*02:01:01:01. Top left pane shows the coverage plot when sequencing reads are mapped to A*02:01:01:01 and A*02:07 genomic sequence. Top right panel shows the coverage plot when sequencing reads are mapped to A*02:01:01:01 and A*02:07 cDNA sequence.
32
High-throughput, High resolution HLA genotyping
33
Data Analysis Steps De-multiplexing
Identical barcodes at both ends of pair-end reads Lowering the chance of cross-contamination Mapping Competitive mapping All available reference sequence, including those form pseudo-genes are mapped, best alignments are passed. Filtering Best alignments identical alignments (for cDNA only) Pair-end alignment Genotype calling Limited number of candidates (top 10 of each category: number of reads mapped, minimal coverage, minimal central coverage) Enumerate possible combination of homozygous and heterozygous set Rank those combination on aggregated number of reads mapped, minimal coverage, minimal central coverage. De novo Assembly Local de novo assembly can be performed to capture SNP for novel allele
34
Paired-end Sequencing
Pair-end reads Reference Sequence 1 ✕ Reference Sequence 2 ~500bp
35
Central Read Definition
36
Using Central Reads Coverage
On regular coverage plot, the two candidates looks similar. On central read coverage plot, the wrong candidate have much lower coverage in comparison with the authentic candidate.
37
Complement Logics Resolved Difficult Alleles
C*03:03:01 and C*03:04:01:01 differ in a single base at the end of exon 2. Due to similarity between some B alleles and C alleles at this region, with cDNA alignment, there is no much difference between those two candidates. With genomic alignment and paired-end filter, the difference between those two candidates is greatly amplified to provide definite evidences to call one versus the other.
38
Using Complement Logics
cDNA alignment genomic alignment Some short exons such as exon 6 of some C alleles are identical to that of B alleles. With cDNA alignment, it is hard to predict whether the alignment is authentic. With genomic alignment and pair-end reads, the neighboring polymorphic site provides sufficient information for this.
39
Maximize Usage of Computing Power for Speed
Raw reads B1 B2 B3 B4 B5 B1.1 B1.2 B1.3 B1.4 B1.5 B1.A B1.B B1.C B1.DPA B1.DPB Level of parallelism De-multiplexing One process * Mapping M-processes per barcode ***** One processes per barcode *** Merging, Filtering, De-multiplexing Several processes per barcode ***** Genotype calling Streaming SIMD Extensions-vectorized implementation of Smith-Waterman algorithm
40
User Friendly Interface
Data analysis pipeline runs with one single command: hla_pipeline.py –c config.file Result reviewing is through web page graphically. The two components will be merged together in a single standalone program in next about 6 months.
41
Interface Counting Logics Sample Info Genotypes Candidate Commenting
42
Interface Coverage plot Central Read Coverage plot Read tiling pattern
Reference alignment
43
Phasing Strategy de novo Assembly
Multiple fragments of similar sequences generated by NGS GCCAATGATGCACTGACTAGCCTAGCCACCC TGCACTGACTAGCCTAGCCACCCGATCAGCTCC CCGATCGATCGGGCATCGATCGATCGG CTAGCCACCCGATCAGCTCCGATCGATCGGG CTAGCCTAGCCACCCGATCAGCTCCGATC Clustering of fragments based on similar sequences to create contiguous sequence GCCAATGATGCACTGACTAGCCTAGCCACCC TGCACTGACTAGCCTAGCCACCCGATCAGCTCC CTAGCCTAGCCACCCGATCAGCTCCGATC CTAGCCACCCGATCAGCTCCGATCGATCGGG CCGATCGATCGGGCATCGATCGATCGG
44
Phasing Analysis Step1: Identify true polymorphic sites
Ratio between major and minor alleles needs be above set threshold to be considered as true polymorphic sites The polymorphic sites are determined by a statistical model 5x”T” 6x”G” 5x”G” 6x”A” 5x”C” 6x”A” 10x”T” 1 x”G” 1 x”G” 10x”A” 5x”C” 6x”A” All 3 sites are true polymorphic sites “G” = noise True Polymorphic site “G” = noise
45
Build Phase Resolved “Contigs”
Step2: Determine which polymorphisms are linked together to resolve two contigs Step1: Identify polymorphic sites Polymorphic Sites CCATGTTCCAATGATGCCCTGTGCATGCATCG CCATGTGCCAATAATGCACTGTGCATGCATCG T-G-C are linked G-A-A are linked T/G G/A C/A CCATGTTCCAATGATGCCCTGTGCATGCATCG CCATGTGCCAATAATGCACTGTGCATGCATCG
46
Best Matching Alleles Best Matching Alleles Consensus Alignment
Compare contig sequences back to the database to find the best match Phase Resolved Consensus Build phased contig sequences based on polymorphic linkage Dynamic Phasing Calling polymorphisms from de novo assembled, mapped, paired-end sequences
47
Build Phased Contig Blocks
“Detail Review” window can be used for in-depth review of HLA genotyping “Detail Review” window displays the contig alignment browser as well as other reference parameters (eReads and xReads) “Contig alignment” browser indicates phased blocks
48
Summary Broad coverage (exons & introns) and deep sequencing (> 50)
Paired-end sequencing Mapping Phasing: Complement logic (cDNA vs. genomic) Central read logic Build Contig blocks
49
Coverage variance
50
Genotype calling
51
Data Analysis Determination of number of reads
Bar codes specific for sample and locus (amplicon) Barcodes specific for sample (early pooling) Informatics: Mapping of Reads Phasing Reads Insertions and Deletions Homozygous and Heterozygous Positions Reads from other Loci Hybrid alleles, Novel alleles
52
NGS HLA TYPING SYSTEMS 5. Library preparation 7. Data analysis
1. Sample Collection 2. Long-Range PCR 3. Quantification & Pooling 4. Fragmentation 5. Library preparation 6. Sequencing 7. Data analysis 52
53
Data Analysis Determination of number of reads
Bar codes specific for sample and locus (amplicon) - Technically unwieldy - Easier interpretation by Software (reads are assigned to the locus) Barcodes specific for sample (early pooling) - Technically simple - Software needs to be more sophisticated need to phase longer sequence stretches
54
Alleles at different HLA-DRB loci share nucleotide sequences Importance of determining Phase
AA Codon DRB1*01:01: CA CGT TTC TTG TGG CAG CTT AAG TTT GAA TGT CAT TTC TTC AAT GGG ACG GAG CGG GTG CGG TTG CTG GAA AGA DRB1*03:01:01: GA- T-C TC- -C- -C- --G AC C --- DRB1*13:01:01: GA- T-C TC- -C- -C- --G C C --- DRB1*13: GA- -T C- --G C C --- DRB3*01:01:02: GA- -T- -G C- --G AC C --- DRB3*02:02:01: GA- -T C- --G C G --- AA Codon DRB1*01:01: TGC ATC TAT AAC CAA GAG GAG TCC GTG CGC TTC GAC AGC GAC GTG GGG GAG TAC CGG GCG GTG ACG GAG CTG GGG DRB1*03:01:01:02 -A- T-- C G AA T DRB1*13:01:01:01 -A- T-- C G AA T DRB1*13: A- T-- C G AA T DRB3*01:01:02:01 -A- T-- C G T- C DRB3*02:02:01:02 CA- T-- C G A- -C G AA Codon DRB1*01:01: CGG CCT GAT GCC GAG TAC TGG AAC AGC CAG AAG GAC CTC CTG GAG CAG AGG CGG GCC GCG GTG GAC ACC TAC TGC DRB1*03:01:01: A G- CG AT DRB1*13:01:01: A A G-C GA DRB1*13: A A G-C GA DRB3*01:01:02: TC C A G- CG AT DRB3*02:02:01: A G- CA AT AA Codon DRB1*01:01: AGA CAC AAC TAC GGG GTT GGT GAG AGC TTC ACA GTG CAG CGG CGA G DRB1*03:01:01: TG DRB1*13:01:01: TG DRB1*13: DRB3*01:01:02: DRB3*02:02:01:
55
Data Analysis Informatics: Mapping of Reads Phasing Reads
Reads from other Loci (Highly homologous genes, DQA2, DPA2, DQB2, DPB2, DRB2/6/7/8/9) Alleles with incomplete references (in general rare) Hybrid alleles Novel alleles
56
Pseudogene Disambiguation
SBT Result: A*02:01 NGS Result HLA-H (novel) A*02:01 TAC CAC CAG TAC GCC TAC GAC GGC AAG GAT TAC ATC GCC CTG AAA GAG GAC CTG CGC TCT TGG H*01:01 GAC CAC CAG TAC GCC TAC GAC AGC AAG GAT TAC ATC GCT CTG AAA GAG GAC CTG CGC TCC TGG (Alpha sample SBC060)
57
Hybrid allele carrying sequences of two loci
AA Codon DRB1*01:01: CTG GCT TTG GCT GGG GAC ACC CGA C|CA CGT TTC TTG TGG CAG CTT AAG TTT GAA TGT CAT TTC TTC AAT GGG ACG DRB1*14:54: A-- -| GA- T-C TC- -C- -C- --G DRB1*14: A-- -| GA- -T C- --G DRB3*02:02:01: C | GA- -T C- --G AA Codon DRB1*01:01: GAG CGG GTG CGG TTG CTG GAA AGA TGC ATC TAT AAC CAA GAG GAG TCC GTG CGC TTC GAC AGC GAC GTG GGG GAG DRB1*14:54: C C --- -A- T-- C G T DRB1*14: C G --- CA- T-- C G A- -C DRB3*02:02:01: C G --- CA- T-- C G A- -C AA Codon DRB1*01:01: TAC CGG GCG GTG ACG GAG CTG GGG CGG CCT GAT GCC GAG TAC TGG AAC AGC CAG AAG GAC CTC CTG GAG CAG AGG DRB1*14:54: C- --G --- C G- --- DRB1*14: G A- AA Codon DRB1*01:01: CGG GCC GCG GTG GAC ACC TAC TGC AGA CAC AAC TAC GGG GTT GGT GAG AGC TTC ACA GTG CAG CGG CGA G|TT GAG DRB1*14:54: A T TG |-C C-T DRB1*14: G- CA AT |-C C-T DRB3*02:02:01: G- CA AT |-C C-T AA Codon DRB1*01:01: CCT AAG GTG ACT GTG TAT CCT TCA AAG ACC CAG CCC CTG CAG CAC CAC AAC CTC CTG GTC TGC TCT GTG AGT GGT DRB1*14:54: G T DRB1*14: G T DRB3*02:02:01: G --- AA Codon DRB1*01:01: TTC TAT CCA GGC AGC ATT GAA GTC AGG TGG TTC CGG AAC GGC CAG GAA GAG AAG GCT GGG GTG GTG TCC ACA GGC DRB1*14:54: T A DRB1*14: T A DRB3*02:02:01: C A
58
Characterization of a rare allele with incomplete sequence
B*15:147 derives from B*15:01:01:01
62
SBT/SSO vs NGS Identifying a novel allele
S-101, Reference Type Result: one allele is an exon 4 variant ? NGS Result: B*13:02:01, B*38:02:01 _Exon 4 variant A to G, Lys to Arg, codon 186.
63
DPB1 Hybrid Alleles DPB1*04:02:01:01/DPA1*01:03:01:05 DPB1*463:01/
X(AAGG) DPB1*463:01/ DPA1*01:03:01:05 bp from exon2 Recombination area DPB1*04:02:01:01/DPA1*01:03:01:05 DPB1*03:01:01/ DPA1*01:03:01:03
64
Functional Significance
Characterization of a Novel allele through the evaluation of unmapped reads Functional Significance Subject with two closely related alleles included in the DPB1*04:02:01:01G DPB1*04:02:01G: DPB1*04:02:01:01 DPB1*04:02:01:02 DPB1*105:01 DPB1*463:01 DPB1*571:01 Identical Antigen Recognition Site Structure Different levels of Expression (we propose)
66
AA Codon DPB1*105: ATG ATG GTT CTG CAG GTT TCT GCG GCC CCC CGG ACA GTG GCT CTG ACG GCG TTA CTG ATG GTG CTG CTC ACA TCT DPB1*414: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** DPB1*463: AA Codon DPB1*105: GTG GTC CAG GGC AGG GCC ACT CCA G|AG AAT TAC CTT TTC CAG GGA CGG CAG GAA TGC TAC GCG TTT AAT GGG ACA DPB1*414: *** *** *** *** *** *** *** *** *| DPB1*463: | AA Codon DPB1*105: CAG CGC TTC CTG GAG AGA TAC ATC TAC AAC CGG GAG GAG TTC GTG CGC TTC GAC AGC GAC GTG GGG GAG TTC CGG DPB1*414: AA Codon DPB1*105: GCG GTG ACG GAG CTG GGG CGG CCT GAT GAG GAG TAC TGG AAC AGC CAG AAG GAC ATC CTG GAG GAG AAG CGG GCA DPB1*414: G AA Codon DPB1*105: GTG CCG GAC AGG ATG TGC AGA CAC AAC TAC GAG CTG GGC GGG CCC ATG ACC CTG CAG CGC CGA G|TC CAG CCT AGG DPB1*414: | A- DPB1*463: | A- AA Codon DPB1*105: GTG AAT GTT TCC CCC TCC AAG AAG GGG CCC TTG CAG CAC CAC AAC CTG CTT GTC TGC CAC GTG ACG GAT TTC TAC DPB1*414: C C A DPB1*463: C C A AA Codon DPB1*105: CCA GGC AGC ATT CAA GTC CGA TGG TTC CTG AAT GGA CAG GAG GAA ACA GCT GGG GTC GTG TCC ACC AAC CTG ATC AA Codon DPB1*105: CGT AAT GGA GAC TGG ACC TTC CAG ATC CTG GTG ATG CTG GAA ATG ACC CCC CAG CAG GGA GAT GTC TAC ACC TGC DPB1*414: C T- --- DPB1*463: C T- --- AA Codon DPB1*105: CAA GTG GAG CAC ACC AGC CTG GAT AGT CCT GTC ACC GTG GAG TGG A|AG GCA CAG TCT GAT TCT GCC CGG AGT AAG DPB1*414: C | DPB1*463: C | AA Codon DPB1*105: ACA TTG ACG GGA GCT GGG GGC TTC GTG CTG GGG CTC ATC ATC TGT GGA GTG GGC ATC TTC ATG CAC AGG AGG AGC DPB1*414: A AA Codon DPB1*105: AAG AAA G|TT CAA CGA GGA TCT GCA TAA DPB1*414: |** *** *** *** *** *** *** DPB1*463: |
70
DPB1 Intron 2 eSTR(AAGG) E2 264 E3 282 E4 111 DPB1 Fragment E2/E4 (~5.1kb) I2 I3 F_DPB1 R_DPB1 X(AAGG) 4172 – 4227 bp E5 22 E1 101 5’ UTR 3’ UTR Possible eSTR proximal to intron-2 splicing site STR length may play a regulatory role in the expression of DPB1
71
Intron2 (-43) Low High
72
STR Analysis : Short - Short
DPB1* 01:01:01e1, 05:01:01e1
73
STR Analysis : Short - Long
DPB1* 02:01:02, 13:01:01e1
74
STR Analysis : Long - Long
DPB1* 02:01:02, 02:01:02v3
75
Data Analysis Determination of number of reads
Bar codes specific for sample and locus (amplicon) Barcodes specific for sample (early pooling) Informatics: Mapping of Reads Phasing Reads Insertions and Deletions Homozygous and Heterozygous Positions Reads from other Loci Hybrid alleles, Novel alleles
76
Typing two DRB5 alleles All reads need to be accounted Correct genotype: DRB5*01:01:01, DRB5*01:08N
DRB5*01:02, 0108N DRB5*01:01:01, 01: DRB5*01:01:01, 01:08N DRB5*01:02/01:08N identical in exon 2, differ by 19 nt indel in exon 3 DRB5*01:01:01/01:02 identical in exon 3, differ by 3 nt substitutions in exon 2
77
Must Know Amplicon: size (homogeneous or variable according to allele families) Preferential amplifications (locus or allele families) Primers: multiplexed or single location Other genes co-amplified (DRB) Software: Binning of reads (to a given locus, to a given allele family). No binning (possible interference in allele assignment) Phasing: reads covering informative SNPs, Central Reads, Assembly Utilization of reads
78
Homozygous allele? Not exactly
DRB1 DQA1 DQB1 DRB1 DQA1 DQB1 Count *13:02:01 *01:02:01:04 *06:04:01/*06:09:01 21/553 *15:01:01:01 *01:02:01:03 *06:02:01/*06:03:01 423/553
79
~40Kb ~10Kb DRB1 DQA1 DQB1 *01:01:01/*01:03 *01:01:01 *05:01:01:0x
*10:01:01/*14:54:01 *01:05/*01:04:01:01 *05:01:01:02 *01:02:01 *01:01:02 *05:01:01:01 DQB1*05:01:01:0x =DQB1*05:01:01:01(intron 4) +DQB1*05:01:01:02 (intron 2)
80
My thought Process for Genotype Assignment
Examine Genotype assigned by software through mapping Perfect match with reference vs no full match at genomic level Check match with reference vs no full match at exon level Check completeness of reference Identify novel allele; see close allele and check differences and sequences Examine by other method phasing (central reads, pair end reads, assembly) Check LD tables (my help identify drop outs)
81
Barcode performance
82
Data Analysis Solid and simple logic Accurate Fast
error is minor Accurate User-friendly interface for reviewing result Fast Less than 2 hours for seq run (12-24 samples) Ability to pick up new allele Stand-alone desktop solution Ability to evaluate genotype assignment by second method
83
Our experience Allele calls were made virtually by the software with no operator evaluation Fourth field data: in most instances no previous information Haplotype associations stronger than expected Several common allele subtypes distinguished at the fourth field Specific allele associations came apparent without any assumptions made These studies show the robustness and comprehensive coverage provided by the typing system
84
Summary of State of the Art NGS for HLA
Application to HLA typing is feasible Processes have been optimized Current methods are appropriate for both Registry Typing and small scale quick TAT Extremely accurate and comprehenisve Great developments in the informatics and analysis Completion of sequences of common alleles will be helpful Studies in familes may unravel limitations 84
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.