Download presentation
Presentation is loading. Please wait.
Published byAdela Greene Modified over 9 years ago
1
DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction to GRCh38 (hg20) 13:20-13:40Pieter Neerinx (UMCG) Migration of tools, pipelines to support GRCh38 13:40-14:00Pjotr Prins BWA handling of ALT- contigs 14:00-14:10Tea break 14:10-14:30Zuotian Tatum (LUMC) New insights on Differential Gene Expression using GRCh38 14:30-14:50Wibowo Arindrarto (LUMC) Comparison of hg19 and GRCh38 in the study of DUX4 gene 14:50-15:30Ies Nijman (UMCU)Wrap-up and open discussions
2
GRCh38 / hg20
3
Human genome build hg20 Basic new assembly released dec 24 th 2013, now GRCh38.p2 (dec 8 th, 2014) 5-7 megabases of added sequence to primary reference Many corrected regions (patches) to hg19 261 alternative loci: chromosomal regions with high variability (~66 MB) 128 large unplaced sequence regios Human_herpes_virus (EBV) mapping decoy (171 kb) Centromere sequences: gaps are replaced by sequence models of the centromer repeats New mitochondrial sequence: Revised Cambridge Reference Sequence (rCRS) from MITOMAP 4 PAR regions This means that coordinates change! Lift-over strategies will not completely solve it.
4
Human genome build hg20
5
New genebuild now available (20.364 coding genes; 2.101 in alternative loci) Only few calling/annotation tools support hg20 yet (VEP fi) Ensembl default genome is hg20!! Latest hg19 site is beeing maintained through archive link. dbSNP locations available for hg20 1000G data will be remapped and recalled (est Q1,/Q2 2015)
6
Human genome build hg20 -Challenges and opportunities- How to use these alternative loci? In hg19 only few were present and mostly blissfully ignored.. Challenge I: mapping strategy and tools needs to be changed In prep: iBWA, srprism BWA 0.7.12 (29 dec 2014) supports ALTs in a two-step approach Challenge II: variant callers need to be aware of alternative references (and context) Challenge III: how to display this data in genome browsers etc, while maintaining context? Challenge IV: nomenclature The primary assembly contains all patches and fixes to hg19 and is still a good starting point.
7
What are these ALT loci? Scaffolds that provide an alternate representation of a locus found in the primary reference. long regions with clustered variations (ie LRC/KIR chr19 and MHC on chr6.HLA loci) Next to different haplo-variants of genes, contain also genes not in the primary assembly (20 prot.coding, ~40 predicted prot.cod., pseudogenes, lincs) Mind: ALTernative approaches between NCBI and ensembl: NCBI uses primary chromosomes and ALT loci while ensembl build a completely new ALT chromosome (so incl identical sequence)
8
Usage scenarios I: use primary reference (toplevel chrs) II: use primary reference + mapping decoys (Un + EBV) Improves mapping accuracy Only feed primary reference to variantcaller III: use primary reference + ALT loci + mapping decoys (Un + EBV) Improves mapping accuracy (?) A:Only feed primary reference to variantcaller B: Run variantcaller on all loci…
9
Adding the mapping decoys Grch38_full_plus_analysissetGrch38_full_analysisset ClassTotal bp Primary3.088.286.401 Unlocalized6.978.808 Unplaced4.485.509 ALT109.535.387 decoy5.964.345171.823 Total3.215.250.4503.209.457.928 graphs based on 11 Xten WGS samples
10
Personalis, Inc. | Confidential and Proprietary 10 GRCh37.p13 Improved alignments outside of fix patch regions Regions outside of fix patches Jason Harris hs37d5 GRCh37.p13 hs37d5 GRCh37.p13
11
Heng Li: BWA approach to ALT mapping ALTs supported in >v0.7.11 through additional ID-list file $ref.alt Advised to use NCBI ngs-analyses sets (3 flavors) with slightly modified sequences to facilitate mapping (hardmasked PAR and centromeric regions) 1.The original mapQ of a non-ALT hit is computed across non-ALT hits only. The reported mapQ of an ALT hit is always computed across all hits. 2.An ALT hit is only reported if its score is better than all overlapping non- ALT hits. A reported ALT hit is flagged with 0x800 (supplementary) unless there are no non-ALT hits. 3.The mapQ of a non-ALT hit is reduced to zero if its score is less than 80% (controlled by option -g) of the score of an overlapping ALT hit. In this case, the original mapQ is moved to the om tag.
12
Heng Li: BWA approach
13
Variantcalling on ALTs?
15
By adding the ALT loci in mapping and calling we gain better haplo aware mappings/calls, but it is not clearly reflected in the vcf Adding ‘ haplotyping’ to the VCF format A. Quinlan, Virginia, GRC WS 2014
16
Variant Annotation on HG20 / ALTs Ensembl VEP snpEFF dbNSFP in next release (~may)
17
Personalis, Inc. | Confidential and Proprietary 17 Nomenclature chr19_KI270938v1_alt CHR_HSCHR19KIR_G248_BA2_HAP_CTG3_1 GenBank: KI270886.1 RefSeq: NT_187640.1 hg38 / GRCh38 not hg20 please…
18
Everything is in a state of flux, including the status quo. -Robert Byrne- Even after 1.5 years after the release many things are uncertain about the use of the full build. GATK is remarkably silent Ewan Birney and Richard Durbin agreed march24th to rebuild a new reference/analysis set with more standardized set of chr, ALTs and decoys (pers. Comm). Henk Li: “ The current BWA-MEM method is just a start. []We may make changes. It is also possible that we might make breakthrough on the representation of multiple genomes, in which case, we can even get rid of ALT contigs for good.”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.