Updating the human reference assembly V.A. Schneider, P. Flicek, T. Graves, T. Hubbard & D.M. Church for the Genome Reference Consortium

Slides:



Advertisements
Similar presentations
Why this paper Causal genetic variants at loci contributing to complex phenotypes unknown Rat/mice model organisms in physiology and diseases Relevant.
Advertisements

Considerations for Analyzing Targeted NGS Data HLA
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
Chap. 6 Problem 2 Protein coding genes are grouped into the classes known as solitary (single) genes, and duplicated or diverged genes in gene families.
METHODS FOR HAPLOTYPE RECONSTRUCTION
GRC Workshop ASHG 22 Oct Outline Reference Assembly Basics GRC: Assembly management and dataflow GRCh38 Accessing the assembly and data
Transcriptome Sequencing with Reference
Introduction  Human leukocyte antigen (HLA) is the major histocompatibility complex (MHC) in humans  Group of genes ('superregion') on chromosome 6.
1.Generate mutants by mutagenesis of seeds Use a genetic background with lots of known polymorphisms compared to other genotypes. Availability of polymorphic.
hg19 (GRCh37) vs. hg38 (GRCh38) Human Genome Reference Comparison
DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
Elephant Seg Dup Analysis 1.Genome 2.Parameters for Pipeline 3.Analysis.
Genome Assembly and Annotation Erik Arner Omics Science Center, RIKEN Yokohama, Japan
CSE182-L12 Gene Finding.
Expanding the Tool Kit for BAC Extension Summary of completion criteria developed for NSF Tomato Sequencing Workshop January 14, 2007.
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
Zebra Finch Seg Dup Analysis 1.Genome 2.Parameters for Pipeline 3.Analysis.
Genetic and physical maps around the sex-determining M- locus of the dioecious plant asparagus Telgmann-Rauber et al
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Chapter 3 -- Genetics Diversity Importance of Genetic Diversity Importance of Genetic Diversity -- Maintenance of genetic diversity is a major focus of.
1 Genetic Variability. 2 A population is monomorphic at a locus if there exists only one allele at the locus. A population is polymorphic at a locus if.
Todd J. Treangen, Steven L. Salzberg
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Genome Sequencing in the Legumes Le et al Phylogeny Major sequencing efforts Minor sequencing efforts ~14 MY ~45 MY.
26th International Mammalian Genome Conference 2012 Bioinformatics Workshop Sunday, October 21, – Location: Tarpon #IMGC2012.
Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph.
A.J. Pierce MI615 University of Kentucky. Low Copy Repeats in the Human Genome Implications for Genomic Structure MI615 Andrew J. Pierce Microbiology,
SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.
Vervet Monkey Genomics: Genome Canada and Génome Québec Physical Map Project J. Wasserscheid, G. Leveque, C. Nagy, C. Pinsonnault, and K. Dewar, McGill.
Bombus terrestris, the buff-tailed bumble bee Native to Europe A managed pollinator Commercially available Reared in greenhouses Important pollinator in.
Human Genome.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
Short read alignment BNFO 601. Short read alignment Input: –Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.
1. Assembly by alignment Instead of overlap-layout-consensus we use alignment-consensus 2.
Genome representation and variant identification Deanna M. Church, NCBI.
ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads
071126_EAS56_0057_FC – lanes 1-8 read 2 b a _EAS56_0057_FC – lanes 1-8 read 1 Table S1. Summary tables for a read 1 and b read 2 of a.
Lesson: Sequence processing
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es.
Cross_genome: Assembly Scaffolding using Cross-species Synteny
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Pre-genomic era: finding your own clones
Volume 6, Issue 4, Pages (April 2004)
Kendy K. Wong, Ronald J. deLeeuw, Nirpjit S. Dosanjh, Lindsey R
Ancient Missense Mutations in a New Member of the RoRet Gene Family Are Likely to Cause Familial Mediterranean Fever  The International FMF Consortium 
Resolving the Breakpoints of the 17q21
Volume 11, Issue 1, Pages (January 2018)
Model of segmental duplication Acceptor regions of the genome acquire segments of genomic material that range from 1–200 kb from disparate regions.
The SPCH1 Region on Human 7q31: Genomic Characterization of the Critical Interval and Localization of Translocations Associated with Speech and Language.
Genomic Rearrangements Resulting in PLP1 Deletion Occur by Nonhomologous End Joining and Cause Different Dysmyelinating Phenotypes in Males and Females 
Volume 20, Issue 12, Pages (June 2010)
Claudia M. B. Carvalho, Rolph Pfundt, Daniel A. King, Sarah J
Jong-Min Lee, Kyung-Hee Kim, Aram Shin, Michael J
Mutations in a Novel Gene with Transmembrane Domains Underlie Usher Syndrome Type 3  Tarja Joensuu, Riikka Hämäläinen, Bo Yuan, Cheryl Johnson, Saara.
Utility of NIST Whole-Genome Reference Materials for the Technical Validation of a Multigene Next-Generation Sequencing Test  Bennett O.V. Shum, Ilya.
Diverse abnormalities manifest in RNA
The Right Tool for the Job: Two Platforms for Targeted DNA Sequencing
Volume 11, Issue 1, Pages (January 2018)
Volume 10, Issue 6, Pages (June 2017)
BF528 - Whole Genome Sequencing and Genomic Variation
Complete Haplotype Sequence of the Human Immunoglobulin Heavy-Chain Variable, Diversity, and Joining Genes and Characterization of Allelic and Copy-Number.
Volume 21, Issue 23, Pages (December 2011)
Christa Lese Martin, Andrew Wong, Alyssa Gross, June Chung, Judy A
Next-Generation Sequencing of Duplication CNVs Reveals that Most Are Tandem and Some Create Fusion Genes at Breakpoints  Scott Newman, Karen E. Hermetz,
Presentation transcript:

Updating the human reference assembly V.A. Schneider, P. Flicek, T. Graves, T. Hubbard & D.M. Church for the Genome Reference Consortium GRCh37.p13 Assembly Statistics The GRCh38 human reference assembly is currently being processed and will be released this fall. If you have questions about this, let us know at Reference Assembly Model Graphical representation of GRCh37.p13. Ideograms represent the primary assembly unit. Sequences affiliated with chr. 6 are shown in greater detail. Alignments of alt loci and patch scaffolds to the primary assembly provide chromosome context. 178 regions: 3.15% of chromosome sequence 131 FIX patches: Add 6.8 Mb novel sequence 73 NOVEL patches: Add >800 Kb novel sequence Patch, alternate loci and assembly region data. FIX patches correct assembly errors. NOVEL patches represent sequence variants. Regions are domains where patches and alt loci align. Increased Allelic Diversity: A Means of Improving Alignments Unresolved Human IssuesResolved for GRCh38 (n=122,922) How the Assembly is Changing GRCh38: Tiling Path Updates GRCh38: Capturing Missing Sequence GRCh38: Updating Individual Bases Several complex genomic regions have been retiled as a single haplotype. The KIR/LRC region of chr. 19, comprised of mixed haplotypes in GRCh37, has been updated with clones from the CH17 library to represent the A01 haplotype. The LILRA3 gene is absent from this haplotype. There will be 35 alternate representations of this region in GRCh38. The 1Q21 (middle), 1P11 (right) and 1Q32 (not shown) regions, containing SRGAP family members, have also been retiled with the single CH17 haplotype in GRCh38. C A. Sources of candidate bases (top). Final distribution of attempted base updates (bottom). B. Analysis of RP11 WGS reads aligned to GRCh37 RP11-derived bases never seen in 1000 Genomes samples. 80% of sites are heterozygous in RP11, not sequencing errors. C. NA12878 read alignments identify an erroneous GRCh37 base in the LIN37 CDS. Sequence absent from GRCh37 is captured in various forms. Above: Left: Breakdown of 1000 Genomes decoy sequence by alignment to GenBank, Repeat Masker coverage, Repeat Masker class, and source. Right: In GRCh38, modeled centromere sequences will be included. Below: A. Addition of new sequence at a GRCh37 chr.17 gap partially captures a missing segmental duplication and adds KCNJ18. B. Novel patch adds a sequence variant with a 40kb repeat insertion. C. Retiling of chr. 6 peri-centromeric region and addition of chr. 3 unlocalized sequence corrects a collapsed duplication and captures missing PRIM2 gene copies. ABC Experiment: Using simulated 101 bp reads, determine the fate of reads derived from patch/alt regions that don’t align to the chromosome when aligning to a target that only includes chromosome sequences. Approach 1: Mask homolo- gous regions of alts/patches Approach 2: Use an alt & patch aware aligner, such as SRPRISM (Agarwala, in press) Above left: Simulated reads aligned with BWA to GRCh37 1 o & MT only or to GRCh37.p9 without and with masking of highly homologous sequence. Box: improved alignments at an alternate locus insertion. Above right: Chr. 12 novel patch with insertion. NA12878 reads aligned to full assembly with SRPRISM (top), primary only with SRPRISM (middle) and 1000G reference with BWA (bottom). Reads sourced from alt/patch unique sequence. A. ~75% have an off-target alignment when proper target unavailable (GRCh37 primary only). B. Roughly half of these are due to exact duplication and cannot be resolved without longer reads. Above: Reads aligned to GRCh37.p9, without masking Left: Reads aligned to full GRCh37.p9 with masks for BWA and no masks for SRPRISM. Mask 1: mask chr for fix patch and alt/patch for alternate loci. Mask2: only mask alts/patches. Conclusion: Both masking and using an alternate locus aware aligner improve sequence alignments A A B A B