Activities in SVs, focusing on breakpoint characterization Mark Gerstein, Yale.

Slides:



Advertisements
Similar presentations
Methods to read out regulatory functions
Advertisements

Epigenetics Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.
Computational analysis on genomic variation: Detecting and characterizing structural variants in the human genome Hugo Y. K. Lam Computational Biology.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Global dissection of cis and trans regulatory variations in Arabidopsis thaliana Xu Zhang Borevitz Lab.
Hybridization Diagnostic tools Nucleic acid Basics PCR Electrophoresis
Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Forward genetics and reverse genetics
Modelling binding site with 3DLigandSite Mark Wass
Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis.
SSAHA, or Sequence Search and Alignment by Hashing Algorithm, is used mainly for fast sequence assembly, SNP detection, and the ordering and orientation.
CS177 Lecture 10 SNPs and Human Genetic Variation
Nozomu TAKAHASHI June 11th, 2012
I519 Introduction to Bioinformatics, Fall, 2012
Genomics Method Seminar - BreakDancer January 21, 2015 Sora Kim Researcher Yonsei Biomedical Science Institute Yonsei University College.
The BreakSeq Project Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library Mark Gerstein.
LARVA: An integrative framework for Large-scale Analysis of Recurrent Variants in noncoding Annotations M Gerstein, Yale Slides freely downloadable from.
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
Cancer genomics Yao Fu March 4, Cancer is a genetic disease In the early 1970’s, Janet Rowley’s microscopy studies of leukemia cell chromosomes.
High density array comparative genomic hybridisation (aCGH) for dosage analysis and rapid breakpoint mapping in Duchenne Muscular Dystrophy (DMD) Victoria.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The International Consortium. The International HapMap Project.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Chromosome inversions in human populations Marta Ruiz Fernández Master in Advanced Genetics 17 December 2014.
Biol 456/656 Molecular Epigenetics Lecture #5 Wed. Sept 2, 2015.
Supplemental Figure 1. False trans association due to probe cross-hybridization and genetic polymorphism at single base extension site. (A) The Infinium.
Finding genes in the genome
Outline Molecular Cell Biology Assessment Review from last lecture Role of nucleoporins in transcription Activators and Repressors Epigenetic mechanisms.
Content What is epigenetics?. The Mapping of the Human Genome Project 2000 A working draft but completed in 2003 Only 20,000–25,000 genes! Only 1.5% of.
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
How do eucaryotic gene activator proteins increase the rate of transcription initiation? 1.By activating directly on the transcription machinery. 2.By.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Single Nucleotide Polymorphisms (SNPs
Timing, rates and spectra of human germline mutation
Of Sea Urchins, Birds and Men
Breanna Perreault and Xiao Liu D145 Presentation
Mark Gerstein, Yale See last slide for more info.
Horizontal gene transfer and the history of life
Human Cells Human genomics
Stephen Clark – Reik Lab, Babraham Institute
Chromatin Regulation September 20, 2017.
Frequency of Nonallelic Homologous Recombination Is Correlated with Length of Homology: Evidence that Ectopic Synapsis Precedes Ectopic Crossing-Over 
Companion PowerPoint slide set DNA Methylation & Cadmium Exposure in utero An Epigenetic Analysis Activity for Students This teacher slide set was created.
Companion PowerPoint slide set DNA Methylation & Cadmium Exposure in utero An Epigenetic Analysis Activity for Students This teacher slide set was created.
Jin Zhang, Jiayin Wang and Yufeng Wu
Volume 153, Issue 4, Pages (May 2013)
Companion PowerPoint slide set DNA Methylation & Cadmium Exposure in utero An Epigenetic Analysis Activity for Students This teacher slide set was created.
Genetic-Variation-Driven Gene-Expression Changes Highlight Genes with Important Functions for Kidney Disease  Yi-An Ko, Huiguang Yi, Chengxiang Qiu, Shizheng.
Jianbin Wang, H. Christina Fan, Barry Behr, Stephen R. Quake  Cell 
Exploring and Understanding ChIP-Seq data
Copy Number Variation Sequencing for Comprehensive Diagnosis of Chromosome Disease Syndromes  Desheng Liang, Ying Peng, Weigang Lv, Linbei Deng, Yanghui.
A Comprehensive Strategy for Accurate Mutation Detection of the Highly Homologous PMS2  Jianli Li, Hongzheng Dai, Yanming Feng, Jia Tang, Stella Chen,
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
GersteinLab.org Overview
Scaling Computation to Keep Pace with Data Generation
Figure 1. Biased distribution of different SV classes
One SNP at a Time: Moving beyond GWAS in Psoriasis
BF528 - Genomic Variation and SNP Analysis
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
Volume 17, Issue 5, Pages (May 2009)
XRCC3 Controls the Fidelity of Homologous Recombination
Making New Connections—Chromosome Conformation Capture for Identification of Disease-Associated Target Genes  Matthew T. Patrick, Lam C. Tsoi, Johann.
Presentation transcript:

Activities in SVs, focusing on breakpoint characterization Mark Gerstein, Yale

Our Activities Related to SVs ●SV calling (eg Retroduplications) ●Functional enrichment ●Breakpoints/Mechanism study

Breakpoint characterization in 1000G Breakseq #1 w/ ~2000 breakpoints [Lam et al. Nat. Biotech. (‘10)] Pilot Phase 1 “Integrated” & Phase 1 refined Phase 3 Count of deletions Pilot set Integrated set Phase 1 TEI NAHR NH VNTR 6,104 (6,299) 23,850 (23,655) 2,839 (2,644) Refined Phase 1 Phase 3 Exact match Number in parentheses: >50% reciprocal match

8,943 Deletion Breakpoints (Phase I Refined) FDR from IRS, PCR, and high-coverage trios –~7% for site existence –13% for site existence + sequence precision Multiple CNV callers Call merging Breakpoint assembly Data for 1,092 samples Mapping to junctions

Higher SNP Density and Relaxed Selection at NH Breakpoints 700 Kbps NH SNP density Conservation score +4% -4% 0

Higher SNP Density and Relaxed Selection at all Breakpoints 700 Kbps NH TEI NAHR SNP density Conservation score +4% -4% +4% -4% +4% -4% 0

SNP Density at NAHR is Driven by High C>T 700 Kbps NH TEI NAHR SNP density Conservation score +4% -4% +4% -4% +4% -4% 0 C>A C>G C>T T>A T>C T>G C>T outside CpG

NAHR breakpoint are associated with open chromatin environment Supported by Hi-C and Histone modification Hypothesis: Some NAHR deletions occur w/o cell Replication * H1 & GM12878 cells Abyzov et al ±0.5 ±1 ±1.5 ±2 ±2.5 Distance from breakpoints, Mbps TEI NAHR NH OpenClosed

Methylation pattern associated with breakpoints mechanisms Lower C>T in CpG around NAHR breakpoints –indicates lower methylation level in germline & embryonic cells Confirmed in male gamete +10% -5% C>T(CpG) NAHR CpG% C>T(all) GC% 0700 Kbps 6161 Distance from breakpoints, Kbps Hypomethylated regions in sperm TEI NAHR NH

Micro-homologies Identified around Breakpoints Breakpoints have Micro- homologous sequences with the template sites.

NH deletions are often coupled with micro-insertions Templates located at 2 characteristic distances from breakpoints, which tend to replicate late Suggests spatial & temporal configuration of DNA during template switching

More about breakpoints/mechanisms More accurate breakpoints and detailed investigation from the trios More mechanism assignment –Recombination effects on SV origin Ectopic recombination mediated by LTR/Alu/LINE1 Deletions close to recombination hotspots (population specific) –BreakSeq2 - developing a more accurate mechanism pipeline taking advantage of the longer reads Using high-res. Hi-C data to further investigate the relationship of high-order chromatin structure and mechanisms of bkpts –Mechanisms of bkpts; Special focus on NAHR –micro-insertion template (potential tool to automate template finding/micro-homology identification)

More Functional Characterization of SVs More functional enrichment –Detect regions with high/low variations in the trio child compared to the parents, and identify potential involved functions; compare the trend in multiple trios Functional region biased bkpts discovery SV-eQTL –Limited by sample size if only look at trios –Instead, pick out the eQTLs identified from populations, check whether they are significant in the trios, and why (might not have exactly matching populations)

More SV calling & retrodups Autonomous/non-autonomous mobile elements insertion polymorphism –Repetitive elements mapping strategies –Transgenerational presence/absence dimorphism More retroduplications - compare retroduplications in parents and child: –Inherited retroduplications –Newborn retroduplications in the child –Missing heritability –Parent-of-origin CNVnator refinements Use HiC data to investigate influence of SVs on the association between the genomic elements and its target region. –Specifically look into SV hotspot region

Acknowledgements Refined Phase 1 Breakpoints Analysis Alexej Abyzov, Shantao Li, Daniel Rhee Kim, Marghoob Mohiyuddin, Adrian Stuetz, Nicholas F. Parrish, Xinmeng Jasmine Mu, Wyatt Clark, Ken Chen, Matthew Hurles, Jan Korbel, Hugo Y.K. Lam, Charles Lee Other SV participants –Y Zhang, J Zhang, F Navarro, S Kumar

Lectures.GersteinLab.org Info about content in this slide pack Breakpoints analysis was from Abyzov et al. Nat. Comm. (’15, in press) General PERMISSIONS -This Presentation is copyright Mark Gerstein, Yale University, Please read permissions statement at -Feel free to use slides & images in the talk with PROPER acknowledgement (via citation to relevant papers or link to gersteinlab.org). -Paper references in the talk were mostly from Papers.GersteinLab.org. For SeqUniverse slide, please contact Heidi Sofia, NHGRI PHOTOS & IMAGES. For thoughts on the source and permissions of many of the photos and clipped images in this presentation see -In particular, many of the images have particular EXIF tags, such as kwpotppt, that can be easily queried from flickr, viz: