Encode variation analysis. Analysis goals Quantify genetic variation in ENCODE regions Detect selective constraint in ENCODE features Develop rules for.

Slides:



Advertisements
Similar presentations
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Advertisements

Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Julia Krushkal 4/11/2017 The International HapMap Project: A Rich Resource of Genetic Information Julia Krushkal Lecture in Bioinformatics 04/15/2010.
Geuvadis RNAseq UNIGE Genetic regulatory variants
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Supplementary Figure S1 Distribution of observed (blue) and Poisson expected (red) standard deviation of human-chimpanzee divergence over different window.
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
Signatures of Selection
Data Analysis for High-Throughput Sequencing
The role of variation in finding functional genetic elements Andy Clark – Cornell Dave Begun – UC Davis.
Applying haplotype models to association study design Natalie Castellana June 7, 2005.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
Human population migrations Out of Africa, Replacement –Single mother of all humans (Eve) ~150,000yr –Single father of all humans (Adam) ~70,000yr –Humans.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006.
“An integrated encyclopedia of DNA elements in the human genome” ENCODE Project Consortium. Nature 2012 Sep 6; 489: Michael M. Hoffman University.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Comparative Genomics II: Functional comparisons Caterino and Hayes, 2007.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
RExPrimer Pongsakorn Wangkumhang, M.Sc. Biostatistics and Informatics Laboratory, Genome Institute, National Center for Genetic Engineering and Biotechnology.
Geuvadis RNAseq analysis at UNIGE Analysis plans
HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
A single-nucleotide polymorphism tagging set for human drug metabolism and transport Kourosh R Ahmadi, Mike E Weale, Zhengyu Y Xue, Nicole Soranzo, David.
Conservation of genomic segments (haplotypes): The “HapMap” n In populations, it appears the the linear order of alleles (“haplotype”) is conserved in.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Supplemental Figure 1A. A small fraction of genes were mapped to >=20 SNPs. Supplemental Figure 1B. The density of distance from the position of an associated.
SeattleSNPs Variation Discovery Resource Materials prepared by: Mary E. Mangan, PhD Updated: Q Version 1.
10cM - Linkage Mapping Set v2 ABI Median intermarker distance: 4.7 Mb Mean intermarker distance: 5.6 Mb Mean genetic gap distance: 8.9 cM Average Heterozygosity.
Sackler Medical School
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Genes in human populations n Population genetics: focus on allele frequencies (the “gene pool” = all the gametes in a big pot!) n Hardy-Weinberg calculations.
Identification of Copy Number Variants using Genome Graphs
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Detection of positive selection in humane genome.
Thoughts on ENCODE Annotations Mark Gerstein. Simplified Comprehensive (published annotation, mostly in '12 & '14 rollouts)
The International Consortium. The International HapMap Project.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
Accessing and visualizing genomics data
NCSU Summer Institute of Statistical Genetics, Raleigh 2004: Genome Science Session 3: Genomic Variation.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
Common variation, GWAS & PLINK
Detection of the footprint of natural selection in the genome
Complex disease and long-range regulation: Interpreting the GWAS using a Dual Colour Transgenesis Strategy in Zebrafish.
Itsik Pe’er, Yves R. Chretien, Paul I. W. de Bakker, Jeffrey C
Disentangling the Effects of Colocalizing Genomic Annotations to Functionally Prioritize Non-coding Variants within Complex-Trait Loci  Gosia Trynka,
Working in the Post-Genomic C. elegans World
Medical genomics BI420 Department of Biology, Boston College
Medical genomics BI420 Department of Biology, Boston College
SNPs and CNPs By: David Wendel.
Presentation transcript:

Encode variation analysis

Analysis goals Quantify genetic variation in ENCODE regions Detect selective constraint in ENCODE features Develop rules for interpretation of functional variation Motivate experiments to test functional variation

Data Encode SNPs (HapMap resequencing) 5kB HapMap SNPs DIPs Gene expression variation

Metrics of variation Derived allele frequency spectrum (Manolis) Diversity/Het (Ewan) SNP density (Ewan, others) DIP density (Jim, Taane) LD/Recombination (Daryl/Oxford) Regions of contiguous DNA without variation (Manolis) Accelerated (positively selected?) regions (Manolis) Standard tests of neutrality McDonald Kreitman/Tajima’s D etc (Mike, others) Other non-parametric tests of selection (Andy) Tagging (Paul)

Analysis plans Analysis wrt to genomic features Calculate variability in a large number of genomic features with all metrics Correlate variability metrics with “intensity” of feature (e.g. levels conservation with levels of variability) Variation, alternative spicing and expression Distance effects from genomic features Association of gene expression with SNPs (some is in UCSC and some will be provided by Manolis at the workshop) Analysis independent of genomic features (in principle) Tag SNPs and comparison of resequencing data to 5 Kb map. Here it will be a good idea to see how the 5 Kb map captures variation within genomic elements. If we really aim to capture variation mainly in functional genomic elements (e.g. known regulatory regions, or nonsym SNPs) how can we modify the tag algorithms? General description of levels of variation wrt to the functional content of the 44 ENCODE regions

av2pq/SNP av2pq/pos #snps Promoters : Region Rnd2 : Completely Rnd: Exons : RRnd Exons : Overall : Diversity in features Ewan Birney

Derived allele frequency spectrum CNS intersection P = 0.003

Derived allele frequency spectrum Transfrags union P = 0.204

Taane Clark Heterozygosity

Indels

Regions accelerated in humans

selective constrains differ for genes expressed in different tissues Nuria Lopez

Genes expressed in more tissues have more selective constrains (lower dN)

Tagging ENCODE is near-complete inventory of common (MAF≥5%) sites How well do tag SNPs picked from thinned versions of ENCODE (to mimic ascertainment of Phase I and II) capture: –all common variants –functional sites Paul de Baker

Coverage of common variants by tags picked from simulated Phase I and II HapMap