Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.

Slides:



Advertisements
Similar presentations
Review of main points from last week Medical costs escalating largely due to new technology This is an ethical/social problem with major conseq. Many new.
Advertisements

Methods to read out regulatory functions
Why this paper Causal genetic variants at loci contributing to complex phenotypes unknown Rat/mice model organisms in physiology and diseases Relevant.
Genetic Analysis of Genome-wide Variation in Human Gene Expression Morley M. et al. Nature 2004,430: Yen-Yi Ho.
Regulomics II: Epigenetics and the histone code Jim Noonan GENE760.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Regulatory variation and eQTLs Chris Cotsapas
Speaker: HU Xue-Jia Supervisor: WU Yun-Dong Date: 19/12/2013.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Signatures of Selection
Teresa Przytycka NIH / NLM / NCBI RECOMB 2010 Bridging the genotype and phenotype.
Estimating “Heritability” using Genetic Data David Evans University of Queensland.
FINAL EXAM: TAKE-HOME Assessment of Significance in Cancer Gene SNPs.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
Genome-Wide Association Studies Xiaole Shirley Liu Stat 115/215.
Quantitative Genetics
“An integrated encyclopedia of DNA elements in the human genome” ENCODE Project Consortium. Nature 2012 Sep 6; 489: Michael M. Hoffman University.
Literature Retrieval and Mining Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Identification of obesity-associated intergenic long noncoding RNAs
Comparative Genomics II: Functional comparisons Caterino and Hayes, 2007.
Manolis Kellis Broad Institute of MIT and Harvard
ENCODE enhancers 12/13/2013 Yao Fu Gerstein lab. ‘Supervised’ enhancer prediction Yip et al., Genome Biology (2012) Get enhancer list away to genes DNase.
Supervisor: Yihong Jennifer Tan Eric Gähwiler Karim Hamidi
ENCODE The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.
Geuvadis RNAseq analysis at UNIGE Analysis plans
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Current Topics in Genomics and Epigenomics – Lecture 2.
An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers)
A single-nucleotide polymorphism tagging set for human drug metabolism and transport Kourosh R Ahmadi, Mike E Weale, Zhengyu Y Xue, Nicole Soranzo, David.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
GWAS Hits and Functional Implications Peter Castaldi February 1, 2013.
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
Supplemental Figure 1A. A small fraction of genes were mapped to >=20 SNPs. Supplemental Figure 1B. The density of distance from the position of an associated.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Lecture 15 Regulatory variation and eQTLs Chris Cotsapas 6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution.
1 Before considering selection, it’s important to characterize how gene expression varies within and between species. What evolutionary forces act on gene.
Overview of ENCODE Elements
Jason Ernst Broad Institute of MIT and Harvard
1 Paper Outline Specific Aim Background & Significance Research Description Potential Pitfalls and Alternate Approaches Class Paper: 5-7 pages (with figures)
1 What forces constrain/drive protein evolution? Looking at all coding sequences across multiple genomes can shed considerable light on which forces contribute.
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Transcriptional Enhancers Looking out for the genes and each other Sridhar Hannenhalli Department of Cell Biology and Molecular Genetics Center for Bioinformatics.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: How to Interpret a Genome-wide Association Study JAMA.
Brendan Burke and Kyle Steffen. Important New Tool in Genomic Medicine GWAS is used to estimate disease risk and test SNPs( the most common type of genetic.
Integrative Genomics. Double-helix DNA strands are separated in the gene coding region Which enzyme detects the beginning of a gene ? RNA Polymerase (multi-subunit.
ChIP-seq Downstream Analysis Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Complex disease and long-range regulation: Interpreting the GWAS using a Dual Colour Transgenesis Strategy in Zebrafish.
Functional Mapping and Annotation of GWAS: FUMA
Gene Hunting: Design and statistics
Figure 3 Example of how a noncoding regulatory rheumatoid
Relationship between Genotype and Phenotype
Genetic-Variation-Driven Gene-Expression Changes Highlight Genes with Important Functions for Kidney Disease  Yi-An Ko, Huiguang Yi, Chengxiang Qiu, Shizheng.
Disentangling the Effects of Colocalizing Genomic Annotations to Functionally Prioritize Non-coding Variants within Complex-Trait Loci  Gosia Trynka,
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Relationship between Genotype and Phenotype
Parisa Shooshtari, Hailiang Huang, Chris Cotsapas 
Genetic Regulatory Mechanisms of Smooth Muscle Cells Map to Coronary Artery Disease Risk Loci  Boxiang Liu, Milos Pjanic, Ting Wang, Trieu Nguyen, Michael.
One SNP at a Time: Moving beyond GWAS in Psoriasis
Sherlock: Detecting Gene-Disease Associations by Matching Patterns of Expression QTL and GWAS  Xin He, Chris K. Fuller, Yi Song, Qingying Meng, Bin Zhang,
Integrating Autoimmune Risk Loci with Gene-Expression Data Identifies Specific Pathogenic Immune Cell Subsets  Xinli Hu, Hyun Kim, Eli Stahl, Robert Plenge,
An Expanded View of Complex Traits: From Polygenic to Omnigenic
GWAS-eQTL signal colocalisation methods
Relationship between Genotype and Phenotype
Genetic and Epigenetic Regulation of Human lincRNA Gene Expression
IMPACT: Genomic Annotation of Cell-State-Specific Regulatory Elements Inferred from the Epigenome of Bound Transcription Factors  Tiffany Amariuta, Yang.
Presentation transcript:

Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215

Pace of GWAS Studies 2

GWAS SNPs Association <> Causal What’s the most likely causal SNP / Gene in LD with the genotyped SNP? Use functional genomics to identify the disease tissue of origin What’s the SNP doing in non-coding regions? RSNPs 3

Use Literature & Pathway Information to Identify Putative Causal SNPs / Genes 4

Each Gene has an NCBI Page 5

Especially Bibliography 6

And Pathways 7

Literature Mining Terms Corpus: Collection of documents. E.g. all papers in PubMed Term frequency: Number of times a word appears in a document. E.g. “polymerase” appeared 41 times in a paper Document frequency: Number of documents a word appears in. E.g. 1234x papers has the word “transcription” Collection frequency: Total number of times a word appears in a corpus. E.g. “transcription” appeared 6789X times in all of PubMed indexed papers Stop words: Words in the corpus that contribute little to meaning. E.g. to, is, an Stemming: Group together different variations of the same word. E.g. activate vs. activated vs. activating 8

A document is summarized as a vector of word counts. Each dimension contains the number of times a word appears. Can calculate similarity between two documents by comparing their vectors acid 2 amino 2 analysis 1 comparison 1 control 1 environments 2 […] our 1 ”Our analysis includes comparison of amino acid environments with random control environments as well as with each of the other amino acid environments.” Documents Represented as Vectors 9

Comparing Two Documents Intuitive comparison between two papers  correlation coefficient of their word occurrence vectors Correlation measures the strength of linear relationship between two random variables a = c(1, 3, 5, 1, 8, 20, 0, 0, 0, 3, 1) b = c(2, 3, 4, 0, 10, 25, 1, 0, 2, 4, 3) c = c(2, 0, 1, 10, 2, 4, 7, 1, 5, 0, 8) cor(a, b) Correlated cor(b, c) Not correlated 10

Term Weighting Considerations Give different terms different weight Global weight –Document frequency 11

Term Weighting Considerations Give different terms different weight Global weight –Document frequency: Fewer documents, more weight: log(N / df). E.g. progesterone vs gene Local weight –Term frequency 12

Term Weighting Considerations Give different terms different weight Global weight –Document frequency: Fewer documents, more weight: log(N / df). E.g. progesterone vs gene Local weight –Term frequency: More frequent, more weight: log(1+tf). E.g. progesterone: 10 times in paper 1 vs 3 in paper 2 –Document length 13

Term Weighting Considerations Give different terms different weight Global weight –Document frequency: Fewer documents, more weight: log(N / df). E.g. progesterone vs gene Local weight –Term frequency: More frequent, more weight: 1 + log(tf). E.g. progesterone: 10 times in paper 1 vs 3 in paper 2 –Document length: Less weight for longer document. E.g. paper pages vs paper 2 3 pages 14

Evaluate Related of Papers Related Articles –Similarity between two documents:  all terms (local wt1 × local wt2 × global wt) –Pre-computed related articles for each citation –Rank ordered by relevance 15

GRAIL: Gene Relationships Across Implicated Loci 16 Raychaudhuri et al PLOS Genetics 2009

GRAIL: Gene Relationships Across Implicated Loci 17

GRAIL: Gene Relationships Across Implicated Loci 18

GRAIL: Gene Relationships Across Implicated Loci 19

GRAIL on Height SNPs 20

GRAIL on Crohn’s Disease Use literature / pathways to identify potential causal gene Find likely reproducible SNP hits, and increase statistical power 21

GWAS SNPs Association <> Causal What’s the most likely causal SNP / Gene in LD with the genotyped SNP? Use functional genomics to identify the disease tissue of origin What’s the SNP doing in non-coding regions? RSNPs 22

Identifying Causal Cell-type for Complex Disease E.g. Rheumatoid Arthritis (RA) Many cell types implicated over the years, ranging from neutrophils, synoviocytes, and all classes of lymphocytes! It is difficult to establish causality complex phenotypes in human Use expression data: Comprehensive and unbiased, publicly available 23

Immunological Genome Project Start with a list of disease SNPs Find genes near the SNP that are specifically expressed in a cell type Identify cell types that have many such genes... more than expected by chance 24

Identifying Causal Cell-type for Complex Disease From Expression Negative control: simulation from random set of SNPs P-value: proportion of simulations exceeding the observed enrichment 25 Hu et al, American Journal of Human Genetics, 2011

26

27

GWAS SNPs Association <> Causal What’s the most likely causal SNP / Gene in LD with the genotyped SNP? Use functional genomics to identify the disease tissue of origin What’s the SNP doing in non-coding regions? eQTL and RSNPs 28

GWAS SNP Distribution RSNP 29

eQTL eQTL: use expression as phenotype –Are there SNPs that are associated with expression changes? –Heritable genetic variation for transcription levels 30

RSNPs A SNP influences TF binding, affecting downstream (disease- related) gene expression 31

eQTL and RSNPs eQTL: use expression as phenotype –Are there SNPs that are associated with expression changes? –Heritable genetic variation for transcription levels RSNP: regulatory SNP –Much of the influential variation is located cis- to the coding locus –In humans, mouse, and maize, 35%-50% of the genetic basis for intraspecific differences in transcription level are cis- to the coding locus (e.g. Morley et al. 2004; Schadt et al. 2003; Stranger et al. 2005; Cheung et al. 2005, etc.). 32

33 Huang et al, Nat Genet 2014

RSNPs from GWAS Enriched in regulatory sequences (promoters and enhancers) that are identified through histone mark ChIP-seq or DNase-seq 34 Maurano et al, Science 2012

Highest Correlated Genes of Distal DHSs Harboring GWAS Variants 35

Trans-Effect of Cis-SNPs Three risk loci for ESR1, MYC, and KLF4 Effect on TF expression is small, but much strong when looking at the expression of their downstream target genes 36 Li et al, Cell 2013

Useful Tools to Understand RSNPs Identify putative TFs whose binding might be influences by SNPs based on ENCODE ChIP-seq / DNase-seq data 37

Understanding GWAS SNPs Association <> Causal Use literature and pathways to identify the putative causal SNP / Gene in LD with the genotyped SNP Use (cell-type specific) expression and epigenomics to: –Identify the disease tissue of origin –Identify regulatory SNPs that affect TF binding and influence the expression of important downstream disease genes 38

Acknowledgement Soumya Raychaudhuri Manolis Dermitzakis 39