Variation and Functional Genomics. 2 of 51 Overview of Talk SNPs and InDels Larger structural variants (CNVs) Phenotype data Individual genomes HapMap.

Slides:



Advertisements
Similar presentations
Julia Krushkal 4/11/2017 The International HapMap Project: A Rich Resource of Genetic Information Julia Krushkal Lecture in Bioinformatics 04/15/2010.
Advertisements

Introduction to genomes & genome browsers
Are you ready for the genomic age? An introduction to human genomics Jacques Fellay EPFL School of Life Sciences Swiss Institute of Bioinformatics Lausanne,
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
1 of 25 Sequence Variation in Ensembl. 2 of 25 Outline SNPs SNPs in Ensembl Linkage disequilibrium SNPs in BioMart DAS sources.
Single Nucleotide Polymorphism And Association Studies
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Fatchiyah, PhD Dept Biology UB Fatchiyah.lecture.ub.ac.id
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Outline to SNP bioinformatics lecture
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson NIEHS SNPs Workshop.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD Robert J. Livingston, PhD NIEHS Variation Workshop January 30-31, 2005.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
Office hours Wednesday 3-4pm 304A Stanley Hall Review session 5pm Thursday, Dec. 11 GPB100.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006.
Course Overview Personalized Medicine: Understanding Your Own Genome Fall 2014.
Genome Variations & GWAS
Computational Molecular Biology Biochem 218 – BioMedical Informatics Simple Nucleotide.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
ENCODE The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.
Introduction to genomes Content  the human genome CNVs SNPs Alternative splicing  genome projects Celia van Gelder CMBI UMC Radboud June 2009
Bioinformatics SNPs and haplotypes Kristel Van Steen, PhD, ScD Université de Liege - Institut Montefiore
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
PhenCode Linking Human Mutations to Phenotype. PhenCode Brings the deep information on genotypes and phenotypes in locus specific databases (LSDBs) into.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
Molecular & Genetic Epi 217 Association Studies
CS177 Lecture 10 SNPs and Human Genetic Variation
Genomes and Genomics.
Genome-Wide Association Study (GWAS)
1 of 32 Sequence Variation in Ensembl. 2 of 32 Outline SNPs SNPs in Ensembl Haplotypes & Linkage Disequilibrium SNPs in BioMart HapMap project Strain-specific.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Molecular & Genetic Epi 217 Association Studies: Indirect John Witte.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
The International Consortium. The International HapMap Project.
Single nucleotide polymorphisms and Large scale variation
Motivations to study human genetic variation
Copyright OpenHelix. No use or reproduction without express written consent1.
Resources at HapMap.Org HapMap3 Tutorial Marcela K. Tello-Ruiz Cold Spring Harbor Laboratory.
Accessing and visualizing genomics data
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
Notes: Human Genome (Right side page)
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Evolution Aristotle: classification of animals theories on change
Week-6: Genomics Browsers
Gil McVean Department of Statistics
Functional Mapping and Annotation of GWAS: FUMA
Week 5 Theory and application for setting up an RNA-Seq pipeline
School of Pharmacy, University of Nizwa
Gene Hunting: Design and statistics
A Common 16p11.2 Inversion Underlies the Joint Susceptibility to Asthma and Obesity  Juan R. González, Alejandro Cáceres, Tonu Esko, Ivon Cuscó, Marta.
Pharmacogenomic variability and anaesthesia
School of Pharmacy, University of Nizwa
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
SNPs and CNPs By: David Wendel.
Volume 152, Issue 8, Pages (June 2017)
KDM4A SNP-A482 (rs586339) correlates with worse outcome in patients with NSCLC. A, schematic of the human KDM4A protein is shown with both the protein.
Presentation transcript:

Variation and Functional Genomics

2 of 51 Overview of Talk SNPs and InDels Larger structural variants (CNVs) Phenotype data Individual genomes HapMap variations and genotypes Locus Specific Databases LRGs

3 of 51 Genomic Diversity SNPs (Single Nucleotide Polymorphisms) base pair substitutions InDels insertion/deletion (frameshifts) occur in 1 in every 300 bp (human) ~3 billion base pairs in mammalian genomes!

4 of 51 Functional Consequences TypeConsequence SNPs in coding area that alter aa sequence Cause of most monogenic disorders, e.g: Cystic fibrosis (CFTR) Hemophilia (F8) SNPs in coding areas that don’t alter aa sequence May affect splicing SNPs in promoter or regulatory regions May affect the level, location or timing of gene expression SNPs in other regionsNo direct known impact on phenotype Useful as markers

5 of 51 Cause disease (SNP in clotting factor IX codes for a stop codon: haemophilia) Increase disease risk (SNP in LDL receptor reduces efficiancy: high cholesterol) Affect drug response (2 million hospitalized patients suffer serious adverse drug reactions, with more than 100,000 are fatal*) Sequence Polymorphisms Effects

6 of 51 Studying variation – why? Determine disease risk Individualised medicine (pharmacogenomics) Forensic studies Biological markers Hybridisation studies, marker-assisted breeding Understanding Evolution

7 of 51 7 of 25 Practical Applications

8 of 51 8 of 25 dbSNP 55 organisms covered:

9 of 51 9 of 49 9 of 25 Small Scale Sequence Variants Most SNPs and Indels are imported from dbSNP (rs……): Imported data: alleles, flanking sequences, pop. frequencies Calculated data: position, transcript effect For human also: HGMD (Human Gene Mutation Database) HGVS (Human Genome Variation Society) Affymetrix and Illumina variations Ensembl-called SNPs (from aligned individual genomes) For mouse, rat, dog and chicken also: Sanger- and Ensembl-called SNPs (other strains/breeds)

10 of of 25 SNPs and InDels in Ensembl Non-synonymousIn coding sequence, resulting in an aa change Synonymous In coding sequence, not resulting in an aa change FrameshiftIn coding sequence, resulting in a frameshift Stop lostIn coding sequence, resulting in the loss of a stop codon Stop gainedIn coding sequence, resulting in the gain of a stop codon Essential splice site In the first 2 or the last 2 basepairs of an intron Splice site1-3 bps into an exon or 3-8 bps into an intron UpstreamWithin 5 kb upstream of the 5'-end of a transcript Regulatory regionIn regulatory region annotated by Ensembl 5' UTRIn 5' UTR IntronicIn intron 3' UTRIn 3' UTR DownstreamWithin 5 kb downstream of the 3'-end of a transcript IntergenicMore than 5 kb away from a transcript

11 of of 49 Small Scale Sequence Variants Ensembl Region in Detail View Colour-coded SNPs and InDels Legend

12 of 51 Polymorphisms in Ensembl Chicken Chimp Cow Dog Human Mouse Rat Platypus Tetraodon Zebrafish Plants (Rice, Arabadopsis, Grapevine, Brachypodia) Yeast Fly Mosquito Plasmodium falciparum

13 of 51 13/72 CNV in human Structural variants track

14 of of 49 14/72 Phenotype Data Genome wide association data 159 annotations from EGA from NHGRI

15 of 51 15/72

16 of 51 Somatic Variations: COSMIC

17 of 51 17/72 Population Data in Ensembl

18 of 51 Population Data Variation tab: Population genetics

19 of 51 Variation Tab Flanking sequence Population genetics and LD plots Disease relationships (human) EGA, GWAS, HapMap, Clinical/LSDB Ancestral alleles

20 of 51 Variation Views View variations drawn on the sequence Gene tab: Sequence link, Transcript tab: Exons, cDNA, protein links View a table of variations for each transcript Gene tab: Variation Table View variations drawn along a transcript Gene tab: Variation Image

21 of 51 Comparison Views Human, Mouse, Rat, Dog and Cow have individual or strain comparisons: Comparison Image link at the left of the Transcript tab.

22 of 51 SNP Effect Calculator Click on Manage your data at the left of any page. Follow the link to “SNP Effect Predictor”. Paste in variation positions and alleles

23 of 51 SNP Effect Calculator Location, variation name in Ensembl, and consequence on amino acid sequence is returned.

24 of 51 Ensembl Variation SNPs and InDels Larger structural variants (CNVs) Phenotype data Individual genomes (human) HapMap variations and genotypes Locus Specific Databases LRGs

25 of 51 Sequencing Individuals Venter and Watson genomes 1000 genomes project HapMap

26 of 51 First diploid genomes for human “The Diploid Genome Sequence of an Individual Human” PLoS Biology 5: (2007) “The Complete Genome of an Individual by Massively Parallel DNA Sequencing” Nature 452: (2008) “Accurate Whole Human Genome Sequencing Using Reversible Terminator Chemistry ” Nature 456:53-59 (2008) “The Diploid Genome Sequence of an Asian Individual” Nature 456:60-65 (2008) Craig Venter: Sequence & analysis ongoing since 2003 Jim Watson: 454 technology (7.4x) 100 mill unpaired reads (25 billion bps) $1,000,000

27 of 51 The Human Genome Project gave the “average” DNA sequence of a small number of people. This helps us find out how a human develops and works Does not show us the DNA differences between different humans Does not reflect the major alleles Reference Sequence

28 of Genomes Project 1000 genomes track in Region in Detail

29 of 51 HapMap A multi-country effort to identify and catalogue genetic similarities and differences in people. Collaboration among scientists and funding agencies from Japan, the United Kingdom, Canada, China, Nigeria, and the United States. All of the information generated by the project released into the public domain.

30 of 51 HapMap (phase III) Genotypes from 1115 individual from 11 populations: ASW African ancestry in Southwest USA (71) CEU Utah residents with Northern and Western European ancestry from the CEPH collection (162) CHB Han Chinese in Beijing, China (70) CHD Chinese in Metropolitan Denver, Colorado (70) GIH Gujarati Indians in Houston, Texas (83) JPT Japanese in Tokyo, Japan (82) LWK Luhya in Webuye, Kenya (83) MEX Mexican ancestry in Los Angeles, California (71) MKK Maasai in Kinyawa, Kenya (171) TSI Toscani in Italia (77) YRI Yoruba in Ibadan, Nigeria (163)

31 of 51 Haplotyping A haplotype is a set of SNPs (on average ~25 kb) found to be statistically associated on a single chromatid and which therefore tend to be inherited together over time. Haplotyping involves grouping subjects by haplotypes.

32 of 51 Locus specific databases (LSDB) Databases that focus on one gene or one disease e.g. p53, ABO, collagen e.g. Albinism, cystic fibrosis, Alzheimer’s disease User communities: Research groups-disease and function driven Clinicians – driven by genetic testing of patients

33 of of 49 LSDBs >1000 on the Human Genome Variation Society website

34 of 51 LSDB examples

35 of 51 Why is it difficult to merge these data? Historical reasons. LSDBs sometimes Use sequences which do not start at Methionine Use transcript coordinates not genomic Use a different transcript for reporting mutations Regularly changes with new assemblies/gene builds It may contain minor alleles or rare alleles It may be inaccurate Missing genes (e.g. no α-haemoglobin - Thalasemia) Mixture of sequences from different individuals

36 of 51 Ensembl and LRGs Define an exchange format for LRGs with the NCBI Create an LRG website Create a pipeline for receiving the data and creating an LRG Extend e! databases to store LRGs Develop an API to query LRGs and associated annotation Consult with the LSDBs to develop useful visualisation tools Build displays for LRG data and annotation

37 of 51 EGA- Repository for genotype data

38 of 51 Sequences Differing from the Reference Common coordinate system for reporting mutations and variation data (stable sequence) Locus Reference Genomic (LRG) Ensembl displays LRGs Project in collaboration with the NCBI and GEN2PHEN Extension of the RefSeq gene project View and Request LRGs here:

39 of 51 Locus Reference Genomic LRG = Genomic sequence for reporting mutations (containing transcript ) * Often differs from the reference assembly

40 of 51 LRGs in the Browser LRG transcripts and underlying sequence can be viewed. LRG_13 All LRGs

41 of 51 Variations Team Fiona Cunningham Pontus Larsson Will McLaren Graham Ritchie

42 of 51 Functional Genomics (Wikipedia): Functional genomics is a field of molecular biology that attempts to make use of the vast wealth of data produced by genomic projects (such as genome sequencing projects) to describe gene (and protein) functions and interactions. In Ensembl: Regulatory build using ENCODE project information Promoters and Enhancers from CisRED and VISTA FlyReg features (for Drosophila)

43 of 51 ENCODE Encylopedia Of DNA Elements Where are the promoter, enhancer, and other regulatory regions of the human genome? Pilot project showed: Use chromatin accessibility and histone modification analysis to predict TSS 14 June 2007, Nature

44 of 51 Regulatory Build  CTCF-binding sites  DNAse1 hypersensitive sites  TF binding sites These are “core features” Overlapping methylation sites expand these regions.

45 of 51 The Regulation Tab

46 of 51 How to get there?

47 of 51 The Location Tab

48 of 51 BioMart

49 of 51 There are other sets… Sequence motifs determined by experimental and prediction tools. VISTA Enhancer Set Tissue-specific enhancers. Tested experimentally. Nucleic Acids Res January; 35(Database issue): D88–D92.

50 of 51 Gene Regulation Summary DNase I hypersensitivitiy, CTCF binding sites, TF binding sites (core features) Histone modification data MeDIP-chip methylation data for 17 human tissues and cell lines VISTA Enhancer Assay ( cisRED motifs ( miRanda microRNA target prediction Expression Quantitative Trait Loci (eQTL) from the Sanger Institute DNase1 Hypersensititvity site (ES cells) Histone modifications for ES, MEF, and NPC cells cisRED motifs ( ZFMODELS-enhancers REDfly TFBSs BioTIFFIN REDfly CRMs Homo sapiens Mus musculus Danio rerio Drosophila melanogaster

51 of 51 Functional Genomics eFG Ian Dunham Nathan Johnson Daniel Sobral Andy Yates ENCODE Steven Wilder Damian Keefe