Lessons learnt from the 1000 Genomes Project about sequencing in populations Gil McVean Wellcome Trust Centre for Human Genetics and Department of Statistics,

Slides:



Advertisements
Similar presentations
Imputation for GWAS 6 December 2012.
Advertisements

Analysis of imputed rare variants
1000G Phase 1 Release chr20 call sets Ryan Poplin Genome Sequencing and Analysis Medical and Population Genetics January 25, 2011.
Charles He, Jessica McClendon, Kaelin Priger, and Wangshu Yang Group B2 Genes and Mutations.
Julia Krushkal 4/11/2017 The International HapMap Project: A Rich Resource of Genetic Information Julia Krushkal Lecture in Bioinformatics 04/15/2010.
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
Ruibin Xi Peking University School of Mathematical Sciences
Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data Kai Ye
Efficient Algorithms for Genome-wide TagSNP Selection across Populations via the Linkage Disequilibrium Criterion Authors: Lan Liu, Yonghui Wu, Stefano.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
1000G Pilot 3 Progress (in silico analysis and comparison to experimental validation) Amit Indap, Wen-Fung Leong Gabor Marth (Boston College) Chris Hartl.
Toward a unified view of human genetic variation Gabor Marth Boston College Biology Department on behalf of the International 1000 Genomes Project.
Mark de Pristo But 1-2% of 3 billion is still a lot! What fraction of human genetic variation has now been described?
1000G Pilot 3 Progress in silico analysis and comparison to experimental validation Gabor Marth (Boston College) + A + L Kiran Garimella (Broad Institute)
The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.
SNP Resources: Finding SNPs Discovery and Databases Mark J. Rieder, PhD SeattleSNPs Workshop March 20-21, 2006.
Workshop in Bioinformatics Eran Halperin. The Human Genome Project “What we are announcing today is that we have reached a milestone…that is, covering.
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson NIEHS SNPs Workshop.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
The Phase 1 Variant Set and Future Developments
Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,
Whole Exome Sequencing for Variant Discovery and Prioritisation
Detection of Rare-Alleles and Their Carriers Using Compressed Se(que)nsing Or Zuk Broad Institute of MIT and Harvard In collaboration.
Comments on Rare Variants Analyses Ryo Yamada Kyoto University 2012/08/27 Japan.
Sequencing TRAF1 in patients with rheumatoid arthritis Bruce C. Jobse Medical and Population Genetics Broad Institute.
Medical variations Gabor T. Marth Boston College Biology Department BI543 Fall 2013 February 5, 2013.
Loss-of-co-Homozygosity mapping and exome sequencing of a Syrian pedigree identified the candidate causal mutation associated with rheumatoid arthritis.
Next-Generation Sequencing
Molecular & Genetic Epi 217 Association Studies
The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.
Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology.
Molecular & Genetic Epi 217 Association Studies: Indirect John Witte.
Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida.
The HapMap Project and Haploview
The International Consortium. The International HapMap Project.
Motivations to study human genetic variation
PanMap Mapping Genomic Variation in Western Chimpanzees
Copyright OpenHelix. No use or reproduction without express written consent1.
Replication and Finemapping of Quantitative Trait Loci Influencing QT interval duration in the Jackson Heart Study Sara Tribune Mentor: Christopher Newton-Cheh.
Resources at HapMap.Org HapMap3 Tutorial Marcela K. Tello-Ruiz Cold Spring Harbor Laboratory.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
Familial coronary artery disease Paul Brennan Clinical Director Northern Genetics Service Newcastle Hospitals NHS Foundation Trust North East and North.
Analysis of Next Generation Sequence Data BIOST /06/2015.
Canadian Bioinformatics Workshops
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Genome-Wides Association Studies (GWAS) Veryan Codd.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Population stratification
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Clinical Interpretation and Implications of Whole-Genome.
From Reads to Results Exome-seq analysis at CCBR
Interpreting exomes and genomes: a beginner’s guide
K. Lakiotaki1, E. Kartsaki1, A. Kanterakis1, T. Katsila2, G. P
Gil McVean Department of Statistics
Week 5 Theory and application for setting up an RNA-Seq pipeline
Differences in asthma genetics between Chinese and other populations
Differences in asthma genetics between Chinese and other populations
Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype Data  Goo Jun, Matthew Flickinger, Kurt N. Hetrick,
Deep Whole-Genome Sequencing of 100 Southeast Asian Malays
Jacob E. Crawford, Ricardo Amaru, Jihyun Song, Colleen G
Jingjing Li, Xiumei Hong, Sam Mesiano, Louis J
Guidelines for Large-Scale Sequence-Based Complex Trait Association Studies: Lessons Learned from the NHLBI Exome Sequencing Project  Paul L. Auer, Alex.
Catarina D. Campbell, Nick Sampas, Anya Tsalenko, Peter H
BF528 - Genomic Variation and SNP Analysis
Trevor J. Pemberton, Chaolong Wang, Jun Z. Li, Noah A. Rosenberg 
Evaluating the Effects of Imputation on the Power, Coverage, and Cost Efficiency of Genome-wide SNP Platforms  Carl A. Anderson, Fredrik H. Pettersson,
Presentation transcript:

Lessons learnt from the 1000 Genomes Project about sequencing in populations Gil McVean Wellcome Trust Centre for Human Genetics and Department of Statistics, University of Oxford

Some questions What has the 1000 Genomes Project told us about how to sequence (in) populations What has the 1000 Genomes Project told us about populations

Samples for the 1000 Genomes Project Major population groups comprised of subpopulations of c. 100 each GBR FIN TSI IBS CEU JPT CHB CHS CDX KHV GWB GHN YRI MAB LWK MXL CLM ASW AJM ACB PEL PUR Samples from S. Asia

The role of the 1000G Project in medical genetics A catalogue of variants – 95% of variants at 1% frequency in populations of interest A representation of ‘normal’ variation A set of haplotypes for imputation into GWAS A training ground for sequencing/statistical/computational technologies

TSI* CEU JPT CHB CHS* YRI LWK* *Exon pilot only Samples for the 1000 Genomes Project: Pilot

Population-scale genome sequencing Haplotypes 2x 10x

What has the project generated?

>15 million SNPs, >50% of them novel dbSNP entries increased by 70%

An huge increase in the set of structural variants

A robust and modular pipeline for analysis of population- scale sequence data

An efficient format for storing aligned reads and a set of tools to manipulate and view the files SAM/BAM format for storing (aligned) reads Bioinformatics (2009)

An information-rich format for storing generic haplotype/genotype data and tools for manipulating the files

An understanding of the ‘rare functional variant load’ carried by individuals c. 250 LOF / person c. 75 HGMD DM

USH2A Mutations cause with Usher syndrome 66 missense variants in dbSNP 2/3 detected in 1000 Genomes Pilot One HGMD ‘disease-causing’ variant homozygous in 3 YRI – Other reports indicate this is not a real disease-causing variant

Samples for the 1000 Genomes Project: Phase1 GBR FIN TSI CEU JPT CHB CHS YRI LWK MXL CLM ASW PUR

Lessons learnt about sequencing in populations

Lesson 1. The low-coverage model works for variant discovery

A near complete record of common variants CEU

Lesson 2. The low coverage model works for SNP genotyping

A set of accurate genotypes/haplotypes CEU

Lesson 3. The genome has a large grey area where variant calling is hard

Lesson 4. Joint calling of different variant types substantially improves the quality of calls

Lesson 5. Managing uncertainty is important

Lesson 6. Data visualisation is key

Lessons learnt about populations

Closely related populations can have substantially different rare variants

Spatial heterogeneity in non-genetic risk can differentially confound association studies for rare and common variants Iain Mathieson

Thanks to the many... Steering committee – Co-chairs: Richard Durbin and David Altshuler Samples and ELSI Committee – Co-chairs: Aravinda Chakravarti and Leena Peltonen Data Production Group – Co-chairs: Elaine Mardis and Stacey Gabriel Analysis Group – Co-Chairs: Gil McVean and Goncalo Abecasis – Subgroups in gene-targeted sequencing (Richard Gibbs) and population genetics (Molly Przeworski) Structural Variation Group – Co-chairs: Matt Hurles, Charles Lee and Evan Eichler DCC – Co-Chairs: Paul Flicek and Steve Sherry