Sequencing of the South Asian Genome Lamri Amel Postdoctoral fellow 1.

Slides:



Advertisements
Similar presentations
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Advertisements

Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Patterns of population structure and admixture among human populations Katarzyna Bryc OEB 275br February 19, 2013.
MALD Mapping by Admixture Linkage Disequilibrium.
Sequencing Neanderthal DNA
Methods and challenges in the analysis of admixed human genomes Simon Gravel Stanford University.
Signatures of Selection
Office hours Wednesday 3-4pm 304A Stanley Hall. Fig Association mapping (qualitative)
Admixture Mapping Qunyuan Zhang Division of Statistical Genomics GEMS Course M Computational Statistical Genetics Computational Statistical Genetics.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Welcome to CS374! A survey of computer science in genomics today ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG.
Differential relatedness of African Americans to populations within West Africa Katarzyna Bryc 1**, Amy Williams 1**, Nick Patterson 2, Solomon Musani.
What does it mean, in practice? 100%. Members of our community are only slightly less different from us than members of distant populations 85% 100%
IBD genetics in children across diverse populations Subra Kugathasan, MD Professor of Pediatrics and Human Genetics Emory University.
Medical variations Gabor T. Marth Boston College Biology Department BI543 Fall 2013 February 5, 2013.
Loss-of-co-Homozygosity mapping and exome sequencing of a Syrian pedigree identified the candidate causal mutation associated with rheumatoid arthritis.
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
Molecular & Genetic Epi 217 Association Studies
A Genome-wide association study of Copy number variation in schizophrenia Andrés Ingason CNS Division, deCODE Genetics. Research Institute of Biological.
Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG.
Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG.
Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG.
Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Figure 5.1 Giant panda (Ailuropoda melanoleuca)
HW2: exome sequencing and complex disease Jacquemin Jonathan de Bournonville Sébastien.
The International Consortium. The International HapMap Project.
Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete populations Katarzyna Bryc Postdoctoral Fellow, Reich Lab, Harvard.
Lecture 16 Tuesday, April 9, 2013 BiSc 001 Spring 2013 Guest Lecture Dr. Jihye Park.
Motivations to study human genetic variation
Inferring the Demographic History of the Ashkenazi Jewish population Shai Carmi Pe’er lab, Columbia University Leicester, UK April 2014.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
Admixture Mapping Controlled Crosses Are Often Used to Determine the Genetic Basis of Differences Between Populations. When controlled crosses are not.
Analysis of Next Generation Sequence Data BIOST /06/2015.
The Little BIG HISTORY of Human Migration The Horn of Africa, 80,000 BC: Have you ever wondered what routes our ancestors took as they multiplied and settled.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Interpreting exomes and genomes: a beginner’s guide
Nucleotide variation in the human genome
Gil McVean Department of Statistics
Global Variation in Copy Number in the Human Genome
Collecting Family Medical History and Ancestry Data Yvette Conley, PhD
Daniel Falush, Dan Lawson, Lucy van Dorp
Imputation-based local ancestry inference in admixed populations
Itsik Pe’er, Yves R. Chretien, Paul I. W. de Bakker, Jeffrey C
Deep Roots for Aboriginal Australian Y Chromosomes
Investigating the Association of Genetic Admixture and Donor/Recipient Genetic Disparity with Transplant Outcomes  Abeer Madbouly, Tao Wang, Michael Haagenson,
The Kalash Genetic Isolate: Ancient Divergence, Drift, and Selection
Alicia R. Martin, Christopher R. Gignoux, Raymond K
Incorporating changing population size into the coalescent
Gene Discovery for Complex Traits: Lessons from Africa
Chad Genetic Diversity Reveals an African History Marked by Multiple Holocene Eurasian Migrations  Marc Haber, Massimo Mezzavilla, Anders Bergström, Javier.
Genomic Signatures of Selective Pressures and Introgression from Archaic Hominins at Human Innate Immunity Genes  Matthieu Deschamps, Guillaume Laval,
Variant Association Tools for Quality Control and Analysis of Large-Scale Sequence and Genotyping Array Data  Gao T. Wang, Bo Peng, Suzanne M. Leal  The.
Ida Moltke, Matteo Fumagalli, Thorfinn S. Korneliussen, Jacob E
Volume 173, Issue 1, Pages e9 (March 2018)
Sequencing the IL4 locus in African Americans implicates rare noncoding variants in asthma susceptibility  Gabe Haller, BA, Dara G. Torgerson, PhD, Carole.
Robust Inference of Identity by Descent from Exome-Sequencing Data
Selection and Reduced Population Size Cannot Explain Higher Amounts of Neandertal Ancestry in East Asian than in European Human Populations  Bernard Y.
Characteristics of Neutral and Deleterious Protein-Coding Variation among Individuals and Populations  Wenqing Fu, Rachel M. Gittelman, Michael J. Bamshad,
Chad Genetic Diversity Reveals an African History Marked by Multiple Holocene Eurasian Migrations  Marc Haber, Massimo Mezzavilla, Anders Bergström, Javier.
Amélie Bonnefond, Philippe Froguel  Cell Metabolism 
Detection of human adaptation during the past 2000 years
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
Analysis of protein-coding genetic variation in 60,706 humans
Giulio Genovese, Robert E. Handsaker, Heng Li, Eimear E
The Time and Place of European Gene Flow into Ashkenazi Jews
Presentation transcript:

Sequencing of the South Asian Genome Lamri Amel Postdoctoral fellow 1

Human genome sequencing projects (I) First draft of the Caucasian genome Final version Remark: all males st Korean (East Asian) genome sequence 1 st Han Chinese (East Asian) genome sequenced st Yoruban (African) genome sequenced 2010 First 2 Southern African genomes Years 2

Human genome sequencing projects (II) G pilot project N= 179 Caucasians, Africans and east Asians G Phase1, n=1092 Caucasians, Africans, east Asians and native Americans First South Asian- indian genome sequenced G phase 3 N= 2,504 individuals from 26 populations (Caucasians, Africans, east Asians and native Americans and south Asians (~500) 2014 The south Asian genome sequencing in (N=148 and 38) Years 12 years later … 3 What about the other ethnic groups ??

4

1000Genomes Chambers et al UK USA N=148 Population Sampling N=489 Wong et al, n =38 5

Sequencing and validation Wong et alChambers et al1000 Genome project NNN Whole- genome Sequencing The three studies have : Different sample size Different sequencing depth ==> What is the sequencing depth ?? Wong et alChambers et al1000 Genome project NdepthN N Whole- genome Sequencing 38× 30168× ×8 6

Sequencing AAATCTGTTCAACCATGCACAGTAATCGATTGACT DNA sequencing X Contigs (overlap) TGTTCAACCATGC AACCATGCACAGTA CACAGTAATCGAT TAATCGATTGAC TGTTCAACCATGCACAGTAATCGATTGAC reconstruction TGTTCAACCATGC AACCATGCACAGTA 7

Sequencing depth number of times a base pair is covered by contigs A 4x (Low coverage) x60 (high coverage) A Less precise (sequencing errors) Cheeper => more samples genotyped for a fixed budget Genotype accuracy is higher More expansive => less samples genotyped for a fixed budget 8

Sequencing and validation The three studies have : Different sample size Different sequencing depth ==> What is the sequencing depth ?? Wong et alChambers et al1000 Genome project NdepthN N Whole- genome Sequencing 38× 30168× ×8 9

Sequencing and validation Wong et alChambers et al1000 Genome project depthN N N Whole- genome Sequencing × 3038× 4.3 × ×8489 Whole- exome sequencing --× × High density genotyping microarray

Loses genetic important genetic information Allows the identification of new exonic variants only Cheaper  Deeper sequencing of more samples and/or more depth Gathers all the genetic information Allows the identification of new variants More expensive  Usually low depth and/or less samples Whole genome sequencing / whole exome sequencing / genotyping arrays Gene 1Gene 2 Sequencing Whole exome sequencing Whole genome sequencing Gene 1Gene 2Gene 1Gene 2 Genotyping array A/T G/T C/A G/A G/C Loses genetic important genetic information +++ No identification of new variants Cheaper +++  genotyping of more samples +++ genotyping 11

Sequencing and validation Wong et alChambers et al1000 Genome project depthN N N Whole- genome Sequencing × 3038× 4.3 × ×8489 Whole- exome sequencing --× × High density genotyping microarray

Sequencing and validation Wong et alChambers et al1000 Genome project depthN N N Whole- genome Sequencing × 3038× ×8489 Whole- exome sequencing --× × High density genotyping microarray Sequencing of relatives ----x47141 (129 trios, 12 duos) 13

Sequencing and validation Wong et alChambers et al1000 Genome project depthN N N Whole- genome Sequencing × 3038× ×8489 Whole- exome sequencing --× × High density genotyping microarray Sequencing of relatives ----x47141 (129 trios, 12 duos) 14

Population CodePopulation DescriptionSuper Population Code CHBHan Chinese in Bejing, ChinaEAS JPTJapanese in Tokyo, JapanEAS CHSSouthern Han ChineseEAS CDXChinese Dai in Xishuangbanna, ChinaEAS KHVKinh in Ho Chi Minh City, VietnamEAS CEUUtah Residents with Northern and Western Ancestry EUR TSIToscani in ItaliaEUR FINFinnish in FinlandEUR GBRBritish in England and ScotlandEUR IBSIberian Population in SpainEUR YRIYoruba in Ibadan, NigeriaAFR LWKLuhya in Webuye, KenyaAFR GWDGambian in Western Divisions in the GambiaAFR MSLMende in Sierra LeoneAFR ESNEsan in NigeriaAFR ASWAmericans of African Ancestry in SW USAAFR ACBAfrican Caribbeans in BarbadosAFR MXLMexican Ancestry from Los Angeles USAAMR PURPuerto Ricans from Puerto RicoAMR CLMColombians from Medellin, ColombiaAMR PELPeruvians from Lima, PeruAMR GIHGujarati Indian from Houston, TexasSAS PJLPunjabi from Lahore, PakistanSAS BEBBengali from BangladeshSAS STUSri Lankan Tamil from the UKSAS ITUIndian Telugu from the UKSAS 1000 Genome project populations 15

A typical south asian Genome has between 4 and 4.2 million variants, Only 2% of these variants are rare (<0,5%) 16

In a typical South Asian genome, nonsynonymous and regulatory variants account for less than 0,5% of total variants % 0.37% 17

G1 G2 G3 18

The rarest variants are most commonly shared between other ethnic groups of the same super population Shared rare variants 19

A Auton et al. Nature 526, (2015) doi: /nature15393 Population differentiation. 20

Figure 4. Enrichment for stratified genetic variants at genetic loci associated with respective phenotype in genome-wide association studies. Chambers JC, Abbott J, Zhang W, Turro E, Scott WR, et al. (2014) The South Asian Genome. PLoS ONE 9(8): e doi: /journal.pone

Why would anyone pay 120 million$ to sequence 2500 human genomes ? Enthusiastic genetic researcher Greedy businessman And you wasted 120M$ just to come up with that table ?!?!?! Can we make money out of that ?? 22 Waaah, look at these amazing 1000G results ! Did you know that the average south Asian genome had 4M variants ?

Whole genome sequencing Applications populations history and evolution (demography, migration, admixture, selection) 23

Demography Bottleneck shared demographic history for all humans beyond,150,000 to 200,000 years ago. European, Asian and American populations shared strong and sustained bottlenecks, between 15, ,000 years ago. These bottlenecks were followed by extremely rapid inferred population growth especially in Bengladesh Reason ???? growth * * * * * 24

Population structure Estimation of the proportion of each genome derived from several (7) putative ‘ancestral populations’ Using a maximum likelihood approach Each color represents a different ancestor east west USA and barbados UK USA North South 25

Population structure Estimation of the proportion of each genome derived from several (7) putative ‘ancestral populations’ Using a maximum likelihood approach Each color represents a different ancestor east west 26

Population structure Estimation of the proportion of each genome derived from several (7) putative ‘ancestral populations’ Using a maximum likelihood approach Each color represents a different ancestor USA and barbados North South 27

Population structure Estimation of the proportion of each genome derived from several (7) putative ‘ancestral populations’ Using a maximum likelihood approach Each color represents a different ancestor USA and barbados 28

Population structure Estimation of the proportion of each genome derived from several (7) putative ‘ancestral populations’ Using a maximum likelihood approach Each color represents a different ancestor 29

Population structure Estimation of the proportion of each genome derived from several (7) putative ‘ancestral populations’ Using a maximum likelihood approach Each color represents a different ancestor 30

Population structure Estimation of the proportion of each genome derived from several (7) putative ‘ancestral populations’ Using a maximum likelihood approach Each color represents a different ancestor 31

Population structure Estimation of the proportion of each genome derived from several (7) putative ‘ancestral populations’ Using a maximum likelihood approach Each color represents a different ancestor South Asian derive from one common ancestor (admixture ignored) east west USA and barbados UK USA North South 32

Population structure Out of Africa human evolution model 33

Ancestral North Indian Ancestral South Indian Input of genetical studies to historical discoveries in SA Genetics Indo-European Dravidian Language 1900 – 4200 years ago 34

3000 years  now : striking reduction of the gene flows. Admixture was replaced by strong endogamy These observation are suported by written texts that suggest the establishement of the cast system during the same period of time Input of genetical studies to historical discoveries in SA 35

Am J Hum Genet 2013 Nature 2009 Am J Hum Genet 2011 Did not use 1000G sequencing data 36

Whole genome sequencing Applications : Imputation SNP1 SNP2 SNP3SNP4SNP5 Reference genome A T C G A G C G C C A ? ? G ? A T ? G A Your genotypes Imputation 37

Imputation allows to increase the number of available genotypic data in a study genotyped with an array Increase the genome coverage and hence the chance of detecting an association signal when performing a GWAS Imputation is a common practice in all GWAS studies and uses 1000G data as a reference (Thank you 1000G !!) Whole genome sequencing Applications : Imputation 38

Take home message The south Asian has both unique and shared genomic feature with other genomes The sequencing of the human genomes offer an invaluable source data with huge applications in health and research! 39

Thank you for your attention 40

References 1000 Genome project Consortium, 2015, nature. Chambers et al, 2014, plos one. Wang et al, 2014, plos genet. Moorjani et al, 2013, Hum mol genet. Reich et al, 2009, nature. 41