1000G Phase 1 Release chr20 call sets Ryan Poplin Genome Sequencing and Analysis Medical and Population Genetics January 25, 2011
Data and Definitions -- Pipeline 2 Full indel cleaning process including known indels BAQ calculation using GATK implementation of H. Li Called by main continental AP and by admixed+ AP Variant quality score recalibration Quality cut chosen using HapMap3.3 + Omni 2.5M chip sensitivity Cut at 99.2% of accessible sites Not yet done genotype refinement
Data and Definitions – 1004 Samples 3 ASN=CHB + CHS + JPT ASN+= CHB + CHS + JPT + MXL + CLM + PUR EUR=CEU + FIN + GBR + TSI + IBS EUR+=CEU + FIN + GBR + TSI + IBS + MXL + CLM + PUR + ASW AFR=LWK + YRI + ASW AFR+= LWK + YRI + ASW + CLM + PUR AMR=MXL + CLM + PUR AMR+=MXL + CLM + PUR + ASW Note these definitions differ from the other groups
# samples Analysis Panel Total # variants dbSNP % (129) # knowns Known ti/tv # novels Novel ti/tv Novel non-CpG ti/tv 266ASN264, , , ASN+446, , , EUR300, , , EUR+516, , , AFR475, , , AFR+529, , , AMR350, , , AMR+452, , , Final chr20 callsets including fragment-based calling and contrastive VQSR clustering