Download presentation
Presentation is loading. Please wait.
1
Resolving membership in a study in shared aggregate genetics data David W. Craig, Ph.D. Investigator & Associate Director Neurogenomics Division dcraig@tgen.org
2
Genome-wide Association Studies Nature Reviews Genetics Genome-wide Association Studies (GWAS) genotype millions of Single Nucleotide Polymorphisms (SNPs) across 1000’s of individuals. SNPs are typically biallic and diploid: CC/CT/TT 00/01/11 Due to ancestral meiotic recombination, SNPs are not independent from neighboring variants. They are often in linkage disequilibrium. The concept of LD means that a SNP may be associated with disease, due to underlying correlation with a different functional variant. Summary stats for a SNP across hundreds/thousands of individuals: 33% C / 77% T for cases and 45% C / 55% T P=10 -8 CC=508 / CT=250 / TT= 108 OR=1.8
3
Resolving Identity from aggregate genetics data GWAS are expensive, requiring genotyping of 1000’s of individuals. Often require consortiums of consortiums. Sharing individual-level data was and is a challenge. Sharing meta-data is a reasonable option. In 2007, summary allele frequency and genotype counts were routinely placed on the web for all SNPs. In 2008, after broad deliberation with the scientific community we published a forensics paper showing that one could have crude estimates of allele frequency, yet still resolve individuals. Resolve is the term we purposely use. Identify has multiple meanings, particularly in GWAS study
4
Example Aggregate Data rs90325225%26% rs23232315%15% rs32355529%29% rs23234373%75% rs23343221%22% rs2343125.1%5.1% rs1632323.1%2.8% rs839273115%16% rs2387647.3%7.1% rs38374545%54% % A allele ~500 cases % A allele ~500 controls Other SNP Aggregate Data Types: Genotypes, odds ratios, p-values, etc.
5
Visual example (SNP data as visualized) AA=1.0 AB=0.5 BB= 0 250,000 pixels
6
Merge 96 independent data images equally
7
After merging, individual images still resolvable No AdjustmentAuto Contrast & Smooth Filter
8
Conceptual Approach Rs90325225%35%100%+10 Rs23232315%13%50%-2 Rs32355529%39%100%+10 Rs23234373%51%0%+22 Rs23343221%32%100%+11 Rs2343125%15%50%+10 Rs1632323%0%0%+3 ….. …..…..…..….. Data Set of Question Person Of Interest Directional score Reference Data Set SNP
9
Reference Data Set Rs90325225%35%100%+10 Rs23232315%13%50%-2 Rs32355529%39%100%+10 Rs23234373%51%0%+22 Rs23343221%32%100%+11 Rs2343125%15%50%+10 Rs1632323%0%0%+3 ….. …..…..…..….. Data Set of Question Person Of Interest Directional score SNP Equations (one approach of many!!) D = 9.1 sd( D ) = 7.4 s = 7 T = D / ( sd( D )/√ s ) 3.2 = 9.1 / ( 7.4/√7 )
10
Resolving Individuals in Aggregate Data Sets
11
Results on pooled samples
12
Impact NIH policy was changed Summary-level data is no longer freely available on the web in a distributed unrestrictive manner. Additional papers refined the math and described limitations
13
Managing Risk Distributing results of studies on human subjects inherently increases the the risk of a person being identifiable.. Context is important. The concept of Positive Predictive Value (PPV) can provide a measure. PPV can also account for ‘at-risk’ populations. Currently, working with NIH on guidance for measuring risk with a given dataset The approaches leveraged a critical concept of directionality, specific to genotype data and frequency tables. P-values represent a fundamentally different datatype with low information content
14
A new era
15
The era of whole-genome sequencing is approaching SNPs are common and usually defined as greater than 1% Whole-genome sequencing and exome sequencing inherently measure rare variants. Rare variants can be highly informative, particularly in combination. Approaches need to be explored for summarizing results without revealing identity.
16
Acknowledgements Lab Jennifer Dinh Szabolcs Szelinger Holly Benson Meredith Sanchez-Castillo Brooke Hjelm Informatics Nils Homer, Ph.D. Tyler Izatt Jessica Aldrich Alexis Christoforides Ahmet Kurdoglu James Long Shripad Sinari Funding NINDS U24NS051872 State of Arizona NHGRI U01HG005210 This work: ENDGAME (NHLBI U01 HL086528 )
17
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.