Download presentation
Presentation is loading. Please wait.
Published byCecilia Catherine Evans Modified over 9 years ago
1
Vanderbilt’s DNA Databank: BioVU
2
Personalized Medicine Integration of genomic information into clinical decision making Personalized disease treatment and also preventative therapies
3
What is BioVU? The move towards personalized medicine requires very large sample sets for discovery and validation BioVU: biobank intended to support a broad view of biology and enable personalized medicine Contains de-identified DNA extracted from leftover blood after clinically-indicated testing of Vanderbilt patients who have not opted out Linked to Synthetic Derivative: de-identified EMR Current sample number: 135,765 o 120,705 adult samples o 15,099 pediatric samples
4
Patient Communication Modules
5
eligible John Doe One way hash A7CCF99DE5732…. A7CCF99DE65732…. scrubbed Extract DNA A7CCF99DE65732…. John Doe The “synthetic derivative” (SD): can be updated
6
The Synthetic Derivative A Derivative of the EMR - information content reduced by ‘scrubbing’ identifiers Systematically shifted event dates Contains ~1.9 million records o ~1 million with detailed longitudinal data o averaging 100,000 bytes in size o an average of 27 codes per record Records updated over time and are current through 4/30/11
7
Narratives, such as: Clinical Notes Discharge Summaries History and Physicals Problem Lists Surgical Reports Progress Notes Letters Diagnostic Codes, Procedural Codes Forms (intake, assessment) Reports (pathology, ECGs, echocardiograms) Clinical Communications Lab Values and Vital Signs Medication Orders TraceMaster (ECGs) Synthetic Derivative Data Types
8
Synthetic Derivative vs. BioVU A7CDE6532 …. scrubbed + A7CDE6532 …. scrubbed Synthetic Derivative BioVU ~1.9 million ~135,000
9
Sample accrual Current accrual as of 2-13-2012: 135,765 samples 15,099 pediatric
10
AGE GENDER RACE BioVU Demographics
11
BioVU Sample Management RTS SmaRTStore
12
Validation in BioVU Sample handling algorithms o Gender match o 1/384 gender mismatches Ancestry o Characterize sample ancestry, assess usefulness of ‘race’ as defined in EMR o Provide a panel of ancestry informative markers that define ancestry o No significant difference between the concordance of self-report or observer-report with genetic ancestry Demonstration project – American Journal of Human Genetics, 2010 o Can known associations between genetic variants and common diseases be identified in the EMR?
13
The “demonstration project” Genotype “high-value” SNPs in the first 8,000 samples accrued. o including SNPs associated by replicated genome-wide experiments with common diseases & traits 1.Atrial fibrillation 2.Crohn’s disease 3.Multiple Sclerosis 4.Rheumatoid arthritis 5.Type II Diabetes Develop Natural Language Processing methods to identify cases and controls Are genotype-phenotype relations replicated?
14
First results 0.55.01.0 Odds Ratio rs2200733Chr. 4q25 rs10033464Chr. 4q25 rs11805303IL23R rs17234657Chr. 5 rs1000113Chr. 5 rs17221417NOD2 rs2542151PTPN22 rs3135388DRB1*1501 rs2104286IL2RA rs6897932IL7RA rs6457617Chr. 6 rs6679677RSBN1 rs2476601PTPN22 rs4506565TCF7L2 rs12255372TCF7L2 rs12243326TCF7L2 rs10811661CDKN2B rs8050136FTO rs5219KCNJ11 rs5215KCNJ11 rs4402960IGF2BP2 Atrial fibrillation Crohn's disease Multiple sclerosis Rheumatoid arthritis Type 2 diabetes disease gene / region marker 2.0
15
0.55.01.0 Odds Ratio rs2200733Chr. 4q25 rs10033464Chr. 4q25 rs11805303IL23R rs17234657Chr. 5 rs1000113Chr. 5 rs17221417NOD2 rs2542151PTPN22 rs3135388DRB1*1501 rs2104286IL2RA rs6897932IL7RA rs6457617Chr. 6 rs6679677RSBN1 rs2476601PTPN22 rs4506565TCF7L2 rs12255372TCF7L2 rs12243326TCF7L2 rs10811661CDKN2B rs8050136FTO rs5219KCNJ11 rs5215KCNJ11 rs4402960IGF2BP2 Atrial fibrillation Crohn's disease Multiple sclerosis Rheumatoid arthritis Type 2 diabetes disease gene / region marker 2.0 First results
16
Types of projects Discovery or validation of genotype-phenotype relations for disease susceptibility or drug responses Discovery of new disease/susceptibility genes resequence in patients (obesity, Cushing's, susceptibility to infection, insomnia, pre-term birth) Access samples without disease X, or “normals” of specified ancestry, or old normals Phenome-wide association study (PheWAS): in development
17
Data Use Agreement
18
Genotyping Data Accrual
19
Common Diagnoses in BioVU
20
Examples of ICD-9 codes for rare diseases Example Rare DiseaseNumber in SDNumber in BioVU Microcephalus1,07085 Pica11522 Septicemic Plague210 Pick’s Disease458 Acromegaly and Gigantism571123 Ehlers-Danlos Syndrome28534 Narcolepsy without Cataplexy43876 Spina Bifida1968238 Stiff-Man Syndrome8217 Tourette Syndrome66734 Bell’s Palsy2534402 Bulimia Nervosa91988 Cushing’s1443298 Peyronies Disease694157 Wilson’s Disease14049 Meningioma1444355 Wegener’s363141
39
Not included in SD searches: Bone marrow transplant SCID Flagged Compromised samples: Transfusion within 2 weeks of blood draw Leukemia Myeloma Lymphoma Pre-leukemic states
53
General algorithm for determining EMR phenotype Iteratively refine case definition through partial manual review until case definition yields PPV ≥ 95% For small case sizes (~100), hand curate cases but use automated case definitions for others For samples with inadequate counts of “Definite Cases”, manually review possible cases to determine true positives For controls, exclude all potentially overlapping syndromes and possible matches, iteratively refine such that NPV ≥ 98% Definite Cases (algorithm-defined) Possible Cases (require manual review) Controls (algorithm-defined) Excluded (algorithm-defined)
54
The problem with ICD9 codes ICD9 give both false negatives and false positives negatives False negatives: Outpatient billing limited to 4 diagnoses/visit Outpatient billing done by physicians (e.g., takes too long to find the unknown ICD9) Inpatient billing done by professional coders: omit codes that don’t pay well can only code problems actually explicitly mentioned in documentation positives: False positives: Diagnoses evolve over time -- physicians may initially bill for suspected diagnoses that later are determined to be incorrect Billing the wrong code (perhaps it is easier to find for a busier clinician) Physicians may bill for a different condition if it pays for a given treatment Example: Anti-TNF biologics (e.g., infliximab) originally not covered for psoriatic arthritis, so rheumatologists would code the patient as having rheumatoid arthritis
55
EMR Phenotyping MedicationsMedicationsLabsLabsICD-9s ≥3 codes ICD-9s ExclusionsExclusions Time Constraints + + PHENOTYPEPHENOTYPE
56
Lessons from preliminary phenotype development Eliminating negated and uncertain terms: –“I don’t think this is MS”, “uncertain if multiple sclerosis” Delineating section tag of the note –“FAMILY MEDICAL HISTORY: Mother had multiple sclerosis.” Adding requirements for further signs of “severity of disease” –For MS: an MRI with T2 enhancement, myelin basic protein or oligoclonal bands on lumbar puncture, etc. –This could potentially miss patients with outside work-ups, however
57
Other lessons (more difficult to correct) A number of incorrect ICD9 codes for RA and MS assigned to patients Evolving disease –“Recently diagnosed with Susac’s syndrome - prior diagnosis of MS incorrect.” (Notes also included a thorough discussion of MS, ADEM, and Susac’s syndrome.) Difference between two doctors: –Presurgical admission H&P includes “rheumatoid arthritis” in the past medical history –Rheumatology clinic visits notes say the diagnosis is “dermatomyositis” - never mention RA Sometimes incorrect diagnoses are propagated through the record due to cutting-and-pasting / note reuse
62
ANALYSIS PLAN 1.Sample size estimation 2.Dependent/outcome variable 3.Independent variables (include SNPs, covariates, confounders) a.Should have race, gender, age in all plans 4.Statistical method proposed a.Type of model if appropriate b.How SNPs will be coded 5.Power calculation 6.Population stratification plans 7.QC plans a.Call rate, gender checks, HWE – these will be important to do on each dataset pulled to check for phenotype specific QC issues PHENOTYPE PLAN 1.Trait of interest for study 2.Demographic constraints (e.g. gender, age, and/or ethnicity) 3.Cases and controls require outline of definition including: Inclusion criteria (e.g. ICD9 codes, keyword search, medications, laboratory results) Exclusion criteria (e.g. ICD9s, keywords, meds, labs, minimum data or follow up) 4.Validation plan for phenotype (e.g. manual review of all or some records)
63
VICTR Funding
65
Investigator query cases controls + Data use agreement + IRB Approval
66
Investigator query cases controls + Data use agreement + IRB Approval Manual Review
67
Sample retrieval cases controls + Investigator query cases controls + Data use agreement + IRB Approval
68
Sample retrieval Genotyping, genotype- phenotype relations cases controls + Investigator query cases controls + Data use agreement + IRB Approval
69
BioVU Genotyping Process Genotyped data analyzed by investigator Genotyped data analyzed by investigator Investigator selects cases and controls from Synthetic Derivative Investigator selects cases and controls from Synthetic Derivative Investigator signals BioVU program to initiate sample selection Investigator signals BioVU program to initiate sample selection BioVU notifies DNA resources core that samples are ready for selection and picking BioVU notifies DNA resources core that samples are ready for selection and picking Samples are provided to appropriate lab and are genotyped Samples are provided to appropriate lab and are genotyped Investigator and BioVU program receive genotype data Investigator and BioVU program receive genotype data BioVU Genotyping Process:
70
BioVU Requests 60 Total Requests 43 Approvals
71
71 BioVU: New Directions controls A well characterized cohort of individuals without specific diseases across all ages to be used as controls plasma Expansion of BioVU to capture and store plasma to enable candidate proteomic/biomarker research mitochondrial SNP genotyping Expanding BioVU genotyping to include mitochondrial SNP genotyping and copy number variants mom-baby pairs Link pediatric DNA samples to maternal samples (mom-baby pairs resource) whole exome sequencing Expansion of BioVU sequencing activities to include whole exome sequencing on targeted populations
72
FAQ “answers” SD access: “non-human subjects” IRB review (days) Current access costs: $4/sample Genotyping data: no charge Genotyping: o Investigator-funded Consider VICTR as a funding source o Genotyping/sequencing performed in VUMC Core Facilities Justification must be provided for outside genotyping, including quality control plans o Genotype “redeposit” part of the data use agreement
73
Questions? Contact: Erica Bowton PhD BioVU Program Manager erica.bowton@vanderbilt.edu 322-1975
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.