Presentation is loading. Please wait.

Presentation is loading. Please wait.

Vanderbilt’s DNA Databank: BioVU. Personalized Medicine Integration of genomic information into clinical decision making Personalized disease treatment.

Similar presentations


Presentation on theme: "Vanderbilt’s DNA Databank: BioVU. Personalized Medicine Integration of genomic information into clinical decision making Personalized disease treatment."— Presentation transcript:

1 Vanderbilt’s DNA Databank: BioVU

2 Personalized Medicine Integration of genomic information into clinical decision making Personalized disease treatment and also preventative therapies

3 What is BioVU? The move towards personalized medicine requires very large sample sets for discovery and validation BioVU: biobank intended to support a broad view of biology and enable personalized medicine Contains de-identified DNA extracted from leftover blood after clinically-indicated testing of Vanderbilt patients who have not opted out Linked to Synthetic Derivative: de-identified EMR Current sample number: 135,765 o 120,705 adult samples o 15,099 pediatric samples

4 Patient Communication Modules

5 eligible John Doe One way hash A7CCF99DE5732…. A7CCF99DE65732…. scrubbed Extract DNA A7CCF99DE65732…. John Doe The “synthetic derivative” (SD): can be updated

6 The Synthetic Derivative A Derivative of the EMR - information content reduced by ‘scrubbing’ identifiers Systematically shifted event dates Contains ~1.9 million records o ~1 million with detailed longitudinal data o averaging 100,000 bytes in size o an average of 27 codes per record Records updated over time and are current through 4/30/11

7  Narratives, such as: Clinical Notes Discharge Summaries History and Physicals Problem Lists Surgical Reports Progress Notes Letters  Diagnostic Codes, Procedural Codes  Forms (intake, assessment)  Reports (pathology, ECGs, echocardiograms)  Clinical Communications  Lab Values and Vital Signs  Medication Orders  TraceMaster (ECGs) Synthetic Derivative Data Types

8 Synthetic Derivative vs. BioVU A7CDE6532 …. scrubbed + A7CDE6532 …. scrubbed Synthetic Derivative BioVU ~1.9 million ~135,000

9 Sample accrual Current accrual as of 2-13-2012: 135,765 samples 15,099 pediatric

10 AGE GENDER RACE BioVU Demographics

11 BioVU Sample Management RTS SmaRTStore

12 Validation in BioVU Sample handling algorithms o Gender match o 1/384 gender mismatches Ancestry o Characterize sample ancestry, assess usefulness of ‘race’ as defined in EMR o Provide a panel of ancestry informative markers that define ancestry o No significant difference between the concordance of self-report or observer-report with genetic ancestry Demonstration project – American Journal of Human Genetics, 2010 o Can known associations between genetic variants and common diseases be identified in the EMR?

13 The “demonstration project” Genotype “high-value” SNPs in the first 8,000 samples accrued. o including SNPs associated by replicated genome-wide experiments with common diseases & traits 1.Atrial fibrillation 2.Crohn’s disease 3.Multiple Sclerosis 4.Rheumatoid arthritis 5.Type II Diabetes Develop Natural Language Processing methods to identify cases and controls Are genotype-phenotype relations replicated?

14 First results 0.55.01.0 Odds Ratio rs2200733Chr. 4q25 rs10033464Chr. 4q25 rs11805303IL23R rs17234657Chr. 5 rs1000113Chr. 5 rs17221417NOD2 rs2542151PTPN22 rs3135388DRB1*1501 rs2104286IL2RA rs6897932IL7RA rs6457617Chr. 6 rs6679677RSBN1 rs2476601PTPN22 rs4506565TCF7L2 rs12255372TCF7L2 rs12243326TCF7L2 rs10811661CDKN2B rs8050136FTO rs5219KCNJ11 rs5215KCNJ11 rs4402960IGF2BP2 Atrial fibrillation Crohn's disease Multiple sclerosis Rheumatoid arthritis Type 2 diabetes disease gene / region marker 2.0

15 0.55.01.0 Odds Ratio rs2200733Chr. 4q25 rs10033464Chr. 4q25 rs11805303IL23R rs17234657Chr. 5 rs1000113Chr. 5 rs17221417NOD2 rs2542151PTPN22 rs3135388DRB1*1501 rs2104286IL2RA rs6897932IL7RA rs6457617Chr. 6 rs6679677RSBN1 rs2476601PTPN22 rs4506565TCF7L2 rs12255372TCF7L2 rs12243326TCF7L2 rs10811661CDKN2B rs8050136FTO rs5219KCNJ11 rs5215KCNJ11 rs4402960IGF2BP2 Atrial fibrillation Crohn's disease Multiple sclerosis Rheumatoid arthritis Type 2 diabetes disease gene / region marker 2.0 First results

16 Types of projects Discovery or validation of genotype-phenotype relations for disease susceptibility or drug responses Discovery of new disease/susceptibility genes  resequence in patients (obesity, Cushing's, susceptibility to infection, insomnia, pre-term birth) Access samples without disease X, or “normals” of specified ancestry, or old normals Phenome-wide association study (PheWAS): in development

17 Data Use Agreement

18 Genotyping Data Accrual

19 Common Diagnoses in BioVU

20 Examples of ICD-9 codes for rare diseases Example Rare DiseaseNumber in SDNumber in BioVU Microcephalus1,07085 Pica11522 Septicemic Plague210 Pick’s Disease458 Acromegaly and Gigantism571123 Ehlers-Danlos Syndrome28534 Narcolepsy without Cataplexy43876 Spina Bifida1968238 Stiff-Man Syndrome8217 Tourette Syndrome66734 Bell’s Palsy2534402 Bulimia Nervosa91988 Cushing’s1443298 Peyronies Disease694157 Wilson’s Disease14049 Meningioma1444355 Wegener’s363141

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39 Not included in SD searches: Bone marrow transplant SCID Flagged Compromised samples: Transfusion within 2 weeks of blood draw Leukemia Myeloma Lymphoma Pre-leukemic states

40

41

42

43

44

45

46

47

48

49

50

51

52

53 General algorithm for determining EMR phenotype Iteratively refine case definition through partial manual review until case definition yields PPV ≥ 95% For small case sizes (~100), hand curate cases but use automated case definitions for others For samples with inadequate counts of “Definite Cases”, manually review possible cases to determine true positives For controls, exclude all potentially overlapping syndromes and possible matches, iteratively refine such that NPV ≥ 98% Definite Cases (algorithm-defined) Possible Cases (require manual review) Controls (algorithm-defined) Excluded (algorithm-defined)

54 The problem with ICD9 codes ICD9 give both false negatives and false positives negatives False negatives: Outpatient billing limited to 4 diagnoses/visit Outpatient billing done by physicians (e.g., takes too long to find the unknown ICD9) Inpatient billing done by professional coders: omit codes that don’t pay well can only code problems actually explicitly mentioned in documentation positives: False positives: Diagnoses evolve over time -- physicians may initially bill for suspected diagnoses that later are determined to be incorrect Billing the wrong code (perhaps it is easier to find for a busier clinician) Physicians may bill for a different condition if it pays for a given treatment Example: Anti-TNF biologics (e.g., infliximab) originally not covered for psoriatic arthritis, so rheumatologists would code the patient as having rheumatoid arthritis

55 EMR Phenotyping MedicationsMedicationsLabsLabsICD-9s ≥3 codes ICD-9s ExclusionsExclusions Time Constraints + + PHENOTYPEPHENOTYPE

56 Lessons from preliminary phenotype development Eliminating negated and uncertain terms: –“I don’t think this is MS”, “uncertain if multiple sclerosis” Delineating section tag of the note –“FAMILY MEDICAL HISTORY: Mother had multiple sclerosis.” Adding requirements for further signs of “severity of disease” –For MS: an MRI with T2 enhancement, myelin basic protein or oligoclonal bands on lumbar puncture, etc. –This could potentially miss patients with outside work-ups, however

57 Other lessons (more difficult to correct) A number of incorrect ICD9 codes for RA and MS assigned to patients Evolving disease –“Recently diagnosed with Susac’s syndrome - prior diagnosis of MS incorrect.” (Notes also included a thorough discussion of MS, ADEM, and Susac’s syndrome.) Difference between two doctors: –Presurgical admission H&P includes “rheumatoid arthritis” in the past medical history –Rheumatology clinic visits notes say the diagnosis is “dermatomyositis” - never mention RA Sometimes incorrect diagnoses are propagated through the record due to cutting-and-pasting / note reuse

58

59

60

61

62 ANALYSIS PLAN 1.Sample size estimation 2.Dependent/outcome variable 3.Independent variables (include SNPs, covariates, confounders) a.Should have race, gender, age in all plans 4.Statistical method proposed a.Type of model if appropriate b.How SNPs will be coded 5.Power calculation 6.Population stratification plans 7.QC plans a.Call rate, gender checks, HWE – these will be important to do on each dataset pulled to check for phenotype specific QC issues PHENOTYPE PLAN 1.Trait of interest for study 2.Demographic constraints (e.g. gender, age, and/or ethnicity) 3.Cases and controls require outline of definition including: Inclusion criteria (e.g. ICD9 codes, keyword search, medications, laboratory results) Exclusion criteria (e.g. ICD9s, keywords, meds, labs, minimum data or follow up) 4.Validation plan for phenotype (e.g. manual review of all or some records)

63 VICTR Funding

64

65 Investigator query cases controls + Data use agreement + IRB Approval

66 Investigator query cases controls + Data use agreement + IRB Approval Manual Review

67 Sample retrieval cases controls + Investigator query cases controls + Data use agreement + IRB Approval

68 Sample retrieval Genotyping, genotype- phenotype relations cases controls + Investigator query cases controls + Data use agreement + IRB Approval

69 BioVU Genotyping Process Genotyped data analyzed by investigator Genotyped data analyzed by investigator Investigator selects cases and controls from Synthetic Derivative Investigator selects cases and controls from Synthetic Derivative Investigator signals BioVU program to initiate sample selection Investigator signals BioVU program to initiate sample selection BioVU notifies DNA resources core that samples are ready for selection and picking BioVU notifies DNA resources core that samples are ready for selection and picking Samples are provided to appropriate lab and are genotyped Samples are provided to appropriate lab and are genotyped Investigator and BioVU program receive genotype data Investigator and BioVU program receive genotype data BioVU Genotyping Process:

70 BioVU Requests 60 Total Requests 43 Approvals

71 71 BioVU: New Directions controls A well characterized cohort of individuals without specific diseases across all ages to be used as controls plasma Expansion of BioVU to capture and store plasma to enable candidate proteomic/biomarker research mitochondrial SNP genotyping Expanding BioVU genotyping to include mitochondrial SNP genotyping and copy number variants mom-baby pairs Link pediatric DNA samples to maternal samples (mom-baby pairs resource) whole exome sequencing Expansion of BioVU sequencing activities to include whole exome sequencing on targeted populations

72 FAQ “answers” SD access: “non-human subjects” IRB review (days) Current access costs: $4/sample Genotyping data: no charge Genotyping: o Investigator-funded  Consider VICTR as a funding source o Genotyping/sequencing performed in VUMC Core Facilities  Justification must be provided for outside genotyping, including quality control plans o Genotype “redeposit” part of the data use agreement

73 Questions? Contact: Erica Bowton PhD BioVU Program Manager erica.bowton@vanderbilt.edu 322-1975


Download ppt "Vanderbilt’s DNA Databank: BioVU. Personalized Medicine Integration of genomic information into clinical decision making Personalized disease treatment."

Similar presentations


Ads by Google