Presentation is loading. Please wait.

Presentation is loading. Please wait.

Phenotype Capture in Genetic Variant Databases Peng Chen School of Computer and Information Science Supervisor: Dr Jan Stanek.

Similar presentations


Presentation on theme: "Phenotype Capture in Genetic Variant Databases Peng Chen School of Computer and Information Science Supervisor: Dr Jan Stanek."— Presentation transcript:

1 Phenotype Capture in Genetic Variant Databases Peng Chen School of Computer and Information Science chepy049@mymail.unisa.edu.au Supervisor: Dr Jan Stanek Research Fields: Health Informatics Health Computer Science Health Information System

2 Outline  Motivation  Research Question  Literature  Methodology  Phenotype Data Review Result  The openEHR Archetypes Review Result  Phenotype Capture Experiment Result  Conclusion

3 Motivation  1950s health computer science, EHR (Electronic Health Record)  Slow development  Bio-medical research & EHR systems  Genotype – Phenotype correlation

4 Research Question Can the existing standard openEHR be used to capture and store phenotype data/clinical data? Hypothesis one: most of the phenotype data in genetic variant databases is not coded, has little clinical details, not stored in a consistent manner. Hypothesis two: openEHR is potentially suitable to store phenotype data as a standard.

5 Literature  Claustres et al. (2002) ‘Time for a Unified System of Mutation Description and Reporting: A Review of Locus-Specific Mutation Databases’  Mitropoulou et al. (2010) ‘Locus-specific database domain and data content analysis: evolution and content maturation toward clinical use’  Spath & Grimson (2011) ‘Applying the archetype approach to the database of a biobank information management system’  Chen et al. (2009) ‘Archetype-based conversion of EHR content models: pilot experience with a regional EHR system’

6 Methodology  Criteria form for phenotype review 1. Storage 4. Granualrity Collect phenotypes Overall granularity level Internal storage Partial fine-grained phenotypes Proprietary external storage Foreign external storage 5. Curation Curated 2. Terminology Formal terminology 6. Multiple phenotypes Proprietary terms (mapped to Single phenotype a recognised terminology) Multiple phenotype External terminology used directly Recognised terminology 7. Case level Variant-level phenotypes 3. Coding standard Case-level phenotypes Formal coding standard Proprietary codes (mapped to 8. Database a recognised coding standard) Database family External coding standard used directlyFlatform Recognised coding standard

7 Methodology  The openEHR phenotype capture model

8 Methodology  Data integration workflow towards a proposed health care EHR integration architecture

9 Phenotype Data Review Result  Reviewed 1224 databases, 978 collect phenotype, all stored in internal storages.  40 (4.1%) has formal terminology, 30 (3.1%) has formal coding.  959 (98%) store low-granularity phenotype data.  604 (62%) were curated by experts.  534 (54.6%) store single phenotype data, 444 (45.5%) store multiple phenotype data.  757 (77.4%) store phenotypes on case basis, 221 (22.6%) on variant basis.  Database: Database familyNumberPlatform LOVD614MySQL UMD134D SQL DB 63% of databases are LOVD PlatformNumber MySQL DB617 Web page table form209 Web page free text132 4D SQL DB13 PDF table form4 Excel table form2 Web page bar chart1

10 Phenotype Data Review Result  Phenotype samples: Sample 1: ‘MRX’, ‘ARRP’, ‘AMD’, ‘arCRD’, ‘CIPA or HSN IV (H406Y + G613V are polymorphisms)’, ‘Type I, type II, non syndromic recessive’ Sample 2: ‘Failure to thrive; Pneumocystis carinii pneumonia; Diarrhea; Marked lymphopenia’ Sample 3:

11 The openEHR Archetypes Review Result  Reviewed 283 existing openEHR archetypes  Multilingual translation mechanism  Term binding mechanism CriteriaResult Number of terms7361 Number of term bindings94 Coding systemSNOMED-CT, LOINC Has term binding7 (0.24% archetypes) Has multilingual translations83 (29.3% archetypes) Languages English, German, Arabic, Portuguese, Japanese, Russian, Dutch, Chinese, Spanish, Farsi Compile failure14

12  Multilingual translation mechanism - example ontology terminologies_available = term_definitions = < … ["zh-cn"] = < items = <... ["at0004"] = < text = description = … ["de"] = < items = <... ["at0004"] = < text = description = <"Der höchste arterielle Blutdruck eines Zyklus - gemessen in der systolischen oder Kontraktionsphase des Herzens."> … ["en"] = < items = <... ["at0004"] = < text = description = > (ADL display) The openEHR Archetypes Review Result

13  Multilingual translation mechanism - compare

14  Term binding mechanism term_bindings = < ["SNOMED-CT"] = < items = < ["at0000"] = ["at0004"] = ["at0005"] = ["at0013"] = > (ADL display) The openEHR Archetypes Review Result

15 Phenotype Capture Experiment Result  The chosen sample:  The mapping of concepts:

16 Phenotype Capture Experiment Result  The openEHR archetypes mapping: Evaluation  Diagnosis Observation  Symptom Action  Treatment NO.ArchetypesEntry items 1openEHR-EHR-EVALUATION.problem-diagnosis.v1.adlDiagnosis 2openEHR-EHR-OBSERVATION.lab_test-full_blood_count.v1.adlPlatelet count 3openEHR-EHR-ACTION.procedure.v1.adlProcedure, Comments

17 Phenotype Capture Experiment Result  Phenotype capture snapshots:

18 Phenotype Capture Experiment Result  Phenotype capture snapshots:

19 Phenotype Capture Experiment Result  Phenotype capture snapshots:

20 Phenotype Capture Experiment Result  Phenotype capture snapshots:

21 A conceptual patient-centric EHR data warehouse schema

22 Conclusion  The research results have justified the hypotheses and have matched the expected outcomes  The openEHR standard is potentially suitable for storing clinical data, even for integrating health information systems.  The multilingual language mechanism and term binding mechanism are two strong evidences for semantic interoperability between heterogeneous systems.  We need international cooperation on managing the archetypes and completing a full set of archetypes for health concepts.  We need international agreement on choosing terminologies and enhancing the terminologies for resolving semantic conflicts.

23 Conclusion  The philosophy and the future A health care EHR integration architecture Archetype-ontology Cognitive IS Human friendly Robust, scalable, integrated Semantic interoperability Syntactic consistency Data modelling neutral Start from learning terms and concepts IS essentially for communication Ubiquitous information computing


Download ppt "Phenotype Capture in Genetic Variant Databases Peng Chen School of Computer and Information Science Supervisor: Dr Jan Stanek."

Similar presentations


Ads by Google