Download presentation
Presentation is loading. Please wait.
1
Synthetic Data Working Group
2
Agenda Quick Review of Kickoff material
Example of the value of synthetic data Group exercise: what does success look like for this working group
3
Principles Why Synthetic Data Synthetic yet realistic High fidelity
Demographic fidelity Clinical fidelity Temporal fidelity Realistic population distribution Standards based, easily-ingestible records Highly configurable, profile based Repeatable process
4
Synthetic Data Framework
Use Cases Intended purpose, users and high-level requirements Target data scope Profiles Demographic distribution Disease or other clinical probabilities Generation Synthetic records based on data profiles Clinical fidelity (no 100 pound babies) Ingestion Ingestion mechanism and target database Retain clinical fidelity Utilization Static snapshots for viewers and dashboards Temporal distribution for workflow and dynamic processes
5
Example
6
Group Activity: What does success look like?
Objectives Metrics Characteristics Milestones High-level requirements
7
Intended purpose, users and high-level requirements Target data scope
Use Cases Intended purpose, users and high-level requirements Target data scope Define detailed data and usage requirements What data makes the Veteran (or other target groups) population unique (e.g., Agent Orange exposure and its relevant to comorbidities) – defining use cases and requirements for specific patient populations like Veterans or people from other parts of the world Need aging algorithms Cohorts by war area, geographic specific info (ebola outbreaks), geospatial theatre operations Occupational concerns (e.g., military working near jet fuels) Need additional data elements that are not always accounted for in an EMR – relationship status, financial status (e.g., determinants of health) – as they have impact on conditions like PTSD, depression and mental health and suicide prevention Appropriate terminology code sets (e.g., DSM 5 for mental health) – DSM 5 currently linked to ICD-10 codes for VA Non-clinical data: e.g., service connection, Mapping to the data sets and data models of target systems (e.g., VistA and Cerner) Pandemics
8
Demographic distribution Disease or other clinical probabilities
Profiles Demographic distribution Disease or other clinical probabilities Profile reference sources: US Census bureau - demographics CDC – disease statistics NIH – disease statistics Peer reviewed publications – disease statistics Combine disease statistics with “standards of care” to drive the application of care to the data probabilities Demographic distributions for age, gender, race, income, education that mirror the real-world distributions For target populations (e.g., Veterans), need to expand the data set to include items on the other slide – data.gov has good data sets for this type of information; search for “public health government” to get war specific morbidities Semantics – need to model relationships between morbidities or between morbidities and related information such as meds, procedures, findings, etc. Synthea: disease models based on extensive research (not easy) and then captured in statistical models Mining ontologies for relationships: value set authority from NLM (e.g., trauma value set), SOLOR, agency maps between terminologies, UMLS – meta-mapping between ontologies and terminology code sets, SNOMED browser/CSIRO, others: Medcin, 3M HDD, FDB Look for statistics for the data used in major scientific studies VA aggregated data sets (CDW) available to ask questions about morbidity percentages and the like
9
Synthetic records based on data profiles
Generation Synthetic records based on data profiles Clinical fidelity (no 100 pound babies)
10
Ingestion mechanism and target database Retain clinical fidelity
11
Static snapshots for viewers and dashboards
Utilization Static snapshots for viewers and dashboards Temporal distribution for workflow and dynamic processes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.