Health Information - Retrieval, Analysis and Archival on cloud Dr. K. Sudheer, Sandeep Kunkunuru, Pradeep Kandru, Balaji Chopparapu
Agenda Data collection - Techniques and tools Data cleansing, linking, transforming, enriching - Techniques and tools Access and collaboration - Patterns Data analytics and Machine Learning Standards Data lifecycle Cloud native architecture Q & A References
Data Collection - Techniques and Tools Public/Open/Government data sources ex: clinicaltrials.gov, PubMed Bulk download from source sites Web scraping individual records using html parsers API - collect data on need/demand Private/Hospital/Point-of-Care data sources ex: ultrasound devices, MRI, PET/CT Scheduled direct data sync to cloud storage services Event driven processing - push da to API API - collect data on demand Initial bulk data collection - Database dumps, file storage, Scan/OCR, Data entry operations etc. Direct upload from patient through mobile and/or email Upload - case sheets, Diagnostic test results, Family history reports, Questionnaire responses Passive data collection Wearables
Data Linking, Preparation - Techniques and Tools Strong ids - email-id, phone number, aadhar/government-id number, patient registration number, Weak ids - full name. Additional ids - trial id, referring physician’s email. NLP techniques to identify ontology terms and corresponding codes. Parse metadata to extract ids ex: from a DICOM image Image/Video/Audio segmentation
Access, Reporting and Collaboration - Patterns Direct web access to the cloud storage services Time bound, permission based access sharing ex: Read-only Read-only with permissions to annotate Bookmark content, generate permalinks, share them, gather comments/annotations Notify all stakeholders as changes are collected.
Data Analytics - Techniques and Tools SQL based interactive queries Descriptive statistics using tools like R Predictive analytics using Machine Learning services
Standards Storage - DICOM Access & Interoperability - FHIR Exchange - HL7 v2(EDI/X12), HL7 v3, LOINC, CDISC-ODM Stacks - SMART on FHIR Ontologies - ICD10, Snomed CT, UMLS, MeSH
Data Lifecycle - Needs and Considerations Data Collection→ Data Cleansing → Data linking/preparation/enrichment → Data Processing/Analysis → Access, report, collaborate → Data Archival Data collection - get the data from external data sources Data cleansing - audit and filter the data. Report outliers. Data preparation - integrate the various data sets, enrich and transform them into a core data model. Data processing - run interactive query, generate summary statistics, machine learning algorithms based on statistical models Access, report, collaborate - generate off-the-shelf and custom data analysis reports based on processed data and/or core data model Archive/Purge data - based on data retention policy archive the data and/or based on contractual agreements purge the data.
Architecture
Q & A
References
Papers SMART - The Substitutable Medical Applications, Reusable Technologies Platform. Paper 1 : The SMART Platform: early experience enabling substitutable applications for electronic health records Paper 2 : SMART on FHIR: a standards-based, interoperable apps platform for electronic health records FHIR - An illustrative usage in Radiology AWS - Reference Architecture for Healthcare Data AWS - Architecting for HIPAA in the cloud