Joel Saltz MD, PhD Director Center for Comprehensive Informatics

Joel Saltz MD, PhD Director Center for Comprehensive Informatics
Data Archive Issues: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data Joel Saltz MD, PhD Director Center for Comprehensive Informatics

Objectives of Integrative Analysis
Create categories of jointly classified data to describe pathophysiology, predict prognosis, response to treatment Reproducible anatomic/functional characterization at gross level (Radiology) and fine level (Pathology) Integration of anatomic/functional characterization with multiple types of “omic” information Data modeling standards, semantics Curation and propagation of results

Informatics Approach Integrative analysis of complementary data sources to improve ability to forecast survival & response Leverage public “omic” datasets Create, analyze microscopy and Radiology datasets Radiology Imaging Patient Outcome “Omic” Data Pathologic Features

In Silico Center for Brain Tumor Research
Specific Aims: Influence of necrosis/ hypoxia on gene expression and genetic classification. Molecular correlates of high resolution nuclear morphometry. Gene expression profiles that predict glioma progression. 4. Molecular correlates of MRI enhancement patterns.

In Silico Center for Brain Tumor Research
Key Public Data Sets REMBRANDT: Gene expression and genomics data set of all glioma subtypes The Cancer Genome Atlas (TCGA): Rich “omics” set of GBM, digitized Pathology and Radiology Vasari Feature Set: Standardized annotation of gliomas of all subtypes

TCGA Research Network Digital Pathology Neuroimaging

Pathology Image Analysis and in Silico Example: Distinguishing Among the Gliomas
“There are also many cells which appear to be transitions between gigantic oligodendroglia and astrocytes. It is impossible to classify them as belonging in either group” Bailey P, Bucy PC. Oligodendrogliomas of the brain. J Pathol Bacteriol 1929: 32:735

Nuclear Qualities Oligodendroglioma Astrocytoma

Interaction between gene expression classification and Pathology Characterization
Hierarchical clustering of 176 Rembrandt samples using TCGA classification genes defines four major subtypes. Proneural Neural Mesenchymal Classical (Lee Cooper and Carlos Moreno)

Predicting Recurrence/Survival
Lee Cooper Carlos Moreno

TCGA: GBM Frozen Sections
179 cases were assessed for % necrosis on frozen section slides for TCGA quality assurance. Cox-based regression analysis for % necrosis vs. gene expression (795 probe sets; 647 distinct genes)

Network Analysis based on % Necrosis
Carlos Moreno David Gutman

Identify correlates of MRI enhancement patterns in astrocytic neoplasms with underlying vascular changes and gene expression profiles. Rim-enhancement Vascular Changes Rapid progression No enhancement Normal Vessels Stable lesion ?

Angiogenesis Segmentation
H&E Image Color Deconvolution Hematoxylin Eosin Eosin intensity image Eosin Image Spatial Norm. Density Image Calculation Boundary Smoothing Density Image Object ID Segmented Vessels Angiogenic Segmentation

Sharing and Archiving Multi-scale Integrative In Silico Experiments
Metadata Identifiers Federated Archives Semantic Query Support 9/17/20189/17/2018

Metadata Precisely and unambiguously describe an in silico experiment (reproducibility and reuse for new analyses) Experiment design and protocol Input datasets: context, characteristics, provenance Analysis methods and workflows: at different granularities Analysis results Semantic information: datasets, results, workflows Standards based, common data elements, and controlled vocabularies Versioned, scalable, and computable semantics Different metadata models for different data types Linkages through backbone models, common attributes, ontologies 9/17/20189/17/2018

Metadata: Semantic Information
Data and semantic models to represent virtual slide related image, annotation, markup and feature information. context relating to patient data, specimen preparation, special stains etc, human observations involving Path classification and characteristics and algorithm and human-described segmentations, features and classifications. Semantic description of the computation being carried out, semantic and concrete identification of input and output datasets. Desirable to have computable semantics 9/17/20189/17/2018

Identifiers Databases of in silico experiments may be updated over time Data may be moved to different archives Globally uniquely identify an in silico experiment and its components Should be possible to validate an identifier Scalable mechanisms for generation, assignment, and resolution of identifiers Leverage standards such as IHE PIX, LSID, URI 9/17/20189/17/2018

Federated Archives May not be possible to store in silico experiments and its components in a centralized location Multiple types of input data and results may be hosted in different systems caArray AIM and PAIS services PACS and NBIA Interoperability is critical Common interfaces Common information models: common data elements and controlled vocabularies 9/17/20189/17/2018

Semantic Query Capabilities
Ability to query metadata on experiments Ability to query data and analysis results Analysis results are semantically complex Features, shapes, annotations Use of domain ontologies Scalable mechanisms for query Large volumes of analysis results Millions of nuclei in high-resolution microscopy images 9/17/20189/17/2018

Challenges Structural and semantic metadata management: how to manage tradeoff between flexibility and curation Semantic modeling infrastructures and policies able to scale to handle distributed systems with an aggregate of 109 or more data models/concepts. How to curate changes to ontologies/data models and how to curate instance data in face of changing semantic models

Challenges Describe, store and replay analyses
Modeling needs to include a semantic description of the computation being carried out, curation/semantic description of input and output datasets. Identifiers and version control Computer assisted annotation and markup for very large datasets - large collection of cascaded algorithm/parameter dependencies challenge reproducibility

Thanks to: In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish Sharma, Tony Pan, David Gutman, Jun Kong, Sharath Cholleti, Carlos Moreno, Chad Holder, Erwin Van Meir, Daniel Rubin, Tom Mikkelsen, Adam Flanders, Joel Saltz (Director) caGrid Knowledge Center: Joel Saltz, Mike Caliguiri, Steve Langella co-Directors; Tahsin Kurc, Himanshu Rathod Emory leads caBIG In vivo imaging team: Eliot Siegel, Paul Mulhern, Adam Flanders, David Channon, Daniel Rubin, Fred Prior, Larry Tarbox and many others In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz Emory ATC Supplement team: Tim Fox, Ashish Sharma, Tony Pan, Edi Schreibmann, Paul Pantalone Digital Pathology R01: Foran and Saltz; Jun Kong, Sharath Cholleti, Fusheng Wang, Tony Pan, Tahsin Kurc, Ashish Sharma, David Gutman (Emory), Wenjin Chen, Vicky Chu, Jun Hu, Lin Yang, David J. Foran (Rutgers)

Thanks!

Joel Saltz MD, PhD Director Center for Comprehensive Informatics

Similar presentations

Presentation on theme: "Joel Saltz MD, PhD Director Center for Comprehensive Informatics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Joel Saltz MD, PhD Director Center for Comprehensive Informatics

Similar presentations

Presentation on theme: "Joel Saltz MD, PhD Director Center for Comprehensive Informatics"— Presentation transcript:

Similar presentations

About project

Feedback