Download presentation
Presentation is loading. Please wait.
Published byRoy King Modified over 9 years ago
1
Domain agnostic tools for multi- scale/integrative sensor data analysis Joel Saltz MD, PhD Stony Brook University
2
Center for Comprehensive Informatics Integrative Biomedical Informatics Analysis Reproducible anatomic/functional characterization at fine level (Pathology) and gross level (Radiology) High throughput multi- scale image segmentation, feature extraction, analysis of features Integration of anatomic/functional characterization with multiple types of “omic” information Radiology Imaging Patient Outcome Pathologic Features “Omic” Data
3
Center for Comprehensive Informatics Overview Pathology Computer Aided Diagnosis Integrative analysis of tissue: pathology, radiology, ‘omics’ and outcome Management, query, analysis of integrative data High end Computing tools for multi-scale analysis Electronic health data: analytics, tools for Clinical phenotype characterization, population health
4
Pathology Computer Assisted Diagnosis Gurcan, Shamada, Kong, Saltz
5
Neuroblastoma Classification FH: favorable histology UH: unfavorable histology CANCER 2003; 98:2274-81 <5 yr Schwannian Development ≥50% Grossly visible Nodule(s) absent present Microscopic Neuroblastic foci absent present Ganglioneuroma (Schwannian stroma-dominant) Maturing subtype Mature subtype Ganglioneuroblastoma, Intermixed (Schwannian stroma-rich) FH Ganglioneuroblastoma, Nodular (composite, Schwannian stroma-rich/ stroma-dominant and stroma-poor) UH/FH* Variant forms* None to <50% Neuroblastoma (Schwannian stroma-poor) Poorly differentiated subtype Undifferentiated subtype Differentiating subtype Any age UH ≥200/5,000 cells Mitotic & karyorrhectic cells 100-200/5,000 cells <100/5,000 cells Any age ≥1.5 yr <1.5 yr UH FH ≥200/5,000 cells 100-200/5,000 cells <100/5,000 cells Any ageUH ≥1.5 yr <1.5 yr ≥5 yr UH FH UH FH
6
Computerized Classification System for Grading Neuroblastoma Background Identification Image Decomposition (Multi- resolution levels) Image Segmentation (EMLDA) Feature Construction (2 nd order statistics, Tonal Features) Feature Extraction (LDA) + Classification (Bayesian) Multi-resolution Layer Controller (Confidence Region) No Yes Image Tile Initialization I = L Background? Label Create Image I (L) Segmentation Feature Construction Feature Extraction Classification Segmentation Feature Construction Feature Extraction Classifier Training Down-sampling Training Tiles Within Confidence Region ? I = I -1 I > 1? Yes No TRAINING TESTING
8
Center for Comprehensive Informatics INTEGRATIVE ANALYSIS OF TISSUE: PATHOLOGY, RADIOLOGY, ‘OMICS’ AND OUTCOME
9
Quantitative Feature Analysis in Pathology: Emory In Silico Center for Brain Tumor Research (PI = Dan Brat, PD= Joel Saltz)
10
Using TCGA Data to Study Glioblastoma Diagnostic Improvement Molecular Classification Predictors of Progression
11
Digital Pathology Neuroimaging TCGA Network
12
Center for Comprehensive Informatics Morphological Tissue Classification Nuclei Segmentation Cellular Features Lee Cooper, Jun Kong Whole Slide Imaging
13
Millions of Nuclei Defined by n Features Top-down analysis: use the features with existing diagnostic constructs
14
TCGA Whole Slide Images Jun Kong Step 1: Nuclei Segmentation Identify individual nuclei and their boundaries
15
Nuclear Analysis Workflow Describe individual nuclei in terms of size, shape, and texture Step 2: Feature Extraction Step 1: Nuclei Segmentation
16
Oligodendroglioma Astrocytoma Nuclear Qualities 110 Step 3: Nuclei Classification
17
Survival Analysis Human Machine
18
Gene Expression Correlates of High Oligo-Astro Ratio on Machine-based Classification Oligo Related Genes Myelin Basic Protein Proteolipoprotein HoxD1 Nuclear features most Associated with Oligo Signature Genes: Circularity (high) Eccentricity (low)
19
Millions of Nuclei Defined by n Features Bottom-up analysis: let nuclear features define and drive the analysis
20
Center for Comprehensive Informatics Lee Cooper, Carlos Moreno
21
Center for Comprehensive Informatics Consensus clustering of morphological signatures Study includes 200 million nuclei taken from 480 slides corresponding to 167 distinct patients Each possibility evaluated using 2000 iterations of K- means to quantify co-clustering Nuclear Features Used to Classify GBMs
22
Center for Comprehensive Informatics Clustering identifies three morphological groups Analyzed 200 million nuclei from 162 TCGA GBMs (462 slides) Named for functions of associated genes: Cell Cycle (CC), Chromatin Modification (CM), Protein Biosynthesis (PB) Prognostically-significant (logrank p=4.5e-4)
23
Center for Comprehensive Informatics Associations
24
Molecular and Pathology Correlates of MR Features Using TCGA Data MRIs of TCGA GBMs reviewed by 3-6 neuroradiologists using VASARI feature set and In Vivo Imaging tools MR Features compared to TCGA Transcriptional Classes, Genetic Alterations and Pathology NCI/in silico group led by Adam Flanders
25
VASARI Feature Set
26
26 Principal Investigator and Director: Haian Fu Co-Directors: Fadlo R. Khuri, Joel Saltz Project Manager: Margaret Johns Aim 1 Leader Yuhong Du Aim 2 Leader Carlos Moreno Cancer genomics- based HT PPI network discovery & validation Genomics informatics and data integration Emory CTD2 Center: High throughput protein-protein interaction interrogation in cancer Winship Cancer Institute Center for Comprehensive Informatics Emory Chemical Biology Discovery Center Emory Molecular Interaction Center for Functional Genomics (MicFG)
27
Center for Comprehensive Informatics MULTI-SCALE IMAGING: INTEGRATED STRUCTURE AND MOLECULAR CHARACTERIZATION Rich morphological and molecular characterizations of macroscopic tissue samples at microscopic resolution
29
Quantum Dot Immunohistochemistry, LCM + NGS, Imaging Mass Spec Imaging Excellent Spatial Resolution Limited Molecular Resolution Genomics Excellent Molecular Resolution Limited Spatial Resolution 1000’s of genes
30
Center for Comprehensive Informatics Integrative Multi-scale Biomedical Informatics Quantitative analyses of the interplay between morphology and spatially mapped genetics and molecular data to be used in studies that predict outcome and response to treatment Assemble, visualize and quantify detailed, multi- scale descriptions of tissue morphologic changes originating from a wide range of microscopy instruments Create/adapt computational and pattern recognition tools to integrate these descriptions with corresponding genomic, proteomic, glycomic, and clinical signatures.
31
Center for Comprehensive Informatics Driving Biomedical Problems Human: Lung Cancer Heterogeneity and Targeted Therapy (Khuri, Marcus) Human: Gastrointestinal Cancer Risk Stratification and Prevention (Bostick, Baron) Human and Mouse model: Glioma Microenvironment and Systems Biology (Brat, Mikkelsen) Mouse model: Role of PTEN in the orchestrated sequence of events, leading to tumor initiation (Leone) Mouse model: Role of Tn, STn tumor antigens in cancer initiation and progression, the impact of tissue-type specific alternations in Cosmc and the impact of altered expression of T-synthase (Cummings)
32
Correlating Imaging Phenotypes with Genomic Signatures: Scientific Opportunities Tumor heterogeneity Multiple definitions: Genetic, epigenetic heterogeneity within tumor Differences in microenvironments within tumor Phenome differences within tumor Heterogeneity involving primary and metastases Characterization: Imaging phenotype (radiology, pathology, optical…) Molecular phenotype Spatially characterized molecular phenotype (Laser captured microdissection, imaging mass spec, molecular imaging) …
33
Correlating Imaging Phenotypes with Genomic Signatures: Scientific Opportunities Clinical Approach and Use Development of imaging+analysis methods to characterize heterogeneity within a tumor at one time point evolution over time among different tumor types Development of imaging metrics that: can predict and detect emergence of resistance? correlates with genomic heterogeneity? correlates with habitat heterogeneity? can identify more homogeneous sub-types
34
Center for Comprehensive Informatics MANAGEMENT, QUERY, ANALYSIS OF INTEGRATIVE DATA Radiology Imaging Patient Outcome Pathologic Features “Omic” Data
35
Center for Comprehensive Informatics Large Scale Spatial Query, Analysis and Data Management Highly optimized spatial query and analyses Hadoop/HDFS, IBM DB2, optimized CPU/GPU spatial algorithms Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc. Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships Supported by two NLM R01 grants – Saltz/Foran PAIS Database Implemented with IBM DB2 for large scale pathology image metadata (~million markups per slide) Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc. Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships Support for high-level data statistical analysis
36
Center for Comprehensive Informatics Spatial Centric – Pathology Imaging “GIS” Point query: human marked point inside a nucleus. Window query: return markups contained in a rectangle Spatial join query: algorithm validation/comparison Containment query: nuclear feature aggregation in tumor regions Fusheng Wang
37
Center for Comprehensive Informatics VLDB 2012, 2013 Spatial Query, Change Detection, Comparison, and Quantification
38
Center for Comprehensive Informatics HIGH END COMPUTING TOOLS FOR MULTI-SCALE ANALYSIS Partnership with Oak Ridge National Laboratory (collaborators -- Scott Klasky, Jeff Vetter ) Also, aka Big Data
39
Macroscopic 3-D Tissue at Micron Resolution: OSU BISTI NBIB Center Big Data (2005) Associate genotype with phenotype Big science experiments on cancer, heart disease, pathogen host response Tissue specimen -- 1 cm 3 0.3 μ resolution – roughly 10 13 bytes Molecular data (spatial location) can add additional significant factor; e.g. 10 2 Multispectral imaging, laser captured microdissection, Imaging Mass Spec, Multiplex QD Multiple tissue specimens; another factor of 10 3 Total: 10 18 bytes – exabyte per big science experiment
40
Center for Comprehensive Informatics Integrate Information from Sensors, Images, Cameras Multi-dimensional spatial-temporal datasets – Radiology and Microscopy Image Analyses – Oil Reservoir Simulation/Carbon Sequestration/Groundwater Pollution Remediation – Biomass monitoring and disaster surveillance using multiple types of satellite imagery – Weather prediction using satellite and ground sensor data – Analysis of Results from Large Scale Simulations – Square Kilometer Array – Google Self Driving Car Correlative and cooperative analysis of data from multiple sensor modalities and sources Equivalent from standpoint of data access patterns – we propose a integrative sensor data mini-App
41
Center for Comprehensive Informatics Core Transformations Data Cleaning and Low Level Transformations Data Subsetting, Filtering, Subsampling Spatio-temporal Mapping and Registration Object Segmentation Feature Extraction Object/Region/Feature Classification Spatio-temporal Aggregation Change Detection, Comparison, and Quantification
42
Center for Comprehensive Informatics Runtime Support Objectives - (Similar to what is required for most applications discussed today!) Coordinated mapping of data and computation to complex memory hierarchies Hierarchical work assignment with flexibility capable of dealing with data dependent computational patterns, fluctuations in computational speed associated with power management, faults Linked to comprehensible programming model – model targeted at abstract application class but not to application domain (In the sensor, image, camera case -- Region Templates) Software stack including coordinated compiler/runtime support/autotuning frameworks
43
Center for Comprehensive Informatics HPC Segmentation and Feature Extraction Pipeline Tony Pan, George Teodoro, Tahsin Kurc and Scott Klasky
44
Center for Comprehensive Informatics ELECTRONIC HEALTH DATA: ANALYTICS, TOOLS FOR CLINICAL PHENOTYPE CHARACTERIZATION, POPULATION HEALTH Andrew Post, Sharath Cholleti, Doris Gao, Joel Saltz, Bill Bornstein Emory David Levine, Sam Hohmann, UHC
45
Center for Comprehensive Informatics Find hot spots in readmissions within 30 days –Fraction of patients with a given principal diagnosis will be readmitted within 30 days? –Fraction of patients with a given set of diseases will be readmitted within 30 days? –How does severity and time course of co-morbidities affect readmissions? –Geographic analyses Compare and contrast with UHC Clinical Data Base –Repeat analyses across all 180+ UHC hospitals –Hospital to hospital differences –Ability to predict readmissions across hospitals Need a repeatable process that we can apply identically to both local and UHC data Clinical Phenotype Characterization and the Emory Analytic Information Warehouse
46
Center for Comprehensive Informatics 5-year Datasets from Emory and University Healthcare Consortium EUH, EUHM and WW (inpatient encounters) Removed encounter pairs with chemotherapy and radiation therapy readmit encounters (CDW data) Encounter location (down to unit for Emory) Providers (Emory only) Discharge disposition Primary and secondary ICD9 codes Procedure codes DRGs Medication orders (Emory only) Labs (Emory only) Vitals (Emory only) Geographic information (CDW only + US Census and American Community Survey) Analytic Information Warehouse
47
Center for Comprehensive Informatics Geographic Analyses UHC Medicine General Product Line (#15) Analytic Information Warehouse
48
Center for Comprehensive Informatics Predictive Modeling for Readmission Random forests (ensemble of decision trees) –Create a decision tree using a random subset of the variables in the dataset –Generate a large number of such trees –All trees vote to classify each test example in a training dataset –Generate a patient-specific readmission risk for each encounter Rank the encounters by risk for a subsequent 30- day readmission Analytic Information Warehouse
49
Center for Comprehensive Informatics Emory Readmission Rates for High and Low Risk Groups Generated with Random Forest
50
Center for Comprehensive Informatics Predictive Modeling Applied to 180 UHC Hospitals Readmission fraction of top 10% high risk patients
51
Center for Comprehensive Informatics Summary and Perspective Large scale integrative data analytic methods and tools to integrate clinical, molecular, Pathology, Radiology data (happy to discuss Radiology aspects off line) Characterize new cancer subtypes and biomarkers, predict outcome, treatment response Algorithms to quantify Pathology classification HPC/BIGDATA analysis pipelines Generate and manage nuanced temporal summary of patients health status, co-morbidities, treatment, treatment response
52
Center for Comprehensive Informatics Importance: Biomedical: generate basic insights into pathophysiology, clues to new treatments, better ways of evaluating existing treatments and core infrastructure needed for comparative effectiveness research studies Computer Science: general approaches to analysis and classification of very large datasets from low dimensional spatio-temporal sensors
53
Center for Comprehensive Informatics Thanks to: In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish Sharma, Tony Pan, David Gutman, Jun Kong, Sharath Cholleti, Carlos Moreno, Chad Holder, Erwin Van Meir, Daniel Rubin, Tom Mikkelsen, Adam Flanders, Joel Saltz (Director) caGrid Knowledge Center: Joel Saltz, Mike Caliguiri, Steve Langella co- Directors; Tahsin Kurc, Himanshu Rathod Emory leads caBIG In vivo imaging team: Eliot Siegel, Paul Mulhern, Adam Flanders, David Channon, Daniel Rubin, Fred Prior, Larry Tarbox and many others In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz Emory ATC Supplement team: Tim Fox, Ashish Sharma, Tony Pan, Edi Schreibmann, Paul Pantalone Digital Pathology R01: Foran and Saltz; Jun Kong, Sharath Cholleti, Fusheng Wang, Tony Pan, Tahsin Kurc, Ashish Sharma, David Gutman (Emory), Wenjin Chen, Vicky Chu, Jun Hu, Lin Yang, David J. Foran (Rutgers) NIH/in silico TCGA Imaging Group: Scott Hwang, Bob Clifford, Erich Huang, Dima Hammoud, Manal Jilwan, Prashant Raghavan, Max Wintermark, David Gutman, Carlos Moreno, Lee Cooper, John Freymann, Justin Kirby, Arun Krishnan, Seena Dehkharghani, Carl Jaffe ACTSI Biomedical Informatics Program: Marc Overcash, Tim Morris, Tahsin Kurc, Alexander Quarshie, Circe Tsui, Adam Davis, Sharon Mason, Andrew Post, Alfredo Tirado-Ramos NSF Scientific Workflow Collaboration: Vijay Kumar, Yolanda Gil, Mary Hall, Ewa Deelman, Tahsin Kurc, P. Sadayappan, Gaurang Mehta, Karan Vahi
54
Center for Comprehensive Informatics The AIW Team Stakeholders –Joel Saltz, MD, PhD – CCI Director and ACTSI BIP Director –William Bornstein, MD, PhD – Emory Healthcare Chief Quality Officer –Dee Cantrell, RN – Emory Healthcare CIO –Marc Overcash – Emory Deputy CIO of Research and Health Sciences IT Project Team –Andrew Post, MD, PhD – AIW Project Lead & CCI Clinical Informatics Architect –Terry Willey, RN – IS Director of Business Strategy/Planning –Richie Willard – Project Manager –Tahsin Kurc, PhD – CCI Chief Software Architect –Sharath Cholleti, PhD – Research Scientist –Jingjing Gao, PhD – Biostatistician –Michel Mansour – Software Engineer –Himanshu Rathod – Software Engineer –Mike Torian – Data Warehouse Engineer –Michael Brown – Software Engineer –Geoff Milton – Software Engineer –Akshatha Kalsanka Pai – Software Engineer Analytic Information Warehouse
55
Thanks!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.