Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion Laboratory, California Institute of Technology, National Aeronautics and Space Administration
Page 2 Problem Definition and Proposal F Overview ä Problem: Specimen data is geographically distributed across heterogeneous data systems making the location, retrieval and use of this data difficult. ä Solution: Build a “data architecture” for the EDRN network F Use “metadata” as a key to interoperability F Provide services for data sharing, archiving and distribution F Provide a software framework that allows analysis tools to be plugged into the EDRN data enterprise ä Benefit: Correlating data across multiple centers affords an opportunity to create new data sets and data awareness F Example: Find all prostate tissue samples for men ages 70 and older collected before 1980 from databases across the EDRN
Page 3 EDRN Data Architecture Evolution Data System Evolution Local Database - Local Tools - No Data Sharing between Centers - No Common Data Elements Limited Data Sharing - Manual Data Sharing - Manual Correlation - Export/Import Data - Limited CDEs Full Data Sharing - Location Independence - Data Interchange - Data Sharing - Common CDEs between centers - Heterogeneous Systems Locally Centralized Data Interoperable & Distributed Databases
Page 4 Completed Steps for the Mockup Implementation F Extracted Data from Partner Centers ä Moffitt and San Antonio provided sample data sets to the DMCC and JPL ä Used “synthesized” data in lieu of “sensitive” data ä Preserved the original data structures provided by the centers F Mapped Data Dictionary Terms ä Mapped common models between the EDRN CDE, Moffitt and San Antonio for correlating data sets ä Developed “Profiles” that represent data resources for San Antonio, Moffitt, DMCC, EDRN and NCI F Hosted data and metadata “profiles” at JPL F Integrated with an existing data sharing software framework developed by JPL called “OODT” or Object Oriented Data Technology ä Framework developed to share space science datasets across NASA’s distributed Planetary Data System F Built a user interface to demonstrate a use case scenario for interoperability and data sharing between the databases
Page 5 Goals for the Mockup Implementation F Demonstrate the Return on Investment (ROI) achieved in “federating” (or linking) laboratory data systems together ä Identify a scenario that demonstrates usability such as providing generic support for specimen data location and retrieval F Use metadata (or profiles) ä “Recipes” to describe what data (specimen) and resources are available ä Communicate across systems F Adoption of EDRN CDEs ä Look for common models between systems ä Understand how to relate center-specific metadata models F Look for “low hanging” fruit ä Centers with similar databases and data models
Page 6 Query Manager EDRN Knowledge Architecture Mockup Implementation at JPL San Antonio MoffittMetadata Profiles EDRN Mock Databases Hosted at JPL San Antonio Product Exchange Server Moffitt Product Exchange Server In:Query Out::Identified Resources In:Query Out::Data Products In:Query Out::Data Products OODT Middleware: Hosted at JPL EDRN “Mock” Query Interface In:Query Out::Data Products
Page 7 Profile CDE Integration F Describe specimen data, data servers, and other resources using metadata “profiles” ä Use Common Data Element (CDE) set for specimen description and search attributes ä Use industry standard metadata terminology such as Dublin Core F Example Metadata Profiles: ä Mockup EDRN H. Lee Moffitt Cancer Center Product Server ä Mockup EDRN University of Texas, San Antonio Product Server ä Mockup EDRN DMCC Fred Hutchinson Cancer Research Center Query Interface ä Mockup EDRN DMCC Fred Hutchinson Cancer Research Center Web Site ä Early Detection Research Network Web Site ä EDRN Data Management and Coordinating Center Data Dictionary
Page 8 Data Element Comparison Chart * As of 12/5/2000
Page 9 User Interface F Provide a user interface to support various queries of related to cancer specimen data ( ): ä Find all prostate tissue samples for all men collected from San Antonio and Moffitt databases ä Find all prostate tissue samples for men ages 70 and older collected before 1980 from San Antonio and Moffitt databases sorted by Grade, Age, and Site ä Find all breast tissue samples from women ages 50 and older from San Antonio* and Moffitt databases ä Find all lung tissue samples from San Antonio and Moffitt databases * * San Antonio database contains just prostate
Page 10 Key Challenges F Local data dictionaries and associated data models ä Different terms, data types, enumerated values, etc ä Different meanings and interpretations F Different database product implementations ä Filemaker Pro and Microsoft Access ä Maintain the structural integrity of the data models F EDRN CDEs exist for demographic data, but not specimen data* ä JPL developed common CDEs between the two databases for the specimen data * As of 12/5/2000
Page 11 Next Steps F Focus the implementation of data sharing on defining a robust metadata infrastructure ä Complete the EDRN CDE effort and begin a process of mapping the CDEs to the center databases ä Reuse this mockup experience as an example! F Incorporate feedback from mockup presentation F Address IRB and security requirements related to data sharing ä Encrypted and de-identified keys ä Network and computer security access F Connect to databases physically located at the centers ä Implement data system interfaces to the remote databases
Page 12 Acknowledgements F Lynn Anderson, H. Lee Moffitt Cancer Center F Betsy Higgins, University of Texas, San Antonio F Heather Kincaid, Data Management and Coordinating Center, Fred Hutchinson Cancer Research Center F Mark Thornquist, Data Management and Coordinating Center, Fred Hutchinson Cancer Research Center F Ziding Feng, Data Management and Coordinating Center, Fred Hutchinson Cancer Research Center F Greg Downing, Office of Science Policy, Office of the Director, National Institute of Health F Sudhir Srivastava, National Cancer Institute
Page 13 Backup Slides
Page 14 EDRN Mockup Query Example
Page 15 EDRN Mockup Results – Query 1
Page 16 EDRN Mockup Results – Query 3
Page 17 EDRN Mockup Results – Query 4
Page 18 Detailed Search of Profiles
Page 19 Profiles of EDRN Resources EDRN Website Resource Profiles Moffitt Product Server San Antonio Product Server EDRN Resources San Antonio Mockup DB Moffitt Mockup DB DMCC Sample Interface EDRN Website DMCC Website DMCC Website DMCC Sample Interface Moffitt Product Server San Antonio Product Server
Page 20 EDRN Mockup Data Flow Query Server Profile Server jpl.edrn Product Server edrn.moffitt Product Server edrn.sanantonio User query XSL (profiles or data products formatted) XMLQuery/IIOP (no results) XMLQuery/IIOP (profiles or data results as requested) XMLQuery/IIOP (no results) XMLQuery/IIOP (profiles of resources to handle query) XMLQuery/IIOP (data results) XMLQuery/IIOP (product search) Search Web Page Profile DB Moffitt “Mock” Database San Antonio “Mock” Database QueryClientWeb server search.jsp Web EDRN/NCI Resources