Presentation is loading. Please wait.

Presentation is loading. Please wait.

Informatics and the caTissue Wrapper for the Early Detection Research Network Chris A. Mattmann, Ph.D. Senior Computer Scientist Instrument Software/ Science.

Similar presentations


Presentation on theme: "Informatics and the caTissue Wrapper for the Early Detection Research Network Chris A. Mattmann, Ph.D. Senior Computer Scientist Instrument Software/ Science."— Presentation transcript:

1 Informatics and the caTissue Wrapper for the Early Detection Research Network Chris A. Mattmann, Ph.D. Senior Computer Scientist Instrument Software/ Science Data Systems Section NASA Jet Propulsion Laboratory Adjunct Assistant Professor Computer Science Department University of Southern California PMC Member Lucene Project Apache Software Foundation

2 Agenda Introduction What is the EDRN Why is JPL involved? The EDRN Informatics Center ERNE - EDRN Specimen Network Exchange Building the EDRN Knowledge Environment Questions 11-Jun-16CAM-2caBIG-May2010

3 The Early Detection Research Network EDRN is a network of 40+ institutions all performing research geared towards the discovery of cancer biomarkers, which are early indicators of onset of disease NCI/NIH funded program NCI’s flagship program Recently renewed after NIH Board of Scientific Advisors advised NIH of the strategic value in EDRN and the existing successes A distributed informatics infrastructure, connecting EDRN sites is at the core of the program 11-Jun-16CAM-3caBIG-May2010

4 Types of information important in EDRN Specimen Inventories –At each site, maps specimens collected (blood, sputum, etc.) to patient characteristics Studies and Protocols –Information about studies conducted in the EDRN and results (publications, outputs) Biomarkers –Information about indicators of early disease Science Data –Outputs of experiments on specimens, regarding biomarkers, driven by particular studies and protocols 11-Jun-16CAM-4caBIG-May2010

5 11-Jun-16caBIG-May2010CAM-5 NASA’s Jet Propulsion Laboratory (JPL) Is a major national research and development (R&D) center supporting: NASA programs Defense programs Civil programs of national importance compatible with JPL capabilities Currently 5000 employees located in Pasadena, CA on 177 Acres Develops data and software- intensive systems for mission and science programs

6 11-Jun-16caBIG-May2010CAM-6 Where does JPL fit into NCI and Cancer Research? In 2001, JPL was hired in a consultant role by the Office of the Director at the U.S. NIH to support development of an informatics roadmap for biomarkers EDRN was seen as a pilot Required a national architecture for bio-specimen sharing within NCI’s Early Detection Research Network (EDRN) program After success in linking together specimen databases from 9 sites distributed across the U.S., JPL and NCI created an interagency agreement and made JPL the PI for EDRN’s informatics center From this, the EDRN Informatics Center was born

7 The EDRN Informatics Center’s Role The EDRN Informatics Center develops software solutions to support the Early Detection Research Network's research of cancer biomarkers and development of cancer-fighting tools. Works closely with the EDRN Data Management and Coordinating Center (DMCC) at Fred Hutchinson Cancer Research Center and with NCI on informatics Defines the informatics architecture Deploys a national data grid for connecting distributed databases Develops the biomarker ontology Develops databases for biomarker research results 11-Jun-16CAM-7caBIG-May2010

8 11-Jun-16caBIG-May2010CAM-8 The original ERNE prototype ERNE EDRN Resource Network Exchange Bio-specimen data grid system allowing users to search for bio- specimens located at distributed repositories throughout the U.S. Moffitt Cancer Center, Texas, USA Fred Hutchinson Cancer Research Center, Seattle, USA NYU Medical School And 12 others Large success for EDRN and JPL ERNE was constructed using JPL’s Object Oriented Data Technology, or OODT, framework Worked in large part due to the set of Common Data Elements (CDEs) ERNE was a pioneering project in terms of connecting highly distributed databases

9 11-Jun-16caBIG-May2010CAM-9 What does OODT do? A data grid software infrastructure for constructing large-scale, distributed data-intensive systems Java, C++, Perl, Python APIs Software available via the Apache Software Foundation (ASF) Currently in the Apache Incubator NASA’s first project to be hosted at Apache http://incubator.apache.org/projects/oodt.html A set of canonical software components, connectors, and styles for designing data-intensive systems Deployed to planetary, earth sciences, biomedicine Runner-up NASA Software of the Year (2003)

10 11-Jun-16caBIG-May2010CAM-10 OODT Architectural Principles Division of Labor Don’t make one component the workhorse! Technology Independence Don’t lose out when a software vendor decides to charge you a lot of $$$ for their previously low cost technology Metadata as a first-class citizen Descriptions of resources come in handy Separation of software and data models Allow each to evolve independently

11 11-Jun-16caBIG-May2010CAM-11 How did we use OODT to build initial ERNE prototype?

12 Integrating EDRN and non-EDRN Infrastructures The decoupling of the EDRN’s informatics architecture into well-defined components via OODT easily allows for building interfaces to non-EDRN systems Wrappers can be built to link non-EDRN systems Translators can be constructed to deal with different semantic architectures In the case of ERNE, a simple product server can negotiate between the non-EDRN system and EDRN caBIG ERNE/caTissue Wrapper EDRN-Canary Collaboration A cloud computing effort that shares raw science data via Amazon S3 between EDRN and the Canary group (using GenoLogics)

13 caTissue Wrapper Recognized a need that sites were using caTissue to manage their specimens UCSF (Esserman) was a pioneer in this regard Decided to build a reusable interface to easily export information from caTissue to ERNE via a wrapper interface Plug-in to an ERNE product server Leveraged the EDRN CDEs (part of the EDRN ontology described earlier) 26 in total Categories Specimen characteristics Participant information Demographics as well as 1st degree relatives Site-specific details

14 ERNE CDE/Query Model Any CDE may be used in an ERNE query Any CDE may be returned from an ERNE query One row per matching specimen

15 caTissue Suite Object oriented Class CATISSUE_SPECIMEN Subclasses CATISSUE_FLUID_SPECIMEN, _CELL_SPECIMEN, etc. Class CATISSUE_PARTICIPANT

16 caTissue to ERNE mapping process For each ERNE CDE: Identify corresponding field(s) in class(es) in caTissue Suite model Create two functions: To map a query constraint from ERNE to caTissue Suite To map a return value from caTissue Suite to ERNE

17 caTissue ERNE mapping status -Only 3 null mappings -May be fixed by site- specific mappings -Poor mappings being explored as potential options for site- specific mappings -All mappings recorded independent of the software and can be easily upgraded via configuration and XML files

18 11-Jun-16caBIG-May2010CAM-18 The EDRN Knowledge Environment ERNE was a great initial success, but rising needs for managing more than just specimen information within ERNE Need to manage Biomarker information Raw Science Data Protocol and Study tracking information Ask ourselves the question: How can we leverage our experience with OODT to build a knowledge environment for EDRN data?

19 11-Jun-16caBIG-May2010CAM-19 Knowledge Environment Design Manages Science Data Manages Specimen Data Manages Biomarker Status Manages Study Data

20 Turning EKE into a Semantic Grid

21 EDRN Public Portal The public face of EDRN New look and feel New google-like search Multi-level security Biomarkers and Data integrated through-out Data is published via RDF 11-Jun-16caBIG-May2010CAM-21

22 EDRN applications and the Portal 11-Jun-16caBIG-May2010CAM-22

23 EDRN Pioneering Accomplishments 11-Jun-16caBIG-May2010CAM-23 National Data Sharing architecture for specimens UCSD latest site to go operational 189,635 available specimens as of Feb 2009 Biomarker database is in place 60+ biomarkers from EDRN captured with deep content EDRN Catalog and Archive Service is in place 27+ data sets and 1500+ products captured from EDRN PLCO/Ovarian data is latest data that is being captured Data ingestion process and mechanisms defined “Google” for EDRN biomarker data is fully integrated Text-based searching of study, biomarker, publications, etc information EDRN-wide multi-level security in place A distributed, semantic architecture is in place Successful pilot with data stored in the “cloud”

24 Get involved 11-Jun-16CAM-24caBIG-May2010 http://www.facebook.com/group.p hp?gid=56938589930 http://twitter.com/edrn_ic http://cancer.jpl.nasa.gov http://edrn.jpl.nasa.gov/portal3.0

25 Acknowledgements NCI –Christos Patriotis –Sudhir Srivastava Data Management and Coordinating Center (DMCC) at Fred Hutchinson Cancer Research Center Mark Thornquist, Suzanna Reid, Jackie Dahlgren Dartmouth University Kristen Anton EDRN IC –Dan Crichton (PI), Andrew Hart, Steve Hughes, Heather Kincaid, Sean Kelly, John Tran, Thuy Tran 11-Jun-16CAM-25caBIG-May2010

26 Questions If you want more information, please contact: Chris.Mattmann@jpl.nasa.gov EDRN Informatics Center Development Lead Dan.Crichton@jpl.nasa.gov EDRN Informatics Center PI 11-Jun-16CAM-26caBIG-May2010


Download ppt "Informatics and the caTissue Wrapper for the Early Detection Research Network Chris A. Mattmann, Ph.D. Senior Computer Scientist Instrument Software/ Science."

Similar presentations


Ads by Google