Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ICR IRWG Update Feb 16, 2012. 2 Information Representation Working Group (IRWG) The caBIG Integrative Cancer Research (ICR) Information Representation.

Similar presentations


Presentation on theme: "1 ICR IRWG Update Feb 16, 2012. 2 Information Representation Working Group (IRWG) The caBIG Integrative Cancer Research (ICR) Information Representation."— Presentation transcript:

1 1 ICR IRWG Update Feb 16, 2012

2 2 Information Representation Working Group (IRWG) The caBIG Integrative Cancer Research (ICR) Information Representation Working Group (IRWG) promotes the use of standards and models in the development of semantically harmonized informatics systems to support the domain of life sciences research. This entails: Identifying and communicating the semantic requirements of the cancer research community based on their use cases Identifying, evaluating, adopting and/or adapting existing approaches, standards, and vocabularies to represent concepts and data in life sciences research Working with other stakeholders on the development, maintenance, and documentation of the Life Sciences Domain Analysis Model

3 3 IRWG Membership Gilberto Fragoso (NCI CBIIT) Bob Freimuth (WG Lead) Elaine Freund (WS Lead) Jason Hipp Jenny Kelley Juli Klemm (NCI Coordinator) Jim McCusker Lisa Schick (Analyst) Mukesh Sharma Grace Stafford Baris Suzek

4 4 IRWG Report LS DAM Collaborations HL7 Clinical Genomics W3C Health Care & Life Sciences LS Ontology Pilot IRWG Recommendations

5 5 Life Sciences Domain Analysis Model (LS DAM) Focused on hypothesis-driven and discovery science in the LS domain Foundational component for achieving semantic interoperability Aligned, on touch points, with the BRIDG model Priorities for each iteration driven by needs of community Based on use cases and input by subject matter experts

6 6 LS DAM Core Components Experiment Conceptual representation of the process of experimentation in both hypothesis driven and discovery research in the Life Sciences domain Participation of HL7 CG WG Molecular Biology Modeling of the molecular components of cells and organisms, such as nucleic acids and proteins, and the identifying information held in databases for those components Specimen Modeling of the collection, processing and storage of physical substances originally obtained from a biological entity Pathology Imaging Modeling the activities that occur in the context of pathology imaging Common Classes that represent common entities (e.g., Person, Organization)

7 7 LS DAM – History Release 1.0 - July 2009 Initial release Release 1.1 – October 2009 Release 1.2 – February 2010 Release 2.0 – October 2010 Experiment Core (collaboration with HL7 Clinical Genomics WG) Release 2.1 – February 2011 Release 2.2 – April 2011 Release 2.2.1 – May 2011 Release 2.2.2 – January 2012

8 8 LS DAM Activities LS DAM 2.2.2 – latest release (Jan 2012) Changes to the Experiment Core 4 open items on Future Activities list closed LS DAM EA package re-organized 4 items added to Future Activities list https://wiki.nci.nih.gov/display/ICR/IRWG+Future+Activities Manuscript

9 9 Changes to the LS DAM Experiment Core FGED and HL7 CG WG provided similar feedback on LS DAM 2.2 Flexibility provided in the Experiment Core will lead to varying implementations and lack of interoperability Distinction between ExperimentalStudy and Experiment not clear enough IRWG addressed these concerns Alignment with ISA-TAB ISA-TAB Investigation  LS DAM ExperimentalStudy ISA-TAB Study  LS DAM Experiment ISA-TAB Assay  LS DAM PerformedActivity Changes to ExperimentalStudy, Experiment, ExperimentalItem, the Activity hierarchy and supporting classes like ExperimentalParameter and ExperimentalFactor All details documented in the Release Summary and Change List artifacts that will be published with the release Several change requests from the HL7 CG WG Omics DAM modeling effort to be considered during last few weeks of January Mostly tweaks to definitions and examples

10 10 LS DAM Closed Items Create LS DAM User Guide Will be published in the release package moving forward, starting with LS DAM 2.2.2 Review StudySubject and SpecimenCollectionProtocolSubject relationship (both children of Subject) Determined model was correct as stands; no change Should LS DAM include BRIDG StudySubject? BRIDG StudySubject removed from the model Reconsider ExperimentalItem–to–PerformedMaterialProcessStep association Addressed through the changes to the Experiment Core Reference: https://wiki.nci.nih.gov/display/ICR/IRWG+Future+Activitieshttps://wiki.nci.nih.gov/display/ICR/IRWG+Future+Activities

11 11 LS DAM 2.2.2 Package Reorganization UML package structure was re-organized to leverage the core models (subject areas) defined within the LS DAM Experiment Core Molecular Biology Core Molecular Databases Core Specimen Core Additional core models may be added as the model evolves Some overview/introductory information provided directly within the EA package to provide first time user with better experience

12 12 LS DAM Future Activities Additional requirements, enhancements/extensions, and follow-up items Captured during development and solicitation of feedback 29 items in 3 categories 7 high priority items 6 medium priority items 12 low priority items 4 unprioritized items Full list located at https://wiki.nci.nih.gov/x/74DbAQhttps://wiki.nci.nih.gov/x/74DbAQ

13 13 LS DAM Manuscript "Life Sciences Domain Analysis Model" Robert R. Freimuth*, Elaine T. Freund*, Lisa Schick*, Mukesh Sharma*, Grace Stafford*, Baris Suzek*, Joyce Hernandez, Jason Hipp, Jenny Kelley, Konrad Rokicki, Sue Pan, Andrew Buckler, Todd Stokes, Anna Fernandez, Ian Fore, Kenneth H. Buetow, Juli Klemm *These authors have contributed equally Submitted to JAMIA on Dec 9, 2011 Based on LS DAM 2.2 Reviews received Jan 29, 2012

14 14 IRWG Report LS DAM Collaborations HL7 Clinical Genomics W3C Health Care & Life Sciences LS Ontology Pilot IRWG Recommendations

15 15 HL7 Clinical Genomics Project Objective To develop a robust Domain Analysis Mode (DAM) which will eventually cover information needed for all –omics areas and support linking to clinical data results contained in other models such as CDISC BRIDG, LSDAM or an EHR. Integration of information from clinical and research is an important to step to facilitate translational medicine. Initial focus on microarray gene expression Represent most important “common” concepts for Clinical Genomics that will have stability across technology platforms Liaisons to the caBIG® ICR WS-IRWG (LS DAM development)

16 16 HL7 CG Next Steps Review use cases and map gene expression microarray data to - Omics DAM. Generate instance diagrams as needed. Continue –Omics model review (Experiment Model, LSDAM) Update model for -Omics DAM based on discussions so far Prepare documents for HL7 ballot (story board, use cases, class/attribute definitions, data type etc) Update model as needed and prepare for ballot Work with HL7 O&O on Specimen DAM Work with HL7 CIC to identify touch points to their DAMs

17 17 W3C Overview of Activities 17

18 18 Translational Medicine Ontology and TMKB Goal: demonstrate how information-based translational medicine activities can be made easier and more effective using semantic web technologies. Use case: Alzheimer’s Disease Example Questions: Clinic: Does Medicare D cover Donepezil? Medicare D covers 2 brand name formulations of Donepezil: Aricept and Aricept ODT. Clinical Trial: Find AD patients without the APOE4 allele as these would be good candidates for the clinical trial involving Bapineuzumab? Of the four patients with AD, only one does not carry the APOE4 allele, and may be a good candidate for the clinical trial. Research: Which SNPs may be potential AD biomarkers? PharmGKB reveals 63 SNPs 18 1.Luciano, et al. The Translational Medicine Ontology and Knowledge Base: driving personalized medicine by bridging the gap between bench and bedside. J Biomed Sem, 2011 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3102889/ http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3102889/

19 19 IRWG Report LS DAM Collaborations HL7 Clinical Genomics W3C Health Care & Life Sciences LS Ontology Pilot IRWG Recommendations

20 20 Life Sciences Ontology Pilot Subgroup (LS OPS) The LS DAM was developed in UML as per CBIIT practices Interaction with external communities revealed that ontologies are used more frequently to represent information An ontological representation of the LS DAM would not only enable the LS DAM to leverage semantics from existing ontologies that are widely used within the life sciences community in general, but would also facilitate the adoption of the LS DAM by members of those ontology- based communities LS OPS was formed to explore the feasibility of creating an ontological representation from the LS DAM

21 21 LS OPS: Goals To explore the feasibility of expressing a portion of the LS DAM as an ontology that uses components from existing ontologies developed by other groups To continue engaging a variety of groups from the broader scientific community with the intent to form collaborations, exchange information regarding domain information models/ontologies, and identify areas of overlap and gaps among the various communities To identify opportunities to reuse components from existing ontologies to express and extend the semantics of the LS DAM

22 22 LS OPS: Scope In Scope Ontological representation of a portion of the LS DAM, as defined by the use case Explore UML-OWL conversion tools and high-level comparison Approximate semantic representation using existing ontologies Out of Scope Ontological representation of the entire LS DAM Comprehensive survey of conversion tools Tool (software) development Precise semantic mapping between the LS DAM and other ontologies

23 23 LS OPS Activities Use Case Constrain LS DAM (UML) Create instance diagram and OWL ontology Explore automated UML-OWL conversion tools

24 24 LS OPS Use Case Based on "Copy number and targeted mutational analysis reveals novel somatic events in metastatic prostate tumors" Robbins et al, Genome Research 2011 Identification of somatic mutations in metastatic prostate cancer Tissue specimen collection and processing Array CGH and Next-Gen Sequencing Data analysis

25 25 Constrained LS DAM

26 26 LS DAM: Ontological Representation Information Sources Dublin Core Ontology NCI Thesaurus NCBI Taxonomy Phenotypic Quality Ontology (PATO) W3C Provenance Ontology (PROV) Relations Ontology (RO) Open Biological and Biomedical Ontologies (OBO) Experimental Factors Ontology (EFO) Cell Type Ontology (CL) vcard Friend of a Friend (FOAF) 18 of 20 classes in the constrained model were represented by components from existing ontologies

27 27 UML-OWL Conversion Tools Challenges No standard specification for UML-OWL conversion Variable level of tooling maturity and documentation Lack of common input/output formats Tools Protégé Enterprise Architect TopBraid Composer Eclipse TwoUse NCI-CBIIT Semantic Infrastructure Transform Libraries UML-OWL Generator

28 28 Comparison of UML-to-OWL Conversion Methods Method Level of Difficulty Tool AvailabilityCustomizability Using NCI-CBIIT Semantic Infrastructure Libraries and TopBraid Composer Hard A mixture of commercial and open source Partial. NCI-CBIIT Semantic Infrastructure Libraries are open source and can be customized. Using Protégé and UML Backend Easy Open source (free for academic institutions) Yes. The open source UML Backend Plug-in potentially can be extended given its licensing terms, or Protégé can be extended with a new plug-in. Using TwoUse OWLizerMedium Open source (free) Partial. TwoUse OWLizer offers few customization options through its user interface (e.g. whether to process property domain/ranges, how to name properties etc.)

29 29 Comparison of Manual and Automated Approaches The optimal conversion process may include a combination of automated and manual techniques Automated ConversionManual Construction Strengths Better coverage Better scalability Supports reversibility Low conversion cost Better knowledge representation Better reuse of existing ontologies Weaknesses Lack of mapping to existing ontologies Lack of standard conversion specifications Labor-intensive

30 30 LS OPS Conclusions Producing an ontological representation of the LS DAM would be a worthwhile effort By aligning ourselves with projects on the national level we are facilitating interoperability not only within the caBIG program but also across the entire research community Scientific end users that would benefit Touchpoints between domains and data sources E.g., Cytoscape, BioPAX, Reactome, KEGG, GMOD, UniProt Research tooling and related infrastructure E.g., CardioSHARE, Eagle-I, MOBY, Bio2RDF, Array Express Adding value to existing ontologies E.g., EFO, OBI, W3C TMO, Protein Ontology (PRO)

31 31 LS OPS Conclusions There are a wealth of existing ontologies that are widely used in the life sciences community that could be leveraged to support the semantics represented in the LS DAM 19 of 20 concepts in the sub-model could be represented using classes from existing ontologies Note: relationships between classes and properties of classes may diverge more frequently between models Creating an ontological representation of the LS DAM and depositing it in BioPortal would make it more visible to the research community

32 32 LS OPS Recommendations Create an ontological representation for the full LS DAM Will help make this more of a community resource and it could help bridge the divides between domain groups Review existing ontologies Identify gaps that could be filled or bridged by the LS DAM Investigate technological solutions UML-to-OWL or OWL-to-UML conversion process Develop best practices to ensure comprehensive and complete representation of UML models (e.g., LS DAM) in OWL Continue and expand outreach and collaborative activities ICR Nanotechnology Working Group (NanoParticle Ontology) HL7 Clinical Genomics Working Group (Experiment Core model) W3C HCLS (Translational Medicine Ontology) Ontology for Biomedical Investigations (OBI) Functional Genomics Data Society (FGED)

33 33 LS OPS Deliverables Research-based use case.MS Word format Constrained LS DAM (UML) RTF and EAP format Browser version Ontological representation of the LS DAM (OWL) Instance diagram in png format Ontology in OWL, Manchester Syntax (omn format) Browser version Final recommendations and report https://wiki.nci.nih.gov/download/attachments/59212462/LS+OPS+Final +Report+1.0.doc

34 34 IRWG Report LS DAM Collaborations HL7 Clinical Genomics W3C Health Care & Life Sciences LS Ontology Pilot IRWG Recommendations

35 35 IRWG Recommendations Project direction and milestones Ontological representation See recommendations from LS OPS Expansion e.g., animal models Guide for how to use the LS DAM (given different use cases) Concrete examples of projects, technical walk-through E.g., how to map legacy data to the LS DAM (as a common standard) Collaborate on a program-level clearninghouse for adoption of standards Inventory, recommendations, strengths/limitations, how-to resources Goal is to make standards (e.g., models, terminologies) more accessible

36 36 IRWG Recommendations Continue and Expand Collaborations and Outreach Community-based organizations and research contsortia E.g., HL7 CG, W3C HCLS, BIRN Domains and data sources E.g., Cytoscape, BioPAX, Reactome, KEGG, GMOD, UniProt Research tooling and related infrastructure E.g., CardioSHARE, Eagle-I, MOBY, Bio2RDF, Array Express Ontologies E.g., EFO, OBI, W3C TMO, Protein Ontology (PRO)

37 37 Resources IRWG Wiki (https://wiki.nci.nih.gov/x/kQiG)https://wiki.nci.nih.gov/x/kQiG Meeting notes and slides Subgroup activities and documentation Reference and supporting materials Direct links to the latest release of the LS DAM Collection point for feedback and requests related to the LS DAM Prioritized list of future activities LS DAM Wiki (https://wiki.nci.nih.gov/x/cxRlAQ)https://wiki.nci.nih.gov/x/cxRlAQ UML model (EA file) Model documentation Experiment implementation guide (IG) Release summary Reference and supporting materials


Download ppt "1 ICR IRWG Update Feb 16, 2012. 2 Information Representation Working Group (IRWG) The caBIG Integrative Cancer Research (ICR) Information Representation."

Similar presentations


Ads by Google