Presentation is loading. Please wait.

Presentation is loading. Please wait.

CaBIG™ Architecture Vocabularies and Common Data Elements Joint Workspace Face-to-Face Meeting University of Utah, Salt Lake City January 28-30, 2008.

Similar presentations


Presentation on theme: "CaBIG™ Architecture Vocabularies and Common Data Elements Joint Workspace Face-to-Face Meeting University of Utah, Salt Lake City January 28-30, 2008."— Presentation transcript:

1 caBIG™ Architecture Vocabularies and Common Data Elements Joint Workspace Face-to-Face Meeting University of Utah, Salt Lake City January 28-30, 2008

2 Introduction Goal: Provide an overview of the Arch/VCDE F2F meeting at Huntsman Cancer Institute at University of Utah These slides will be posted with meeting notes at ICR WS gforge site: http://gforge.nci.nih.gov/projects/july2006-icr/ All meeting material including presentations are available at: http://gforge.nci.nih.gov/docman/index.php?group_id=357&selected _doc_group_id=2582&language_id=1http://gforge.nci.nih.gov/docman/index.php?group_id=357&selected _doc_group_id=2582&language_id=1 Presenters can provide for more information

3 Summary of Day 1 As covered on ICR Meeting on February,13, 2007, highlights: Idea of using design templates to guide/drive future caBIG/caGrid middleware development Summary of last year activities and future goals of Arch/VCDE Workspaces External entities using/testing caBIG technologies: National Public Health Grid / CDC Training activities: Silver to Grid Training Module Approval of Gold Compatibility Guidelines

4 Theme: The Expanding caBIG™ Community and the Impact on Technology

5 caCORE 4.0 Overview Denise Warzel - NCI Center for Biomedical Informatics and Information Technology (CBIIT) Overview caCORE product line – What is new in 4.0? caCORE-like systems are about methodologies (not simply tools): model driven, agile development, object oriented, open source, service oriented architecture, XML schema transport, registered ISO 11179 Metadata and uses controlled vocabularies caCORE products: Helps to build a framework for developers to build apps that are interoperable Enable discovery and ability to move data around Future plan for caCORE products: Simplified and improved tools – “develop, access, consume” Support HL7 Datatypes – building a roadmap for HL7 caBIG participants Services to leverage semantic metadata Collaborative Terminology and caDSR Metadata development - Semantic Media Wiki, workflow support Simplified granular interoperability (Object and CDE level) Reusable ‘plug-ins’ to support building interoperable grid services e.g. validation services Optimized caDSR and UIs to support faster/simplified access

6 caGrid 1.x Scott Oster – Ohio State University Overview of caGrid 1.x caGrid 1.2 Highlights (caCORE SDK 4.0 support, bug fixes, new portal and more) Work in progress Simplified alternative to GridFTP for binary data transfer (caGrid transfer service) Enhancements to Data Service query language to meet community needs (CQL 2.0) Designing approach for metrics collection/statistics Future focus (selected) Incorporating outcomes from working groups such ASBP, Workflow, HTP Integrating with forthcoming registered UML/XML binding information (Introduce integration, caDSR grid service, GME enhancements) Tighter integration with other tools/projects relevant to the “caBIG Process” e.g. caCORE SDK Continuing improvement for more complex service requirements

7 Theme: Domain Workspace Requirements and the Impact on caBIG™ Infrastructure: Data Services/Federated Query

8 caCORE SDK and caGrid Interface Satish Patel - NCICB Dave Ervin – Ohio State University Overview of SDK 4.0 and new features: Re-architected system (Concurrent connection to services, POJO) Enhanced security (Attribute level security using CSM) Enhanced code generation Performance improvement Relaxed restrictions on object/data model development (No “id” attributes) New features in SDK 4.1: Support for HL7 complex data types Freestyle search Graphical installer to generate system using SDK caGrid Data Services integration with caCORE SDK is improved: Data service styles Query processors based on pluggable architecture SDK/caGrid joint development efforts

9 Federated Query Impact on caCORE APIs and caGrid Ian Fore - NCICBIIT Dave Ervin - Ohio State University Limitations of different query layers (SQL -> Hibernate -> caCORE API/QBE -> CQL/DCQL) caTissue use cases and solutions using different query layers Planned features in CQL 2.0 based on use cases from TBPT and IVI: Association population (going beyond targeted objects) Typed attributes (date, boolean etc) for binary operators (e.g. equal, not equal) Query modifiers (distinct, min, max) DCQL 2.0 will be build on builds on CQL 2.0

10 Martin Morgan – Fred Hutchinson Cancer Research Center. Shannon Hastings – Ohio State University BDT (or HTP) Requirements Data transfer and parsing related ‘Workflow’ related (Interactive, Stateful, Cooperative) Implementation related (Secure, Strongly typed, Interoperable Available BDT solutions: GridFTP WS-Enumeration Endpoint references Issues with existing solutions: Installation/configuration/platform/usability issues for GridFTP New solution: “caGrid Transfer” with caGrid 1.2 Bulk Data Transfer

11 Theme: Domain Workspace Requirements and the Impact on caBIG™ Infrastructure: Metadata

12 caDSR and Population Sciences Paul Courtney - Pop Sci SIG Lead, Dartmouth Medical School Overview of Population Science (Goals,Tools, Data of interest) How current caDSR metadata can serve Population Scientists? Need recognition that the context of a CDE within the construct of a questionnaire provides the context and semantics for that CDE Forms level metadata: About the questionnaire/survey tool as a whole About the administration of the questionnaire/survey tool (Currently working on Form Builder to accommodate needs) Future efforts: Work to bring population scientists into the process of defining Forms-level metadata requirements Move the process of bringing in questionnaires from manual curation to UML Modeling Identification of population science/public health ontologies to be used Applications that can link epi and socio-economic status (SES) data

13 caTissue Suite Dynamic Extensions George Komatsoulis -NCICB Denise Warzel -NCICB Poornima Govindrao –Persistent Systems (caTissue Suite developer team) Ian Fore - NCICB caTissue overview caTissue Suite Dynamic Extensions motivation: Impossible to imagine research software that isn’t extensible Commercial software for research often provide extensibility Current DE implementation Form builder UI XMI import and export System generated data entry forms to accept user input Integrated with caTissue Query interface to query across static and dynamic classes Captures metadata for UML, caDSR data, UI controls, Database Arch/VCDE Workspace related implications will be discussed in a working group (tooling, review and mentoring process) as decided in the break-out session

14 Theme: Domain Workspace Requirements and the Impact on caBIG™ Infrastructure: Security

15 caGrid/GAARDS Security Overview (Stephen Langella – OSU) Services and tools for the administration and enforcement of security policy in an enterprise Grid caBIG Clinical Trials Suite Requirements (Edmond Mulaire – SemanticBits) CCTS needs single sign on (SSO) caXchange Requirements (Kalpesh Patel – Ekagra Software) caXchange acts as a proxy for message originator Only Grid authenticated user should be able to submit the message caXchange must be able to act on behalf of the authenticated user WebSSO Solutions/Implementation (Kunal Modi – Ekagra Software) WebSSO provides the Single Sign On capabilities for the web applications as well the grid services using a single solution Credential Delegation Service (Stephen Langella – OSU) CDS,WSRF-compliant Grid service, enables users/services (delegator) to delegate their Grid credentials to other users/services (delegatee) such that the delegatee(s) may act on the delegator's behalf

16 geWorkbench/caGrid/TeraGrid Interface and Demo Introduction on TeraGrid Workgroup (Scott Oster- OSU) TeraGrid is an NSF high end computing infrastructure. Background on geWorkbench and geWorkbench/caGrid/TeraGrid Project (Christine Hung – Columbia University) geWorkbench – platform for data integration for genomics with tools to manage, analyze, annotate and visualize data Description of steps to establish geWorkbench/caGrid/TeraGrid Interface Demo (Christine Hung – Columbia Univeristy, Ravi Madduri – Argonne National Lab) Running a geWorkbench’s Hierarchical Clustering Service using caGrid/TeraGrid gateway

17 Security Working Group George Komatsoulis - NCICB Marsha Young - Booz Allen Hamilton Overview of working groups and current status caBIG™ initiatives for federated authentication and authorization caBIG™ Data Sharing and Security Framework (DSSF) Determine which data can be shared Identify necessary access and data security controls (authentication, authorization)

18 Ravi Madduri – Argonne National Lab Background, goals, issues & activities Review existing workflow authoring tools and suggest a tool that can be extended to be used with caBIG services. - Taverna Implement (and execute) a workflow for a specific scientific domain using this tool and existing caGrid data and analytical services. – Demo Demo: A simple (yet typical) use case for microarray analysis: 1.Locate the datasets of interest 2.Obtain the data (caArray) 3.Preprocess the data 4.Cluster the data (GenePattern) ICR Workflow Working Group Activities

19 caGrid Portal Joshua Phillips – Ohio State University Goal of caGrid Portal (http://cagrid-portal.nci.nih.gov/)http://cagrid-portal.nci.nih.gov/ Provide visualization of caGrid functionality. Demonstrate how caGrid supports semantic and syntactic interoperability Demo current functionalities: Discovery Metadata exploration Status monitoring Identity federation Data service query Query sharing Future direction: Demonstrate use of semantic metadata (grid-join) Support new CQL and DCQL features Expose workflow functionality Increase support for knowledge sharing features

20 caBIG/ONIX Collaboration Max Wilkinson - Scientific IT Analyst ONIX Platform Development, UK NCRI Informatics Coordination Unit Oncology Information Exchange: ONIX Goal: Using technology to make use of information relating to cancer cause, prevention and cure Using informatics to maximise the impact of cancer research through better data sharing Broadly similar goals as caBIG

21 Analytical Service Best Practices Working Group Activites Baris Suzek – Georgetown University Shannon Hastings – Ohio State University Charter & Objectives Issues & Solutions Model Reuse XSD Reuse and/or Generation Process used for Service Development Recommended Process for Future Development by caGrid team ( Top down (cleaner) Bottom up Outstanding Issues Generic Parameters Next Steps (was presented to group on February 27 th )

22 Breakout Sessions Working Session – Gold Review Process and Review Criteria Level of CDE Reuse High impact/Standard CDEs Backbone model Dynamic Extensions Convene a Working Group consisting of TBPT,VCDE and Arch reps Semantic Discovery and Query Convene Semantic Query Working Group Explore Semantic Web technologies to assist in the discovery of a set of data services that could collaborate to answer a query that was expressed in terms of concepts (rather than data types)

23 Birds of a Feather Session: Semantic MediaWiki Hands-On Introduction (Frank Hartel – NCICB)) Biomedical Grid Terminology (BiomedGT) is an open collaboratively developed terminology for translational researchBiomedGT Demo of BiomedGT MediaWiki for Collaborative Terminology Development (Harold Solbrig – Apelon)

24 Theme: Arch/VCDE Workspace Requirements and the Impact on caBIG™ Infrastructure: Metadata (Day 3)

25 caDSR/GME Mapping Denise Warzel - NCICB Scott Oster – Ohio State University Problem: Achieving caBIG interoperability goals on the grid requires not only sound handling of both syntax and semantics, but also a formal binding between them Previous “solution”: Require an XML Schema for each package that followed a namespace construction rule (implicit binding) Solution: Planned definition of mapping rules specify how a given UML entity is represented in XML over the grid Mapping maintained in the caDSR, lookup and query available through caDSR grid service

26 High-Impact Common Data Element Identification Process CDE Leadership Group – VCDE Workspace Mukesh Sharma – Washington University St. Louis Introduction ‘High impact’ or standard CDEs are pervasive through many developer projects and are ‘touch points’ for semantic interoperability To permit interoperability, the classes and attributes in the UML models for different applications should be semantically annotated to be the same Approach Manual: Review models, find common objects/classes, expand “backbone” models, propose standards Automated: Use caDSR metadata to identify CDEs Next steps Complete manual review (15/76 models completed to date) Complete automated review using CRS v2.0 software Determine extension of the backbone model based on findings Creation of standard CDEs that may be shared more broadly

27 Terminology Metadata: Extension of the Service Meta Model Tom Johnson - Mayo Clinic Goal: Identify and model metadata needed to discover vocabularies on the grid Standards considered Dublin Core ISO 11179-2/3/6: classification, registries, admin National Center for Biomedical Ontology (NCBO) BioPortal and more Next steps Model harmonization w/ recommended ISO 11179 superclasses Change caGrid tooling to capture additional metadata when registering terminology Create custom discovery client for terminology services, to take advantage of additional metadata in support of identified use cases Vote taken on criteria identified, not model per se [APPROVED]

28 The Vocabulary Resources: LexBIG and EVS and NCBO Browser Tom Johnson- Mayo Clinic Overview of vocabulary resources: LexGrid – raw content: model and data storage which defines concepts and properties as well as relations and associations and supports loaders/ representations such as OWL, OBO, RRF, Protégé, XML. LexBIG API – allows you to fetch data EVS caCORE API – this is in the distributed environment where LexBIG is local with the caCORE externalization which talks to database BioPortal – Web-based-features driven by the infrastructure and can chose code systems in addition to text, etc. Future browser support: OpenPortal - a collaborative effort to develop an open, site neutral and easily extensible qeb service allowing users to browse, search, and visualize ontologies stored in LexGrid repositories

29 The Vocabulary Review (the process and the resource together): LOINC James Cimino – Columbia University Lab LOINC® (Logical Observations Identifiers, Names, Codes) a clinical terminology important for laboratory test orders and results. Results of review process: Met most criteria (where lacking primarily in documentation) Lessons learned from review process—generally good but needs: More active participation by developer especially with respect to documentation Content available in a standard exchange format and QCd to make sure all reviewers have access Reviewers experienced with domain and vocabulary; evaluation experience is helpful because there is a steep learning curve. Notes and examples needed on criteria matrix Vote: Lab LOINC approved as caBIG terminology [Approved]

30 caGrid Queries into the Ontologic Space James Buntrock - Mayo Clinic Harold Solbrig - Apelon Motivation: Leverage additional semantics used for caGrid application development Provide next generation of design time activities (e.g. CDE/Model Reuse) Provide next generation semantically aware services for runtime activities (e.g. NLP) Semantic Query WG Charter: Use cases for search, retrieval, and aggregation from one or more data nodes on caGrid leveraging the semantics in vocabulary Utilize or inform future caGrid runtime and design components Semantic Query WG Deliverables: White Paper that discusses the use cases and the modifications to the caGrid software and design activities Review by VCDE and Arch Workspaces with recommendations Construction of a prototype or proof of concept implementation Evaluation of the benefits and costs of supporting semantic query capabilities on caGrid

31 Additional Information These slides will be posted with meeting notes at ICR WS gforge site: http://gforge.nci.nih.gov/projects/july2006-icr/ All Joint Arch/VCDE WS presentations are available at: http://gforge.nci.nih.gov/docman/index.php?group_id=357&selected _doc_group_id=2582&language_id=1http://gforge.nci.nih.gov/docman/index.php?group_id=357&selected _doc_group_id=2582&language_id=1 Presenters

32 Acknowledgements Elaine Freund Grace A. Stafford Brian Davis Li Kramer

33 Additional Slides

34 Introduction to Huntsman Cancer Institute at University of Utah Joyce A. Mitchell-Welcome, PhD, FACMI, FACMG Associate Vice President, Health Sciences Information Technology Chair, Department of Bioinformatics, University of Utah Huntsman Cancer Institute's Mission Cancer Genetic Research at HCI Utah Population Database Largest genetic database in the world (6.5 million individuals) Large pedigrees enable genetics Used in identification of major cancer genes such as BRCA1 and BRCA2 Department of Biomedical Informatics and its activities

35 Middleware and use of Design Templates in Translational Research Joel Saltz, MD, PhD Chair, Department of Biomedical Informatics, OSU College of Medicine Idea of using design templates to guide/drive future caBIG/caGrid middleware development Design templates for Translational Research: Coordinated Systems-Level Attack on Focused Problem Prospective clinical research study Multiscale Investigations that encompass genomics, epigenetics, (micro)anatomic structure and function Secondary Data Analysis Adaptive Image Guided Intervention Ad-hoc discovery, query, invocation of discrete services caGrid related Middleware Challenges: Data and Analytical Services Support for federated querying and grid services/workflow Semantic infrastructure Security Governance of middleware development

36 Theme: The Evolution of the Architecture and VCDE Workspaces in the Context of the Expanding caBIG™ Community

37 Impact of the caBIG™ Enterprise on the Architecture and VCDE Workspaces: EY2 Challenges Avinash Shanbhag - Architecture Workspace Lead, NCICB George Komatsoulis – VCDE Workspace Lead, NCICB Perspective on past year Policies – Compatibility/mentoring guidelines, compatibility review process, security policies, vocabulary review guidelines and bronze level certification Infrastructure – caGrid 1.1 support for semantic/syntactic interoperability, deployment tools Standards Domain Workspace Assistance – Support for domain workspaces (ICR, CTMS..) and working groups (ASBP, TeraGrid, Security) Goals for the next year (no major change in direction) Gold compatibility - Review process and mentorship High impact data standards – Data standard submissions from Domain Workspaces (e.g. BRIDG) Infrastructure enhancement per needs Training and documentation Security – Bridging policies and technologies for service configuration Federated Vocabulary Environment using caGrid Community expansion – Adoption/adaption support Integration with other Biomedical Research Grids/Organizations - NCRI/ONIX (UK), National Health Information Network Plan: Working towards goals

38 External Communities Investigating/Leveraging caBIG™ Governance and Technology Ken Hall – BearingPoint Scott Halpine - SCI Group National Public Health Grid 23 programs in the Local Health Departments (HDs) 19 programs in the State Health Departments There are 3000 local HDs and 50 State HDs Public Health Informatics Challenges (not that different from caBIG): Public health data widely distributed Volume of public health data growing rapidly Many cultural, social and political impediments to data sharing Requires a stronger economic model for long-term financial sustainability Uniquely dynamic, complex and global in scale Many redundant systems, application silos and data silos Current thinking: Explore/leverage existing Grid technologies and align with other nationwide health initiatives Pilot study at Center of Disease Control and Prevention (CDC) – A silver level compatible data service as a “proof of concept”

39 Updates to Silver Compatibility Checklist Guide to Mentors – VCDE/Architecture Workspaces Revision proposal and approval to allow Java primitive data types Change wording in checklist to: Class and Attribute datatypes must be approved by the VCDE and Architecture workspaces, and/or mapped to the equivalent datatype in the caDSR per the datatypes white paper [APPROVED]

40 Compatibility Guidelines version 3.0 Final Approval Gold Compatibility Guidelines Working Group – VCDE/Architecture Workspaces Overview of changes to Compatibility Guidelines v3.0 Responses to VCDE/Architecture Workspace participants Discussion and Vote [APPROVED] Next Steps Send for review by NCI Senior Leadership Release to caBIG community and collect comment Kick-off of 4 Gold compatibility review process working group (Vocabulary, Architecture, Information Model, Common Data Elements) Update as new needs emerge (version 3.1..) Initial discussions around the responsibilities and issues relevant to working groups

41 Silver to Grid Training Module Development- Overview and Demos Baris Suzek, Peter McGarvey – Protein Information Resource Georgetown University Overview of a hands-on training module to cover all steps to develop a data service from an idea to a caGrid data service Description of codebase and individual lessons for WS participants input Challenges, lessons learned and recommendations experienced Demo Lesson: Practical Metadata Reuse Finding and reusing different component including standards, models, CDEs with current tools and repositories: UML Model Browser, EA, SIW, caDSR.. Demo Lesson: Using caGrid to for Semantic Interoperability Use caGRID Service APIs (live demo) Discover information resources (Standard Vocabularies) Query resources using a standard language CQL (Standard APIs) Identify ways to combine information from multiple resources (CDE) Next steps Identification of volunteers to review individual sections

42 Compatibility Review Software Hands-On Training Robert Freimuth - Mayo Clinic Poornima Govindrao - Persistent Systems A system with the goal to make the process of compatibility review more efficient and reduce the administrative overhead Demonstration of workflow back and forth between developers and reviewers Hands-on mock review for participants

43 XC F2F action items for ICR Current Working Groups: Continue HTP WG. The Grid response to HTP is caTransfer (stateless transfer service over an http/https service) and the Grid team would like to continue with HTP WG. Define more/ provide a variety of workflows in the context of the use cases for continued software development (by caGrid team.) Include translational workflows. ASBP WG should address where you draw the line between benefits semantic interoperability but benefits of increased speed of analytical services. Interoperability: Proactively define CDEs for the ICR domain Define points and junctions where ICR will connect with other translational tools. Engage the workspaces in the identification process. CTMS and Imaging were specifically called out.

44 XC F2F action items for ICR--continued Other Select ICR end-user volunteers to review the caGrid training – may be for next year’s program. Engage in Dynamic extensions uses and development Create or identify tools to construct semi-automatic construction of workflows to develop pipeline. ICR and VCDE to engage in and identify projects/targets for determining standardization on transfer of structured chunks of data in order to use tools like Taverna more efficiently.


Download ppt "CaBIG™ Architecture Vocabularies and Common Data Elements Joint Workspace Face-to-Face Meeting University of Utah, Salt Lake City January 28-30, 2008."

Similar presentations


Ads by Google