Arch/VCDE Workspace Requirements and the Impact on caBIG™ Infrastructure: Metadata caDSR/GME Mapping Denise Warzel, Scott Oster caBIG Architecture/VCDE.

Arch/VCDE Workspace Requirements and the Impact on caBIG™ Infrastructure: Metadata caDSR/GME Mapping Denise Warzel, Scott Oster caBIG Architecture/VCDE Joint Face-to-Face Salt Lake City, UT January 28 th -30 th, 2008

2 Agenda Problem Statement caCORE Implementation Current Status and Plans Future Work caGrid Integration

caCORE Team Developers, QA, Technical Writers Lockheed Martin Management Systems Designers Eddie VanArsdell Sarah Gill Ann Wiley Northern Taiga Ventures Wendy Erickson-Hirons Science Applications International Corporation (SAIC) Johnita Beasley Rui Chen Jyothsna Chilukuri Tommie Curtis Brenda Maeske Mary Cooper Konrad Rokicki Tracy Safran Ye Wu Mayo (LexBig Team) Thomas Johnson James Buntrock Russ Hamm Terrapin Systems LLC (TerpSys) Yeon Choi Andrea Johnson Ralph Rutherford Vanessa Caldwell Gavin Brennan Norval Johnson Sriram Kalyanasundaram Alan Klink Cuong Nguyen Bob Wysong Craig Fee John White Claire Wolfe

caCORE Team Developers, QA, Traikning, Technical Writers NCI Frank Hartel George Komatsoulis Sichen Liu Dianne Reeves Avinash Shanbhag Denise Warzel – caCORE Product Line, caDSR Oracle Steve Alred Prerna Aggarwal Sharad Bhardwaj Christophe Ludet Brett Novak Ekagra Software Technologies, Ltd Satish Patel JJ Maurer Denis Avdic Nadine ScenPro Jennifer Brush Larry Hebel Sumana Hedge Charles Yaghmour Northrop Grumman Wilberto Garcia Kim Ong David Yee

5 caGrid 1.1 Team Ohio State University - Biomedical Informatics Department Dave Ervin Shannon Hastings Tahsin Kurc Stephen Langella Scott Oster - Chief Architect - caGrid Joel Saltz University of Chicago/Argonne National Laboratory Ravi Madduri Ian Foster SemanticBits, LLC. Joshua Phillips Duke Comprehensive Cancer Center Patrick McConnell Ekagra Software Technologies, Ltd. Vijay Parmar

6 caGrid 1.1 Team (Continued) Northern Taiga Ventures, Inc. (NTVI) Wendy Erickson-Hirons Science Applications International Corporation (SAIC) Aynur Abdurazik Ye Wu Terrapin Systems LLC (TerpSys) Chet Bochan Vanessa Caldwell Craig Fee Alan Klink Gavin Brennan NCI - Center for Biomedical Informatics and Information Technology (CBIIT) Todd Cox George Komatsoulis Denise Warzel Wendy Patterson Mary Jo Deering Booz Allen Hamilton Michael Keller Arumani Manisundaram

7 Problem Statement Achieving caBIG interoperability goals on the grid requires not only sound handling of both syntax and semantics, but also a formal binding between them Previously, the binding was: implicit and based upon overly constraining assumptions (in some cases) difficult to make use of programmatically difficult to verify, and therefore enforce, compliance with

8 Example of Grid Data Syntax XML of CQL Results from a query to a Data Service Infrastructure capable of mapping directly back to corresponding programming language objects Semantics of data are explicitly bound to the conceptual model (DomainModel) the user queried against, and implicitly associated with this XML via it’s namespace

9 Example Syntax Use Cases I’m writing an application to process arbitrary data types, how do I know what the data will look like? I want to build a grid service, how do I describe the inputs and outputs of its operations? How can I be sure data I am getting is valid? I don’t want to work with WSDL or XML, can’t I just use an object-oriented client API to talk to grid services?

10 Example of Grid Data Semantics Snippet of XML representation of DomainModel metadata from a Data Service Describes the caDSR-registered logical structure, used to query the service; annotated with concept codes from EVS Independent of the syntax used to realize the data types on the grid (such as in the results of a query)

11 Example Semantics Use Cases What are the service addresses of all the running Data Services exposing data based on concept C16612? What are the permissible values of Taxon.abbreviation in the caBIO 3.1 model? Are there any Analytical Services on the grid which provide operations over a data type that is semantically interoperable with the caBIO 3.1 Tissue.organ? How do I learn about the data available from an arbitrary Data Service?

12 Example Semantic/Syntax Binding Use cases I’ve received a generic data set (such as a CQL results object), how can I understand the semantics of the contained data? I want to describe a workflow using either logical models or EVS concepts, how can I figure out how the XML is structured such that I can use BPEL to extract data? Better yet, is there a graphical tool that can do the transformation for me? I registered my model in caDSR, why is Introduce asking me for XML Schemas?

13 Realizing the Binding Use Cases Need the capability to formally describe, register, and access the association between a given data type’s syntax and semantics The foundation of the grid data syntax is XML Schema; XML Schemas are identified by their namespaces The foundation of the grid data semantics are the EVS annotated, caDSR data models; data models are identified by their caDSR identifier and version The binding between syntax and semantics is essentially a mapping from the UML/caDSR universe to the XML/GME universe

14 Connecting UML and XML: Current Implementation Previous “solution” was to require an XML Schema for each package that followed a namespace construction rule Had several limitations: Several caBIG projects wish to reuse existing XML Schemas, which have existing namespaces (can’t be changed for interoperability reasons) Some uniqueness constraints assumed are not strictly enforced Does not support “reverse lookup” (XSD -> caDSR) Does not provide fine grain mapping necessary for some applications/infrastructure Somewhat prescriptive (namespaces, naming conventions, etc) when ideal solution is entirely descriptive

15 Connecting UML and XML: The complexities There are a large variety of options on how a conceptual UML model can be represented in XML (many in use within caBIG as well as relevant external XML standards) Assumptions on namespaces or structure don’t work Most applications/services “don’t care,” so process must be simple for the default case, but sufficient for the others ? Attribute or element? Names different What about Collection associations

16 Connecting UML and XML: The solution Planned definition of mapping rules specify how a given UML entity is represented in XML over the grid Maintained in the caDSR, lookup and query available through caDSR grid service

17 Partial Mapping Example Project: BookStore Version: 1.0 Project’s Namespace (gme://example.com/version/1.0) XPath: title Element: Book Namespace: gme://bookstore.example. com/version/1.0 Attribute Mapping

18 Connecting UML and XML: The solution Value: “gme://caMOD.caBIG/3.0” Default heuristic: gme://{projectName}.{contextName}/{version} The full package path name “gme://caMOD.caBIG/3.0/gov.nih.nci.camod.domain” (2 tags, 1 for Source, 1 for Target) {rolename}/{XMLElement of Class} “resident/Person” “addressCollection/Address’” (2 tags) The full package path name And {className} “gme://caMOD.caBIG/3.0/gov.nih.nci.camod.domain” “Agent” “@agentId

19 GME Namespace Generation Current Thinking Project (CS) level: Default heuristic: gme://{projectName}.{contextName}/{version} “GME_XMLNamespace” = “gme://caMOD.caBIG/3.0” Package (CSI) Level: Default heuristic: The full package path name GME_XMLNamespace = “gme://caMOD.caBIG/3.0/gov.nih.nci.camod.domain” Associations: Default heuristic: {rolename}/{XMLElement of Class} One for Source, one for Target GME_SourceXMLLocRef = "resident/Person", or “addressCollection/Address” Class level: (2 Tags) Package name Default heuristic the full path name GME_XMLNamespace ="gme://caMOD.caBIG/3.0/gov.nih.ncio.camod.domain” Class name Default Hueristic {className} Object Class GME_XMLElement = “Agent”  Object Class Attribute level: Default Heuristic: @{attribute name} CDEGME_XMLLocReference =“@agentId“  CDE Association level: GME_TargetXMLLocRef ="toClass2” GME_SourceXMLLocRef = "company”

20 Options for creating Schema -> GME Namespace names Options for generating/creating GME namespace names: 1.Semantic Integration Workbench (SIW) – generate and review GME namespace tagged values based on current heruistic ‘Generate GME Namespace Tags’ ‘Clear GME Namespace Tags’ Project  Classification Scheme Package  Classification Scheme Item Class  Object Class Associations  Object Class Relationships Attribute  CDE 2.caAdapter – map an existing XSD to a domain model Useful if you are trying to use an externally defined schema with your service caAdapter inserts the tags for you 3.Manually – EA or ARGO UML

21 Impact on caCORE Tools Objective: caDSR becomes the authoritative source for registering the GME Namespace information Impacted tools: SIW, UML Loader, caAdapter, caCORE SDK, Introduce UML Loader – load the names into caDSR as alternate names UML Loader - validate that all GME tags are present got for the model UML Loader - validate that all GME tags are valid URIs UML Loader – classify alternate names by Project/Version (reuse) SIW - Feature to remove all GME related tagged values from an XMI file SIW - Roundtrip inserts GME tags from caDSR to XMI file SDK/Introduce - use the GME tagged values to generate the schema New Services for retrieving GME namespace names using model

22 Issues Model owner names are not unique identifiers for the objects they name  XML Schema based on model owner names are not syntactically ‘interoperable’ with other semantically equivalent or similar objects  Human readable, but not ‘meaningful’ Does it matter? Is “Query Semantic Metadata” to find equivalence sufficient? Is there a better way?

23 Assessing Schema interoperability? Can these data be combined? Looks Promising! 7 attributes 9 attributes

24 Equivalent? 2390874v1.0 2529585v1.0 2529586v1.0 2529587v1.0 2529588v1.0 2529589v1.0 2529590v1.0 2529591v1.0 2437803v1.0 2438153v1.0 2438154v1.0 2438155v1.0 2438156v1.0 2438157v1.0 2438158v1.0 2438218v1.0 2438219v1.0 2438153v1.0 2438220v1.0 2438221v1.0

25 Unique Identifiers revel that items are not semantically the same 2390874v1.0 2529585v1.0Protocol Agent Identifier java.lang.Long 2529586v1.0Protocol Agent NSC Code java.lang.Long 2529587v1.0Protocol Agent Cancer Molecular Analysis Project Protocol Agent Flag java.lang.Boolean 2529588v1.0Protocol Agent NCI Concept Code java.lang.String 2529589v1.0Protocol Agent Comment java.lang.String 2529590v1.0Protocol Agent Source java.lang.String 2529591v1.0Protocol Agent Name java.lang.String 2437803v1.0 2438153v1.0 Pharmacologic Substance Identifier java.lang.Long 2438154v1.0 Pharmacologic Substance NSC Code java.lang.Long 2438155v1.0 Pharmacologic Substance Cancer Molecular Analysis Project Pharmacologic Substance Flag java.lang.Boolean 2438156v1.0 Pharmacologic Substance NCI Concept Code java.lang.String 2438157v1.0 Pharmacologic Substance Comment java.lang.String 2438158v1.0 Pharmacologic Substance Source java.lang.String 2438218v1.0 Pharmacologic Substance Name java.lang.String …. Question: Would using standard names in schemas facilitate interoperability? If so, which to use? Public ID and Version are persistent, not user friendly System Generated names are consistent, human friendly

26 Harmonization and Mapping Services XML Schema Based on Model Owners Names 007 Taxol 007 Taxol … Values and structures are the same, but classes and attributes are still NOT semantically the same

27 Harmonization and Mapping Services XML Schema based on caDSR Identifiers? <Agent cadsr-identifier= “2390874v1.0” > 007 Taxol <Agent cadsr-identifier=" 2437803v1.0 "> 007 Taxol … Values and structures are the same, but classes and attributes are still NOT semantically the same

28 Harmonization and Mapping Services XML Schema Based on System Generated Names? <Agent caDSRidentifier= “Protocol_Agent” > 007 Taxol <Agent caDSRidentifier =" Pharmacologic_Substance "> 007 Taxol … Values and structures are the same, but classes and attributes are still NOT semantically the same

29 caGrid Integration Grid Access: The caDSR Grid Service will provide grid access to the binding information registered in the caDSR Replace existing heuristic use: Metadata Creation/Annotation: caDSR Grid Service provides ability to annotate ServiceMetadata instances generated by Introduce, with the caDSR/EVS extract information Introduce Data Type Browsing Provide ability to seamless locate and leverage XML Schemas needed for a service by selecting caDSR data models New Potentials: Workflow Creation User Interface Workflows are described with BPEL, which uses XML constructs for things like data extraction; will be able to provide an interface to seamlessly describe workflows at the “object layer”

Arch/VCDE Workspace Requirements and the Impact on caBIG™ Infrastructure: Metadata caDSR/GME Mapping Denise Warzel, Scott Oster caBIG Architecture/VCDE Joint Face-to-Face Salt Lake City, UT January 28 th -30 th, 2008

31 Backup Slides

32 Conceptual View of the Problem

33 XML Schema Existence Rules Each caDSR data object (as part of a specific Project version) used in the grid must have its XML format modeled in an XML Schema that is registered in the GME Within each data object, the format of its attributes and associations must be modeled in the corresponding XML Schema type Every caDSR Package (as part of a specific Project version) must have a corresponding XML Schema that is registered in the GME, even if it just imports a series of other XML Schemas and doesn’t define any of its own types Every caDSR Project must have a corresponding XML Schema registered in the GME; It will most likely just imports a series of other XML Schemas (corresponding to its Packages’ schemas), though it may define its own types These rules let the XSD modeler create any level of schema granularity from 1 per project, down to 1 per Class, but there is always a defined way to retrieve all of the types for a given Project, Package, and Class

34 Binding the XML Schemas and caDSR Preceding rules only detail the necessity of existence of the various XML Schemas and schema entities, they do not specify how a particular instance can be located A detailed bidirectional mapping/binding between the schema entities and caDSR items is required to support the use cases

35 Mapping Realization Planned to be maintained in the caDSR API to query and lookup bindings Access will be available through caDSR Grid Service

Arch/VCDE Workspace Requirements and the Impact on caBIG™ Infrastructure: Metadata caDSR/GME Mapping Denise Warzel, Scott Oster caBIG Architecture/VCDE.

Similar presentations

Presentation on theme: "Arch/VCDE Workspace Requirements and the Impact on caBIG™ Infrastructure: Metadata caDSR/GME Mapping Denise Warzel, Scott Oster caBIG Architecture/VCDE."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Arch/VCDE Workspace Requirements and the Impact on caBIG™ Infrastructure: Metadata caDSR/GME Mapping Denise Warzel, Scott Oster caBIG Architecture/VCDE.

Similar presentations

Presentation on theme: "Arch/VCDE Workspace Requirements and the Impact on caBIG™ Infrastructure: Metadata caDSR/GME Mapping Denise Warzel, Scott Oster caBIG Architecture/VCDE."— Presentation transcript:

Similar presentations

About project

Feedback