Adapting an Existing Data Service to be caBIG™ Silver-level Compliant Peter Hussey LabKey Software, Inc, Seattle, WA USA Contact: Abstract.

Slides:



Advertisements
Similar presentations
CACORE TOOLS FEATURES. caCORE SDK Features caCORE Workbench Plugin EA/ArgoUML Plug-in development Integrated support of semantic integration in the plugin.
Advertisements

COM vs. CORBA.
Spring, Hibernate and Web Services 13 th September 2014.
Computer Monitoring System for EE Faculty By Yaroslav Ross And Denis Zakrevsky Supervisor: Viktor Kulikov.
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
Database Management: Getting Data Together Chapter 14.
Satzinger, Jackson, and Burd Object-Orieneted Analysis & Design
SiS Technical Training Development Track Technical Training(s) Day 1 – Day 2.
BUSINESS DRIVEN TECHNOLOGY
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Course Instructor: Aisha Azeem
Academic Year 2014 Spring.
UNIT-V The MVC architecture and Struts Framework.
What is Software Architecture?
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse 2.
Page 1 ISMT E-120 Desktop Applications for Managers Introduction to Microsoft Access.
PHASE 3: SYSTEMS DESIGN Chapter 7 Data Design.
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
Software Engineering Muhammad Fahad Khan
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.
The Design Discipline.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
Module 3: Table Selection
Adapting an Existing Data Service to be caBIG™ Silver-level Compliant Peter Hussey LabKey Software, Inc, Seattle, WA USA Contact: Abstract.
Introduction –All information systems create, read, update and delete data. This data is stored in files and databases. Files are collections of similar.
6-1 DATABASE FUNDAMENTALS Information is everywhere in an organization Information is stored in databases –Database – maintains information about various.
OpenMDR: Alternative Methods for Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
Database Technical Session By: Prof. Adarsh Patel.
Introduction to Databases A line manager asks, “If data unorganized is like matter unorganized and God created the heavens and earth in six days, how come.
Chapter 9 Designing Databases Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Using SAS® Information Map Studio
Adaptive Hypermedia Tutorial System Based on AHA Jing Zhai Dublin City University.
 2004 Prentice Hall, Inc. All rights reserved. 1 Segment – 6 Web Server & database.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Storing Organizational Information - Databases
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
ATLAS Detector Description Database Vakho Tsulaia University of Pittsburgh 3D workshop, CERN 14-Dec-2004.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
CaGrid Overview and Core Services caGrid Knowledge Center February 2011.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 18 Slide 1 Software Reuse.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Chapter 6 – Architectural Design Lecture 1 1Chapter 6 Architectural design.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
Database Management Supplement 1. 2 I. The Hierarchy of Data Database File (Entity, Table) Record (info for a specific entity, Row) Field (Attribute,
Database Management Systems (DBMS)
Ch- 8. Class Diagrams Class diagrams are the most common diagram found in modeling object- oriented systems. Class diagrams are important not only for.
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 JSP Application Models.
Object storage and object interoperability
Patterns in caBIG Baris E. Suzek 12/21/2009. What is a Pattern? Design pattern “A general reusable solution to a commonly occurring problem in software.
Design for a High Performance, Configurable caGrid Data Services Platform Peter Hussey LabKey Software, Inc, Seattle, WA USA Contact:
PowerPoint Presentation for Dennis, Wixom, & Tegarden Systems Analysis and Design with UML, 5th Edition Copyright © 2015 John Wiley & Sons, Inc. All rights.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
2) Database System Concepts and Architecture. Slide 2- 2 Outline Data Models and Their Categories Schemas, Instances, and States Three-Schema Architecture.
Hibernate Java Persistence API. What is Persistence Persistence: The continued or prolonged existence of something. Most Applications Achieve Persistence.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
Getting Started on The Project Bank in Visual Studio
The Client/Server Database Environment
Databases and Information Management
Smoke and Mirrors Prototype
Smoke and Mirrors Prototype
SDMX IT Tools SDMX Registry
Presentation transcript:

Adapting an Existing Data Service to be caBIG™ Silver-level Compliant Peter Hussey LabKey Software, Inc, Seattle, WA USA Contact: Abstract Conclusion The caCORE SDK is the technology foundation for caBIG™ compliant applications. The SDK is based on a software development paradigm that starts with an abstract model of the entities represented in a particular application. Real-world examples of such entities include identified peptides in an MS2 run or microarray test results. Entities are usually related to other entities in known ways. For example a single MS2 “run” entity must have 1 or more “FASTA databases” and may have 0 or one or more “identified peptides”. Generally the interesting entities in an application are those stored in the database. There is often a close correspondence between a row (record) in a SQL table in the database used by an application and an instance (single entity) of a class of similar entities to be exposed by the application. The caCORE SDK architecture is based largely on the 1-to-1 correspondence between an application class and a SQL table. Introduction At the core of the generated caCORE web application is Hibernate, an open source middleware layer for mapping Java programming objects into SQL table objects and vice- versa. (Figure 3). The caCORE SDK build process translates the UML model into configuration files that allow Hibernate to construct complex queries by translating relationships between objects into SQL JOIN constructs. Hibernate allows programmers to issue database queries in a simple a “Query By Example” format. The use of Hibernate in the caCORE runtime yields several benefits: It avoids mixing SQL commands application code, common source of bugs in web database applications. It is highly configurable, allowing the developer to tune the way Hibernate translates object access into SQL. It supports a standardized “Hibernate Query Language” (HQL) that looks like SQL but works unchanged across all supported relational databases, allowing the developer to issue more complex queries than can be expressed via the standard QBE mechanism. The caCORE SDK allows a developer or analyst to leverage application model knowledge into a working web database application that would otherwise be very difficult and expensive to build from scratch. UML Class Model UML Data Model Web Application SQL Schema (Tables) API libraries SDK Build Process The caCORE SDK is based on a software development paradigm that starts with an abstract model of the entities represented in a particular application. Real-world examples of such entities include identified peptides in an MS2 run or microarray test results. Entities are usually related to other entities in known ways. For example a single MS2 “run” entity must have 1 or more “FASTA databases” and may have 0 or one or more “identified peptides”. Generally the interesting entities in an application are those stored in the database. There is often a close correspondence between a row (record) in a SQL table in the database used by an application and an instance (single entity) of a class of similar entities to be exposed by the application. The caCORE SDK architecture is based largely on the 1-to-1 correspondence between an application class and a SQL table. One of the design goals of the caCORE architecture is to create an inter-operability standard that is not tied to a single programming language. So in the caCORE development paradigm, the developer describes objects and their relationships in Universal Modeling Language (UML). UML is a high-level, primarily graphical approach to defining a programming project. UML is implemented by a number of tools including Enterprise Architect and ArgoUML, the two tools supported by the current caCORE SDK (version 4.0). UML modeling, however, is only partly standardized. It is very difficult to transfer a model between tools without losing information in the transfer. caCORE SDK Development Process The caCORE Application Paradigm There are three phases in the caCORE development paradigm: 1.Create the UML model elements using the UML modeling tool. This is a painstaking task for any moderately complex real-world application. The application object model is essentially specified twice: as a UML Class model and as a UML Data model. The Class model corresponds to the objects in the application that a developer will ultimately use to access the data service. The Data model describes the implementation of those classes in a relational database, In most cases there is a single SQL table that corresponds to a single Class object. The data objects are linked together through a set of specific relationships and attribute values that must all match exactly, but are each specified and visible on separate property dialogs within Enterprise Architect. (Note: the 4.0 SDK has added a very useful validation step to the build process that should make it much easier to track down and fix inconsistencies and omissions in the UML models than what the LabKey/CPAS team experienced.) Figure 2 shows a small subset of the LabKey/CPAS UML model in a diagram that combines some the class elements and the data elements in a single diagram. 2.Register the classes and attributes of the UML model objects with NCI’s Enterprise Vocabulary Services (EVS) and the Cancer Data Standards Repository (caDSR). The common data element identifiers resulting from this step are incorporated into the class model objects as additional tagged values. 3.Run the SDK build process, creating three runtime entities from the model (figure 2) Database definition scripts, in the form of SQL CREATE TABLE commands A web application that implements the UML Class model and can translate requests for objects into SQL commands. A set of programming interface libraries that enable applications to query, insert, update and delete application objects over several different communication channels, including local Java applications and web service calls. app Most large-scale, team-built applications are not built using an application generator approach. LabKey/CPAS is one such application. Yet LabKey/CPAS still need to participate in the interoperability of caBIG. For these situations, the caCORE SDK can be used to generate a web application that runs in parallel to an existing application and exposes a caBIG™ silver-compliant programming interface over the data managed by the non-caCORE application. The main pre-requisite to this architecture is that the data to expose is held in a relational database. We also made the big simplification that the caCORE-generated web application would expose read-only interfaces, which is allowed and appropriate for caBIG™ compliance. Within this simplified target, we still encountered difficulties around the following: SQL schema implementation differences from caCORE. The caCORE SDK makes several assumptions regarding the database schema that may not be true for an existing application: A class in the object model to be exposed corresponds 1-to-1 with a table in the SQL Schema The object identifier maps to a single integer primary key in the corresponding relational table. A relationship between Class objects corresponds to a foreign key in the SQL tables Security integration. An existing application will likely have some security implementation that logically should extend to the caBIG™ interface. The caCORE SDK, however, discusses only the implementation of security in a new application, not integration with an existing security model. caCORE Runtime Architecture SQL database caCORE web application Local Java lib Hibernate Local JSP Domain model Remote WS Remote WS lib In a software application based on the caCORE design, developers write web pages and program-to-program applications using the API generated by the SDK build process. The web application handles both read and write access to the underlying SQL database in order to support the creation and management of application objects. Challenges in Adapting an Existing Application to caCORE The LabKey/CPAS Solution The National Cancer Institute’s caBIG™ initiative aims for interoperability of bioinformatics applications. caBIG™ envisions that this will be achieved by encouraging all applications to implement a standard programming interface and to register their terms and data objects with a centralized service. The required programming interface is essentially defined in terms of the behavior of applications built using the caCORE Software Development Kit (SDK). The caCORE SDK is designed and documented for building a new application from scratch. Little is documented on how one might achieve caBIG™ silver-level compliance in an application not built with the caCORE SDK. This poster describes the caCORE SDK development and build process and how the LabKey team changed it to work with their existing proteomics platform software. The LabKey/CPAS solution creates a parallel web application that supports the caBIG™ programming interface and access CPAS data through a SQL View layer. LabKey/CPAS resolved these challenges through the creation of a SQL View layer. In our solution, the Data model defines a virtual schema definition in a database schema named “cabig”., Then we created a set of SQL views with the same names and same columns as the UML Data model. The caCORE-generated web application interacts with these views as if they were tables. The web application cannot tell the difference. Under the covers, the view layer passes through the queries to the original base tables (managed by the non-caCORE application), and fixes up the differences along the way. We wrapped the cabig view definition scripts into a new module of LabKey/CPAS and included a small set of UI changes that configures and tests caBIG™ access for a given folder. Adapting an existing application to enable caBIG™ access proved to be relatively straightforward. Once we decided on the basic approach of running the caCORE SDK generated web application in parallel to LabKey/CPAS. In our design, the SDK generated application accesses the relational data through a set of views that handle some of the tricky mapping and security problems. The views also act as a buffer between the underlying base tables and the web application, allowing names to change in one place without affecting the other. The caCORE-like Web Application supports the read-only API libraries as if they were part of a standard, “pure” caCORE-like application. This is the basis of CPAS’ successful caBIG™ silver- level compliance validation. The view layer solves the issues described above: Security Integration: Since data access in LabKey/CPAS is granted on a folder-by-folder basis, we wanted to enable or disable caBIG access by folder. We added a single true/false “caBIGPublished” column to our existing core.Containers table. This bit is turned on and off by the “Publish” button accessible on a project’s Permissions page. The corresponding Containers view in the cabig schema includes the restriction “WHERE caBIGPublished=true”. All of the other view definitions in the cabig schema include an inner join to the cabig.Containers view. As a result, the caBIG interface sees only data in those containers that have been published. Data Model Compliance: Most of the underlying CPAS tables have a single integer primary key, but a few had two-column integer keys. To meet the caCORE’s requirement for a single column key, the SQL View definition includes a sum function: SELECT (( * op.propertyid)+op.objectid) AS id,.. As a second example, the PeptidesData table in CPAS is used to store score values from different search engines in generically- named “ScoreX” columns.. For caBIG, we chose to represent the scores for different engines as different objects (preserving the 1- to-1 paradigm). We handled this difference in the view layer by creating a view per search engine, with the appropriate filter. SQL database caCORE web application caCORE API Search Application LabKey/CPAS cabig Views Client API Data- Driven pages Script apps Figure 1 The caCORE development paradigm starts with the creation of Class and Data models in UML. Objects in these models have relationships and attributes that must be specified exactly for a successful build. At right is a combined Class and Data model diagram showing a small subset of the models developed to achieve caBIG silver compliance for LabKey/CPAS. Figure 2. The caCORE SDK Build process Figure 3. The caCORE runtime architecture Figure 4. The caCORE implementation for CPAS