December 2006 Federated Query Ian Fore, NCICBIIT David Ervin, Ohio State University Arch \ VCDE Face-to-Face Meeting Salt Lake City, UT January 29, 2008.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

CACORE TOOLS FEATURES. caCORE SDK Features caCORE Workbench Plugin EA/ArgoUML Plug-in development Integrated support of semantic integration in the plugin.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
The Relational Model and Relational Algebra Nothing is so practical as a good theory Kurt Lewin, 1945.
CaTissue 2.0 Architectural and Semantic Challenges June 10 th, 2011.
COMP 3715 Spring 05. Working with data in a DBMS Any database system must allow user to  Define data Relations Attributes Constraints  Manipulate data.
Database Systems: Design, Implementation, and Management Tenth Edition
CaGrid Service Metadata Scott Oster - Ohio State
Introduction to Structured Query Language (SQL)
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model.
3-1 Chapter 3 Data and Knowledge Management
Database Systems More SQL Database Design -- More SQL1.
Introduction to Structured Query Language (SQL)
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 7 Introduction to Structured Query Language (SQL)
Class Number – CS 304 Class Name - DBMS Instructor – Sanjay Madria Instructor – Sanjay Madria Lesson Title – ER Model.
Modern Systems Analysis and Design Third Edition
Chapter 3 Data Modeling Using the Entity- Relationship (ER) Model Dr. Bernard Chen Ph.D. University of Central Arkansas.
CSE314 Database Systems Data Modeling Using the Entity- Relationship (ER) Model Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
CaTissue Suite 2.0 Scope Detail TBPT Workspace call May 23, 2011.
Rationale Aspiring Database Developers should be able to efficiently query and maintain databases. This module will help students learn the Structured.
Information storage: Introduction of database 10/7/2004 Xiangming Mu.
Chapter 4 The Relational Model.
Object Query Language (OQL) and Language Binding
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Database System Concepts and Architecture Lecture # 2 21 June 2012 National University of Computer and Emerging Sciences.
CHAPTER 8: MANAGING DATA RESOURCES. File Organization Terms Field: group of characters that represent something Record: group of related fields File:
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology Khan younis.
Weave: An architecture for tailoring urban sensing applications across multiple sensor fabrics V. Kulathumani, M. Sridharan, R. Ramnath, A. Arora Dept.
Chapter 3 Data Modeling Using the Entity- Relationship (ER) Model Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2008.
Clinical Quality Language (CQL) Bryn Rhodes Chris Moesel Mark Kramer.
2Object-Oriented Analysis and Design with the Unified Process Objectives  Describe the differences and similarities between relational and object-oriented.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
Ashish Sharma, Tony Pan, Barla Cambazoglu, Joel Saltz Ohio State University, Columbus, OH (ashish, tpan, October 10, 2007 caBIG In Vivo.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
1 Advanced Databases (CM036) – Lecture # 12: The ODMG Standard for Object Databases Object Query Language (OQL) & Language Binding Advanced Database Technologies.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
CaGrid Overview and Core Services caGrid Knowledge Center February 2011.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.
The Client-Server Model And the Socket API. Client-Server (1) The datagram service does not require cooperation between the peer applications but such.
Data Access Framework (DAF) Relationship to Other ONC Initiatives 1.
1/18/00CSE 711 data mining1 What is SQL? Query language for structural databases (esp. RDB) Structured Query Language Originated from Sequel 2 by Chamberlin.
Adapting an Existing Data Service to be caBIG™ Silver-level Compliant Peter Hussey LabKey Software, Inc, Seattle, WA USA Contact: Abstract.
Mining real world data RDBMS and SQL. Index RDBMS introduction SQL (Structured Query language)
B. Information Technology (Hons.) CMPB245: Database Design Physical Design.
In Vivo Imaging Middleware and Applications RSNA 2007 Berkant Barla Cambazoglu The Ohio State University Department of Biomedical Informatics.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model.
Design for a High Performance, Configurable caGrid Data Services Platform Peter Hussey LabKey Software, Inc, Seattle, WA USA Contact:
7 1 Database Systems: Design, Implementation, & Management, 7 th Edition, Rob & Coronel 7.6 Advanced Select Queries SQL provides useful functions that.
1 © 2013 Cengage Learning. All Rights Reserved. This edition is intended for use outside of the U.S. only, with content that may be different from the.
Retrieving Information Pertemuan 3 Matakuliah: T0413/Current Popular IT II Tahun: 2007.
Big Data Yuan Xue CS 292 Special topics on.
Data Modeling Using the Entity- Relationship (ER) Model
Comp 1100 Entity-Relationship (ER) Model
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Business System Development
More SQL: Complex Queries,
Business System Development
Chapter 12 Outline Overview of Object Database Concepts
Chapter # 7 Introduction to Structured Query Language (SQL) Part II.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
The Relational Model Textbook /7/2018.
Chapter 7 Introduction to Structured Query Language (SQL)
Data Model.
Chapter 8 Advanced SQL.
Relational Model B.Ramamurthy 5/28/2019 B.Ramamurthy.
Presentation transcript:

December 2006 Federated Query Ian Fore, NCICBIIT David Ervin, Ohio State University Arch \ VCDE Face-to-Face Meeting Salt Lake City, UT January 29, 2008

Data Services Overview Part 1: Introduction (30 min) Overview CQL Current query capabilities provided by caGrid caTissue Object model Example caTissue Queries Part 2: Team Assignments (2 hours) Develop use cases for federated queries on caGrid Detail specific queries to be performed over specific services Categorize these use cases by: Queries that can be currently performed on caGrid Queries that are difficult to currently perform on caGrid Queries that can not be currently performed on caGrid Part 3: Team Briefings and Summary (30 min)

Query types Aggregate queries (count) Multiple queries on associations Hierarchical queries Temporal queries Returning data from more than one object Most of these have an issue in Silver and caGrid compatible systems

Query Layers Database Object Relational Mapping caCORE like API caGrid service SQL Hibernate CQL/DCQL Query by example, Hibernate

caTissue caCORE like API examples Demo of caTissue federated query using silver compatible API Overview of caGrid queries - CQL and DCQL

Sources of information caCORE SDK Programmers Reference Guide caCORE 3.2 Technical Guide

Aggregate queries (count) Using “group by” List the number of patients registered to and specimens collected on a protocol by protocol basis Using “having” Find all specimens which have been thawed more than three times

Count specimens by Protocol and Type

Count specimens by Protocol and Type - HQL select p.title, s.type, count(distinct r), count(s) from CollectionProtocol p join p.registrationCollection r join r.specimenCollectionGroupCollection g join g.specimenCollection s group by p.title, s.type ProtocolSpecimen typeNo registeredNo of specimens Trial onePlasma35 Woodward Colon Trial Frozen Tissue23 Woodward Colon Trial Plasma23

Multiple queries on associations Find specimens that were fixed in formalin 30 minutes or less and were embedded in low melting point paraffin

Fixation and embedding

Instance diagram Block Fixation event Specimen Event Embedded event

Fixation and embedding - HQL select block.label from TissueSpecimen block, FixedEventParameters fix, EmbeddedEventParameters embed where fix member of block.specimenEventCollection and embed member of block.specimenEventCollection and fix.durationInMinutes <= 30 and embed.embeddingMedium like '%Low%'

Hierarchical queries Find all RNA extracts derived from specimens where the tissue fixative was not formalin

Instance diagram Block Slide RNA Fixation Specimen Event

Specimen fow1 - formalin fixation

Specimen for1 - ethanol fixation

RNA extracts where block not formalin fixed - HQL select rna.label, fix.fixationType from MolecularSpecimen rna, FixedEventParameters fix join rna.parentSpecimen slide join slide.parentSpecimen block where fix member of block.specimenEventCollection and fix.fixationType not like '%formalin%' RNA specimen labelBlock fixed in frna1Ethanol, 70% frna2Ethanol, 70%

Temporal queries Find all specimens collected from participants older than 70 Find all specimen that were thawed for more than 20 minutes

Specimens where age at collection > 50

Specimens where age at collection > 50 : SQL SELECT c.birth_date, s.specimen_class, e.event_timestamp collection_time, datediff(e.event_timestamp,c.birth_date) "age in days at collection", e.event_timestamp-c.birth_date FROM catissue_participant c, catissue_coll_prot_reg r, catissue_specimen_coll_group g, catissue_specimen s, catissue_specimen_event_param e, catissue_coll_event_param ce where birth_date is not null and r.identifier = g.collection_protocol_reg_id and r.participant_id = c.identifier and s.specimen_collection_group_id = g.identifier and e.specimen_id = s.identifier and ce.identifier = e.identifier and e.event_timestamp < date_add(c.birth_date, INTERVAL 50 YEAR)

Specimens where age at collection > 50 : HQL select p, e from Participant p join p.collectionProtocolRegistrationCollection r join r.specimenCollectionGroupCollection g join g.specimenCollection s join s.specimenEventCollection e where e.timestamp - p.birthDate > (:ageinyears*365*24*60*60*1000)) select p, e from Participant p join p.collectionProtocolRegistrationCollection r join r.specimenCollectionGroupCollection g join g.specimenCollection s join s.specimenEventCollection e where (datediff(e.timestamp, p.birthDate) > (:ageinyears*365)) HQL + database specific SQL

Returning data from more than one object Illustrated in many of the examples above

Other issues Security Instance level authorization

Summary of Query Capabilities LayerLanguageAggregateTemporalAttributes from Multiple Objects Multiple Queries on Associations Hierarchical Queries Raw DBSQLYesYes (Design Dependent) Yes Yes (with known depth) ORMHQLYesSort-of (DB dependant SQL) Yes Yes (with known depth) SilverApplication Service HQL “” SilverApplication Service query by example No YesNoYes (with known object model) GoldCQL (caGrid 1.0+) SomeNo Yes (with groups) “” GoldCQL 2 (caGrid 2.0+) More?No“”

CQL 2 - New Query Capabilities Design addresses use cases from multiple sources TBPT IVI caGrid Users List CQL and DCQL CQL for single data service queries DCQL for federated queries CQL 2.0 In development now Targeted for caGrid 2.0

CQL 2 – Association Population “Find Studies matching some criteria, and return them with associated Patients who participated, but not the Investigator of the study” Population of associated objects CQL 1.0 can only return targeted data types CQL 2.0 allows population of associated types by name or depth Recursive definition to populate associations of associations Simplifies association retrieval CQL 1.0 required multiple queries based on identifier attributes Some associations impossible to resolve without bidirectional association definition

CQL 2 – Typed Attributes Typed Attribute Values Avoid confusion when passing typed data in query Date, Boolean, etc. conform to XML base data types Binary and Unary attributes Binary attributes have name, value, and predicate Equal, not equal, like, less than, greater than, less or equal, greater or equal Unary attribute have only name and predicate Is null, is not null

CQL 2 – Query Modifiers Query modifier restricts results Named attributes A list of named attributes of the target data type may be returned Distinct Attribute Single named attribute with distinct values Aggregations Min, max, and count of a named attribute

Federated Query Today Distributed aggregations Broadcast queries to multiple services Identical data types on each service Distributed joins Disparate data types on each service Potentially disparate data models

Federated Query Language (DCQL) DCQL derives from CQL Hierarchal approach Recursively defined Data Model drives the query Underlying data service domain models used CQL is context dependent Depend on data model of target service DCQL identifies context Identifies service each part of the query targets

Federated Query Language (DCQL) Foreign Association Describes a relationship between the containing object and an object on another data service Processing results in a new CQL subquery directed to the target data service Join Condition Specifies attribute relationship Local and remote attribute names Predicate (EQUAL, LESS_THAN, etc)

DCQL 2 DCQL builds on and extends CQL DCQL 2 will follow the same pattern Potential additional functionality for DCQL 2 Incorporation of new features of CQL 2 Returning data from multiple data types at top level Described via some join Use cases From TBPT From IVI select rna.label, fix.fixationType from MolecularSpecimen rna, FixedEventParameters fix join rna.parentSpecimen slide join slide.parentSpecimen block where fix member of block.specimenEventCollection and fix.fixationType not like '%formalin%'