Download presentation
Presentation is loading. Please wait.
Published bySheena Harmon Modified over 8 years ago
1
December 2006 Federated Query Ian Fore, NCICBIIT David Ervin, Ohio State University Arch \ VCDE Face-to-Face Meeting Salt Lake City, UT January 29, 2008
2
Data Services Overview Part 1: Introduction (30 min) Overview CQL Current query capabilities provided by caGrid caTissue Object model Example caTissue Queries Part 2: Team Assignments (2 hours) Develop use cases for federated queries on caGrid Detail specific queries to be performed over specific services Categorize these use cases by: Queries that can be currently performed on caGrid Queries that are difficult to currently perform on caGrid Queries that can not be currently performed on caGrid Part 3: Team Briefings and Summary (30 min)
3
Query types Aggregate queries (count) Multiple queries on associations Hierarchical queries Temporal queries Returning data from more than one object Most of these have an issue in Silver and caGrid compatible systems
4
Query Layers Database Object Relational Mapping caCORE like API caGrid service SQL Hibernate CQL/DCQL Query by example, Hibernate
5
caTissue caCORE like API examples Demo of caTissue federated query using silver compatible API Overview of caGrid queries - CQL and DCQL
6
Sources of information caCORE SDK Programmers Reference Guide caCORE 3.2 Technical Guide
8
Aggregate queries (count) Using “group by” List the number of patients registered to and specimens collected on a protocol by protocol basis Using “having” Find all specimens which have been thawed more than three times
9
Count specimens by Protocol and Type
10
Count specimens by Protocol and Type - HQL select p.title, s.type, count(distinct r), count(s) from CollectionProtocol p join p.registrationCollection r join r.specimenCollectionGroupCollection g join g.specimenCollection s group by p.title, s.type ProtocolSpecimen typeNo registeredNo of specimens Trial onePlasma35 Woodward Colon Trial Frozen Tissue23 Woodward Colon Trial Plasma23
11
Multiple queries on associations Find specimens that were fixed in formalin 30 minutes or less and were embedded in low melting point paraffin
12
Fixation and embedding
13
Instance diagram Block Fixation event Specimen Event Embedded event
14
Fixation and embedding - HQL select block.label from TissueSpecimen block, FixedEventParameters fix, EmbeddedEventParameters embed where fix member of block.specimenEventCollection and embed member of block.specimenEventCollection and fix.durationInMinutes <= 30 and embed.embeddingMedium like '%Low%'
15
Hierarchical queries Find all RNA extracts derived from specimens where the tissue fixative was not formalin
17
Instance diagram Block Slide RNA Fixation Specimen Event
18
Specimen fow1 - formalin fixation
19
Specimen for1 - ethanol fixation
20
RNA extracts where block not formalin fixed - HQL select rna.label, fix.fixationType from MolecularSpecimen rna, FixedEventParameters fix join rna.parentSpecimen slide join slide.parentSpecimen block where fix member of block.specimenEventCollection and fix.fixationType not like '%formalin%' RNA specimen labelBlock fixed in frna1Ethanol, 70% frna2Ethanol, 70%
21
Temporal queries Find all specimens collected from participants older than 70 Find all specimen that were thawed for more than 20 minutes
22
Specimens where age at collection > 50
23
Specimens where age at collection > 50 : SQL SELECT c.birth_date, s.specimen_class, e.event_timestamp collection_time, datediff(e.event_timestamp,c.birth_date) "age in days at collection", e.event_timestamp-c.birth_date FROM catissue_participant c, catissue_coll_prot_reg r, catissue_specimen_coll_group g, catissue_specimen s, catissue_specimen_event_param e, catissue_coll_event_param ce where birth_date is not null and r.identifier = g.collection_protocol_reg_id and r.participant_id = c.identifier and s.specimen_collection_group_id = g.identifier and e.specimen_id = s.identifier and ce.identifier = e.identifier and e.event_timestamp < date_add(c.birth_date, INTERVAL 50 YEAR)
24
Specimens where age at collection > 50 : HQL select p, e from Participant p join p.collectionProtocolRegistrationCollection r join r.specimenCollectionGroupCollection g join g.specimenCollection s join s.specimenEventCollection e where e.timestamp - p.birthDate > (:ageinyears*365*24*60*60*1000)) select p, e from Participant p join p.collectionProtocolRegistrationCollection r join r.specimenCollectionGroupCollection g join g.specimenCollection s join s.specimenEventCollection e where (datediff(e.timestamp, p.birthDate) > (:ageinyears*365)) HQL + database specific SQL
25
Returning data from more than one object Illustrated in many of the examples above
26
Other issues Security Instance level authorization
27
Summary of Query Capabilities LayerLanguageAggregateTemporalAttributes from Multiple Objects Multiple Queries on Associations Hierarchical Queries Raw DBSQLYesYes (Design Dependent) Yes Yes (with known depth) ORMHQLYesSort-of (DB dependant SQL) Yes Yes (with known depth) SilverApplication Service HQL “” SilverApplication Service query by example No YesNoYes (with known object model) GoldCQL (caGrid 1.0+) SomeNo Yes (with groups) “” GoldCQL 2 (caGrid 2.0+) More?No“”
28
CQL 2 - New Query Capabilities Design addresses use cases from multiple sources TBPT IVI caGrid Users List CQL and DCQL CQL for single data service queries DCQL for federated queries CQL 2.0 In development now Targeted for caGrid 2.0
29
CQL 2 – Association Population “Find Studies matching some criteria, and return them with associated Patients who participated, but not the Investigator of the study” Population of associated objects CQL 1.0 can only return targeted data types CQL 2.0 allows population of associated types by name or depth Recursive definition to populate associations of associations Simplifies association retrieval CQL 1.0 required multiple queries based on identifier attributes Some associations impossible to resolve without bidirectional association definition
30
CQL 2 – Typed Attributes Typed Attribute Values Avoid confusion when passing typed data in query Date, Boolean, etc. conform to XML base data types Binary and Unary attributes Binary attributes have name, value, and predicate Equal, not equal, like, less than, greater than, less or equal, greater or equal Unary attribute have only name and predicate Is null, is not null
31
CQL 2 – Query Modifiers Query modifier restricts results Named attributes A list of named attributes of the target data type may be returned Distinct Attribute Single named attribute with distinct values Aggregations Min, max, and count of a named attribute
32
Federated Query Today Distributed aggregations Broadcast queries to multiple services Identical data types on each service Distributed joins Disparate data types on each service Potentially disparate data models
33
Federated Query Language (DCQL) DCQL derives from CQL Hierarchal approach Recursively defined Data Model drives the query Underlying data service domain models used CQL is context dependent Depend on data model of target service DCQL identifies context Identifies service each part of the query targets
34
Federated Query Language (DCQL) Foreign Association Describes a relationship between the containing object and an object on another data service Processing results in a new CQL subquery directed to the target data service Join Condition Specifies attribute relationship Local and remote attribute names Predicate (EQUAL, LESS_THAN, etc)
35
DCQL 2 DCQL builds on and extends CQL DCQL 2 will follow the same pattern Potential additional functionality for DCQL 2 Incorporation of new features of CQL 2 Returning data from multiple data types at top level Described via some join Use cases From TBPT From IVI select rna.label, fix.fixationType from MolecularSpecimen rna, FixedEventParameters fix join rna.parentSpecimen slide join slide.parentSpecimen block where fix member of block.specimenEventCollection and fix.fixationType not like '%formalin%'
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.