Presentation is loading. Please wait.

Presentation is loading. Please wait.

ICR ASBP Analysis Service Parameters Proposal Draft proposal In alphabetical order… Brian Davis, Kiran Keshav, Ted Liefeld, Curt Lockshin, Patrick McConnell.

Similar presentations


Presentation on theme: "ICR ASBP Analysis Service Parameters Proposal Draft proposal In alphabetical order… Brian Davis, Kiran Keshav, Ted Liefeld, Curt Lockshin, Patrick McConnell."— Presentation transcript:

1 ICR ASBP Analysis Service Parameters Proposal Draft proposal In alphabetical order… Brian Davis, Kiran Keshav, Ted Liefeld, Curt Lockshin, Patrick McConnell. Martin Morgan, Sal Mungal, Jared Nedzel, Baris Suzek, Claire Wolfe, Nov 8, 2007

2 Analytic Services are DIFFERENT from Data Services 1 Differences Data Services e.g. caArray Long lifetimes Remain useful for many years May be extended/grow, but seldom disappear Grow/change slowly Few in number 10’s-100’s of services Analytic Services e.g. Hierarchical Clustering Short(er) lifetimes Replaced by newer algorithms or variants frequently E.g. Blast - 13 variants at http://www.ncbi.nlm.nih.gov/BLAST/ Change often Some GenePattern Algorithms have had >10 updates Parameters added/removed, implementations improved Many in number GenePattern+Bioconductor+geWorkbench have >400 between them

3 Analysis Services are DIFFERENT from Data Services 2 Class registration of input and output for caGrid supports (relatively stable) data services Data models have long lives Overhead of registration small compared to service implementation Registered classes remain valid for long period Geared towards supporting new services Starting from new data model to be put on the grid Analytic services - More of them, more variable, shorter lifespan Overhead of class registration a significant portion of development effort (Many) analytic services are preexisting GenePattern+BioConductor+geWorkbench have >400, ~9 on caGrid Developers must ‘go back’ to re-model the service parameters in caBIG way Parameters change often, each version may have different parameters Conclusion: need to modify registration process in caBIG to get more analytic services on caGrid

4 Process for analytic services Model reused classes (Solution: Service Loader) Modeling parameters SIW Roundtrip partially working for reused model (Solution: bug fixes in SIW 3.2.1 + Service Loader) Re-annotating reused classes (Solution: Service Loader) Annotation of parameter classes Reloading re- used classes (Solution: Service Loader) Loading parameter classes Outstanding issues in RED

5 caGrid and analytical services: steps in the Introduce toolkit Import dataypes Create operations Add Service Metadata and Domain Model Create Skeleton / Implement Methods XSD File caDSR Loading classes to caDSR/schemas to GME before development XSD generation - Wrong XSD using EA and caCORE SDK not used for Analytical Services Redefinition of interfaces/ operations modeled in EA XML File Annotation and tagging (CDE id) of parameter classes in EA (needs caDSR load) GME Outstanding issues in RED caDSR

6 Issues and Solutions: 1) Use specialized “service loader” and improvement in “Roundtrip” Issue 1- Model Reuse Significant time investment to reuse models. Hard to include in UML Round trip did not work well Required re-annotation, re-generation of XSDs Solution 1- Use New “Service Loader” to Register Reuse of Models Significantly reduce registration time (~2 developer FTE weeks) to register models reusing other models Replace with Service Loader based process Re-used Models not included in UML - unless modified/extended Register model re-use in introduce. Recorded in service metadata Service metadata submitted to Service Loader to record use in caDSR Prevents partially re-used model mismatch problems (eg GP/caArray/caB2B) NOTE: Still Need to “Test Drive” new process to ensure it works Created Demo Service to test using Service Loader process for model reuse Used as an example for new Analysis Service developers Used to provide scaffolding for developing white paper describing how to create analysis services

7 Exploring Further Solutions & Additional Time Savings: Parameters Issue 2 – Modeling and registration of parameters Parameters change frequently, requiring model changes, re-annotation, and loading into caDSR (3 Months?) Parameters, unlike input and output data classes, are not intended for semantic interoperability or reusability Parameters are not semantically rich or meaningfully annotatable Parameters meaningful only within the context of the service Solution 2 – Treat parameters differently Time savings estimated at ~1 developer FTE week effort per service over 2-3 calendar months Additional Curator FTE savings due to reduced model loading workload

8 Proposal - Generic Parameter Passing Model Use a generic parameter model to pass parameters to the services - Reuse model and allow Service Loader to register our model reuse - This model registered once, reused often Simple reusable metadata model facilitates auto-generation of Parameter metadata & service implementation

9 Proposal 2- Generic Parameters Metadata Model Extend caGrid Service Metadata (already supported) with Parameter metadata Model (as discussed with caGrid Team) All metadata is handled at caGrid level Draft model:

10 Exploring Further Solutions & Additional Time Savings: Parameters Issue 2 – Modeling and registration of parameters Parameters change frequently, requiring model changes, re-annotation, and loading into caDSR (3 Months?) Parameters, unlike input and output data classes, are not intended for semantic interoperability or reusability Parameters are not semantically rich or meaningfully annotatable Parameters meaningful only within the context of the service Solution 2 – Treat parameters differently Time savings estimated at ~1 developer FTE week effort per service over 2-3 calendar months Additional Curator FTE savings due to reduced model loading workload Generic Parameter Passing Model reused in Domain Model Generic Parameter Metadata Model to be in service metadata ONLY Enhanced service metadata to define parameters Parameters are NOT registered as CDEs in caDSR (not semantically annotated) The parameters are found in the index service

11 Pros and Cons Pros: SAVE TIME (~1 developer FTE week per service) More analytic services on caGrid/available to caBIG Actual parameters and descriptions of parameters are still available at Grid level No caDSR/GME registration dependency (if all classes are reused) Cons No parameter re-use No concept based-discovery of services No semantic interoperability based on parameters (is this likely, anyway?) No CDEs for parameters A different place to look for parameter metadata (not caDSR) Proposed model is not appropriate for non-caGrid services Could be adapted to support non-caGrid silver services

12 Time Savings from Adoption of Proposals ---Total Calendar Time Total Developer TimeComments No Change to Process 6-9 Month5 FTE Weeks (200 hours) Estimated time it took for Reference Implementations in 2006-7 (GenePattern, geWorkBench, Bioconductor) Adoption of Analytic Service Loader Proposal 3.5 Months3 FTE Weeks (120 hours) Parameters still need to be registered in domain model Adoption of Analytic Service Loader AND Parameter Proposal 1.5 Months2 Weeks (80 hours) Time for using service loader and using generic model proposal. Time to semantically annotate and grid-enable an Analytic Service

13 NEXT STEPS Suggested Next Steps Modifications to proposal based on input from NCI Create final draft of generic parameters proposal Meeting with caGrid team on Extension to Service Metamodel Presentation to Arch and VCDE WorkSpace’s Develop proof of concept services Test drive of Analytic Service Loader Test use of generic parameter services Register generic parameter models in caDSR

14 Appendix: additional slides Extra slides Not in any order

15 ASBP Meeting Logistics Meeting 2 nd Friday of every month Next Meeting: November 9th @ 2:00 pm EST Topics Continued development of demonstration service & white paper Analysis of CGEMs model for demo service (Hrishi) Follow-up about registration of Parameters liefeld@broad.mit.edu

16 Parameter Modeling Comparison Modeling one parameter set (below) Including all CADSR tags & stereotypes Clean up of tags, adding Concept codes Generating extended metadata using new model for GenePattern modules EA->Schema->jaxB->custom java code executeAnalysis java.lang.String reference gene accession from data file to find neighbors for true gene.accession 1 java.lang.String 50 number of neighbors to find true num.neighbors 2 …

17 Modeling Time Comparison ---Parameter SetGeneric Parameter metadata generation Time to model & first pass annotation 95 min~1 min Estimated registration & semantic annotation ?? Working days 2-3 calendar months 0 min # of parameters4270 # of value domains2125 # modules182 Estimate for all modules to draft introductory stage ~3 and a half person weeks + registration & XSD ~1 min One-time cost to create metadata-generation code <120 minutes Note: This does not include the semantic annotation or XSD editing which are typically the most time consuming portions of the process

18 caGrid Service Metadata From caDSR Registration

19 Current caGrid Parameter Modeling A current caGrid parameter representation Cons: Modeling, semantic annotation and caDSR registration –significant cost Pros: CDE based discovery Parameter information on caDSR


Download ppt "ICR ASBP Analysis Service Parameters Proposal Draft proposal In alphabetical order… Brian Davis, Kiran Keshav, Ted Liefeld, Curt Lockshin, Patrick McConnell."

Similar presentations


Ads by Google