DAS Advance Search and its prototype implementation in MyDas Gustavo Adolfo Salazar Orejuela Supervised by: Nicola Mulder Henning Hermjakob DAS workshop
Outline 1. Distributed Annotation System 2. Problem Definition 3. Proposed Solution 1. Advance query 2. DAS Query Language 3. Response 4. Prototype Implementation 4. Future Work 5. Acknowledgements
DAS Distributed Annotation System
Problem Definition ?
Proposed Solution Advance query query: a new argument for the features command should be added, so now the request of this command is defined as: SERVER/das/DSN/features? [;segment=RANGE] [;type=TYPE] [;category=CATEGORY] [;feature_id=ID] [;maxbins=BINS] [;query=DASQUERY]
Proposed Solution DAS Query Language Based in LUCENE, A query is broken into terms and operators: Terms E.g. "alpha helix" Fields E.g. type Condition: E.g. type:”alpha helix” Terms Modifiers E.g. type:alpha* Operands E.g. typeCvId:CV:00001 AND featureLabel:"one Feature" Grouping (typeCvId:CV:00001 AND featureLabel:"one Feature") OR typeId:twoFeatureTypeIdOne
Proposed Solution DAS Query Language Defined Fields: featureId, featureLabel, segmentId, segmentLabel, segmentStart, segmentStop, typeId, typeCvId, typeLabel, typeCategory, type, methodId, methodCvId, methodLabel, method, start, stop, score, orientation, phase, note, link, target, parent, part, all Reporting capability
Proposed Solution Response The document returned from the features request does not have to be extended because it supports to have more than one segment in it. From the relax-ng …
Proposed Solution Prototype Implementation MyDas has been extended to support this capability, but so far it is still a beta version, just downloadable through SVN: SNAPSHOT LUCENE is used to create an index of the data source to add the advance search capability The capability entry_point is required to be able of going through all the features per each entry point. LUCENE is also used to construct the data source. This may cause a sync error with the data, but it avoids the requirement of the feature_id capability
Proposed Solution Prototype Implementation
Proposed Solution Data Source - Uniprot 2 Probes It maps microarrays probes to UniprotKB accession numbers srv/das/uniprot2probes/features?segment=Q58EV5 srv/das/uniprot2probes/features?segment=Q58EV5 srv/das/uniprot2probes/features?query=segmentId:Q58EV5 srv/das/uniprot2probes/features?query=segmentId:Q58EV5 srv/das/uniprot2probes/features?feature_id=234173_s_at.affy_hg_u133b srv/das/uniprot2probes/features?feature_id=234173_s_at.affy_hg_u133b srv/das/uniprot2probes/features?query=featureId:234173_s_at.affy_hg_u133b srv/das/uniprot2probes/features?query=featureId:234173_s_at.affy_hg_u133b srv/das/uniprot2probes/features?query=typeLabel:affy_mouse430a_2 srv/das/uniprot2probes/features?query=typeLabel:affy_mouse430a_2 srv/das/uniprot2probes/features?query=typeLabel:affy_mouse430a_2%20AN D%20featureId: * srv/das/uniprot2probes/features?query=typeLabel:affy_mouse430a_2%20AN D%20featureId: *
Future work Pagination of the feature command Clients using this capability. JsDas + Advance Search + Uniprot2probes Proserver implementation?
Acknowledgments Supervisors Doctor Nicola Mulder Henning Hermjakob University of Cape Town CBIO laboratory EBI Rafael Jimenez Andy Jenkinson DAS Communty Jonathan Warren
Questions??