Presentation is loading. Please wait.

Presentation is loading. Please wait.

SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International.

Similar presentations


Presentation on theme: "SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International."— Presentation transcript:

1 SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International August 18, 2009

2 1 SRI International Bioinformatics Introduction BioVelo is a query language Like SQL but simpler and easier to learn Documentation: http://biocyc.org/bioveloLanguage.html Free-Form Advanced Query Page allows Web submission of BioVelo queries Structured Advanced Query Page (SAQP)‏ Web page for interactively constructing advanced and precise queries to PGDBs Queries are translated to BioVelo and sent to the server for processing SAQP: http://biocyc.org/query.html Documentation: http://biocyc.org/webQueryDoc.html

3 SRI International Bioinformatics 1 Why a query interface? Allow a structured way to access the rich data representation stored in a PGDB. Most advanced databases have a high-level, declarative method of access (i.e., SQL). Provides an intermediate level of access between graphically browsing the PGDB and programmatically processing the data using Lisp.

4 SRI International Bioinformatics 1 The Structured Advanced Query Page 'Advanced', in that it allows you to ask more advanced and complicated queries than the basic search interface. In other words, the SAQP allows you to search for a precise set of answers given simple or complex conditions 'Structured', in that it is a dynamic HTML form, that provides greater ease in crafting queries, but trades flexibility and power for simplicity (FFAQP). 'Page', in that it is accessed via the Web interface for BioCyc (www.biocyc.org/query.html), or from your own Pathway Tools Web server.

5 SRI International Bioinformatics 1 SAQP Architecture The SAQP is built on top of a high-level functional declarative language called BioVelo which is built on top of Pathway Tools. On every result page, you will see the equivalent BioVelo code that was generated from the SAQP, which, in turn, generated the results. You don't need to know anything about BioVelo to use the SAQP, but it might be helpful later if you need the ability to write even more complicated queries using the Free Form Advanced Query Page (FFAQP).

6 SRI International Bioinformatics 1 The Structure of the SAQP: Database specification Class specification 'Where' constraints on attributes of classes Output attributes description Data format (HTML vs TXT)‏

7 SRI International Bioinformatics 1 Example #1: A simple query usually consists of querying a particular database about a particular class. Find all the proteins in E. coli K-12. Display the protein names.

8 SRI International Bioinformatics 1 Structure of the Results A line that shows the equivalent BioVelo expression that the SAQP generated to answer the query. A HTML table of the results, with the corresponding entries hyperlinked to the matching Pathway Tools Web pages. If a text data format was requested, then a tab- delimited text file is generated, with just the table data.

9 SRI International Bioinformatics 1 Example #2: Find all the proteins of E. coli K-12 for which the DNA- FOOTPRINT-SIZE is smaller than 10. Display the protein name, and the DNA footprint size.

10 SRI International Bioinformatics 1 Example #3: In EcoCyc, display polypeptides constrained by experimentally determined molecular weight and isoelectric point. The experimental molecular weight should be between 50 and 100 kD. The pI should be less than 7. Display the polypeptide name, the experimental molecular weight, and the pI.

11 SRI International Bioinformatics 1 Example #4: The SAQP allows for specifying quantifiers on relations between PGDB classes. Extending example #3, now we want only proteins where at least one of the genes that encodes the protein to be within the first 500 kilobases of the E. coli chromosome.

12 SRI International Bioinformatics 1 Example #5: Queries with Several Components A second search component will search potentially another database and another class of objects for each element found in the first search component. It is called a 'cross-product' search. Any number of search components can be added. In general, the new search component is done for each set of objects found in the previous components. Some restraints is needed not to build a query that takes too long to answer. (The server gives a limit of a few minutes for a query.)‏ Example: Search for MetaCyc pathways in the taxonomic range of Bacteria that also exist in E. coli K- 12 using the common-name attribute.

13 SRI International Bioinformatics 1 Introduction to BioVelo BioVelo is based on set and list comprehension. In Mathematics, a set comprehension describes a set of values as in: {x | x in Prime, x > 100} The output is 'x', the body has a generator 'x in Prime' and a condition 'x > 100'. Several conditions and several generators could be used. BioVelo used a concise syntax: 1) [ output-expression : generator, condition,... ] 2) a generator has the form v ← database^^class 3) a condition uses logical and relational operators

14 SRI International Bioinformatics 1 Syntax of BioVelo (the big picture) 1. [ head-output : generators, conditions, … ] 2. { head-output : generators, conditions, …} 3. The comma can be read as “and”. 4. Head-output is a single expression or a tuple of expressions: (exp1, exp2, …, expn) 5. To get objects from database: orgid ^^ class-name 6. Typical generator: var <- ecoli^^proteins 7. To access an attribute value: Object ^ attribute 8. Conditions are formed with variables, constants, logical and relational operators. 9. Special biological functions: reaction-to-genes, enzyme-to-genes, pathway-to-reactions, etc.

15 SRI International Bioinformatics 1 Examples of BioVelo Queries [r : r <- ecoli^^reactions] [p^name : p <- ecoli^^proteins] [p^?name : p<- ecoli^^proteins] [p^?name : p <- ecoli^^proteins, p^dna-footprint-size < 10] [(g^?name, g^left-end-position): g <- ecoli^^genes, g^left-end-position < 153000]  [(g^?name, k): g<- ecoli^^genes, k := abs(g^left-end-position – g^right-end-position)+1, k < 200 ] [(r^?name, c^?name) : r<- ecoli^^reactions, c<- r^left, c in r^right]

16 SRI International Bioinformatics 1 BioVelo Grammar in EBNF

17 SRI International Bioinformatics 1 BioVelo, Table of Operators (1)

18 SRI International Bioinformatics 1 BioVelo, Table of Operators (2)

19 SRI International Bioinformatics 1 BioVelo, Special Functions

20 SRI International Bioinformatics 1 BioVelo and the Free Form Advanced Query Page (FFAQP) Any BioVelo query can be entered at the FFAQP There is an interactive online documentation at the FFAQP The FFAQP can be reached from the Search->Advanced command Menu Bar via the SAQP by clicking the button Switch to Free Form Advanced Query Page. Here is a demo…

21 SRI International Bioinformatics 1 BioVelo from Pathway Tools (Desktop) BioVelo queries can be executed from the Lisp prompt by using the bv Lisp function. The query is given as a Lisp string to bv. The result is displayed and put on the answer-list. Examples: 1. (bv “[ r : r 4]”) 2. (bv “[(g,s) : g <-ecoli^^genes, s <- g^synonyms, s ~= \”b0[0-9]\”]”)

22 SRI International Bioinformatics 1 BioVelo from Pathway Tools (Desktop) The result can be further stored or manipulated using Lisp. Objects are always returned as frame-structures, not frame-ids, so that multiple databases can be handled without worrying about orgids (database identifiers). But, not all functions in Pathway Tools take frame structures at “face value”. The current selected organism must match the frame-structure database.


Download ppt "SRI International Bioinformatics 1 The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International."

Similar presentations


Ads by Google