JQL : The Java Query Language

JQL : The Java Query Language
Darren Willis, David J Pearce, James Noble

Object Querying JQL is a simple extension for Java.
JQL adds support for Object Querying. Object Querying helps us pick sets of objects from other sets. JQL can simplify working with collections.

Object Querying With JQL: Without JQL: for(Object a : collection1)
for(Object b : collection2) if(a.equals(b)) results.add(new Object[]{a,b}); With JQL: selectAll(Object a = collection1, Object b = collection :a.equals(b)); This is a nested loop, very common, as any self respecting programmer would have written. It loops through both collections, pairing objects together for the query expression. The JQL expression here does, semantically the same thing. It’s a select statement, similar to other query languages like SQL. It defines a variable a to range over collection1 and a variable b to range over collection2. it applies the expression ‘a.equals(b)’ to each pairing, and returns a java collection of the results.

Hypothetical IM Client
A Hypothetical Instant Messaging Client Session Window String name; int id; show_msg(Msg); enter_msg(); String name; send_msg(Msg); recv_msg(Msg); So, just for a simple example, here we have a couple of classes from a hypothetical instant messaging client. We’ve got session objects, to represent a particular chat session with one of our friends, and Window objects, to represent the graphical window we’ve got that actually shows our chats. So our window passes what we’ve typed over to the program, which sends it through to the Session, and when the Session gets a new message it passes it back and this gets sent to the appropriate window. Now, we’ve decoupled our Session from our Windows; they don’t know about each other. We’ve got a list of each of them, and quite commonly we’ll want to match Windows with a particular Session based on their ‘name’ field. This is what we might term, in databases, as ‘joining’ the sets on the ‘name’ field.

Querying : Nested Loop We're joining the sets of Sessions and Windows.
An easy way – the nested loop join: ArrayList results = new Arraylist(); for(Session s : sessions) for(Window w : windows) if(w.name.equals(s.name)) results.add(new Object[w,s]); This takes O(|sessions| * |windows|) time. So, like we say – we’re joining the sets of Buddies and Windows, on the ‘name’ field of each. So, first off, again here’s the easy, obvious solution – a nested loop join. So this version goes through all the buddies, and for each one it goes through all the windows, and tries to match them together. And it takes time on there order of the size of the sessions list by the size of the windows list. This isn’t necessarily too bad – but for large sets it can be a pain. Now, this is something we’re probably doing a lot in our program, so we want it to be fast. So, because we’re clever programmers, we investigate using another well-known join technique – the hash join.

Querying : Hash Join Another join technique, from databases:
ArrayList results = new ArrayList(); Hashtable nameIndex = new Hashtable(); for(Session s : sessions) nameIndex.put(s.name,s); } for(Window w : windows){ results.add(names.get(w.name)); (This is a simplified version) The hash join – used an awful lot in databases. This does the same thing as the Nested Loop from last slide; it’s another version of a solution to this. In brief, how it works: it first builds a hashtable, in which we store each session, keyed by their name. Once we’ve built this hashtable, getting the result requires simply scanning over the windows set and looking up the name for each window in the hashtable. So this version is more Order (|sessions| + |windows|), although there are constant factors that could prove important. We should note that this is a less obvious solution; a programmer who hasn’t heard of the hash join isn’t so likely to just come up with it.

Joining Dilemma A B A B A B
Now, we have a dilemma with these joins – which join do we use, and when? We might have sets A and B of roughly equal size here – which one do we hash? What about when sets A and B are both small – in this case, a nested loop join could be better (the overhead of setting up the hashtable can be prohibitive). Or what if A and B’s sizes change during the execution – we might have to change which one we hash, or even switch from using a hashtable back to a nested loop join. A B

JQL for Collection Operations
results = selectAll(Session s = sessions, Window w = windows: s.name.equals(w.name)); Let the Query Evaluator take care of the details! So when using JQL, this is all we write. We don’t need to supply details, we don’t need to try and guess about the size of the sets at runtime or what the best way to do it is. We let the query evaluator take care of the details!

Domain Variable Sources
Passing in a Domain Variable source selectAll(Window w = windows :w.ID == 5) Domain Variable Definition without a source selectAll(Window w : w.ID == 5) Now something we’re just drawing attention to here is that domain variables can be initialised with and without a collection as a source – if we give it one, it ranges over all objects from that collection, otherwise it ranges over all instances of objects of that class.

w.ID > 5 && w.ID < 10 && w.name == s.name && s.isActive() && ...
Query Expressions w.ID == 5 w.ID == 5 && w.name == s.name w.ID > 5 && w.ID < 10 && w.name == s.name && s.isActive() && ... Now here’s what the query expressions themselves look like. As you can see they can get quite complex – you can just keep chaining on ANDs. We do allow arbitraty method calls, as you can see here with the b.isActive()

Method calls are allowed Side effects are possible
Query Expressions Method calls are allowed Side effects are possible These expressions do not short circuit So what else can we do with queries? Some points to note about queries – as I mentioned, Method calls on the objects involved in the queries ARE allowed. However, we ignore the fact they they could have side effects. Due to the optimisations going on, it’d be very difficult to reliably say which objects the query methods will be called on, so caution is advised there. Also, as the optimiser re-orders query expressions depending on the size of the input sets etc, the query expressions do NOT short circuit – so you don’t need to be clever in how you specify the ordering of the expression, the evaluator can do that for you.

BinTree Aliasing So here’s another wee example of something you can do with queries that is quite neat. Here we have a bit of a bintree – a bunch of nodes, with left and right fields. We want to make sure there is no aliasing in the tree, which can lead to nasty bugs. Here’s an example of aliasing – now, if your tree wound up with this in it, you’d almost certainly hit an infinite loop at some point. Now, what’s going on in aliasing is that some node is ending up with more than one parent – so it’d be useful to check that that’s not happening in an assertion throughout our program. Unfortunately, it’s hard to write this concisely in an assert statement – we can’t quantify over the Binary Tree objects, so we can’t check them all, so we don’t have the information necessary.

A Query Invariant Uses Object Tracking to check all BinTrees.
assert null == selectA(BinTree a, BinTree b : (a != b && a.left == b.left) || (a != b && a.right == b.right)|| a.left == b.right); So, we use a query to do it! Okay, so this query is a little more complex, so I’ll just go through it. We take two domain variables, a and b – and you’ll note we don’t supply a source, so a and b are going to range over EVERY bintree object in the program. Then, we compare a and b’s trees, covering all the cases in which aliasing could happen. Now, you’ll notice we need to guard against the case that a == b, because both variables are ranging over ALL objects. Anyway, this gives us a relatively concise way to check our previous problem. We could possibly do this in Java with a static list of BinTree objects that we update whenever we create a new BinTree, but that’s quite a hassle, quite a mess, and hard to disable. Now, we provide, with JQL, an ASPECT to track the objects in the system. I’ll just describe it briefly now. Uses Object Tracking to check all BinTrees.

Object Tracking public class GroupChat{ private List participants;
public SomeClass(){ participants = new ArrayList(); ... } public void addChatter(String name){ participants.add(new Session(name)); Object Tracker So here’s the object tracker, in brief. It’s just an aspect that hangs around the program and intercepts calls to new. It then makes a WEAK reference to each object that is created and stores it in its own stores. We use WEAK references, so garbage collection is not interfered with.

Expression Ordering w.ID == 5 && w.name == s.name w.ID = 5
Queries get broken down along &&s w.ID == 5 && w.name == s.name Becomes two subqueries: w.ID = 5 w.name == s.name JQL arranges these into a query pipeline. For efficient execution of queries it’s important to break the query down into subqueries and tackle these piecemeal. We do this simply by splitting them along the conjunctions. So this query becomes two smaller queries, and JQL orders these in a ‘query pipeline’ in the evaluator.

A Query Pipeline Sessions Windows w.name == s.name w.ID == 5
Here’s a query pipeline, for this query below. It’s in two stages, separated with vertical dashed lines. The first stage is simply – you can see its taking input from the set ‘Windows’ and evaluating the expression “w.ID == 5” for those objects – it passes these results along to stage two, which takes the results from the previous stage and mixes them with the objects from the sessions set. Temp. Results w.ID == 5 && w.name == s.name

Ordering the Pipeline There are two factors to consider: Cost
Very dependent on input sizes Input sizes are dependent on selectivity… Selectivity - The proportion of results rejected by the stage - We estimate it based on the expression used: ==, < >, !=, aMethod() - Or, we can test it with sampling Now, the ordering of this pipeline is extremely important for an optimal query execution. We have to consider two main factors for ordering the pipeline – the cost of each join, and the selectivity for each join. The cost is dependent on the input of the join, while the selectivity depends on the expression for the join. The selectivity of a join influences the cost of joins further down the pipeline – so finding the optimal ordering is non-trivial. We can’t know the selectivity without executing the join, but we need it, so we estimate it. We have a heuristic, we use relative values for the expressions in the join. You can see here from most to least selective we have equals, comparisons, not equals, and arbitrary method calls. Alternatively, for large enough queries we have the option of using sampling, where we evaluate the join for 10 or so objects randomly chosen, and estimate the selectivity from that sample. This gives surprisingly useful results.

Configuring Join Order
Two strategies to decide Join Order: Exhaustive Search - Good results but expensive - Requires searching n! possible orderings Maximum Selectivity Heuristic - Less optimal order, less overhead. So we use two strategies to determine the optimal join ordering, once we have costs and selectivites for all the joins in the query. First the exhausitve search – this goes through all the possible combinations and figures out the total cost for the whole query. This gives very good results, but it’s expensive – there are n factorial possible orderings, after all. The second strategy is the maximum selectivity heuristic – it uses a simpler strategy, and simply does the most selective joins first. This is not necessarily the best idea, but it’s usually pretty good, and for long pipelines and small queries the time taken to find the optimal ordering could be excessive. So, with all this guff about performance, how do we actually do?

Querying Performance Four versions of: selectAll(Integer a=as,
Integer b=bs, Integer c=cs, Integer d=ds: a == b && b != c && c < d); A benchmark! This is an arbitrary little 3-stage query on 4 sets of Integer objects. The four lines correspond to 4 implementations of the query expression on the right (read it out). Two of these lines are JQL, and two are simulated ‘hand coded’ implementations. First, the black line here represents ‘handpoor’ – a poorly implemented hand coded solution. This is just four nested loops with some short circuiting, so its pretty much n to the four. The orange line down the bottom here is the other hand-coded version – this is ‘hand opt’, an optimised version that uses the optimal join ordering, a sorted join and a hash join, and achieves the very best possible result. As you can see it does MUCH better than handpoor. Now, the blue and the green lines here are JQL doing this query, using the two query ordering strategies from before. The green is the exhaustive, the blue is the maximum selectivity. Now, they’re very close – it seems the exhaustive search has found a more optimal ordering than the maximum selectivity, so in this case the overhead has paid off. Now, we think that this performance is pretty good – the optimal version is doing better, but JQL is far far better than the hand-poor implementation.

HANDOPT Implementation
for(Integer i1 : grp) { int a=i1; for(Integer i3 : array3) { int c=i3; if(b != c) { for(int x=array4.size();x!=0;x=x1){ int d=array4.get(x-1); if(c<d){ Object[] t = new Object[4]; t[0]=a; t[1]=b; t[2]=c; t[3]=d; matches.add(t); } else { break; } } return matches; HashMap<Integer,ArrayList<Integer>> map; map = new HashMap<Integer,ArrayList<Integer>>( ); for(Integer i1 : array1) { ArrayList<Integer> grp = map.get(i1); if(grp == null) { grp = new ArrayList<Integer>(); map.put(i1,grp); } grp.add(i1); ArrayList<Object[]> matches = new ArrayList<Object[]>(); Collections.sort(array4); for(Integer i2 : array2) { int b=i2; ArrayList<Integer> grp = map.get(i2); if(grp != null) { So what we’ve got here is the implementation of this HANDOPT benchmark. Don’t worry about reading all this, it’s up here merely to show just how much you need to type to get this kind of performance. And, to compare:

Performance Discussion
selectAll(Integer a=as, Integer b=bs, Integer c=cs, Integer d=ds : a==b && b!=c && c < d); JQL's performance is pretty good! The average programmer is more likely to code HANDPOOR than HANDOPT.

Object Tracker Performance
Object tracker overhead varies greatly.

Other Querying Systems
Cω/LINQ(C#) Provide similar querying capabilities. Query-Based Debuggers QBD, DQBD by Lencevicius et al. Used querying and object tracking for debugging. Relational Databases Much prior work in query optimisation.

In Development/Consideration
Caching Both simple, and incrementalised Other join methods Again, from databases – Merge, Sort, etc Tool Support Eclipse/Netbeans/etc

Conclusions Queries are neat
They are a powerful, simple, useful abstraction. JQL provides efficient and easy support for querying in Java. For more information on JQL:

JQL : The Java Query Language

Similar presentations

Presentation on theme: "JQL : The Java Query Language"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

JQL : The Java Query Language

Similar presentations

Presentation on theme: "JQL : The Java Query Language"— Presentation transcript:

Similar presentations

About project

Feedback