JQL : The Java Query Language

Slides:



Advertisements
Similar presentations
Heuristic Search techniques
Advertisements

CS 11 C track: lecture 7 Last week: structs, typedef, linked lists This week: hash tables more on the C preprocessor extern const.
Introduction to Recursion and Recursive Algorithms
JQL : The Java Query Language Darren Willis, David J Pearce, James Noble.
Using the Optimizer to Generate an Effective Regression Suite: A First Step Murali M. Krishna Presented by Harumi Kuno HP.
Trees. 2 Definition of a tree A tree is like a binary tree, except that a node may have any number of children Depending on the needs of the program,
28-Jun-15 Recognizers. 2 Parsers and recognizers Given a grammar (say, in BNF) and a string, A recognizer will tell whether the string belongs to the.
29-Jun-15 Recursion. 2 Definitions I A recursive definition is a definition in which the thing being defined occurs as part of its own definition Example:
CS 106 Introduction to Computer Science I 10 / 15 / 2007 Instructor: Michael Eckmann.
CHAPTER 10 Recursion. 2 Recursive Thinking Recursion is a programming technique in which a method can call itself to solve a problem A recursive definition.
CS 106 Introduction to Computer Science I 10 / 16 / 2006 Instructor: Michael Eckmann.
Passing Other Objects Strings are called immutable which means that once a String object stores a value, it never changes –recall when we passed a message.
1 COP 3538 Data Structures with OOP Chapter 8 - Part 2 Binary Trees.
Chapter Twenty-ThreeModern Programming Languages1 Formal Semantics.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Data Structures R e c u r s i o n. Recursive Thinking Recursion is a problem-solving approach that can be used to generate simple solutions to certain.
CMP-MX21: Lecture 4 Selections Steve Hordley. Overview 1. The if-else selection in JAVA 2. More useful JAVA operators 4. Other selection constructs in.
The Java Query Language Darren Willis, David J. Pearce and James Noble Victoria University of Wellington, New Zealand.
CSE 143 Lecture 13 Recursive Backtracking slides created by Ethan Apter
Copyright © 2014 Curt Hill Algorithms From the Mathematical Perspective.
11 Making Decisions in a Program Session 2.3. Session Overview  Introduce the idea of an algorithm  Show how a program can make logical decisions based.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
LECTURE 9 CS203. Execution Time Suppose two algorithms perform the same task such as search (linear search vs. binary search) and sorting (selection sort.
Arrays Chapter 7.
CSC 108H: Introduction to Computer Programming
Sort & Search Algorithms
Week 13: Searching and Sorting
The greatest mathematical tool of all!!
Database Management System
Introduction To Repetition The for loop
C++ coding standard suggestion… Separate reasoning from action, in every block. Hi, this talk is to suggest a rule (or guideline) to simplify C++ code.
Testing and Debugging.
Debugging and Random Numbers
Trees.
Basic SQL Lecture 6 Fall
Control Structures – Selection
Quicksort 1.
Recursion 12-Nov-18.
Problem Solving: Brute Force Approaches
Stacks.
Writing Methods AP Computer Science A.
Binary Search Trees One of the tree applications in Chapter 10 is binary search trees. In Chapter 10, binary search trees are used to implement bags.
Objective of This Course
Fundamentals of Programming
Recursion 2-Dec-18.
Recursion 2-Dec-18.
Recursion 29-Dec-18.
Coding Concepts (Basics)
Selection Insertion and Merge
Type & Typeclass Syntax in function
Lecture 2- Query Processing (continued)
Guest Lecture by David Johnston
Stacks.
Trees.
Advanced Implementation of Tables
Quicksort.
Recursion Taken from notes by Dr. Neil Moore
CSC 143 Java Linked Lists.
Recursion 23-Apr-19.
Quicksort.
CMPE212 – Reminders Assignment 2 due next Friday.
slides created by Ethan Apter
ECE 352 Digital System Fundamentals
CSE 326: Data Structures Lecture #14
326 Lecture 9 Henry Kautz Winter Quarter 2002
CMSC201 Computer Science I for Majors Lecture 12 – Program Design
Quicksort.
Lecture 27: Array Disjoint Sets
Lecture 26: Array Disjoint Sets
Presentation transcript:

JQL : The Java Query Language Darren Willis, David J Pearce, James Noble

Object Querying JQL is a simple extension for Java. JQL adds support for Object Querying. Object Querying helps us pick sets of objects from other sets. JQL can simplify working with collections.

Object Querying With JQL: Without JQL: for(Object a : collection1) for(Object b : collection2) if(a.equals(b)) results.add(new Object[]{a,b}); With JQL: selectAll(Object a = collection1, Object b = collection2 :a.equals(b)); This is a nested loop, very common, as any self respecting programmer would have written. It loops through both collections, pairing objects together for the query expression. The JQL expression here does, semantically the same thing. It’s a select statement, similar to other query languages like SQL. It defines a variable a to range over collection1 and a variable b to range over collection2. it applies the expression ‘a.equals(b)’ to each pairing, and returns a java collection of the results.

Hypothetical IM Client A Hypothetical Instant Messaging Client Session Window String name; int id; show_msg(Msg); enter_msg(); String name; send_msg(Msg); recv_msg(Msg); So, just for a simple example, here we have a couple of classes from a hypothetical instant messaging client. We’ve got session objects, to represent a particular chat session with one of our friends, and Window objects, to represent the graphical window we’ve got that actually shows our chats. So our window passes what we’ve typed over to the program, which sends it through to the Session, and when the Session gets a new message it passes it back and this gets sent to the appropriate window. Now, we’ve decoupled our Session from our Windows; they don’t know about each other. We’ve got a list of each of them, and quite commonly we’ll want to match Windows with a particular Session based on their ‘name’ field. This is what we might term, in databases, as ‘joining’ the sets on the ‘name’ field.

Querying : Nested Loop We're joining the sets of Sessions and Windows. An easy way – the nested loop join: ArrayList results = new Arraylist(); for(Session s : sessions) for(Window w : windows) if(w.name.equals(s.name)) results.add(new Object[w,s]); This takes O(|sessions| * |windows|) time. So, like we say – we’re joining the sets of Buddies and Windows, on the ‘name’ field of each. So, first off, again here’s the easy, obvious solution – a nested loop join. So this version goes through all the buddies, and for each one it goes through all the windows, and tries to match them together. And it takes time on there order of the size of the sessions list by the size of the windows list. This isn’t necessarily too bad – but for large sets it can be a pain. Now, this is something we’re probably doing a lot in our program, so we want it to be fast. So, because we’re clever programmers, we investigate using another well-known join technique – the hash join.

Querying : Hash Join Another join technique, from databases: ArrayList results = new ArrayList(); Hashtable nameIndex = new Hashtable(); for(Session s : sessions) nameIndex.put(s.name,s); } for(Window w : windows){ results.add(names.get(w.name)); (This is a simplified version) The hash join – used an awful lot in databases. This does the same thing as the Nested Loop from last slide; it’s another version of a solution to this. In brief, how it works: it first builds a hashtable, in which we store each session, keyed by their name. Once we’ve built this hashtable, getting the result requires simply scanning over the windows set and looking up the name for each window in the hashtable. So this version is more Order (|sessions| + |windows|), although there are constant factors that could prove important. We should note that this is a less obvious solution; a programmer who hasn’t heard of the hash join isn’t so likely to just come up with it.

Joining Dilemma A B A B A B Now, we have a dilemma with these joins – which join do we use, and when? We might have sets A and B of roughly equal size here – which one do we hash? What about when sets A and B are both small – in this case, a nested loop join could be better (the overhead of setting up the hashtable can be prohibitive). Or what if A and B’s sizes change during the execution – we might have to change which one we hash, or even switch from using a hashtable back to a nested loop join. A B

JQL for Collection Operations results = selectAll(Session s = sessions, Window w = windows: s.name.equals(w.name)); Let the Query Evaluator take care of the details! So when using JQL, this is all we write. We don’t need to supply details, we don’t need to try and guess about the size of the sets at runtime or what the best way to do it is. We let the query evaluator take care of the details!

Domain Variable Sources Passing in a Domain Variable source selectAll(Window w = windows :w.ID == 5) Domain Variable Definition without a source selectAll(Window w : w.ID == 5) Now something we’re just drawing attention to here is that domain variables can be initialised with and without a collection as a source – if we give it one, it ranges over all objects from that collection, otherwise it ranges over all instances of objects of that class.

w.ID > 5 && w.ID < 10 && w.name == s.name && s.isActive() && ... Query Expressions w.ID == 5 w.ID == 5 && w.name == s.name w.ID > 5 && w.ID < 10 && w.name == s.name && s.isActive() && ... Now here’s what the query expressions themselves look like. As you can see they can get quite complex – you can just keep chaining on ANDs. We do allow arbitraty method calls, as you can see here with the b.isActive()

Method calls are allowed Side effects are possible Query Expressions Method calls are allowed Side effects are possible These expressions do not short circuit So what else can we do with queries? Some points to note about queries – as I mentioned, Method calls on the objects involved in the queries ARE allowed. However, we ignore the fact they they could have side effects. Due to the optimisations going on, it’d be very difficult to reliably say which objects the query methods will be called on, so caution is advised there. Also, as the optimiser re-orders query expressions depending on the size of the input sets etc, the query expressions do NOT short circuit – so you don’t need to be clever in how you specify the ordering of the expression, the evaluator can do that for you.

BinTree Aliasing So here’s another wee example of something you can do with queries that is quite neat. Here we have a bit of a bintree – a bunch of nodes, with left and right fields. We want to make sure there is no aliasing in the tree, which can lead to nasty bugs. Here’s an example of aliasing – now, if your tree wound up with this in it, you’d almost certainly hit an infinite loop at some point. Now, what’s going on in aliasing is that some node is ending up with more than one parent – so it’d be useful to check that that’s not happening in an assertion throughout our program. Unfortunately, it’s hard to write this concisely in an assert statement – we can’t quantify over the Binary Tree objects, so we can’t check them all, so we don’t have the information necessary.

A Query Invariant Uses Object Tracking to check all BinTrees. assert null == selectA(BinTree a, BinTree b : (a != b && a.left == b.left) || (a != b && a.right == b.right)|| a.left == b.right); So, we use a query to do it! Okay, so this query is a little more complex, so I’ll just go through it. We take two domain variables, a and b – and you’ll note we don’t supply a source, so a and b are going to range over EVERY bintree object in the program. Then, we compare a and b’s trees, covering all the cases in which aliasing could happen. Now, you’ll notice we need to guard against the case that a == b, because both variables are ranging over ALL objects. Anyway, this gives us a relatively concise way to check our previous problem. We could possibly do this in Java with a static list of BinTree objects that we update whenever we create a new BinTree, but that’s quite a hassle, quite a mess, and hard to disable. Now, we provide, with JQL, an ASPECT to track the objects in the system. I’ll just describe it briefly now. Uses Object Tracking to check all BinTrees.

Object Tracking public class GroupChat{ private List participants; public SomeClass(){ participants = new ArrayList(); ... } public void addChatter(String name){ participants.add(new Session(name)); Object Tracker So here’s the object tracker, in brief. It’s just an aspect that hangs around the program and intercepts calls to new. It then makes a WEAK reference to each object that is created and stores it in its own stores. We use WEAK references, so garbage collection is not interfered with.

Expression Ordering w.ID == 5 && w.name == s.name w.ID = 5 Queries get broken down along &&s w.ID == 5 && w.name == s.name Becomes two subqueries: w.ID = 5 w.name == s.name JQL arranges these into a query pipeline. For efficient execution of queries it’s important to break the query down into subqueries and tackle these piecemeal. We do this simply by splitting them along the conjunctions. So this query becomes two smaller queries, and JQL orders these in a ‘query pipeline’ in the evaluator.

A Query Pipeline Sessions Windows w.name == s.name w.ID == 5 Here’s a query pipeline, for this query below. It’s in two stages, separated with vertical dashed lines. The first stage is simply – you can see its taking input from the set ‘Windows’ and evaluating the expression “w.ID == 5” for those objects – it passes these results along to stage two, which takes the results from the previous stage and mixes them with the objects from the sessions set. Temp. Results w.ID == 5 && w.name == s.name

Ordering the Pipeline There are two factors to consider: Cost Very dependent on input sizes Input sizes are dependent on selectivity… Selectivity - The proportion of results rejected by the stage - We estimate it based on the expression used: ==, < >, !=, aMethod() - Or, we can test it with sampling Now, the ordering of this pipeline is extremely important for an optimal query execution. We have to consider two main factors for ordering the pipeline – the cost of each join, and the selectivity for each join. The cost is dependent on the input of the join, while the selectivity depends on the expression for the join. The selectivity of a join influences the cost of joins further down the pipeline – so finding the optimal ordering is non-trivial. We can’t know the selectivity without executing the join, but we need it, so we estimate it. We have a heuristic, we use relative values for the expressions in the join. You can see here from most to least selective we have equals, comparisons, not equals, and arbitrary method calls. Alternatively, for large enough queries we have the option of using sampling, where we evaluate the join for 10 or so objects randomly chosen, and estimate the selectivity from that sample. This gives surprisingly useful results.

Configuring Join Order Two strategies to decide Join Order: Exhaustive Search - Good results but expensive - Requires searching n! possible orderings Maximum Selectivity Heuristic - Less optimal order, less overhead. So we use two strategies to determine the optimal join ordering, once we have costs and selectivites for all the joins in the query. First the exhausitve search – this goes through all the possible combinations and figures out the total cost for the whole query. This gives very good results, but it’s expensive – there are n factorial possible orderings, after all. The second strategy is the maximum selectivity heuristic – it uses a simpler strategy, and simply does the most selective joins first. This is not necessarily the best idea, but it’s usually pretty good, and for long pipelines and small queries the time taken to find the optimal ordering could be excessive. So, with all this guff about performance, how do we actually do?

Querying Performance Four versions of: selectAll(Integer a=as, Integer b=bs, Integer c=cs, Integer d=ds: a == b && b != c && c < d); A benchmark! This is an arbitrary little 3-stage query on 4 sets of Integer objects. The four lines correspond to 4 implementations of the query expression on the right (read it out). Two of these lines are JQL, and two are simulated ‘hand coded’ implementations. First, the black line here represents ‘handpoor’ – a poorly implemented hand coded solution. This is just four nested loops with some short circuiting, so its pretty much n to the four. The orange line down the bottom here is the other hand-coded version – this is ‘hand opt’, an optimised version that uses the optimal join ordering, a sorted join and a hash join, and achieves the very best possible result. As you can see it does MUCH better than handpoor. Now, the blue and the green lines here are JQL doing this query, using the two query ordering strategies from before. The green is the exhaustive, the blue is the maximum selectivity. Now, they’re very close – it seems the exhaustive search has found a more optimal ordering than the maximum selectivity, so in this case the overhead has paid off. Now, we think that this performance is pretty good – the optimal version is doing better, but JQL is far far better than the hand-poor implementation.

HANDOPT Implementation for(Integer i1 : grp) { int a=i1; for(Integer i3 : array3) { int c=i3; if(b != c) { for(int x=array4.size();x!=0;x=x1){ int d=array4.get(x-1); if(c<d){ Object[] t = new Object[4]; t[0]=a; t[1]=b; t[2]=c; t[3]=d; matches.add(t); } else { break; } } return matches; HashMap<Integer,ArrayList<Integer>> map; map = new HashMap<Integer,ArrayList<Integer>>( ); for(Integer i1 : array1) { ArrayList<Integer> grp = map.get(i1); if(grp == null) { grp = new ArrayList<Integer>(); map.put(i1,grp); } grp.add(i1); ArrayList<Object[]> matches = new ArrayList<Object[]>(); Collections.sort(array4); for(Integer i2 : array2) { int b=i2; ArrayList<Integer> grp = map.get(i2); if(grp != null) { So what we’ve got here is the implementation of this HANDOPT benchmark. Don’t worry about reading all this, it’s up here merely to show just how much you need to type to get this kind of performance. And, to compare:

Performance Discussion selectAll(Integer a=as, Integer b=bs, Integer c=cs, Integer d=ds : a==b && b!=c && c < d); JQL's performance is pretty good! The average programmer is more likely to code HANDPOOR than HANDOPT.

Object Tracker Performance Object tracker overhead varies greatly.

Other Querying Systems Cω/LINQ(C#) Provide similar querying capabilities. Query-Based Debuggers QBD, DQBD by Lencevicius et al. Used querying and object tracking for debugging. Relational Databases Much prior work in query optimisation.

In Development/Consideration Caching Both simple, and incrementalised Other join methods Again, from databases – Merge, Sort, etc Tool Support Eclipse/Netbeans/etc

Conclusions Queries are neat They are a powerful, simple, useful abstraction. JQL provides efficient and easy support for querying in Java. For more information on JQL: www.mcs.vuw.ac.nz/~darren/jql/