Infovis using VTK and Qt DOECGF 2005 Andy Wilson Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
Outline Driving Problems Why VTK? Adapting VTK for Infovis Applications: CallView Lessons Learned What’s next?
Driving Problems Customers are asking for tools that are more about infovis than traditional scivis. Map concepts against one another and display as a graph Patent database LDRD calls and ideas Social and semantic networks Who knew whom, when? We also want the ability to annotate data inside Paraview. Pass along annotations like any other attribute
Okay, how do we do that? Possible options: Adopt an existing toolkit. Write our own. Adapt something we already have.
Option 1: Adopt an existing toolkit Plenty of candidates Several toolkits presented at a seminar before Vis last year Don’t need to reinvent the wheel Java toolkits make portability easier Not a panacea… Licensing issues? If proprietary, how much does it cost? If open, which license does it use? We may still have to re-invent other components How does the toolkit work with Paraview, VTK, others?
Option 2: Write our own Toolkit will do exactly what we want Play nicely with VTK, Paraview, perhaps even Ensight… No license problems Large initial investment Duplication of effort Difficult to collaborate with others
Option 3: Modify existing code Some initial investment Reinvent the wheel but not the whole car Toolkit will do most of what we want May not be a perfect fit Take advantage of existing code There is already a skilled developer community Portability may come nearly for free Contribute back to the community Potential license problems
Adapting VTK for Infovis Why VTK? We already know how to use it Widely deployed within Sandia via Paraview Good working relationship with Kitware We already have permission to release code as open source within VTK Portability Multiple language bindings There may be other toolkits with most of these advantages…
Infrastructure Problems String support Add string as a new data type in VTK arrays VTK assumes that array elements are numbers Major problem: Strings are variable length! I/O Need a robust way to read and write string data Filtering What should happen when a numeric filter encounters string data? Backward compatibility is critical!
Problem 1: String data type We need to be able to treat a string as a POD type. std::string is close enough Bigger problem: vtkDataArray and its subclasses assume that their contents are numeric. Tried subclassing. It didn’t work. Solution: Change the class hierarchy. Factor out the numeric assumption Backward compatibility is critical!
Old VTK data array hierarchy vtkObject vtkDataArray vtkBitArray vtkIntArray vtkShortArray vtkFloatArray vtkCharArray vtkLongArray
New VTK data array hierarchy vtkObject vtkAbstractArray vtkDataArray vtkStringArray Other arrays… What we’ve done here is to factor out the array-management aspects of vtkDataArray into a new class vtkAbstractArray. The interfaces specific to numeric data types remain in vtkDataArray, leaving us free to define arrays of other data types. We’ve also changed a few of the fundamental containers that hold collections of these arrays to use abstract arrays instead of data arrays. We did this specifically so that we could handle strings, but it’s already having side effects: a few other classes in VTK like vtkCellArray are going to become much cleaner under this new hierarchy. The best part about this is that the interface to vtkDataArray doesn’t break at all. It gains one or two new methods that are perfectly safe for users and developers to ignore unless they want to use the new array types. vtkBitArray Backward compatibility is critical! etc…
Changes to VTK I/O The elements of a vtkStringArray do not all have the same size. The components of each element do, though! Flatten each string array into two separate arrays One long block of characters An array of offsets Use standard VTK I/O code to read and write those arrays Since we’re not tying the I/O code to the standard 8-bit character, this leaves the door open for international character sets…
Changes to VTK Filtering Add a new IsNumeric() method to vtkAbstractArray Filters that only copy and select data remain unchanged Filters that work with a specific data type must check for that data type That’s the way it is already! Since we’re adding a new data type that doesn’t fit some of the old assumptions, we have to expose a way for people to check for it.
Okay, now what? This relatively minor change lets us pass string data through the VTK pipeline just like any other data type. That was the only real obstacle to using VTK for infovis. Now, on to what users really care about: the applications.
CallView: Who is the customer? Customer: Sandia LDRD (Lab Directed Research & Development) program office Proposal submission season every March About 15 broad research categories About 100 specific thrusts (topics of interest) Typical submission season gets ~1200 short proposals
CallView: What is the problem? The people submitting ideas for research projects don’t know what area & thrust to submit to. The people in the LDRD office want to explore the submissions. How well do the ideas match the areas & thrusts? Are ideas being submitted to the appropriate areas? Infovis to the rescue!
CallView: Server function Store a set of documents in a local database Initially, thrust text only Ideas as well (after submission season closes) Compute similarities between all pairs of documents Do force-directed graph layout using the results Also allow clients to compare arbitrary chunks of text against all documents in a set
Use case #1: I have an idea! Researcher has an idea for an LDRD proposal. Send idea text to CallView as part of submission process. Results include a list of the top 5 most similar thrusts. Researcher is not forced to submit to those: they are only a suggestion.
Use case #1: Results page
Use case #2: Analyzing all idea submissions Investment area managers want to see how the ideas map to the areas and thrusts. Construct a document set containing all ideas and thrusts for a given year Build a semantic graph using the similarities between all pairs within that set Allow users to explore that graph
Use case #2: Front end
Use case #3: Analyzing some submissions LDRD area managers want to see the ideas most relevant to their investment area Construct a document set with all the ideas For each thrust within an investment area, find all the ideas similar to that thrust Sort the results by similarity Flag those ideas that are very similar to a thrust but were not submitted to it. How can they improve the process so that this doesn’t happen as frequently?
CallView: System overview Clients (web pages, interactive viewer) XML over TCP, HTTP CallView CVServer (VTK + Qt) VxOrd (Graph layout) STANLEY (Text analysis) SQLite (Documents)
CallView: Results CallView integrated into the standard LDRD idea submission process Server handled about 1000 requests over the 2-week submission period Positive reactions from LDRD staff, VPs, and end users Development continues on use cases #2 and #3 Investment area manager tools should be online right now
CallView: Lessons learned Neat graphical interface isn’t always the right thing Convey the information necessary in the simplest format sufficient to users’ needs Sometimes a web page is all you need Developing distributed systems is hard! Robust network code is hard to write and debug Following a specification incorrectly can be worse than not following it at all There is no substitute for testing on real data.
What’s next? We have enough infrastructure to build general information visualization tools Many, many applications: homeland security, intelligence, business planning, patent & publication analysis… String support in VTK is general enough to handle data annotation as well as string attributes
Where do I get it? String support is currently in a branch of the VTK CVS repository. With luck, it will be released with VTK 5.0. Without luck, it will be in VTK 5.1. If you want to play with it before then… http://www.vtk.org/get-software.php#cvs CVS branch: VTK-Sandia-InfoViz If it breaks, you get to keep both pieces
Thanks to… Will Schroeder, Kitware Nabeel Rahal, Keith Ortiz, Brian Wylie, Hank Westrich, and Travis Bauer, Sandia Desert Sky Software