Infovis using VTK and Qt DOECGF 2005 Andy Wilson Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
Outline Driving Problems Why VTK? Adapting VTK for Infovis Applications: CallView Lessons Learned What’s next?
Driving Problems Customers are asking for tools that are more about infovis than traditional scivis. –Map concepts against one another and display as a graph Patent database LDRD calls and ideas –Social and semantic networks Who knew whom, when? We also want the ability to annotate data inside Paraview. –Pass along annotations like any other attribute
Okay, how do we do that? Possible options: –Adopt an existing toolkit. –Write our own. –Adapt something we already have.
Option 1: Adopt an existing toolkit Plenty of candidates –Several toolkits presented at a seminar before Vis last year Don’t need to reinvent the wheel Java toolkits make portability easier –Not a panacea… Licensing issues? –If proprietary, how much does it cost? –If open, which license does it use? We may still have to re-invent other components –How does the toolkit work with Paraview, VTK, others?
Option 2: Write our own Toolkit will do exactly what we want –Play nicely with VTK, Paraview, perhaps even Ensight… No license problems Large initial investment Duplication of effort Difficult to collaborate with others
Option 3: Modify existing code Some initial investment Reinvent the wheel but not the whole car Toolkit will do most of what we want May not be a perfect fit Take advantage of existing code There is already a skilled developer community Portability may come nearly for free Contribute back to the community Potential license problems
Adapting VTK for Infovis Why VTK? –We already know how to use it –Widely deployed within Sandia via Paraview –Good working relationship with Kitware –We already have permission to release code as open source within VTK –Portability –Multiple language bindings There may be other toolkits with most of these advantages…
Infrastructure Problems String support –Add string as a new data type in VTK arrays –VTK assumes that array elements are numbers –Major problem: Strings are variable length! I/O –Need a robust way to read and write string data Filtering –What should happen when a numeric filter encounters string data? Backward compatibility is critical!
Problem 1: String data type We need to be able to treat a string as a POD type. –std::string is close enough Bigger problem: vtkDataArray and its subclasses assume that their contents are numeric. –Tried subclassing. It didn’t work. –Solution: Change the class hierarchy. Factor out the numeric assumption Backward compatibility is critical!
Old VTK data array hierarchy vtkObject vtkDataArray vtkBitArray vtkCharArray vtkShortArray vtkLongArrayvtkFloatArray vtkIntArray
New VTK data array hierarchy vtkObject vtkDataArray vtkAbstractArray Other arrays…vtkStringArray vtkBitArray etc… Backward compatibility is critical!
Changes to VTK I/O The elements of a vtkStringArray do not all have the same size. –The components of each element do, though! Flatten each string array into two separate arrays –One long block of characters –An array of offsets Use standard VTK I/O code to read and write those arrays
Changes to VTK Filtering Add a new IsNumeric() method to vtkAbstractArray Filters that only copy and select data remain unchanged Filters that work with a specific data type must check for that data type –That’s the way it is already!
Okay, now what? This relatively minor change lets us pass string data through the VTK pipeline just like any other data type. That was the only real obstacle to using VTK for infovis. Now, on to what users really care about: the applications.
CallView: Who is the customer? Customer: Sandia LDRD (Lab Directed Research & Development) program office Proposal submission season every March –About 15 broad research categories –About 100 specific thrusts (topics of interest) –Typical submission season gets ~1200 short proposals
CallView: What is the problem? The people submitting ideas for research projects don’t know what area & thrust to submit to. The people in the LDRD office want to explore the submissions. –How well do the ideas match the areas & thrusts? –Are ideas being submitted to the appropriate areas? Infovis to the rescue!
CallView: Server function Store a set of documents in a local database –Initially, thrust text only –Ideas as well (after submission season closes) Compute similarities between all pairs of documents –Do force-directed graph layout using the results –Also allow clients to compare arbitrary chunks of text against all documents in a set
Use case #1: I have an idea! Researcher has an idea for an LDRD proposal. Send idea text to CallView as part of submission process. Results include a list of the top 5 most similar thrusts. Researcher is not forced to submit to those: they are only a suggestion.
Use case #1: Results page
Use case #2: Analyzing all idea submissions Investment area managers want to see how the ideas map to the areas and thrusts. Construct a document set containing all ideas and thrusts for a given year Build a semantic graph using the similarities between all pairs within that set Allow users to explore that graph
Use case #2: Front end
Use case #3: Analyzing some submissions LDRD area managers want to see the ideas most relevant to their investment area Construct a document set with all the ideas For each thrust within an investment area, find all the ideas similar to that thrust Sort the results by similarity Flag those ideas that are very similar to a thrust but were not submitted to it. –How can they improve the process so that this doesn’t happen as frequently?
CallView: System overview CVServer (VTK + Qt) STANLEY (Text analysis) Clients (web pages, interactive viewer) VxOrd (Graph layout) SQLite (Documents) XML over TCP, HTTP CallView
CallView: Results CallView integrated into the standard LDRD idea submission process Server handled about 1000 requests over the 2- week submission period Positive reactions from LDRD staff, VPs, and end users Development continues on use cases #2 and #3 –Investment area manager tools should be online right now
CallView: Lessons learned Neat graphical interface isn’t always the right thing –Convey the information necessary in the simplest format sufficient to users’ needs –Sometimes a web page is all you need Developing distributed systems is hard! –Robust network code is hard to write and debug –Following a specification incorrectly can be worse than not following it at all There is no substitute for testing on real data.
What’s next? We have enough infrastructure to build general information visualization tools –Many, many applications: homeland security, intelligence, business planning, patent & publication analysis… String support in VTK is general enough to handle data annotation as well as string attributes
Where do I get it? String support is currently in a branch of the VTK CVS repository. With luck, it will be released with VTK 5.0. Without luck, it will be in VTK 5.1. If you want to play with it before then… – –CVS branch: VTK-Sandia-InfoViz –If it breaks, you get to keep both pieces
Thanks to… Will Schroeder, Kitware Nabeel Rahal, Keith Ortiz, Brian Wylie, Hank Westrich, and Travis Bauer, Sandia Desert Sky Software