Recent Advances in ViPER David Mihalcik David Doermann Charles Lin
What is ViPER? A tool for evaluating video understanding algorithms. Includes: – An Annotation Tool For labeling ground truth and browsing results. – A Comparison Tool For evaluating result data with respect to ground truth.
What is the Problem? Lots of people, here and elsewhere, are working on video processing algorithms for information extraction, etc. Evaluating performance of the algorithms requires a lot of work, usually with tools developed by the algorithm designer. Is your solution any good? Prove it.
How to evaluate the algorithm? What is the problem? – ViPER focuses on evaluating solutions to detection and tracking problems; these determine if or where in the video some entity or event appears. Evaluation – Comparison of the result data set against a truth data set. – Truth, metrics, and rules for comparison are task dependent.
Goal of the ViPER Project To make evaluation of video algorithms simple, repeatable, and ubiquitous. As ground truth is required for evaluation, annotation must be made simple, as well. – Avoid tedium. – Avoid frustration. – Support expert usage.
Ground Truth Annotation ViPER-GT supports annotation of temporally qualified spatial and nominal data on video files and still images. Go from the simple per-frame or shot annotations to detailed spatial markup. – You can quickly indicate which frames contain people. – Then, you add how many people per frame. – With a lot of time and money, you can put boxes around them.
ViPER-GT: Video Ground Truth Annotation Tool
For Example: Person Tracking How well does an algorithm find and track humans moving through a video? To evaluate detection: – Truth must indicate which frames contain the person. To evaluate tracking: – Truth must contain spatial information, indicating where a person may be found.
Example of Annotation: Person Detection
Example of Annotation: Person Tracking
The ViPER Data Model Similar to a relational database: – Tables are Descriptor Definitions. – Columns are Attributes. – Rows are Descriptor Instances. Most descriptors are OBJECT descriptors: – Attributes are temporally qualified. – Static OBJECTS have a frame range, but their attributes are not temporally qualified. Useful for events, etc.
The ViPER File Format and API Uses XML. I won’t go into it here. There is a Java API.
Related Work VideoAnnEx OntoLog PhotoStuff Informedia
VideoAnnEx IBM’s MPEG-7 annotation tool. Cool Features: – Cut detection makes it easy to add per-shot markup. – Supports MPEG-7. Annoyances: – Not very good for spatial attributes. – Commercial software. Not as extensible as ViPER.
OntoLog Jon Heggland’s Tool for Temporal Markup with Ontologies Advantages: – Good support for key bindings and playback. – Data model supports inheritance. Annoyances: – No spatial data support.
PhotoStuff MINDSWAP’s Tool for Adding Semantic Web Markup to Images Cool Features: – Semantic Markup! – Spatial Data! Annoyances: – Buggy and beta. – No support for video.
Informedia CMU’s tool for browsing video libraries Cool Features: – Advanced browsing functionality. But… – Focus on video library, not annotation. – Not available for download, from what I can tell. – Not terrifically extensible. See also: Silver and Malach
Extending the Interface ViPER provides a lot of functionality, but is very general. It may be appropriate to extend viper-gt to better support marking up a different type of annotation.
Example Extension: Adding Text Zones Adds a toolbar that allows typed bounding boxes. Instead of having to click create, auto-creates a new box.
Architecture of ViPER-GT Application Launcher – Loads a set of javabeans from an RDF model. – Allows modifying menus, i18n, etc. – Is a bit of a pain to handle 'menu change' events, like for most recently used menu. Viper View Mediator – Javabean container for ViPER API. – Adds 'user interaction' methods to keep track of things not in API (focus, filters).
Continual Improvement SourceForge web site: – Mail suggestions/comments to: –