Architectural Support for Database Visualization Dennis Groth Indiana University Computer Science May 1, 2002
Talk Structure Problem Motivation Overview of Visualization Process Data Preparation Definable Maps Visualizing Databases Summary and Future Research 9/19/2018 Dennis Groth
Motivation What is visualization? “the act or process of interpreting in visual terms or of putting into visible form.” [Webster’s] “Transforming the symbolic into the geometric.” [McCormick et al, 1987] “The binding (or mapping) of data to representations that can be perceived.” [Foley, 1994] 9/19/2018 Dennis Groth
The Goal of Visualization Gain insight into data Understand the “whole” Different than presentation graphics: Used to communicate information to others 9/19/2018 Dennis Groth
Classic Presentation Display of 6 variables. [Minard, 1861] 9/19/2018 Dennis Groth
Scientific Visualization Visual representation of scientific data. Rainfall in Peru over 3 day period [Goldberg et al, 1987] 9/19/2018 Dennis Groth
Information Visualization Visual representation of abstract data. [Shneiderman, et al] 9/19/2018 Dennis Groth
What’s the Difference? Scientific data: Abstract data: Often, already numeric Natural mapping to coordinates Abstract data: May not be numeric (No order, No scale) Mappings must be defined or constructed 9/19/2018 Dennis Groth
Visualization Process (KDD) 9/19/2018 Dennis Groth
Interpret Patterns Discriminating clusters 9/19/2018 Dennis Groth
Research Mission To leverage database techniques and technologies in order to enhance visualization activities, contributing to measurable improvements in efficiency, effectiveness and satisfaction of users. 9/19/2018 Dennis Groth
Research Contributions Architecture Mapping and Map Specification Measuring Usability System Implementation Application Visualizing relationships 9/19/2018 Dennis Groth
Architecture User Front End DB Query Specification Map Specification Filtered Image Query Specification Map Specification Filter Image Display Query Map Scaled Data Rendering Data Domain Filtered Data Data Extraction Data Preparation Visual Query Plot Raw Data Pre- Image Unscaled Data DB 9/19/2018 Dennis Groth
Architecture User Front End DB Query Specification Map Specification Filtered Image Query Specification Map Specification Filter Image Display Query Map Scaled Data Rendering Data Domain Filtered Data Data Extraction Data Preparation Visual Query Plot Raw Data Pre- Image Unscaled Data DB 9/19/2018 Dennis Groth
Simple Visualization Output Input Data t1 t2 t3 . tn 9/19/2018 Dennis Groth
Database Activities Salary Age Sex Salary Age … M 640001 57 Select Salary,Age,Count(*) From Employee Group By Salary,Age 9/19/2018 Dennis Groth
Architecture User Front End DB Query Specification Map Specification Filtered Image Query Specification Map Specification Filter Image Display Query Map Scaled Data Rendering Data Domain Filtered Data Data Extraction Data Preparation Visual Query Plot Raw Data Pre- Image Unscaled Data DB 9/19/2018 Dennis Groth
Data Preparation Filtered Map Data Map Join Aggregation Pre-Image Raw Data Canned Algorithms Future Extensions 9/19/2018 Dennis Groth
Map Join Output Input Data Map t1 m1 t2 m2 t3 m3 . . tn mn 9/19/2018 Dennis Groth
Database Activities SalaryRank Age Sex Salary Age … M 640001 57 Salary 3318 … Age Select SalaryRank,Age, Count(*) From Employee, SalaryMap Where Employee.Salary = SalaryMap.Salary Group By SalaryRank,Age 9/19/2018 Dennis Groth
Architecture User Front End DB Query Specification Map Specification Filtered Image Query Specification Map Specification Filter Image Display Query Map Scaled Data Rendering Data Domain Filtered Data Data Extraction Data Preparation Visual Query Plot Raw Data Pre- Image Unscaled Data DB 9/19/2018 Dennis Groth
Visual Query Defines the linkages between the data and the display Pre-defined schema for each visualization Histogram – {X, Height}, {X, Y, Height} Scatterplot – {X, Y}, {X, Y, Aggregation}, … Line, Surface, … 9/19/2018 Dennis Groth
Architecture User Front End DB Query Specification Map Specification Filtered Image Query Specification Map Specification Filter Image Display Query Map Scaled Data Rendering Data Domain Filtered Data Data Extraction Data Preparation Visual Query Plot Raw Data Pre- Image Unscaled Data DB 9/19/2018 Dennis Groth
Map Maps are relations, so: Applied with relational operators Modifications to maps do not affect base data Multiple maps can be applied to one dataset One map can be applied to multiple datasets 9/19/2018 Dennis Groth
Constructing Maps Standard database operations: Insert statements Constant values: Insert into MonthMap values (1, “January”) Calculated values: Insert into MonthMap Select floor(SalesAmount / 1000) as MapValue, * From MonthlySales Algorithm driven (cluster, binning, etc.) Mapping Language 9/19/2018 Dennis Groth
Mapping Language Map program – P = < p1, p2, …, pk > Based on Datalog No recursion, no negation Each rule is written as: is a boolean expression is an expression that evaluates to a numeric value Functions are allowed 9/19/2018 Dennis Groth
Map Program Rules Rules are defined over sets of attributes, not specific tables (t) is the value of the head of the rule when substituting attribute values from tuple t (t) is the value of the body of the rule when substituting attribute values from tuple t Two special types of rules Except (Excludes tuples) Else (If all preceding rules fail) 9/19/2018 Dennis Groth
Map Program Interpretations Given P = < p1, p2, …, pk > and input instance s Relational Interpretation: IR(P,s) = {< t , i(t) > : t s 1 i k i(t)} Functional Interpretation: IF(P,s) = {< t , i(t) > : t s 1 i k i(t) (j)[1 j < i j(t)]} 9/19/2018 Dennis Groth
Properties of the Language Monotonic - Equivalent to subset of RA Safe - Finite output Closed – Takes as input a relation and returns a relation Allows composition - IR(P, IF(Q, IF(R,s) ) ) One implementation supports both interpretations Complexity - O(|input|) 9/19/2018 Dennis Groth
SQL Approach Each rule has an equivalent SQL query Select From InputRelation Where Each program has an equivalent union query Select 1 From InputRelation Where 1 UNION … … UNION Select k From InputRelation Where k Complexity - O(|input| k) 9/19/2018 Dennis Groth
Example Map Programs 1 Month = ‘January’ 2 Month = ‘February’ . . . Age + 100 Sex=‘F’ Age Else 9/19/2018 Dennis Groth
Map Language Usability Prior studies Early work (Reissner, Welty) SQL vs QBE (Yen and Scamell) SQL vs Visual Language (Catarci, et al) Goal: Quantify usability of the mapping language 9/19/2018 Dennis Groth
Experiment Design Subjects 27 undergraduate students (Low skill level) 28 graduate students (Medium skill level) 10 professionals (High skill level) All 3 groups attempted the same tasks and were given the same training materials Half of each group used SQL, the other half used the mapping language 9/19/2018 Dennis Groth
Results Accuracy (Within Group) Satisfaction (Pre to Post Comparison) Undergraduates perform better with rules (p < .10) Graduates perform better with rules (p < .10) Note: Excludes 3 subjects that did not answer any problems No difference for professionals Satisfaction (Pre to Post Comparison) Undergrad/Professional satisfied with rules Everyone not satisfied with SQL (p < .10) Preference Professionals preferred rule language (p < .001) 9/19/2018 Dennis Groth
Results Interesting trend for satisfaction (5=best) Rule Satisfaction Pre-Test Post-Test Undergrad 2.9 3.0 Grad 3.2 Professional SQL Satisfaction Pre-Test Post-Test Undergrad 3.2 2.6 Grad 3.1 2.5 Professional 9/19/2018 Dennis Groth
System Implementation Queries Database Client Server Map Requests Map Construction 9/19/2018 Dennis Groth
User Interaction Rotation, Translation, Scaling Drill-down queries Select data-points for use in other contexts Like brushing Combining plots Scaled independently or dependently Overlay, Offset, Tile 9/19/2018 Dennis Groth
Visualization of Databases Proof of concept application Mapping based on entropy Allows insight into structure of a relation Functional Dependencies (Exact) Approximate Dependencies (Almost an FD) Information dependency measures (Dalkilic and Robertson, 2000) Visualizations show every relationship in one plot 9/19/2018 Dennis Groth
Identifying Relationships Function-Like Functional Dependency 9/19/2018 Dennis Groth
Visualization of Databases Entropy calculation :: H(A) = - pi log(1/pi) where each pi is the probability of ai in the active domain of attribute A Gives the average number of bits needed to transmit an A value 0 H(A) log(|Adom(A)|) H(AB) = H(AB) – H(A) If H(AB) = 0, the FD AB holds Provides a measure of approximateness 9/19/2018 Dennis Groth
Comparing Datasets 9/19/2018 Dennis Groth
Summary Architecture supporting visualization Mapping as a key element Mapping language Usability evaluation System Implementation Mission: To leverage database techniques and technologies in order to enhance visualization activities, contributing to measurable improvements in efficiency, effectiveness and satisfaction of users. 9/19/2018 Dennis Groth
Future Research Visualization Data Mining Human Computer Interaction Extensions driven by applications Data Mining Rule management (information overload) Human Computer Interaction Empirical testing of application manager user interface 9/19/2018 Dennis Groth
Questions 9/19/2018 Dennis Groth