Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases Ugur YENIER
Introduction Need for data interfaces emerging with Data warehousing Scientific Computation Business Analysis Graphic Representations are more effective Allowing multiple views of the same data Easy discovery on massive data
Introduction Meaning of data Discover Structure Find Patterns Derive Casual Relationships n-Dimensional Data Cubes Cube Dimension = Relational Schema Dimension
Introduction Most popular method : PIVOT TABLE Allow data cube to be rotated, pivoted Dimensions = Rows or Columns Remaining Dimensions are aggregated Cross-tabulations and summaries are provided Further exploit : Graphs Projections of data cubes in Bar Charts Scatter points Parallel Coordinate Displays
Introduction POLARIS Interface for exploring multi-dimensional databases Extends Pivot Table to directly generate rich graphical displays Builds tables using algebraic formalism involving fields of the database Each table contains layers and panes
Overview Support interactive exploration of large multidimensional relational databases A relational database may contain heterogeneous but interrelated tables Field Characteristics Nominal Ordinal Quantitative Interval
Overview Polaris Field Categorization Intervals = Quantitative (Ordered) Nominal = Ordinal Dimension : Product Name Measure : Product Prize, Size Ordinal Fields Dimensions Quantitative Measure
Overview Target Specifications Data-dense displays Multiple display types Exploratory Interface Polaris meets specs. providing rapidly and incrementally generating table-based displays Table = Rows, Columns + LAYERS
Overview Each table axis may contain multiple nested dimensions Each table entry (pane) consists a set of records represented with marks Sample Polaris Interface
Interface Characteristics Multivariate: Multiple dimension of data can be explicitly encoded Comparative : Small-multiple displays to compare, exposing patterns and trends Familiar : Statisticians are accustomed to using tabular display of graphics
Visualization Multiple data sources may be combined in a single visualization Dimensions are displayed in x,y,z shelves Record partitioning and layering Grouping information Graphic Type Field mappings to retinal properties
Visualization Selecting a mark pops up detail window displaying specified tuples It is possible to draw a rubber band around a set of marks to brush (will be discussed later…)
Generating Graphics There are three components Specifications of the different table configurations Type of graphics inside each pane Details of the visual encodings
Table Algebra Formal mechanism to specify table configurations When a field is placed in a shelf, algebra expression is generated x,y axes partition into rows and columns, z partitions to layers
Table Algebra A,B,C representing ordinal fields P,Q,R representing quantitative fields Assignment of sets to symbols reflect difference in how two types of fields will be encoded in the structure of the tables Ordinal fields into rows and columns Quantitative fields into axes within the panes
Table Algebra Valid expression is an ordered sequence of one or more symbols Between each adjacent symbol there are operators Operators (in order of precedence) (X) Cross (/) Nest (+) Concatenation
Concatenation Performs union operations
Cross Performs Cartesian product operations
Nest Similar to cross operator, but only creates set entries for which there exist records with those domain values. Interpretation is “B within A” For example, given the fields quarter and month, the expression quarter/month would be interpreted as months within each quarter
Table Algebra Every expression in the algebra can be reduced to a single set Each entry in the set being an ordered concatenation of zero of more ordinal values with zero or more quantitative field names This set evaluation of an expression is normalized set form
Normalized Set Form Table axis is partitioned into columns (rows or layers) so that there is a one-to-one correspondence between set entries in the set and columns
Normalized Set Form
Types of Graphics Once the table configuration is specified, next step is to specify the type of graphic in each pane Three graphic families ordinal-ordinal ordinal-quantitative quantitative-quantitative Each family contains a number of ways to mark records
Type of Graphics Supported Polaris types Rectangle Circle Glyph Text Gantt bar Line Polygon Image
Types of Graphics Dependent and independent dimensions are interpreted differently By default dimensions are treated as independent dimensions Aggregations affect the type of graphics
Ordinal-Ordinal Axis variables are typically independent of each other Task is focused on understanding patterns and trends in some function ƒ(O x,O y ) R Typical example is studying sales and margin as a function of product type, month and state of items sold by a coffee chain
Ordinal-Quantitative Typically bar chart, possibly clustered or stacked, the dot plot and Gantt Chart Quantitative variable is often dependent on the ordinal variable and the aim is to understand or compare the properties of some function ƒ(O) Q
Ordinal-Quantitative Matrix of bar charts used to study several functions of the independent variables product and month
Ordinal-Quantitative The cardinality of the record set does affect the structure of the graphics When the cardinality of the record is set is one, the graphics are simple bar or dot plots When the cardinality of the record is set to greater than one, the graphic is stacked bar chart
Ordinal-Quantitative Major wars over the past 500 year shown as a Gantt chart Additional layer in figure displays pictures of major scientists plotted as a function of the independent variables country of birth and date of birth
Quantitative- Quantitative Used to understand the distribution of data as a function of one or both quantitative variables and to discover casual relationships between the two quantitative variables
Quantitative-Quantitative Typical Map Flight scheduling varies with the region of the country in which the flight originated Number of flights between major airports has been plotted as a function of latitude and longitude Plotted in two layers, the location plots and the geography of each state as a polygon
Visual Mappings Each record in a pane is mapped to a mark Two Components Type of graphic and mark Encoding of fields of the records into visual or retinal properties of the selected mark Visual properties in Polaris are based on Bertin's retinal variables Shape Size Orientation Color (value and hue) Texture (not supported in the current version of Polaris)
Retinal Properties The different retinal properties that can be used to encode fields of the data and examples of the default mappings that are generated when a given type of data field is encoded in each of the retinal properties
Visual Mappings Retinal properties of the display greatly enhances the data density and the variety of displays that can be generated Analysts should not be required to construct the mappings Instead, they should be able to simply specify that a field be encoded as a visual property System should then generate an effective mapping from the domain of the field to the range of the visual property
DATA TRANSFORMATIONS AND VISUAL QUERIES Rapidly change the table configuration, type of graphic, and visual encodings used to visualize a data set for interactive exploration Resulting display is also manipulable Analyst is able to sort, filter, and transform the data to uncover useful relationships and information Also form ad hoc groupings and partitions that reflect this newly uncovered information
Data Transformations and Visual Queries Polaris supports four features to perform visual queries Deriving additional fields Sorting and filtering Brushing and tool tips Undo and Redo
Deriving Additional Fields The generated fields are aggregates or statistical summaries Polaris currently provides five methods for deriving additional fields Simple aggregation of quantitative measures Counting of distinct values in ordinal dimensions Discrete partitioning of quantitative measures Ad hoc grouping within ordinal dimensions Threshold aggregation
Deriving Additional Fields Simple Aggregation Basic aggregation operations (that are applied to a single quantitative field) Summation Average Minimum Maximum Right-Click and apply, change type Easily extended to provide any statistical aggregate that can be generated from relational data
Deriving Additional Fields Counting of Ordinal Dimensions Counting of distinct values for an ordinal field within the data set Right-Click and apply Applying the count operator changes the field type (to quantitative) and thus change the table configuration and graph type in each pane
Deriving Additional Fields Discrete Partitioning Used to discretize a continuous domain Polaris provides two discretization methods Binning, allows the analyst to specify a regular bin size in which to aggregate the data, useful for creating graphs, such as histograms, in which there are many regularly sized bins Partitioning, allows the user to individually specify the size and name of each bin, useful for encoding additional categorizations into the data Right-Click and apply
Deriving Additional Fields Ad hoc Grouping Ordinal version of quantitative partitioning, where the user can choose to group together different ordinal values Allows the analyst to add own domain knowledge to the analysis and to change the groupings as the exploration uncovers additional patterns Right-Click and apply
Deriving Additional Fields Threshold Aggregation It is derived from two source fields: an ordinal field and a quantitative field If the quantitative field is less than a certain threshold value for any values of an ordinal field, those values are aggregated together to form an "Other" category Allows the user to specify threshold values below which the data is considered uninteresting Right-Click and apply
Sorting and Filtering Filtering allows the user to choose which values to display so that he can focus on and devote more screen space and attention to the areas of interest For ordinal fields, a listbox with all possible values is shown and the user can check or uncheck each value to display it or not For quantitative fields, a dynamic query slider allows the user to choose a new domain Additionally, there are textboxes showing the chosen minimum and maximum values that the user can use to directly enter a new domain.
Sorting and Filtering Sorting allows the user to uncover hidden patterns by changing the order of values within a field's domain or the ordering of tuples in the data The ordering of tuples affects the drawing order of marks within a pane. Polaris provides three ways for a user to sort the domain. User can bring up the filter window and drag-and drop the values within that window to reorder the domain If the field has been used to partition the table into rows or columns, the user can drag-and-drop the table row or column headers to reorder the domain values Polaris provides programmatic sorting, allowing the user to sort one field based on the value in another field
Brushing and Tooltips Analysts want to directly interact with the data, visually querying the data to highlight correlated marks or getting more details on demand Brushing allows the user to choose a set of interesting data points by drawing a rubberband around them Tooltips allow the user to get more details on demand.
Brushing The user selects a single field whose values are then used to identify related marks and tuples All marks corresponding to tuples sharing selected field values with the selected tuples are subsequently highlighted in all other panes or linked Polaris views Allowing correlation between different projections of the same data set or relationships between distinct data sets.
Tooltips If the user hovers over a data point or pane, additional details, such as specific field values for the tuple corresponding to the selected mark, are shown Analysts can use tooltips to understand the relationship between the graphical marks and the underlying data
Undo and Redo Unlimited undo and redo within an analysis sessio Users can use the "Back" and "Forward" buttons on the top toolbar to either return to a previous visual specification or to move forward again.
GENERATING DATABASE QUERIES
Results Throughout the analyses users want to see data and how they want to see it change continually Analysts form hypotheses create new views to perform tests and experiments Certain displays enable an understanding of overall trends, whereas others show causal relationships As the analysts better understand the data, they may want to drill-down in the visible dimensions or display entirely different dimensions Polaris supports this exploratory process through its visual interface By formally categorizing the types of graphics, Polaris is able to provide a simple interface for rapidly generating a wide range of displays This allows analysts to focus on the analysis task rather than the steps needed to retrieve and display the data
Discussions Comparison with similar work is omitted in this presentation Interpretation of visual specifications as database queries Interactivity and performance of Polaris
Interpretation of Visual Specifications as Database Queries Polaris generates the SQL query for each table pane Similar to CUBE operator generating the queries to create the cross-tab and Pivot Table displays However the CUBE operator is not applicable for Polaris because it assumes that the sets of relations partitioned into each table pane do not overlap
Interactivity and Performance of Polaris Polaris at its first implementations focuses on the techniques, semantics and formalism rather then the interactivity It has been experienced that the query response time does not need to be real-time in order to maintain a feeling of exploration (several tens of seconds)
Interactivity and Performance of Polaris Test Data: A subset of a packet trace of a mobile network over a 13 week period, approx. 6 million tuples A subset of the data collected from Sloan Digital Sky Survey (approx. 650MB) Both stored on MS SQL Server 2000 Paper does not provide numeric data on performance but the personal experiences of the testers
Conclusion Polaris extends the well known Pivot Table interface to display relational query results using a rich inexpensive set of graphical displays Succinct visual specification for describing table-based graphical displays of relational data Interpretation of visual specifications as a precise sequence of relational database operations
Future Work Performance evaluation Hierarchical data cubes Correspondence of marks to data tuples (dynamic mark generation) Animation shelf to display sequencing data
Thank You