Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases Ugur YENIER.

Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases Ugur YENIER

Introduction Need for data interfaces emerging with  Data warehousing  Scientific Computation  Business Analysis Graphic Representations are more effective Allowing multiple views of the same data Easy discovery on massive data

Introduction Meaning of data  Discover Structure  Find Patterns  Derive Casual Relationships n-Dimensional Data Cubes  Cube Dimension = Relational Schema Dimension

Introduction Most popular method : PIVOT TABLE  Allow data cube to be rotated, pivoted  Dimensions = Rows or Columns  Remaining Dimensions are aggregated  Cross-tabulations and summaries are provided Further exploit : Graphs  Projections of data cubes in Bar Charts Scatter points Parallel Coordinate Displays

Introduction POLARIS  Interface for exploring multi-dimensional databases  Extends Pivot Table to directly generate rich graphical displays Builds tables using algebraic formalism involving fields of the database Each table contains layers and panes

Overview Support interactive exploration of large multidimensional relational databases A relational database may contain heterogeneous but interrelated tables Field Characteristics  Nominal  Ordinal  Quantitative  Interval

Overview Polaris Field Categorization  Intervals = Quantitative  (Ordered) Nominal = Ordinal Dimension : Product Name Measure : Product Prize, Size Ordinal Fields  Dimensions Quantitative  Measure

Overview Target Specifications  Data-dense displays  Multiple display types  Exploratory Interface Polaris meets specs. providing rapidly and incrementally generating table-based displays Table = Rows, Columns + LAYERS

Overview Each table axis may contain multiple nested dimensions Each table entry (pane) consists a set of records represented with marks Sample Polaris Interface

Interface Characteristics Multivariate: Multiple dimension of data can be explicitly encoded Comparative : Small-multiple displays to compare, exposing patterns and trends Familiar : Statisticians are accustomed to using tabular display of graphics

Visualization Multiple data sources may be combined in a single visualization Dimensions are displayed in x,y,z shelves Record partitioning and layering Grouping information Graphic Type Field mappings to retinal properties

Visualization Selecting a mark pops up detail window displaying specified tuples It is possible to draw a rubber band around a set of marks to brush (will be discussed later…)

Generating Graphics There are three components  Specifications of the different table configurations  Type of graphics inside each pane  Details of the visual encodings

Table Algebra Formal mechanism to specify table configurations When a field is placed in a shelf, algebra expression is generated x,y axes partition into rows and columns, z partitions to layers

Table Algebra A,B,C representing ordinal fields P,Q,R representing quantitative fields Assignment of sets to symbols reflect difference in how two types of fields will be encoded in the structure of the tables Ordinal fields into rows and columns Quantitative fields into axes within the panes

Table Algebra Valid expression is an ordered sequence of one or more symbols Between each adjacent symbol there are operators Operators (in order of precedence)  (X) Cross  (/) Nest  (+) Concatenation

Concatenation Performs union operations

Cross Performs Cartesian product operations

Nest Similar to cross operator, but only creates set entries for which there exist records with those domain values. Interpretation is “B within A” For example, given the fields quarter and month, the expression quarter/month would be interpreted as months within each quarter

Table Algebra Every expression in the algebra can be reduced to a single set Each entry in the set being an ordered concatenation of zero of more ordinal values with zero or more quantitative field names This set evaluation of an expression is normalized set form

Normalized Set Form Table axis is partitioned into columns (rows or layers) so that there is a one-to-one correspondence between set entries in the set and columns

Normalized Set Form

Types of Graphics Once the table configuration is specified, next step is to specify the type of graphic in each pane Three graphic families  ordinal-ordinal  ordinal-quantitative  quantitative-quantitative Each family contains a number of ways to mark records

Type of Graphics Supported Polaris types  Rectangle  Circle  Glyph  Text  Gantt bar  Line  Polygon  Image

Types of Graphics Dependent and independent dimensions are interpreted differently By default dimensions are treated as independent dimensions Aggregations affect the type of graphics

Ordinal-Ordinal Axis variables are typically independent of each other Task is focused on understanding patterns and trends in some function ƒ(O x,O y )  R Typical example is studying sales and margin as a function of product type, month and state of items sold by a coffee chain

Ordinal-Quantitative Typically bar chart, possibly clustered or stacked, the dot plot and Gantt Chart Quantitative variable is often dependent on the ordinal variable and the aim is to understand or compare the properties of some function ƒ(O)  Q

Ordinal-Quantitative Matrix of bar charts used to study several functions of the independent variables product and month

Ordinal-Quantitative The cardinality of the record set does affect the structure of the graphics When the cardinality of the record is set is one, the graphics are simple bar or dot plots When the cardinality of the record is set to greater than one, the graphic is stacked bar chart

Ordinal-Quantitative Major wars over the past 500 year shown as a Gantt chart Additional layer in figure displays pictures of major scientists plotted as a function of the independent variables country of birth and date of birth

Quantitative- Quantitative Used to understand the distribution of data as a function of one or both quantitative variables and to discover casual relationships between the two quantitative variables

Quantitative-Quantitative Typical Map Flight scheduling varies with the region of the country in which the flight originated Number of flights between major airports has been plotted as a function of latitude and longitude Plotted in two layers, the location plots and the geography of each state as a polygon

Visual Mappings Each record in a pane is mapped to a mark Two Components  Type of graphic and mark  Encoding of fields of the records into visual or retinal properties of the selected mark Visual properties in Polaris are based on Bertin's retinal variables  Shape  Size  Orientation  Color (value and hue)  Texture (not supported in the current version of Polaris)

Retinal Properties The different retinal properties that can be used to encode fields of the data and examples of the default mappings that are generated when a given type of data field is encoded in each of the retinal properties

Visual Mappings Retinal properties of the display greatly enhances the data density and the variety of displays that can be generated Analysts should not be required to construct the mappings Instead, they should be able to simply specify that a field be encoded as a visual property System should then generate an effective mapping from the domain of the field to the range of the visual property

DATA TRANSFORMATIONS AND VISUAL QUERIES Rapidly change the table configuration, type of graphic, and visual encodings used to visualize a data set for interactive exploration Resulting display is also manipulable Analyst is able to sort, filter, and transform the data to uncover useful relationships and information Also form ad hoc groupings and partitions that reflect this newly uncovered information

Data Transformations and Visual Queries Polaris supports four features to perform visual queries  Deriving additional fields  Sorting and filtering  Brushing and tool tips  Undo and Redo

Deriving Additional Fields The generated fields are aggregates or statistical summaries Polaris currently provides five methods for deriving additional fields  Simple aggregation of quantitative measures  Counting of distinct values in ordinal dimensions  Discrete partitioning of quantitative measures  Ad hoc grouping within ordinal dimensions  Threshold aggregation

Deriving Additional Fields Simple Aggregation Basic aggregation operations (that are applied to a single quantitative field)  Summation  Average  Minimum  Maximum Right-Click and apply, change type Easily extended to provide any statistical aggregate that can be generated from relational data

Deriving Additional Fields Counting of Ordinal Dimensions Counting of distinct values for an ordinal field within the data set Right-Click and apply Applying the count operator changes the field type (to quantitative) and thus change the table configuration and graph type in each pane

Deriving Additional Fields Discrete Partitioning Used to discretize a continuous domain Polaris provides two discretization methods  Binning, allows the analyst to specify a regular bin size in which to aggregate the data, useful for creating graphs, such as histograms, in which there are many regularly sized bins  Partitioning, allows the user to individually specify the size and name of each bin, useful for encoding additional categorizations into the data  Right-Click and apply

Deriving Additional Fields Ad hoc Grouping Ordinal version of quantitative partitioning, where the user can choose to group together different ordinal values Allows the analyst to add own domain knowledge to the analysis and to change the groupings as the exploration uncovers additional patterns Right-Click and apply

Deriving Additional Fields Threshold Aggregation It is derived from two source fields: an ordinal field and a quantitative field If the quantitative field is less than a certain threshold value for any values of an ordinal field, those values are aggregated together to form an "Other" category Allows the user to specify threshold values below which the data is considered uninteresting Right-Click and apply

Sorting and Filtering Filtering allows the user to choose which values to display so that he can focus on and devote more screen space and attention to the areas of interest For ordinal fields, a listbox with all possible values is shown and the user can check or uncheck each value to display it or not For quantitative fields, a dynamic query slider allows the user to choose a new domain Additionally, there are textboxes showing the chosen minimum and maximum values that the user can use to directly enter a new domain.

Sorting and Filtering Sorting allows the user to uncover hidden patterns by changing the order of values within a field's domain or the ordering of tuples in the data The ordering of tuples affects the drawing order of marks within a pane. Polaris provides three ways for a user to sort the domain.  User can bring up the filter window and drag-and drop the values within that window to reorder the domain  If the field has been used to partition the table into rows or columns, the user can drag-and-drop the table row or column headers to reorder the domain values  Polaris provides programmatic sorting, allowing the user to sort one field based on the value in another field

Brushing and Tooltips Analysts want to directly interact with the data, visually querying the data to highlight correlated marks or getting more details on demand  Brushing allows the user to choose a set of interesting data points by drawing a rubberband around them  Tooltips allow the user to get more details on demand.

Brushing The user selects a single field whose values are then used to identify related marks and tuples All marks corresponding to tuples sharing selected field values with the selected tuples are subsequently highlighted in all other panes or linked Polaris views Allowing correlation between different projections of the same data set or relationships between distinct data sets.

Tooltips If the user hovers over a data point or pane, additional details, such as specific field values for the tuple corresponding to the selected mark, are shown Analysts can use tooltips to understand the relationship between the graphical marks and the underlying data

Undo and Redo Unlimited undo and redo within an analysis sessio Users can use the "Back" and "Forward" buttons on the top toolbar to either return to a previous visual specification or to move forward again.

GENERATING DATABASE QUERIES

Results Throughout the analyses users want to see data and how they want to see it change continually Analysts  form hypotheses  create new views to perform tests and experiments Certain displays enable an understanding of overall trends, whereas others show causal relationships As the analysts better understand the data, they may want to drill-down in the visible dimensions or display entirely different dimensions Polaris supports this exploratory process through its visual interface By formally categorizing the types of graphics, Polaris is able to provide a simple interface for rapidly generating a wide range of displays This allows analysts to focus on the analysis task rather than the steps needed to retrieve and display the data

Discussions Comparison with similar work is omitted in this presentation Interpretation of visual specifications as database queries Interactivity and performance of Polaris

Interpretation of Visual Specifications as Database Queries Polaris generates the SQL query for each table pane Similar to CUBE operator generating the queries to create the cross-tab and Pivot Table displays However the CUBE operator is not applicable for Polaris because it assumes that the sets of relations partitioned into each table pane do not overlap

Interactivity and Performance of Polaris Polaris at its first implementations focuses on the techniques, semantics and formalism rather then the interactivity It has been experienced that the query response time does not need to be real-time in order to maintain a feeling of exploration (several tens of seconds)

Interactivity and Performance of Polaris Test Data:  A subset of a packet trace of a mobile network over a 13 week period, approx. 6 million tuples  A subset of the data collected from Sloan Digital Sky Survey (approx. 650MB) Both stored on MS SQL Server 2000 Paper does not provide numeric data on performance but the personal experiences of the testers

Conclusion Polaris extends the well known Pivot Table interface to display relational query results using a rich inexpensive set of graphical displays Succinct visual specification for describing table-based graphical displays of relational data Interpretation of visual specifications as a precise sequence of relational database operations

Future Work Performance evaluation Hierarchical data cubes Correspondence of marks to data tuples (dynamic mark generation) Animation shelf to display sequencing data

Thank You

Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases Ugur YENIER.

Similar presentations

Presentation on theme: "Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases Ugur YENIER."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases Ugur YENIER.

Similar presentations

Presentation on theme: "Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases Ugur YENIER."— Presentation transcript:

Similar presentations

About project

Feedback