Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases Ugur YENIER.

Slides:



Advertisements
Similar presentations
Area: Data visualization & Interface design
Advertisements

POLARIS Area: Data visualization & Interface design A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases By Chris Stolte.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Concepts of Database Management Seventh Edition
Concepts of Database Management Sixth Edition
Concepts of Database Management Seventh Edition
Reading Graphs and Charts are more attractive and easy to understand than tables enable the reader to ‘see’ patterns in the data are easy to use for comparisons.
Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases Presented by Darren Gates for ICS 280.
Classifier Decision Tree A decision tree classifies data by predicting the label for each record. The first element of the tree is the root node, representing.
1 i247: Information Visualization and Presentation Marti Hearst Multidimensional Graphing.
Types of Data Displays Based on the 2008 AZ State Mathematics Standard.
Polaris Query, Analysis, and Visualization of Large Hierarchical Relational Databases Pat Hanrahan With Chris Stolte and Diane Tang Computer Science Department.
Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases Chris Stolte and Pat Hanrahan Computer Science Department.
Multiscale Visualization Using Data Cubes Chris Stolte, Diane Tang, Pat Hanrahan Stanford University Information Visualization October 2002 Boston, MA.
Table Lens From papers 1 and 2 By Tichomir Tenev, Ramana Rao, and Stuart K. Card.
Evaluation of MineSet 3.0 By Rajesh Rathinasabapathi S Peer Mohamed Raja Guided By Dr. Li Yang.
Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
04 | Building Stellar Data Visualizations Using Power View.
Concepts of Database Management Sixth Edition
CS1100: Computer Science and Its Applications Creating Graphs and Charts in Excel.
Info Vis: Multi-Dimensional Data Chris North cs3724: HCI.
Data Mining Techniques
Concepts of Database Management, Fifth Edition
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
EMetric Presents A reporting application designed to fit the needs of ACCESS for ELLs users.
Data Analysis Using SPSS
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
TATA CONSULTANCY SERVICES
Data Exploration Chapter 9. Introduction  Where to begin?  Data exploration is data-centered query and analysis  Better understand the data and provide.
Chapter 5: Charts Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
Examples of different formulas and their uses....
Data Presentation & Graphing Introduction to Mechanical Engineering The University of Texas-Pan American College of Science and Engineering.
Working with Reports in Microsoft Excel Session Version 1.0 © 2011 Aptech Limited.
The Scientific Method Honors Biology Laboratory Skills.
© 2011 Autodesk High-End Infrastructure Modeling with Low-Cost Tools: Introducing AutoCAD® Map 3D 2012 Bradford Heasley, GISP Vice President, Brockwell.
Concepts of Database Management Seventh Edition
An Internet of Things: People, Processes, and Products in the Spotfire Cloud Library Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist.
Computer Literacy BASICS: A Comprehensive Guide to IC 3, 5 th Edition Lesson 23 Getting Started with Access Essentials 1 Morrison / Wells / Ruffolo.
Database Systems Microsoft Access Practical #3 Queries Nos 215.
Polaris: A System for Query, Analysis, & Visualization of Relational Databases Chris Stolte May 29 th, 2002.
Applied Quantitative Analysis and Practices
Unit 42 : Spreadsheet Modelling
VisDB: Database Exploration Using Multidimensional Visualization Maithili Narasimha 4/24/2001.
VizDB A tool to support Exploration of large databases By using Human Visual System To analyze mid-size to large data.
LBR & WS LAB 1: INTRODUCTION TO GIS.
Polaris: A System for Query, Analysis and Visualization of Multi- dimensional Relational Database by Chris Stolte & Pat Hanrahan presenter Andrew Trieu.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
Visualization Four groups Design pattern for information visualization
Scatter Plots Scatter plots are a graphic representation of collated biviariate data via a mathematical diagram using Cartesian coordinates. The data.
Microsoft Excel 2013 Chapter 8 Working with Trendlines, PivotTable Reports, PivotChart Reports, and Slicers.
Excel part 5 Working with Excel Tables, PivotTables, and PivotCharts.
Microsoft® Access Generate forms quickly 1 Modify controls in Layout View 2 Work with form sections 3 Modify controls in Design View 4 Add calculated.
Lesson 4: Querying a Database. 2 Learning Objectives After studying this lesson, you will be able to:  Create, save, and run select queries  Set query.
24 Copyright © 2009, Oracle. All rights reserved. Building Views and Charts in Requests.
M. MASTAK AL AMIN The summary Table A summary table indicates the frequency, amount or percentage of items in a set of categories so that you can see differences.
Data Visualization with Tableau
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
DB Implementation: MS Access Forms
Database Applications – Microsoft Access
CSc4730/6730 Scientific Visualization
DB Implementation: MS Access Forms
Graphs with SPSS.
Essentials of Statistics for Business and Economics (8e)
ESRM 250/CFR 520 Autumn 2009 Phil Hurvitz
Presentation transcript:

Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases Ugur YENIER

Introduction Need for data interfaces emerging with  Data warehousing  Scientific Computation  Business Analysis Graphic Representations are more effective Allowing multiple views of the same data Easy discovery on massive data

Introduction Meaning of data  Discover Structure  Find Patterns  Derive Casual Relationships n-Dimensional Data Cubes  Cube Dimension = Relational Schema Dimension

Introduction Most popular method : PIVOT TABLE  Allow data cube to be rotated, pivoted  Dimensions = Rows or Columns  Remaining Dimensions are aggregated  Cross-tabulations and summaries are provided Further exploit : Graphs  Projections of data cubes in Bar Charts Scatter points Parallel Coordinate Displays

Introduction POLARIS  Interface for exploring multi-dimensional databases  Extends Pivot Table to directly generate rich graphical displays Builds tables using algebraic formalism involving fields of the database Each table contains layers and panes

Overview Support interactive exploration of large multidimensional relational databases A relational database may contain heterogeneous but interrelated tables Field Characteristics  Nominal  Ordinal  Quantitative  Interval

Overview Polaris Field Categorization  Intervals = Quantitative  (Ordered) Nominal = Ordinal Dimension : Product Name Measure : Product Prize, Size Ordinal Fields  Dimensions Quantitative  Measure

Overview Target Specifications  Data-dense displays  Multiple display types  Exploratory Interface Polaris meets specs. providing rapidly and incrementally generating table-based displays Table = Rows, Columns + LAYERS

Overview Each table axis may contain multiple nested dimensions Each table entry (pane) consists a set of records represented with marks Sample Polaris Interface

Interface Characteristics Multivariate: Multiple dimension of data can be explicitly encoded Comparative : Small-multiple displays to compare, exposing patterns and trends Familiar : Statisticians are accustomed to using tabular display of graphics

Visualization Multiple data sources may be combined in a single visualization Dimensions are displayed in x,y,z shelves Record partitioning and layering Grouping information Graphic Type Field mappings to retinal properties

Visualization Selecting a mark pops up detail window displaying specified tuples It is possible to draw a rubber band around a set of marks to brush (will be discussed later…)

Generating Graphics There are three components  Specifications of the different table configurations  Type of graphics inside each pane  Details of the visual encodings

Table Algebra Formal mechanism to specify table configurations When a field is placed in a shelf, algebra expression is generated x,y axes partition into rows and columns, z partitions to layers

Table Algebra A,B,C representing ordinal fields P,Q,R representing quantitative fields Assignment of sets to symbols reflect difference in how two types of fields will be encoded in the structure of the tables Ordinal fields into rows and columns Quantitative fields into axes within the panes

Table Algebra Valid expression is an ordered sequence of one or more symbols Between each adjacent symbol there are operators Operators (in order of precedence)  (X) Cross  (/) Nest  (+) Concatenation

Concatenation Performs union operations

Cross Performs Cartesian product operations

Nest Similar to cross operator, but only creates set entries for which there exist records with those domain values. Interpretation is “B within A” For example, given the fields quarter and month, the expression quarter/month would be interpreted as months within each quarter

Table Algebra Every expression in the algebra can be reduced to a single set Each entry in the set being an ordered concatenation of zero of more ordinal values with zero or more quantitative field names This set evaluation of an expression is normalized set form

Normalized Set Form Table axis is partitioned into columns (rows or layers) so that there is a one-to-one correspondence between set entries in the set and columns

Normalized Set Form

Types of Graphics Once the table configuration is specified, next step is to specify the type of graphic in each pane Three graphic families  ordinal-ordinal  ordinal-quantitative  quantitative-quantitative Each family contains a number of ways to mark records

Type of Graphics Supported Polaris types  Rectangle  Circle  Glyph  Text  Gantt bar  Line  Polygon  Image

Types of Graphics Dependent and independent dimensions are interpreted differently By default dimensions are treated as independent dimensions Aggregations affect the type of graphics

Ordinal-Ordinal Axis variables are typically independent of each other Task is focused on understanding patterns and trends in some function ƒ(O x,O y )  R Typical example is studying sales and margin as a function of product type, month and state of items sold by a coffee chain

Ordinal-Quantitative Typically bar chart, possibly clustered or stacked, the dot plot and Gantt Chart Quantitative variable is often dependent on the ordinal variable and the aim is to understand or compare the properties of some function ƒ(O)  Q

Ordinal-Quantitative Matrix of bar charts used to study several functions of the independent variables product and month

Ordinal-Quantitative The cardinality of the record set does affect the structure of the graphics When the cardinality of the record is set is one, the graphics are simple bar or dot plots When the cardinality of the record is set to greater than one, the graphic is stacked bar chart

Ordinal-Quantitative Major wars over the past 500 year shown as a Gantt chart Additional layer in figure displays pictures of major scientists plotted as a function of the independent variables country of birth and date of birth

Quantitative- Quantitative Used to understand the distribution of data as a function of one or both quantitative variables and to discover casual relationships between the two quantitative variables

Quantitative-Quantitative Typical Map Flight scheduling varies with the region of the country in which the flight originated Number of flights between major airports has been plotted as a function of latitude and longitude Plotted in two layers, the location plots and the geography of each state as a polygon

Visual Mappings Each record in a pane is mapped to a mark Two Components  Type of graphic and mark  Encoding of fields of the records into visual or retinal properties of the selected mark Visual properties in Polaris are based on Bertin's retinal variables  Shape  Size  Orientation  Color (value and hue)  Texture (not supported in the current version of Polaris)

Retinal Properties The different retinal properties that can be used to encode fields of the data and examples of the default mappings that are generated when a given type of data field is encoded in each of the retinal properties

Visual Mappings Retinal properties of the display greatly enhances the data density and the variety of displays that can be generated Analysts should not be required to construct the mappings Instead, they should be able to simply specify that a field be encoded as a visual property System should then generate an effective mapping from the domain of the field to the range of the visual property

DATA TRANSFORMATIONS AND VISUAL QUERIES Rapidly change the table configuration, type of graphic, and visual encodings used to visualize a data set for interactive exploration Resulting display is also manipulable Analyst is able to sort, filter, and transform the data to uncover useful relationships and information Also form ad hoc groupings and partitions that reflect this newly uncovered information

Data Transformations and Visual Queries Polaris supports four features to perform visual queries  Deriving additional fields  Sorting and filtering  Brushing and tool tips  Undo and Redo

Deriving Additional Fields The generated fields are aggregates or statistical summaries Polaris currently provides five methods for deriving additional fields  Simple aggregation of quantitative measures  Counting of distinct values in ordinal dimensions  Discrete partitioning of quantitative measures  Ad hoc grouping within ordinal dimensions  Threshold aggregation

Deriving Additional Fields Simple Aggregation Basic aggregation operations (that are applied to a single quantitative field)  Summation  Average  Minimum  Maximum Right-Click and apply, change type Easily extended to provide any statistical aggregate that can be generated from relational data

Deriving Additional Fields Counting of Ordinal Dimensions Counting of distinct values for an ordinal field within the data set Right-Click and apply Applying the count operator changes the field type (to quantitative) and thus change the table configuration and graph type in each pane

Deriving Additional Fields Discrete Partitioning Used to discretize a continuous domain Polaris provides two discretization methods  Binning, allows the analyst to specify a regular bin size in which to aggregate the data, useful for creating graphs, such as histograms, in which there are many regularly sized bins  Partitioning, allows the user to individually specify the size and name of each bin, useful for encoding additional categorizations into the data  Right-Click and apply

Deriving Additional Fields Ad hoc Grouping Ordinal version of quantitative partitioning, where the user can choose to group together different ordinal values Allows the analyst to add own domain knowledge to the analysis and to change the groupings as the exploration uncovers additional patterns Right-Click and apply

Deriving Additional Fields Threshold Aggregation It is derived from two source fields: an ordinal field and a quantitative field If the quantitative field is less than a certain threshold value for any values of an ordinal field, those values are aggregated together to form an "Other" category Allows the user to specify threshold values below which the data is considered uninteresting Right-Click and apply

Sorting and Filtering Filtering allows the user to choose which values to display so that he can focus on and devote more screen space and attention to the areas of interest For ordinal fields, a listbox with all possible values is shown and the user can check or uncheck each value to display it or not For quantitative fields, a dynamic query slider allows the user to choose a new domain Additionally, there are textboxes showing the chosen minimum and maximum values that the user can use to directly enter a new domain.

Sorting and Filtering Sorting allows the user to uncover hidden patterns by changing the order of values within a field's domain or the ordering of tuples in the data The ordering of tuples affects the drawing order of marks within a pane. Polaris provides three ways for a user to sort the domain.  User can bring up the filter window and drag-and drop the values within that window to reorder the domain  If the field has been used to partition the table into rows or columns, the user can drag-and-drop the table row or column headers to reorder the domain values  Polaris provides programmatic sorting, allowing the user to sort one field based on the value in another field

Brushing and Tooltips Analysts want to directly interact with the data, visually querying the data to highlight correlated marks or getting more details on demand  Brushing allows the user to choose a set of interesting data points by drawing a rubberband around them  Tooltips allow the user to get more details on demand.

Brushing The user selects a single field whose values are then used to identify related marks and tuples All marks corresponding to tuples sharing selected field values with the selected tuples are subsequently highlighted in all other panes or linked Polaris views Allowing correlation between different projections of the same data set or relationships between distinct data sets.

Tooltips If the user hovers over a data point or pane, additional details, such as specific field values for the tuple corresponding to the selected mark, are shown Analysts can use tooltips to understand the relationship between the graphical marks and the underlying data

Undo and Redo Unlimited undo and redo within an analysis sessio Users can use the "Back" and "Forward" buttons on the top toolbar to either return to a previous visual specification or to move forward again.

GENERATING DATABASE QUERIES

Results Throughout the analyses users want to see data and how they want to see it change continually Analysts  form hypotheses  create new views to perform tests and experiments Certain displays enable an understanding of overall trends, whereas others show causal relationships As the analysts better understand the data, they may want to drill-down in the visible dimensions or display entirely different dimensions Polaris supports this exploratory process through its visual interface By formally categorizing the types of graphics, Polaris is able to provide a simple interface for rapidly generating a wide range of displays This allows analysts to focus on the analysis task rather than the steps needed to retrieve and display the data

Discussions Comparison with similar work is omitted in this presentation Interpretation of visual specifications as database queries Interactivity and performance of Polaris

Interpretation of Visual Specifications as Database Queries Polaris generates the SQL query for each table pane Similar to CUBE operator generating the queries to create the cross-tab and Pivot Table displays However the CUBE operator is not applicable for Polaris because it assumes that the sets of relations partitioned into each table pane do not overlap

Interactivity and Performance of Polaris Polaris at its first implementations focuses on the techniques, semantics and formalism rather then the interactivity It has been experienced that the query response time does not need to be real-time in order to maintain a feeling of exploration (several tens of seconds)

Interactivity and Performance of Polaris Test Data:  A subset of a packet trace of a mobile network over a 13 week period, approx. 6 million tuples  A subset of the data collected from Sloan Digital Sky Survey (approx. 650MB) Both stored on MS SQL Server 2000 Paper does not provide numeric data on performance but the personal experiences of the testers

Conclusion Polaris extends the well known Pivot Table interface to display relational query results using a rich inexpensive set of graphical displays Succinct visual specification for describing table-based graphical displays of relational data Interpretation of visual specifications as a precise sequence of relational database operations

Future Work Performance evaluation Hierarchical data cubes Correspondence of marks to data tuples (dynamic mark generation) Animation shelf to display sequencing data

Thank You