Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002.

Slides:



Advertisements
Similar presentations
Area: Data visualization & Interface design
Advertisements

OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
POLARIS Area: Data visualization & Interface design A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases By Chris Stolte.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases Presented by Darren Gates for ICS 280.
Database Systems: Design, Implementation, and Management Tenth Edition
Classifier Decision Tree A decision tree classifies data by predicting the label for each record. The first element of the tree is the root node, representing.
1 i247: Information Visualization and Presentation Marti Hearst Multidimensional Graphing.
Polaris Query, Analysis, and Visualization of Large Hierarchical Relational Databases Pat Hanrahan With Chris Stolte and Diane Tang Computer Science Department.
Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases Chris Stolte and Pat Hanrahan Computer Science Department.
Multiscale Visualization Using Data Cubes Chris Stolte, Diane Tang, Pat Hanrahan Stanford University Information Visualization October 2002 Boston, MA.
Implementing Business Analytics with MDX Chris Webb London September 29th.
Table Lens From papers 1 and 2 By Tichomir Tenev, Ramana Rao, and Stuart K. Card.
Chap8: Trends in DBMS 8.1 Database support for Field Entities 8.2 Content-based retrieval 8.3 Introduction to spatial data warehouses 8.4 Summary.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
OLAP OPERATIONS. OLAP ONLINE ANALYTICAL PROCESSING OLAP provides a user-friendly environment for Interactive data analysis. In the multidimensional model,
1 Basic concepts of On-Line Analytical processing DT211 /4.
Chetan Bhirud Raza Mohammad Abinash Sahoo Online Marketing Giant.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Data Mining Techniques
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
DWH – Dimesional Modeling PDT Genči. 2 Outline Requirement gathering Fact and Dimension table Star schema Inside dimension table Inside fact table STAR.
SQL Analysis Services Microsoft® SQL Server 2005 Analysis Services provides unified, fully integrated views of your business data to support online.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Override the title Chris Harrington
Databases and LINQ Visual Basic 2010 How to Program 1.
Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases Ugur YENIER.
Chapter 6 SAS ® OLAP Cube Studio. Section 6.1 SAS OLAP Cube Studio Architecture.
Succeeding with Technology Database Systems Basic Data Management Concepts Organizing Data in a Database Database Management Systems Using Database Systems.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Data Warehouse. Design DataWarehouse Key Design Considerations it is important to consider the intended purpose of the data warehouse or business intelligence.
1 Data Warehouses BUAD/American University Data Warehouses.
Data Warehousing.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
Polaris: A System for Query, Analysis, & Visualization of Relational Databases Chris Stolte May 29 th, 2002.
BI Terminologies.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
DEFINING the BUSINESS REQUIREMENTS. Introduction OLTP and DW planning is different in term of requirements clarity Planning DW is about solving users’
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 13 Business Intelligence and Data Warehouses.
UNIT-II Principles of dimensional modeling
Polaris: A System for Query, Analysis and Visualization of Multi- dimensional Relational Database by Chris Stolte & Pat Hanrahan presenter Andrew Trieu.
Data Mining Data Warehouses.
What is OLAP?.
Data Warehousing.
12 1 Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 12.4 Online Analytical Processing OLAP creates an advanced data.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
OLAP Theory-English version On-Line Analytical processing (Buisness Intelligence) Ing.Skorkovský,CSc Department of Corporate Economy Faculty of Economics.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Chapter 13 Business Intelligence and Data Warehouses
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
MANAGING DATA RESOURCES
Data Warehouse and OLAP
Enhance BI Applications and Simplify Development
CSc4730/6730 Scientific Visualization
OLAP in DWH Ján Genči PDT.
DWH – Dimesional Modeling
Data Warehouse and OLAP
Presentation transcript:

Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002

Motivation  Large databases have become very common Corporate data warehouses Amazon, Walmart,… Scientific projects: Human Genome Project Sloan Digital Sky Survey  Need tools to extract meaning from these databases Programmatic data mining/statistical analysis Visual exploration and analysis

Hierarchical Structure  Challenge: these databases are very large Queries can not visit every record Visualizations can not display every record  Analysts have augmented databases with hierarchical structure Provide meaningful levels of abstraction Leveraged by both computer and analyst Derived from semantics or programmatic analysis  Tools need to take advantage of these hierarchies

Contributions  Interactive tool for analysis of data warehouses with hierarchical structure Based on Polaris* Rapid construction of table-based visualizations Algebraic formalism Analysis of flat relational databases To support hierarchies, we need to extend: User interface Algebraic formalism Generation of data queries * C. Stolte, D. Tang, and P. Hanrahan. Polaris: A System for Query, Analysis, and Visualization of Multi-dimensional Relational Databases. In IEEE Transactions on Visualization and Computer Graphics, January 2002.

Outline Review of Polaris Demo Formalism Hierarchies and Data Cubes Extensions to Polaris Demo Formalism Discussion

Schema: Denormalized Relation Market State Year Quarter Month Product Type Product Profit Sales Payroll Marketing Inventory Margin COGS... Ordinal fields (categorical) Quantitative fields (metrics) Hypothetical nation-wide coffee chain data (courtesy Visual Insights)

Demo I: Original Polaris

Polaris Review  Provide an interface for rapidly and incrementally generating table-based graphical displays  Users construct visualizations via a drag-and-drop interface  Queries are automatically generated  Interface is simple and expressive because built upon a formalism

Polaris Formalism  UI interpreted as visual specification that defines: table configuration type of graphic in each pane encoding of data as visual properties of marks data transformations  Specification automatically compiled into necessary queries & drawing commands

Polaris Formalism  UI interpreted as visual specification that defines: table configuration type of graphic in each pane encoding of data as visual properties of marks data transformations  Specification automatically compiled into necessary queries & drawing commands

Specifying Table Configurations  Interface: define table configuration by dropping fields on shelves  Formalism: shelf content interpreted as expressions in table algebra

Table Algebra  Operands are the database fields each operand interpreted as a set {…} quantitative and ordinal fields interpreted differently  Three operators: concatenation (+), cross (X), nest (/)

 Ordinal fields: interpret domain as a set that partitions table into rows and columns: Quarter = {(Qtr1),(Qtr2),(Qtr3),(Qtr4)}  Table Algebra: Operands  Quantitative fields: treat domain as single element set and encode spatially as axes: Profit = {(Profit[-410,650])} 

Concatenation (+) operator  Ordered union of set interpretations: Quarter + ProductType = {(Qtr1),(Qtr2),(Qtr3),(Qtr4)} + {(Coffee), (Espresso)} = {(Qtr1),(Qtr2),(Qtr3),(Qtr4),(Coffee),(Espresso)} Profit + Sales = {(Profit[-310,620]),(Sales[0,1000])}

Cross (x) operator  Cross-product of set interpretations: Quarter x ProductType = ProductType x Profit = {(Qtr1,Coffee), (Qtr1, Tea), (Qtr2, Coffee), (Qtr2, Tea), (Qtr3, Coffee), (Qtr3, Tea), (Qtr4, Coffee), (Qtr4,Tea)}

Nest (/) operator  Quarter x Month would create entry twelve entries for each quarter. i.e., (Qtr1, December)  Quarter / Month would only create three entries per quarter based on tuples in database not semantics can be expensive to compute

Outline Review of Polaris Demo Formalism Hierarchies and Data Cubes Extensions to Polaris Demo Formalism Discussion

Data Cubes  Structure relation as n-dimensional cube Each cell summarizes all measures for those dimension values Each cube dimension corresponds to a dimension in the relation

Hierarchies and Data Cubes  Each dimension in the cube is structured as a tree  Each level in tree corresponds to level of detail  Nodes correspond to domain values

Hierarchies and Data Cubes  Some hierarchies known a priori Provide semantic meaning Time (day, month, year) Location (city, state, country)  Can be automatically generated Classification algorithms Clustering  Enable analyst to reason at high level of abstraction then drill down Interface must expose underlying hierarchical structure

Hierarchy Model  Our model assumes that hierarchies: Can be modeled using star or snowflake schema Have uniform depth Have homogenous node types  Other models relax these constraints  Chose to focus on model commonly found in commercial data warehouse and data cube products

Outline Review of Polaris Demo Formalism Hierarchies and Data Cubes Extensions to Polaris Demo Formalism Discussion

Schema: Star Schema State Month Product Profit Sales Payroll Marketing Inventory Margin COGS... Measures Location Market State Time Year Quarter Month Products Product Type Product Name Fact table Dimension Table

Demo II: Revised Polaris

Extending the Formalism  Redefine operands as dimension levels and measures not simply database fields  Need to define set interpretation of a dimension level Domain is not a single ordered list Composed of node values at particular level in hierarchy Node values are uniquely defined by the path from root node  Possible definitions?

Set Interpretation: Option 1  Define set interpretation by listing each node value with unique path to root: {1998.Qtr1.Jan, …., 1998.Qtr4.Dec} (+) Provides unique set interpretation (-) Limits expressiveness Any table including “Months” must include “Year” Not possible to summarize across years (e.g., Total Sales in January for all Years) Not a standard projection of data cube but very useful

Set Interpretation: Option 2  Define set interpretation by listing each node value without path to root: {Jan, Feb, …., Dec}  Order by depth first traversal  Consolidate non-unique values This works—but how do we leverage known relationship between dimension levels?

Dot (.) Operator  Nest isn’t aware of defined hierarchical relationships: Year / Months might work—if all data present Inefficient  New operator: Dot (.) Nest computed using the dimension table rather then the fact table  Sufficient to provide support for aggregation, drill down, and roll up in algebra.

Generating Queries  Queries generated from specification.  Panes correspond to either a slice of a projection or an aggregation of a projection.  Multiple queries required if level-of-detail varies.  Algebraic manipulation can be used to determine minimal set of queries.  Interpreter generates SQL, MDX, or Rivet queries.

Related Visualization Projects  Formalisms for Graphics Wilkinson’s Grammar of Graphics Bertin’s Semiology of Graphics Mackinlay’s APT  Visual Exploration of Databases VQE, DeVise, Visage, DataSplash/Tioga-2,…  Visualization and Data Mining MineSet, …

Data Mining and Visualization  Polaris not solely for visual analysis Precursor to algorithmic analysis to identify areas of interest Validate results and establish trust and understanding Incorporate decision trees and classification algorithms into data warehouses as hierarchies

Summary  Extended Polaris to fully support and expose hierarchical structure of data cubes  Extended not only interface but underlying algebraic formalism

Future Work  Use underlying formalism as basis for other visualization tools Interactive pan-and-zoom systems

Future Work  Visual presentation of metadata Hierarchies are one example of rich, domain specific metadata As important to analysis as data itself How to visualize this metadata?

Future Work  Interactive visualization  Prefetching and Caching