Polaris Query, Analysis, and Visualization of Large Hierarchical Relational Databases Pat Hanrahan With Chris Stolte and Diane Tang Computer Science Department.

Slides:



Advertisements
Similar presentations
Area: Data visualization & Interface design
Advertisements

Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
POLARIS Area: Data visualization & Interface design A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases By Chris Stolte.
C van Ingen, D Agarwal, M Goode, J Gupchup, J Hunt, R Leonardson, M Rodriguez, N Li Berkeley Water Center John Hopkins University Lawrence Berkeley Laboratory.
Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases Presented by Darren Gates for ICS 280.
Decision Support and Data Warehouse. Decision supports Systems Components Data management function –Data warehouse Model management function –Analytical.
Classifier Decision Tree A decision tree classifies data by predicting the label for each record. The first element of the tree is the root node, representing.
1 i247: Information Visualization and Presentation Marti Hearst Multidimensional Graphing.
Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases Chris Stolte and Pat Hanrahan Computer Science Department.
Multiscale Visualization Using Data Cubes Chris Stolte, Diane Tang, Pat Hanrahan Stanford University Information Visualization October 2002 Boston, MA.
Implementing Business Analytics with MDX Chris Webb London September 29th.
Infovis and data george, laura, tjerk.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Table Lens From papers 1 and 2 By Tichomir Tenev, Ramana Rao, and Stuart K. Card.
Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris Chris Stolte, Diane Tang, Pat Hanrahan July 2002.
Chap8: Trends in DBMS 8.1 Database support for Field Entities 8.2 Content-based retrieval 8.3 Introduction to spatial data warehouses 8.4 Summary.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
I2b2 grid integration with Ontology Mapper
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Info Vis: Multi-Dimensional Data Chris North cs3724: HCI.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Data Mining Techniques
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
MD240 - MIS Oct. 4, 2005 Databases & the Data Asset Harrah’s & Allstate Cases.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases Ugur YENIER.
1 CUBE: A Relational Aggregate Operator Generalizing Group By By Ata İsmet Özçelik.
IMS 6217: Data Warehousing / Business Intelligence Part 3 1 Dr. Lawrence West, Management Dept., University of Central Florida Analysis.
CIS 9002 Kannan Mohan Department of CIS Zicklin School of Business, Baruch College.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
Computer Science 101 Database Concepts. Database Collection of related data Models real world “universe” Reflects changes Specific purposes and audience.
CS 157B: Database Management Systems II March 20 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak.
Succeeding with Technology Database Systems Basic Data Management Concepts Organizing Data in a Database Database Management Systems Using Database Systems.
1 Data Warehouses BUAD/American University Data Warehouses.
Database A database is a collection of data organized to meet users’ needs. In this section: Database Structure Database Tools Industrial Databases Concepts.
Data Warehousing.
Module 1: Introduction to Data Warehousing and OLAP
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
Polaris: A System for Query, Analysis, & Visualization of Relational Databases Chris Stolte May 29 th, 2002.
Info Vis: Multi-Dimensional Data Chris North cs3724: HCI.
Polaris: A System for Query, Analysis and Visualization of Multi- dimensional Relational Database by Chris Stolte & Pat Hanrahan presenter Andrew Trieu.
Information Visualization External Cognition Using perception to amplify cognition 34 x
Data Warehousing.
12 1 Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 12.4 Online Analytical Processing OLAP creates an advanced data.
Mailto : for all Hyperion video tutorial/Training/Certification/Material Understanding MDX with BSO and ASO.
1 Visual Encoding Andrew Chan CPSC 533C January 20, 2003.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Physical Layer of a Repository. March 6, 2009 Agenda – What is a Repository? –What is meant by Physical Layer? –Data Source, Connection Pool, Tables and.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Business Intelligence Environment Integration with Dynamics NAV Rogers Family Company Matthew McGinley Devraj Ghosh Dominic Miller.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
Data Analysis Decision Support Systems Data Analysis and OLAP Data Warehousing.
Data Mining: Data Warehousing
Intro to MIS – MGS351 Databases and Data Warehouses
[Ing.Skorkovský,CSc] KPH_ESF_MU
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Implementing Data Models & Reports with Microsoft SQL Server
Data Warehouse and OLAP
CSc4730/6730 Scientific Visualization
[Ing.Skorkovský,CSc] KPH_ESF_MU
DATA CUBES E0 261 Jayant Haritsa Computer Science and Automation
Analysis Services Analysis Services vs. the Data Warehouse vs. OLTP DB
Slides based on those originally by : Parminder Jeet Kaur
Data Warehouse and OLAP
Presentation transcript:

Polaris Query, Analysis, and Visualization of Large Hierarchical Relational Databases Pat Hanrahan With Chris Stolte and Diane Tang Computer Science Department Stanford University

Motivation Large databases have become very common Corporate data warehouses Amazon, Walmart,… Scientific projects: Human Genome Project Sloan Digital Sky Survey Need tools to extract meaning from these databases

Related Work Formalisms for graphics Bertin’s “Semiology of Graphics” Mackinlay’s APT Roth et al.’s Sage and SageBrush Wilkinson’s “Grammar of Graphics” Visual exploration of databases DeVise DataSplash/Tioga-2 Visualization and data mining SGI’s MineSet IBM’s Diamond

Formalism

Polaris Formalism UI interpreted as visual specification that defines: Table configuration Type of graphic in each pane Encoding of data as visual properties of marks Data transformations and queries

Schema Market State Year Quarter Month Product Type Product Profit Sales Payroll Marketing Inventory Margin COGS... Ordinal fields (categorical) Quantitative fields (measures) Coffee chain data [Visual Insights]

Polaris Visual Encodings Principle of Importance Ordering: Encode the most important information in the most effective way [Cleveland & McGill]

The Pivot Table Interface Common interface to statistical packages/Excel Cross-tabulations Simple interface based on drag-and-drop

Data Cubes Structure relation as n-dimensional cube Each cell aggregates all measures for those dimensions Each cube axis corresponds to a dimension in the relation

Table Algebra: Operands Ordinal fields: interpret domain as a set that partitions table into rows and columns: Quarter = {(Qtr1),(Qtr2),(Qtr3),(Qtr4)}  Quantitative fields: treat domain as single element set and encode spatially as axes: Profit = {(Profit)} 

Concatenation (+) Operator Ordered union of two sets Quarter + ProductType = {(Qtr1),(Qtr2),(Qtr3),(Qtr4)}+{(Coffee),(Espresso)} = {(Qtr1),(Qtr2),(Qtr3),(Qtr4),(Coffee),(Espresso)} Profit + Sales = {(Profit),(Sales)}

Cross (  ) Operator Direct-product of two sets Quarter  ProductType = {(Qtr1,Coffee), (Qtr1, Tea), (Qtr2, Coffee), (Qtr2, Tea), (Qtr3, Coffee), (Qtr3, Tea), (Qtr4, Coffee), (Qtr4,Tea)} ProductType  Profit =

SQL Dataflow Notes Aggregation operators applied after sort Only one layer is shown; additional z-sort Relational TableTuples in PanesMarks in Panes Sort

Multiscale Visualization

Hierarchical Structure Challenge: these databases are very large Queries/Vis should not require all the records Augment database with hierarchical structure Provide meaningful levels of abstraction Derived from domain or clustering Provides metadata (missing data for context)

Hierarchies and Data Cubes Each dimension in the cube is structured as a tree Each level in tree corresponds to level of detail

Schema: Star Schema State Month Product Profit Sales Payroll Marketing Inventory Margin... Measures Location Market State Time Year Quarter Month Products Product Type Product Name Fact table Existence Table Generalizations Snowflake schemas Lattices (DAGs)

Categorical Hierarchies Quarter  Month Direct product of two sets Would create twelve entries for each quarter, i.e. (Qtr1, December) Quarter / Month Based on tuples in database not semantics Would only create three entries per quarter Can be expensive to compute Quarter. Month Based on tuples in existence tables (not db)

Cartographic Generalization Canterbury and East Kent 1:50,0001:625,000

Generalization: Techniques Selection Simplification Exaggeration Regularization Displacement Aggregation

Summary Polaris Spreadsheet or table-based displays Simple drag-and-drop interface Built on a formalism that allows algebraic manipulation of visual mapping of tuples to marks Multiscale visualizations using data and visual abstraction Connects to SQL/MDX servers See

Future Work Articulate full-set of multiscale design patterns Transition between levels of detail Develop system infrastructure for browsing VLDB Support layers/lenses/linking with tuple flow Device independence through graphical encodings Extend formalism to 3D Couple scientific and information visualization …