Architectural Support for Database Visualization

Slides:



Advertisements
Similar presentations
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 5A Relational Algebra.
Advertisements

1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
1 Efficient Temporal Coalescing Query Support in Relational Database Systems Xin Zhou 1, Carlo Zaniolo 1, Fusheng Wang 2 1 UCLA, 2 Simens Corporate Research.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
1 Enviromatics Spatial database systems Spatial database systems Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Efficient Query Evaluation on Probabilistic Databases
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 Primitives for Workload Summarization and Implications for SQL Prasanna Ganesan* Stanford University Surajit Chaudhuri Vivek Narasayya Microsoft Research.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
1 9. Evaluation of Queries Query evaluation – Quantifier Elimination and Satisfiability Example: Logical Level: r   y 1,…y n  r’ Constraint.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
Data Mining Techniques
Programming Languages
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
Data Models, Representation, Transformation. Visualization Framework Displays Visualization Techniques Design Process Iterative design Design studies.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
1 Relational Algebra and Calculas Chapter 4, Part A.
IST 210 The Relational Language Todd S. Bacastow January 2004.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Designing the User Interface: Strategies for Effective Human-Computer.
CHAPTER 4 THE VISUALIZATION PIPELINE. CONTENTS The focus is on presenting the structure of a complete visualization application, both from a conceptual.
Copyright © Curt Hill The Relational Calculus Another way to do queries.
Feature Generation and Selection in SRL Alexandrin Popescul & Lyle H. Ungar Presented By Stef Schoenmackers.
©Silberschatz, Korth and Sudarshan2.1Database System Concepts - 6 th Edition Chapter 8: Relational Algebra.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Relational Algebra & Calculus
CSE202 Database Management Systems
Formal Specification.
Module 2: Intro to Relational Model
COP4710 Database Systems Relational Algebra.
Chapter 2 Database System Concepts and Architecture
Entity Relationship (E-R) Modeling
Avraham Leff James T. Rayfield IBM T.J. Watson Research Center
Enhanced Entity-Relationship and UML Modeling
Fundamentals & Ethics of Information Systems IS 201
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Chapter 12 Outline Overview of Object Database Concepts
Chapter 2: Intro to Relational Model
Relational Algebra Chapter 4, Part A
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
CS 174: Server-Side Web Programming February 12 Class Meeting
Data Mining Concept Description
Ch 15 –part 3 -design evaluation
Database Applications (15-415) Relational Calculus Lecture 6, September 6, 2016 Mohammad Hammoud.
File Systems and Databases
Enhance BI Applications and Simplify Development
Relational Algebra : #I
Database Systems Instructor Name: Lecture-3.
ITEC 3220A Using and Designing Database Systems
Chapter 2: Intro to Relational Model
Chapter 2: Intro to Relational Model
CHAPTER 7: Information Visualization
Example of a Relation attributes (or columns) tuples (or rows)
Chapter 2: Intro to Relational Model
CHAPTER 14: Information Visualization
Problem Solving with Constraints
Relational Algebra & Calculus
CS639: Data Management for Data Science
Relational Calculus Chapter 4, Part B 7/1/2019.
Slides based on those originally by : Parminder Jeet Kaur
Human and Computer Interaction (H.C.I.) &Communication Skills
Relational Calculus Chapter 4, Part B
Presentation transcript:

Architectural Support for Database Visualization Dennis Groth Indiana University Computer Science May 1, 2002

Talk Structure Problem Motivation Overview of Visualization Process Data Preparation Definable Maps Visualizing Databases Summary and Future Research 9/19/2018 Dennis Groth

Motivation What is visualization? “the act or process of interpreting in visual terms or of putting into visible form.” [Webster’s] “Transforming the symbolic into the geometric.” [McCormick et al, 1987] “The binding (or mapping) of data to representations that can be perceived.” [Foley, 1994] 9/19/2018 Dennis Groth

The Goal of Visualization Gain insight into data Understand the “whole” Different than presentation graphics: Used to communicate information to others 9/19/2018 Dennis Groth

Classic Presentation Display of 6 variables. [Minard, 1861] 9/19/2018 Dennis Groth

Scientific Visualization Visual representation of scientific data. Rainfall in Peru over 3 day period [Goldberg et al, 1987] 9/19/2018 Dennis Groth

Information Visualization Visual representation of abstract data. [Shneiderman, et al] 9/19/2018 Dennis Groth

What’s the Difference? Scientific data: Abstract data: Often, already numeric Natural mapping to coordinates Abstract data: May not be numeric (No order, No scale) Mappings must be defined or constructed 9/19/2018 Dennis Groth

Visualization Process (KDD) 9/19/2018 Dennis Groth

Interpret Patterns Discriminating clusters 9/19/2018 Dennis Groth

Research Mission To leverage database techniques and technologies in order to enhance visualization activities, contributing to measurable improvements in efficiency, effectiveness and satisfaction of users. 9/19/2018 Dennis Groth

Research Contributions Architecture Mapping and Map Specification Measuring Usability System Implementation Application Visualizing relationships 9/19/2018 Dennis Groth

Architecture User Front End DB Query Specification Map Specification Filtered Image Query Specification Map Specification Filter Image Display Query Map Scaled Data Rendering Data Domain Filtered Data Data Extraction Data Preparation Visual Query Plot Raw Data Pre- Image Unscaled Data DB 9/19/2018 Dennis Groth

Architecture User Front End DB Query Specification Map Specification Filtered Image Query Specification Map Specification Filter Image Display Query Map Scaled Data Rendering Data Domain Filtered Data Data Extraction Data Preparation Visual Query Plot Raw Data Pre- Image Unscaled Data DB 9/19/2018 Dennis Groth

Simple Visualization Output Input Data t1 t2 t3 . tn 9/19/2018 Dennis Groth

Database Activities Salary Age Sex Salary Age … M 640001 57 Select Salary,Age,Count(*) From Employee Group By Salary,Age 9/19/2018 Dennis Groth

Architecture User Front End DB Query Specification Map Specification Filtered Image Query Specification Map Specification Filter Image Display Query Map Scaled Data Rendering Data Domain Filtered Data Data Extraction Data Preparation Visual Query Plot Raw Data Pre- Image Unscaled Data DB 9/19/2018 Dennis Groth

Data Preparation Filtered Map Data Map Join Aggregation Pre-Image Raw Data Canned Algorithms Future Extensions 9/19/2018 Dennis Groth

Map Join Output Input Data Map t1 m1 t2 m2 t3 m3 . . tn mn 9/19/2018 Dennis Groth

Database Activities SalaryRank Age Sex Salary Age … M 640001 57 Salary 3318 … Age Select SalaryRank,Age, Count(*) From Employee, SalaryMap Where Employee.Salary = SalaryMap.Salary Group By SalaryRank,Age 9/19/2018 Dennis Groth

Architecture User Front End DB Query Specification Map Specification Filtered Image Query Specification Map Specification Filter Image Display Query Map Scaled Data Rendering Data Domain Filtered Data Data Extraction Data Preparation Visual Query Plot Raw Data Pre- Image Unscaled Data DB 9/19/2018 Dennis Groth

Visual Query Defines the linkages between the data and the display Pre-defined schema for each visualization Histogram – {X, Height}, {X, Y, Height} Scatterplot – {X, Y}, {X, Y, Aggregation}, … Line, Surface, … 9/19/2018 Dennis Groth

Architecture User Front End DB Query Specification Map Specification Filtered Image Query Specification Map Specification Filter Image Display Query Map Scaled Data Rendering Data Domain Filtered Data Data Extraction Data Preparation Visual Query Plot Raw Data Pre- Image Unscaled Data DB 9/19/2018 Dennis Groth

Map Maps are relations, so: Applied with relational operators Modifications to maps do not affect base data Multiple maps can be applied to one dataset One map can be applied to multiple datasets 9/19/2018 Dennis Groth

Constructing Maps Standard database operations: Insert statements Constant values: Insert into MonthMap values (1, “January”) Calculated values: Insert into MonthMap Select floor(SalesAmount / 1000) as MapValue, * From MonthlySales Algorithm driven (cluster, binning, etc.) Mapping Language 9/19/2018 Dennis Groth

Mapping Language Map program – P = < p1, p2, …, pk > Based on Datalog No recursion, no negation Each rule is written as:     is a boolean expression  is an expression that evaluates to a numeric value Functions are allowed 9/19/2018 Dennis Groth

Map Program Rules Rules are defined over sets of attributes, not specific tables (t) is the value of the head of the rule when substituting attribute values from tuple t (t) is the value of the body of the rule when substituting attribute values from tuple t Two special types of rules Except   (Excludes tuples)   Else (If all preceding rules fail) 9/19/2018 Dennis Groth

Map Program Interpretations Given P = < p1, p2, …, pk > and input instance s Relational Interpretation: IR(P,s) = {< t , i(t) > : t  s  1  i  k  i(t)} Functional Interpretation: IF(P,s) = {< t , i(t) > : t  s  1  i  k  i(t)  (j)[1  j < i  j(t)]} 9/19/2018 Dennis Groth

Properties of the Language Monotonic - Equivalent to subset of RA Safe - Finite output Closed – Takes as input a relation and returns a relation Allows composition - IR(P, IF(Q, IF(R,s) ) ) One implementation supports both interpretations Complexity - O(|input|) 9/19/2018 Dennis Groth

SQL Approach Each rule has an equivalent SQL query Select  From InputRelation Where  Each program has an equivalent union query Select 1 From InputRelation Where 1 UNION … … UNION Select k From InputRelation Where k Complexity - O(|input| k) 9/19/2018 Dennis Groth

Example Map Programs 1 Month = ‘January’ 2 Month = ‘February’ . . . Age + 100  Sex=‘F’ Age  Else 9/19/2018 Dennis Groth

Map Language Usability Prior studies Early work (Reissner, Welty) SQL vs QBE (Yen and Scamell) SQL vs Visual Language (Catarci, et al) Goal: Quantify usability of the mapping language 9/19/2018 Dennis Groth

Experiment Design Subjects 27 undergraduate students (Low skill level) 28 graduate students (Medium skill level) 10 professionals (High skill level) All 3 groups attempted the same tasks and were given the same training materials Half of each group used SQL, the other half used the mapping language 9/19/2018 Dennis Groth

Results Accuracy (Within Group) Satisfaction (Pre to Post Comparison) Undergraduates perform better with rules (p < .10) Graduates perform better with rules (p < .10) Note: Excludes 3 subjects that did not answer any problems No difference for professionals Satisfaction (Pre to Post Comparison) Undergrad/Professional satisfied with rules Everyone not satisfied with SQL (p < .10) Preference Professionals preferred rule language (p < .001) 9/19/2018 Dennis Groth

Results Interesting trend for satisfaction (5=best) Rule Satisfaction Pre-Test Post-Test Undergrad 2.9 3.0 Grad 3.2 Professional SQL Satisfaction Pre-Test Post-Test Undergrad 3.2 2.6 Grad 3.1 2.5 Professional 9/19/2018 Dennis Groth

System Implementation Queries Database Client Server Map Requests Map Construction 9/19/2018 Dennis Groth

User Interaction Rotation, Translation, Scaling Drill-down queries Select data-points for use in other contexts Like brushing Combining plots Scaled independently or dependently Overlay, Offset, Tile 9/19/2018 Dennis Groth

Visualization of Databases Proof of concept application Mapping based on entropy Allows insight into structure of a relation Functional Dependencies (Exact) Approximate Dependencies (Almost an FD) Information dependency measures (Dalkilic and Robertson, 2000) Visualizations show every relationship in one plot 9/19/2018 Dennis Groth

Identifying Relationships Function-Like Functional Dependency 9/19/2018 Dennis Groth

Visualization of Databases Entropy calculation :: H(A) = -  pi log(1/pi) where each pi is the probability of ai in the active domain of attribute A Gives the average number of bits needed to transmit an A value 0  H(A)  log(|Adom(A)|) H(AB) = H(AB) – H(A) If H(AB) = 0, the FD AB holds Provides a measure of approximateness 9/19/2018 Dennis Groth

Comparing Datasets 9/19/2018 Dennis Groth

Summary Architecture supporting visualization Mapping as a key element Mapping language Usability evaluation System Implementation Mission: To leverage database techniques and technologies in order to enhance visualization activities, contributing to measurable improvements in efficiency, effectiveness and satisfaction of users. 9/19/2018 Dennis Groth

Future Research Visualization Data Mining Human Computer Interaction Extensions driven by applications Data Mining Rule management (information overload) Human Computer Interaction Empirical testing of application manager user interface 9/19/2018 Dennis Groth

Questions 9/19/2018 Dennis Groth