1 i247: Information Visualization and Presentation Marti Hearst Multidimensional Graphing
2 Today Discuss found visualizations Discuss Polaris paper Introducing the EDA assignment In-class practice with EDA
3 The Polaris Framework Goal: support interactive exploration of multi- dimensional relational databases –Nice overview of how to combine different standard visualizations into interactive systems. Data types: –Only ordinal and quantitative! Treats intervals as quantitative Assigns an order to all nominal fields (alphabetical) –Ordinal = dimensions = independent variable –Quantitative = measures = dependent variables Supports design principles: –Small simultaneous multiples for comparison –Data-dense display –Allows proper use of “retinal properties” (Bertin) –Cleveland’s idea regarding mapping independent and dependent variables
4 Polaris Paper Two nice examples of exploratory data analysis –Analysts form hypotheses –Create views to confirm or refute If refuted, follow leads to find new hypotheses –Look for different things Trends Anomalies
5 Specifying Table Configurations Operands are the database fields –each operand interpreted as a set {…} –quantitative and ordinal fields interpreted differently Three operators: –concatenation (+), cross product (X), nest (/)
6 Ordinal fields: interpret domain as a set that partitions table into rows and columns: Quarter = {(Qtr1),(Qtr2),(Qtr3),(Qtr4)} Table Algebra: Operands Quantitative fields: treat domain as single element set and encode spatially as axes: Profit = {(Profit[-410,650])}
7 Concatenation (+) operator Ordered union of set interpretations: Quarter + ProductType = {(Qtr1),(Qtr2),(Qtr3),(Qtr4)} + {(Coffee), (Espresso)} = {(Qtr1),(Qtr2),(Qtr3),(Qtr4),(Coffee),(Espresso)} Profit + Sales = {(Profit[-310,620]),(Sales[0,1000])}
8 Cross (x) operator Cross-product of set interpretations: Quarter x ProductType = ProductType x Profit = {(Qtr1,Coffee), (Qtr1, Tea), (Qtr2, Coffee), (Qtr2, Tea), (Qtr3, Coffee), (Qtr3, Tea), (Qtr4, Coffee), (Qtr4,Tea)}
9 Nest (/) operator Quarter x Month would create entry twelve entries for each quarter. i.e., (Qtr1, December) Quarter / Month would only create three entries per quarter based on tuples in database not semantics can be expensive to compute
10 Combining the Data Types Ordinal - Ordinal
11 Combining the Data Types Quant - Quant
12 Combining the Data Types Ordinal - Quantitative
13 Data Transformations Deriving Additional Fields: –Aggregation Sums Averages / Variances / Std. Deviations Min/Max LOTS of other functions (arctan …) –Counting of Ordinal Dimensions CNT(field) –Discrete Partitioning Binning (fixed-sized groups, for creating histograms) Partitioning (ad hoc sizes, good for encoding data) –Ad hoc Grouping The ordinal version of partitioning Choose meaningful groups
14 Data Transformations (cont) Sorting and Filtering –Filtering allows for choosing which values to focus on –Sorting helps find trends and anomolies Brushing and Tooltips –Brushing allows for selecting and highlighting interesting points; can then create a new dataset with them. –Tableau/Polaris is missing linking, which usually goes with brushing (it’s high on the to-do list). Linking allows you to see which items that are brushed in one view are highlighted in another Undo and Redo –A key interface capability which is well-supported here.
15 Querying the Database
16 Assignment Exploratory Data Analysis –Choose a dataset –Formulate hypotheses –Test these hypotheses and also explore the dataset using visualization tool(s) Tableau and optionally others of your choosing –We’ll supply some datasets or you can use your own –You can work in pairs but not in larger groups Due Monday February 25 (2.5 weeks)
17 EDA Practice Data from UCB Career Center What jobs do graduates get, grouped by major area?