Download presentation
Presentation is loading. Please wait.
Published byJanis Preston Modified over 9 years ago
1
LECTURE 03: DATA COLLECTION AND MODELS February 4, 2015 COMP 150-04 Topics in Visual Analytics Note: slide deck adapted from R. Chang, Fall 2010
2
Announcements Course location has moved: Halligan 102 Assignment 1 posted on course website If you haven’t yet installed RStudio: http://www.rstudio.com/products/rstudio/download/ To download the materials for today’s demo: http://www.cs.tufts.edu/comp/150VAN/demos/Stats-with-R.Rmd
3
Outline Reminder: post on “Illuminating the Path” Recap: Keim’s VA Model Data Foundations - Basic Data Types - Dimensionality Metadata: “data about data” Structure vs. Value - Value - Derived Value - Derived Structure - Structure
4
Reminder: thoughts on “Illuminating the Path” What did you think? Who is the intended audience? (…is it us?) Do the goals make sense to you? Is anything missing? From what you see in the world, how far along this agenda have we come since 2006?
5
Recap: Keim’s Visual Analytics Model input Pre-process interactions Image source: Keim, Daniel, et al. Visual analytics: Definition, process, and challenges. Springer Berlin Heidelberg, 2008. Data types Dimensionality Metadata Structure vs. Value Statistical Models in R
6
Data: a definition A typical dataset in visualization consists of n records: (r 1, r 2, r 3, …, r n ) Each record r i consists of (m >=1) observations or variables: (v 1, v 2, v 3, …, v m ) A variable may be either independent or dependent: - An independent variable (iv) is not controlled or affected by another variable (e.g., time in a time-series dataset) - A dependent variable (dv) is affected by a variation in one or more associated independent variables (e.g., temperature in a region) Formal definition: - r i = (iv 1, iv 2, iv 3, …, iv m i, dv 1, dv 2, dv 3, …, dv m d ) - where m = m i + m d
7
Basic Data Types Nominal Ordinal Scale / Quantitative Ratio Interval An unordered set of non-numeric values Examples: Categorical (finite) data -{apple, orange, pear} -{red, green, blue} Arbitrary (infinite) data -{“12 Main St. Boston MA”, “45 Wall St. New York NY”, …} -{“John Smith”, “Jane Doe”, …}
8
Basic Data Types Nominal Ordinal Scale / Quantitative Ratio Interval An ordered set (also known as a tuple) Examples: Numeric: Binary: Non-numeric:
9
Basic Data Types Nominal Ordinal Scale / Quantitative Ratio Interval A numeric range Ratios -Distance from “absolute zero” -Can be compared mathematically using division -For example: height, weight Intervals -Ordered numeric elements that can be mathematically manipulated, but cannot be compared as ratios -E.g.: date, current time
10
Basic Data Types (Formal) Nominal (N){…} Ordinal (O) Scale / Quantitative (Q)[…] Q → O [0, 100] → O → N → {C, B, F, D, A} N → O (??) {John, Mike, Bob} → {red, green, blue} → ?? O → Q (??) Hashing? Bob + John = ?? Readings in Information Visualization: Using Vision To Think. Card, Mackinglay, Schneiderman, 1999
11
Operations on Basic Data Types What are the operations that we can perform on these data types? Nominal (N) = and ≠ Ordinal (O) >, <, ≥, ≤ Scale / Quantitative (Q) everything else (+, -, *, /, etc.) Consider a distance function
12
Dimensionality Scalar: a single value (0D array) Vector: collection of scalars (1D array) Matrix: a collection of vectors (2D array) Tensor: a collection of matrices (3+D array) Think of a cube:
13
Operations on Multidimensional Data Slice Selects a subset of the original nD cube Result set could be of any dimensionality Roll up (consolidate) Creates a hierarchy based on the data Same as clustering Drill down Expand a cluster Pivot Changes the orientation of the cube Combine with the 4 basic SQL commands: SELECT, UPDATE, INSERT, DELETE Adapted from Wikipedia: OLAP Cube
14
Examples – Roll up and Drill down
15
Metadata Defined as “data about data” Introduced by Lisa Tweetie in CHI 1997 (“Characterizing Interactive Externalizations) Extends the original concept by Bertin of data values and data structures. Values (low-level): variables relevant to a problem Structures (high level): relations that characterize the data as a whole (e.g. links, equations, constraints)
16
Metadata – 4 Relationships 1. Values → Derived Values 2. Values → Derived Structure 3. Structure → Derived Values 4. Structure → Derived Structure Derived Values Example: average Derived Structure Example: sorting a list of variables
17
Values → Derived Values → Derived Structure Values: a (text) document corpus Derived values: compute the similarities between the documents Derived Structure: apply multi- dimensional scaling to plot the documents in a spatial view.
18
Values → Derived Values → Derived Structure IN-SPIRE by PNNL
19
Structure → Derived Structure → Derived Values Structure: a tabular layout of individuals’ relationships with each other Derived Structure: convert the tabular structure to a graph Derived Values: compute centrality to identify the importance of the individual in this social network
20
Structure → Derived Structure → Derived Values Image taken from: http://beth.typepad.com/beths_blog/2009/12
21
Questions / Comments?
22
Guest speaker Maja Milosavljevic “Statistical Analysis with R”
23
For next week Assignment 1 due before class on Monday Wednesday: Several VIPs coming in to pitch datasets for final projects Start thinking about a topic you might like to explore! Need help? Talk to Jordan
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.