LECTURE 03: DATA COLLECTION AND MODELS February 4, 2015 COMP 150-04 Topics in Visual Analytics Note: slide deck adapted from R. Chang, Fall 2010.

Slides:



Advertisements
Similar presentations
**ESTABLISHING PATTERNS OR TRENDS IN THE DATA COLLECTED** BY DR. ARTEMIO P. SEATRIZ MMSU-CTE LAOAG CITY.
Advertisements

Statistics for the Social Sciences Psychology 340 Fall 2006 Distributions.
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Maria Takousi Based on Sandra O’Brien lecture notes.
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 1 An Introduction to Business Statistics.
Transforming Concepts into Variables Operationalization and Measurement Issues of Validity and Reliability.
TYPES OF DATA. Qualitative vs. Quantitative Data A qualitative variable is one in which the “true” or naturally occurring levels or categories taken by.
Introduction to Statistics & Measurement
Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases Presented by Darren Gates for ICS 280.
Quantitative Methods and Computer Applications in the Historical and Social Sciences Roman Studer Nuffield College
Lecture Notes for Chapter 2 Introduction to Data Mining
Raster Data. The Raster Data Model The Raster Data Model is used to model spatial phenomena that vary continuously over a surface and that do not have.
Infovis and data george, laura, tjerk.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 1 Chicago School of Professional Psychology.
Business 205. Review of Previous Class Milestone #1 Groups Math Review Symbolic Manipulation Excel Review.
Baburao Kamble (Ph.D) University of Nebraska-Lincoln Data Analysis Using R Week2: Data Structure, Types and Manipulation in R.
Data Mining Techniques
Variation, Validity, & Variables Lesson 3. Research Methods & Statistics n Integral relationship l Must consider both during planning n Research Methods.
Jargon & Basic Concepts Howell Statistical Methods for Psychology.
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal
Statistics Introduction Part 2. Statistics Warm-up Classify the following as a) impossible, b) possible, but very unlikely, or c) possible and likely:
Chapter 1: Introduction to Statistics. 2 Statistics A set of methods and rules for organizing, summarizing, and interpreting information.
10/3/20151 PUAF 610 TA Session 4. 10/3/20152 Some words My –Things to be discussed in TA –Questions on the course and.
Statistics, Data, and Statistical Thinking
Outline Class Intros – What are your goals? – What types of problems? datasets? Overview of Course Example Research Project.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall, 2014 Room 120 Integrated.
Variables & Measurement Lesson 4. What are data? n Information from measurement l datum = single observation n Variables l Dimensions that can take on.
Statistics 300: Introduction to Probability and Statistics Section 1-2.
Introduction to MATLAB 7 Engineering 161 Engineering Practices II Joe Mixsell Spring 2010.
Areej Jouhar & Hafsa El-Zain Biostatistics BIOS 101 Foundation year.
Interpreting Data for use in Charts and Graphs. V
Interpreting Data for use in Charts and Graphs. V
Review Data: {2, 5, 6, 8, 5, 6, 4, 3, 2, 1, 4, 9} What is F(5)? A.2 B.4 C.6 D.8.
Introduction to Cartographic Modeling
Wednesday, June 10, 2015 PHYS , Summer 2015 Dr. Jaehoon Yu 1 PHYS 1441 – Section 001 Lecture #2 Tuesday, June 9, 2015 Dr. Jaehoon Yu Chapter 2:
1 Data Mining: Data Lecture Notes for Chapter 2. 2 What is Data? l Collection of data objects and their attributes l An attribute is a property or characteristic.
Chapter 2: Getting to Know Your Data
Tables & Graphs Outline 1. Tables as representations of data 2. Graphs *Definition *Components 3. Types of graph *Bar *Line *Frequency distribution *Scattergram.
Lecture 07: Dealing with Big Data
Overview and Types of Data
Chapter 2: Levels of Measurement. Researchers classify variables according to the extent to which the values of the variable measure the intended characteristics.
Introduction To Statistics
Chapter 1: Introduction to Statistics. Variables A variable is a characteristic or condition that can change or take on different values. Most research.
1 PAUF 610 TA 1 st Discussion. 2 3 Population & Sample Population includes all members of a specified group. (total collection of objects/people studied)
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall 2015 Room 150 Harvill.
1 What is Data? l An attribute is a property or characteristic of an object l Examples: eye color of a person, temperature, etc. l Attribute is also known.
Biostatistics Introduction Article for Review.
Exploratory data analysis, descriptive measures and sampling or, “How to explore numbers in tables and charts”
1/59 Lecture 02: Data Mapping September 15, 2015 COMP Visualization.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Business Information Analysis, Chapter 1 Business & Commerce Discipline, IVE 1-1 Chapter One What is Statistics? GOALS When you have completed this chapter,
Data Preliminaries CSC 600: Data Mining Class 1.
DATA TYPES.
Chapter 12 Understanding Research Results: Description and Correlation
Elementary Statistics
Central Tendency & Scale Types
Lecture Notes for Chapter 2 Introduction to Data Mining
Review Data: {2, 5, 6, 8, 5, 6, 4, 3, 2, 1, 4, 9} What is F(5)? 2 4 6
Applied Statistical Analysis
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Chapter 1 Introduction to Statistics with Excel
Statistics Chapter 1 Sections
Group 9 – Data Mining: Data
Data Preliminaries CSC 576: Data Mining.
Statistics Definitions
Chapter 1 Introduction to Statistics
Data Pre-processing Lecture Notes for Chapter 2
Biostatistics Lecture (2).
Presentation transcript:

LECTURE 03: DATA COLLECTION AND MODELS February 4, 2015 COMP Topics in Visual Analytics Note: slide deck adapted from R. Chang, Fall 2010

Announcements Course location has moved: Halligan 102 Assignment 1 posted on course website If you haven’t yet installed RStudio: To download the materials for today’s demo:

Outline Reminder: post on “Illuminating the Path” Recap: Keim’s VA Model Data Foundations - Basic Data Types - Dimensionality Metadata: “data about data” Structure vs. Value - Value - Derived Value - Derived Structure - Structure

Reminder: thoughts on “Illuminating the Path” What did you think? Who is the intended audience? (…is it us?) Do the goals make sense to you? Is anything missing? From what you see in the world, how far along this agenda have we come since 2006?

Recap: Keim’s Visual Analytics Model input Pre-process interactions Image source: Keim, Daniel, et al. Visual analytics: Definition, process, and challenges. Springer Berlin Heidelberg, Data types Dimensionality Metadata Structure vs. Value Statistical Models in R

Data: a definition A typical dataset in visualization consists of n records: (r 1, r 2, r 3, …, r n ) Each record r i consists of (m >=1) observations or variables: (v 1, v 2, v 3, …, v m ) A variable may be either independent or dependent: - An independent variable (iv) is not controlled or affected by another variable (e.g., time in a time-series dataset) - A dependent variable (dv) is affected by a variation in one or more associated independent variables (e.g., temperature in a region) Formal definition: - r i = (iv 1, iv 2, iv 3, …, iv m i, dv 1, dv 2, dv 3, …, dv m d ) - where m = m i + m d

Basic Data Types Nominal Ordinal Scale / Quantitative Ratio Interval An unordered set of non-numeric values Examples: Categorical (finite) data -{apple, orange, pear} -{red, green, blue} Arbitrary (infinite) data -{“12 Main St. Boston MA”, “45 Wall St. New York NY”, …} -{“John Smith”, “Jane Doe”, …}

Basic Data Types Nominal Ordinal Scale / Quantitative Ratio Interval An ordered set (also known as a tuple) Examples: Numeric: Binary: Non-numeric:

Basic Data Types Nominal Ordinal Scale / Quantitative Ratio Interval A numeric range Ratios -Distance from “absolute zero” -Can be compared mathematically using division -For example: height, weight Intervals -Ordered numeric elements that can be mathematically manipulated, but cannot be compared as ratios -E.g.: date, current time

Basic Data Types (Formal) Nominal (N){…} Ordinal (O) Scale / Quantitative (Q)[…] Q → O [0, 100] → O → N → {C, B, F, D, A} N → O (??) {John, Mike, Bob} → {red, green, blue} → ?? O → Q (??) Hashing? Bob + John = ?? Readings in Information Visualization: Using Vision To Think. Card, Mackinglay, Schneiderman, 1999

Operations on Basic Data Types What are the operations that we can perform on these data types? Nominal (N) = and ≠ Ordinal (O) >, <, ≥, ≤ Scale / Quantitative (Q) everything else (+, -, *, /, etc.) Consider a distance function

Dimensionality Scalar: a single value (0D array) Vector: collection of scalars (1D array) Matrix: a collection of vectors (2D array) Tensor: a collection of matrices (3+D array) Think of a cube:

Operations on Multidimensional Data Slice Selects a subset of the original nD cube Result set could be of any dimensionality Roll up (consolidate) Creates a hierarchy based on the data Same as clustering Drill down Expand a cluster Pivot Changes the orientation of the cube Combine with the 4 basic SQL commands: SELECT, UPDATE, INSERT, DELETE Adapted from Wikipedia: OLAP Cube

Examples – Roll up and Drill down

Metadata Defined as “data about data” Introduced by Lisa Tweetie in CHI 1997 (“Characterizing Interactive Externalizations) Extends the original concept by Bertin of data values and data structures. Values (low-level): variables relevant to a problem Structures (high level): relations that characterize the data as a whole (e.g. links, equations, constraints)

Metadata – 4 Relationships 1. Values → Derived Values 2. Values → Derived Structure 3. Structure → Derived Values 4. Structure → Derived Structure Derived Values Example: average Derived Structure Example: sorting a list of variables

Values → Derived Values → Derived Structure Values: a (text) document corpus Derived values: compute the similarities between the documents Derived Structure: apply multi- dimensional scaling to plot the documents in a spatial view.

Values → Derived Values → Derived Structure IN-SPIRE by PNNL

Structure → Derived Structure → Derived Values Structure: a tabular layout of individuals’ relationships with each other Derived Structure: convert the tabular structure to a graph Derived Values: compute centrality to identify the importance of the individual in this social network

Structure → Derived Structure → Derived Values Image taken from:

Questions / Comments?

Guest speaker Maja Milosavljevic “Statistical Analysis with R”

For next week Assignment 1 due before class on Monday Wednesday: Several VIPs coming in to pitch datasets for final projects Start thinking about a topic you might like to explore! Need help? Talk to Jordan