Chapter 2 Getting to Know Your Data Yubao (Robert) Wu Georgia State University.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

DESCRIBING DISTRIBUTION NUMERICALLY
Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Measures of Dispersion boxplots. RANGE difference between highest and lowest value; gives us some idea of how much variation there is in the categories.
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Statistics It is the science of planning studies and experiments, obtaining sample data, and then organizing, summarizing, analyzing, interpreting data,
Exploratory Data Analysis (Descriptive Statistics)
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Measures of Central Tendency MARE 250 Dr. Jason Turner.
Numerically Summarizing Data
Statistics Lecture 2. Last class began Chapter 1 (Section 1.1) Introduced main types of data: Quantitative and Qualitative (or Categorical) Discussed.
Data Mining: Concepts and Techniques — Chapter 2 —
Basic Practice of Statistics - 3rd Edition
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 2 Describing Data with Numerical Measurements
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
Summarizing and Displaying Data Chapter 2. Goals for Chapter 2 n To illustrate: – – A summary of numerical data is more easily comprehended than the list.
Statistics for Linguistics Students Michaelmas 2004 Week 1 Bettina Braun.
Describing distributions with numbers
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
COMP5331: Knowledge Discovery and Data Minig
Description and measurement
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Basics: Data Remark: Discusses “basics concerning data sets (first half of Chapter.
Chapter 2 Describing Data.
Skewness & Kurtosis: Reference
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
1 Univariate Descriptive Statistics Heibatollah Baghi, and Mastee Badii George Mason University.
INVESTIGATION 1.
1 Further Maths Chapter 2 Summarising Numerical Data.
Statistics Chapter 1: Exploring Data. 1.1 Displaying Distributions with Graphs Individuals Objects that are described by a set of data Variables Any characteristic.
Measurement Variables Describing Distributions © 2014 Project Lead The Way, Inc. Computer Science and Software Engineering.
Chapter 2: Getting to Know Your Data
Chapter 5: Boxplots  Objective: To find the five-number summaries of data and create and analyze boxplots CHS Statistics.
BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.
Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
LIS 570 Summarising and presenting data - Univariate analysis.
Numerical descriptions of distributions
Statistics with TI-Nspire™ Technology Module E Lesson 1: Elementary concepts.
Tallahassee, Florida, 2016 CIS4930 Introduction to Data Mining Getting to Know Your Data Peixiang Zhao.
ALL ABOUT THAT DATA UNIT 6 DATA. LAST PAGE OF BOOK: MEAN MEDIAN MODE RANGE FOLDABLE Mean.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Descriptive Statistics
Data Preliminaries CSC 600: Data Mining Class 1.
Chapter 2: Getting to Know Your Data
Data Mining: Concepts and Techniques Data Understanding
Lecture Notes for Chapter 2 Introduction to Data Mining
Data Mining: Concepts and Techniques
COP 6726: New Directions in Database Systems
Description of Data (Summary and Variability measures)
Common Core Math I Unit 6 One-Variable Statistics Introduction
Understanding Data Characteristics
Understanding Basic Characteristics of Data
Common Core Math I Unit 6 One-Variable Statistics Introduction
Data Mining: Concepts and Techniques Data Understanding
Basic Statistical Terms
Common Core Math I Unit 6 One-Variable Statistics Introduction
Shape of Distributions
Data Mining: Concepts and Techniques — Chapter 2 —
Welcome!.
Measures of Center and Spread
Descriptive Statistics
Lesson – Teacher Notes Standard:
Data Preliminaries CSC 576: Data Mining.
Ten things about Descriptive Statistics
Describing Data Coordinate Algebra.
Lesson Plan Day 1 Lesson Plan Day 2 Lesson Plan Day 3
Presentation transcript:

Chapter 2 Getting to Know Your Data Yubao (Robert) Wu Georgia State University

Chapter 2 Getting to Know Your Data Data Objects and Attribute Types Basic Statistical Descriptions of Data Data Visualization Measuring Data Similarity and Dissimilarity

Types of Data —— Records Term-frequency vectors (text documents) gamescoreteamlostseason Doc Doc Doc ⋯⋯⋯⋯⋯⋯ Transaction ID Items 1 Bread, Coke, Milk 2 Beer, Bread 3 Beer, Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Diaper, Milk ⋯ ⋯ Transaction data

Types of Data —— Graphs TwitterFacebookWorld Wide Web [1] google.com/insidesearch/howsearchworks/thestory/ [2] [3] Billion60 Trillion320 Million Active Users Individual Pages

Types of Data —— Ordered Time-series Data Time {Computer, Keyboard, Mouse} {Printer} {Toner Cartridges} {MS Xbox} Transaction Sequence

Types of Data —— Spatial Seismic SensorsLocation – Smart Phone Spatial-Temporal Data

Types of Data —— Multimedia Image Segmentation Sunspot Image Brain CT Scan Images

Types of Data —— Data Objects Data objects are described by attributes. Attribute represents a feature of a data object (Dimension, Feature, Variable) Database Data Object (Person) Attribute 1 (Name) Attribute 2 (Gender) Attribute 3 (Grade) Attribute 4 (Score) ⋯ 1DavidMaleA+98 ⋯ 2MikeMaleA95 ⋯ 3SusanFemaleB+87 ⋯ 4LisaFemaleA-91 ⋯ 5RobMaleA93 ⋯ ⋯⋯⋯⋯⋯⋯

Types of Data —— Attributes Database Data Object (Person) Attribute 1 (Name) Attribute 2 (Gender) Attribute 3 (Grade) Attribute 4 (Score) ⋯ 1DavidMaleA+98 ⋯ 2MikeMaleA95 ⋯ 3SusanFemaleB+87 ⋯ 4LisaFemaleA-91 ⋯ 5RobMaleA93 ⋯ ⋯⋯⋯⋯⋯⋯ Nominal: categories, states, or “names of things” Binary: Nominal attribute with only 2 states (0 and 1)

Ordinal: Values have a meaningful order (ranking) but magnitude between successive values is not known. Numeric: Quantity (integer or real-valued) Types of Data —— Attributes Database Data Object (Person) Attribute 1 (Name) Attribute 2 (Gender) Attribute 3 (Grade) Attribute 4 (Score) ⋯ 1DavidMaleA+98 ⋯ 2MikeMaleA95 ⋯ 3SusanFemaleB+87 ⋯ 4LisaFemaleA-91 ⋯ 5RobMaleA93 ⋯ ⋯⋯⋯⋯⋯⋯

Discrete vs. Continuous Attributes Discrete Attribute: Database Data Object (Person) Attribute 1 (City) Attribute 2 (Zip Code) Attribute 3 (Tempara.) ⋯ 1Atlanta ⋯ 2Cleveland ⋯ 3NYC ⋯ 4LA ⋯ 5Seattle ⋯ ⋯⋯⋯⋯⋯ Continuous Attribute:  Has only a finite or countably infinite set of values  Can be represented as integers  Has real numbers as attribute values  typically represented as floating-point variables Zip codes, profession, or set of words in a document temperature, height, or weight

Basic Statistical Descriptions of Data To better understand the data To summarize the data Data Object (Person) Attribute 1 (Name) Attribute 2 (Gender) Attribute 3 (Grade) Attribute 4 (Score) ⋯ 1DavidMaleA+98 ⋯ 2MikeMaleA95 ⋯ 3SusanFemaleB+87 ⋯ 4LisaFemaleA-91 ⋯ 5RobMaleA93 ⋯ ⋯⋯⋯⋯⋯⋯ Min, Max, Mean, Median, Quantiles, Variance Motivation:Dispersion Characteristics:

Basic Statistical Descriptions of Data The “center” of a set of data Data Object (Person) Attribute 1 (Name) Attribute 4 (Score) 1David98 2Mike95 3Susan87 4Lisa91 5Rob93 ⋯⋯⋯ {87, 91, 93, 95, 98} Mean: Mode: Median: The middle value The value that occurs most frequently {87, 91, 93, 95, 98, 91, 82, 95, 91, 96} {87, 91, 93, 95, 98, 99}

Symmetric vs. Skewed Data Mean, Median, and Mode Symmetric Data DistributionSkewed Data Distribution Values Frequency Histogram Probability distribution

Measuring the Dispersion of Data Quartile sort

Quartile Normal (or Gaussian) Distribution

Interquartile Range (IQR) Interquartile Range (IQR): Normal Distribution

Five-Number Summary Max Min

Boxplots Max Min box whisker

Boxplots with Outliers Max Min Outlier, 64

Boxplots Example