Exploring Data Descriptive Data

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Chapter 2 Exploring Data with Graphs and Numerical Summaries
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 2 Exploring Data with Graphs and Numerical Summaries Section 2.2 Graphical Summaries.
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES
ISE 261 PROBABILISTIC SYSTEMS. Chapter One Descriptive Statistics.
Chapter 2: Exploring Data with Graphs and Numerical Summaries
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
Chapter 2 Describing Data.
Describing distributions with numbers
Categorical vs. Quantitative…
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
1 Chapter 2: Exploring Data with Graphs and Numerical Summaries Section 2.1: What Are the Types of Data?
Descriptive Statistics Unit 6. Variable Any characteristic (data) recorded for the subjects of a study ex. blood pressure, nesting orientation, phytoplankton.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 2 Exploring Data with Graphs and Numerical Summaries Section 2.1 Different Types of.
Descriptive Statistics
Graphing options for Quantitative Data
Exploratory Data Analysis
Methods for Describing Sets of Data
Exploring Data Descriptive Data
ISE 261 PROBABILISTIC SYSTEMS
CHAPTER 2: Describing Distributions with Numbers
Chapter 2: Methods for Describing Data Sets
CHAPTER 2: Describing Distributions with Numbers
Chapter 6 ENGR 201: Statistics for Engineers
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
Laugh, and the world laughs with you. Weep and you weep alone
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Chapter 2b.
CHAPTER 1 Exploring Data
DAY 3 Sections 1.2 and 1.3.
Describing Distributions of Data
Lesson 1: Summarizing and Interpreting Data
Warmup Draw a stemplot Describe the distribution (SOCS)
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Honors Statistics Review Chapters 4 - 5
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Organizing, Displaying and Interpreting Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Exploring Data Descriptive Data

Content Types of Variables Describing data using graphical summaries Describing the Centre of Quantitative Data Describing the Spread of Quantitative Data How Measures of Position Describe Spread How can Graphical Summaries be Misused

Variable A variable is any characteristic that is recorded for the subjects in a study Examples: Marital status, Height, Weight, IQ A variable can be classified as either Categorical or Quantitative Discrete or Continuous www.thewallstickercompany.com.au

Categorical Variable A variable is categorical if each observation belongs to one of a set of categories. Examples: Gender (Male or Female) Religion (Catholic, Jewish, …) Type of residence (Apartment, House, …) Belief in life after death (Yes or No) www.post-gazette.com

Quantitative Variable A variable is called quantitative if observations take numerical values for different magnitudes of the variable. Examples: Age Number of siblings Annual Income

Quantitative vs. Categorical Math 1530 - Chapter 2 Quantitative vs. Categorical For Quantitative variables, key features are the center (a representative value) and spread (variability). For Categorical variables, a key feature is the percentage of observations in each of the categories .

Types of Data Basically, there are two types of data: qualitative and quantitative. Qualitative data: are numerically nonmeasurable; Quantitative data: can be measured numerically. Most statistical analysis is based on quantitative data using appropriate measurement of their variables. Quantitative variables are also classified into two types: discrete and continuous.

Types of Data A discrete variable can take only certain distinct or isolated values in a given range, for example, number of siblings 0, 1, 2, …, 10. A continuous variable can take any value in a given range, for example, age from 0 years to 100 years. To take another example, if one would like to know what factors are associated with a sales representative’s performance, a number of measures might be used to indicate success.

Discrete Quantitative Variable A quantitative variable is discrete if its possible values form a set of separate numbers: 0,1,2,3,…. Examples: Number of pets in a household Number of children in a family Number of foreign languages spoken by an individual upload.wikimedia.org

Continuous Quantitative Variable A quantitative variable is continuous if its possible values form an interval Measurements Examples: Height/Weight Age Blood pressure www.wtvq.com

Measuring Using Data Measures of a salesperson’s success Dollar or unit sales volume, or share of accounts lost could be utilised . Principally, to enable ease of understanding, the quantitative variables are usually measured by various scales. A scale may be defined as a measuring tool for appropriate quantification of variables. In other words, a scale is a continuous spectrum or series of categories. Like other research, four types of scales are used in business research. These include nominal, ordinal, interval and ratio scales.

Nominal Scale A nominal scale is the simplest type of scale. The numbers or letters assigned to objects serve as labels for identification or classification. For example, names and gender are categorical variables; and one can put the level ‘M’ for Male and ‘F’ for Female, or ‘1’ for male and ‘2’ for female, or ‘1’ for female and ‘2’ for male. Other examples include marital status, religion, race, colour and employment status, and so forth.

Ordinal Scale When a nominal scale follows an order then it becomes an ordinal scale. In other words, an ordinal scale arranges objects or categorical variables according to an ordered relationship. So, ranking of nominal scales is an essential prior criterion for ordinal scales. A typical ordinal scale in business research asks respondents to rate career opportunities and company brands as ‘excellent’, ‘good’, ‘fair’ or ‘poor’. Other examples would be (i) result of examination: first, second, third classes and fail; (ii) quality of products; and (iii) social class.

Interval Scale The interval scale indicates the distance or difference in units between two events. In other words, such scales not only indicate order, they also measure the order or distance in units of equal intervals. It is important to note that the location of the zero point is arbitrary. To take an example, in the price index, the number of the base year is set to be usually 100. Another classic example of an interval scale is the temperature where the initial point is always arbitrary.

Ratio Scale Ratio scales have absolute rather than relative quantities. In other words, if an interval scale has an absolute zero then it can be classified as a ratio scale. The absolute zero represents a point on the scale where there is an absence of the given attribute. For examples, age, money and weights are ratio scales because they possess an absolute zero and interval properties.

Proportion & Percentage (Rel. Freq.) Proportions and percentages are also called relative frequencies.

Math 1530 - Chapter 2 Frequency Table Frequency table is a listing of possible values for a variable together with the number of observations or relative frequencies for each value.

Describing data using graphical summaries

Graphs for Categorical Variables Use pie charts and bar graphs to summarize categorical variables Pie Chart: A circle having a “slice of pie” for each category Bar Graph: A graph that displays a vertical bar for each category wpf.amcharts.com

Pie Charts Summarize categorical variable Drawn as circle where each category is a slice The size of each slice is proportional to the percentage in that category

Bar Graphs Summarizes categorical variable Vertical bars for each category Height of each bar represents either counts or percentages Easier to compare categories with bar graph than with pie chart Called Pareto Charts when ordered from tallest to shortest

Graphs for Quantitative Data Math 1530 - Chapter 2 Graphs for Quantitative Data Dot Plot: shows a dot for each observation placed above its value on a number line Stem-and-Leaf Plot: portrays the individual observations Histogram: uses bars to portray the data

Which Graph? Dot-plot and stem-and-leaf plot: Histogram Math 1530 - Chapter 2 Which Graph? Dot-plot and stem-and-leaf plot: More useful for small data sets Data values are retained Histogram More useful for large data sets Most compact display More flexibility in defining intervals content.answers.com

Dot Plots To construct a dot plot Draw and label horizontal line Mark regular values Place a dot above each value on the number line Sodium in Cereals

Stem-and-leaf plots Summarizes quantitative variables Separate each observation into a stem (first part of #) and a leaf (last digit) Write each leaf to the right of its stem; order leaves if desired Sodium in Cereals

Histograms Graph that uses bars to portray frequencies or relative frequencies of possible outcomes for a quantitative variable

Constructing a Histogram Sodium in Cereals Divide into intervals of equal width Count # of observations in each interval

Constructing a Histogram Math 1530 - Chapter 2 Constructing a Histogram Label endpoints of intervals on horizontal axis Draw a bar over each value or interval with height equal to its frequency (or percentage) Label and title Sodium in Cereals

Interpreting Histograms Assess where a distribution is centered by finding the median Assess the spread of a distribution Shape of a distribution: roughly symmetric, skewed to the right, or skewed to the left Left and right sides are mirror images

Examples of Skewness

Math 1530 - Chapter 2 Shape and Skewness Consider a data set containing IQ scores for the general public. What shape? Symmetric Skewed to the left Skewed to the right Bimodal botit.botany.wisc.edu

Shape and Skewness Consider a data set of the scores of students on an easy exam in which most score very well but a few score poorly. What shape? Symmetric Skewed to the left Skewed to the right Bimodal

Shape: Type of Mound

Outlier An outlier falls far from the rest of the data

Time Plots Display a time series, data collected over time Plots observation on the vertical against time on the horizontal Points are usually connected Common patterns should be noted Time Plot from 1995 – 2001 of the # worldwide who use the Internet

Describing the Centre of Quantitative Data

Mean The mean is the sum of the observations divided by the number of observations It is the center of mass

Median Midpoint of the observations when ordered from least to greatest Order observations If the number of observations is: Odd, the median is the middle observation Even, the median is the average of the two middle observations

Comparing the Mean and Median Mean and median of a symmetric distribution are close Mean is often preferred because it uses all In a skewed distribution, the mean is farther out in the skewed tail than is the median Median is preferred because it is better representative of a typical observation

Resistant Measures A measure is resistant if extreme observations (outliers) have little, if any, influence on its value Median is resistant to outliers Mean is not resistant to outliers www.stat.psu.edu

Mode Value that occurs most often Highest bar in the histogram Mode is most often used with categorical data

Describing the Spread of Quantitative Data

Range Range = max - min The range is strongly affected by outliers. Math 1530 - Chapter 2 Range Range = max - min The range is strongly affected by outliers.

Math 1530 - Chapter 2 Standard Deviation Each data value has an associated deviation from the mean, A deviation is positive if it falls above the mean and negative if it falls below the mean The sum of the deviations is always zero

Math 1530 - Chapter 2 Standard Deviation Standard deviation gives a measure of variation by summarizing the deviations of each observation from the mean and calculating an adjusted average of these deviations: Find mean Find each deviation Square deviations Sum squared deviations Divide sum by n-1 Take square root

Standard Deviation Metabolic rates of 7 men (calories/24 hours) Math 1530 - Chapter 2 Standard Deviation Metabolic rates of 7 men (calories/24 hours)

Properties of Sample Standard Deviation Measures spread of data Only zero when all observations are same; otherwise, s > 0 As the spread increases, s gets larger Same units as observations Not resistant Strong skewness or outliers greatly increase s

Empirical Rule: Magnitude of s Math 1530 - Chapter 2 Empirical Rule: Magnitude of s

How Measures of Position Describe Spread

Percentile The pth percentile is a value such that p percent of the observations fall below or at that value

Finding Quartiles Splits the data into four parts Math 1530 - Chapter 2 Finding Quartiles Splits the data into four parts Arrange data in order The median is the second quartile, Q2 Q1 is the median of the lower half of the observations Q3 is the median of the upper half of the observations

Measure of Spread: Quartiles Math 1530 - Chapter 2 Measure of Spread: Quartiles Quartiles divide a ranked data set into four equal parts: 25% of the data at or below Q1 and 75% above 50% of the obs are above the median and 50% are below 75% of the data at or below Q3 and 25% above Q1= first quartile = 2.2 M = median = 3.4 We are going to start out with a very general way to describe the spread that doesn’t matter whether it is symmetric or not - quartiles. Just as the word suggests - quartiles is like quarters or quartets, it involves dividing up the distribution into 4 parts. Now, to get the median, we divided it up into two parts. To get the quartiles we do the exact same thing to the two halves. Use same rules as for median if you have even or odd number of observations. Now, what an we do with these that helps us understand the biology of these diseases? Q3= third quartile = 4.35

Calculating Interquartile Range The interquartile range is the distance between the thirdand first quartile, giving spread of middle 50% of the data: IQR = Q3 - Q1

Criteria for Identifying an Outlier An observation is a potential outlier if it falls more than 1.5 x IQR below the first or more than 1.5 x IQR above the third quartile.

5 Number Summary The five-number summary of a dataset consists of: Minimum value First Quartile Median Third Quartile Maximum value

Boxplot Box goes from the Q1 to Q3 Line is drawn inside the box at the median Line goes from lower end of box to smallest observation not a potential outlier and from upper end of box to largest observation not a potential outlier Potential outliers are shown separately, often with * or +

Comparing Distributions Boxplots do not display the shape of the distribution as clearly as histograms, but are useful for making graphical comparisons of two or more distributions

Math 1530 - Chapter 2 Z-Score An observation from a bell-shaped distribution is a potential outlier if its z-score < -3 or > +3

How can Graphical Summaries be Misused

Misleading Data Displays

Guidelines for Constructing Effective Graphs Label axes and give proper headings Vertical axis should start at zero Use bars, lines, or points Consider using separate graphs or ratios when variable values differ