Statistics and Quantitative Analysis U4320 Segment 2: Descriptive Statistics Prof. Sharyn O’Halloran.

Slides:



Advertisements
Similar presentations
Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Advertisements

Unit 16: Statistics Sections 16AB Central Tendency/Measures of Spread.
Introduction to Summary Statistics
Descriptive Statistics
Statistical Tests Karen H. Hagglund, M.S.
Descriptive Statistics
Intro to Descriptive Statistics
Introduction to Educational Statistics
Central Tendency and Variability
2 Textbook Shavelson, R.J. (1996). Statistical reasoning for the behavioral sciences (3 rd Ed.). Boston: Allyn & Bacon. Supplemental Material Ruiz-Primo,
Measures of Central Tendency
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
Chapter 4 Measures of Central Tendency
Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Math 116 Chapter 12.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.1 Chapter Four Numerical Descriptive Techniques.
Chapter 3 Descriptive Measures
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
Measurement Tools for Science Observation Hypothesis generation Hypothesis testing.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.1 Chapter Four Numerical Descriptive Techniques.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Describing Data from One Variable
With Statistics Workshop with Statistics Workshop FunFunFunFun.
Chapter 3 – Descriptive Statistics
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Descriptive Statistics: Presenting and Describing Data.
Practice Page 65 –2.1 Positive Skew Note Slides online.
Measures of Central Tendency: The Mean, Median, and Mode
Describing Distributions with Numbers Chapter 2. What we will do We are continuing our exploration of data. In the last chapter we graphically depicted.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
IE(DS)1 Descriptive Statistics Data - Quantitative observation of Behavior What do numbers mean? If we call one thing 1 and another thing 2 what do we.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
LIS 570 Summarising and presenting data - Univariate analysis.
Chapter 2 Describing and Presenting a Distribution of Scores.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
INTRODUCTION TO STATISTICS
SPSS CODING/GRAPHS & CHARTS CENTRAL TENDENCY & DISPERSION
Practice Page Practice Page Positive Skew.
Introduction to Summary Statistics
Central Tendency and Variability
Introduction to Summary Statistics
Descriptive Statistics: Presenting and Describing Data
Descriptive Statistics
Introduction to Summary Statistics
Descriptive Statistics
Introduction to Summary Statistics
Descriptive Statistics
Introduction to Summary Statistics
Basic Statistical Terms
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Univariate Statistics
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
Summary (Week 1) Categorical vs. Quantitative Variables
Presentation transcript:

Statistics and Quantitative Analysis U4320 Segment 2: Descriptive Statistics Prof. Sharyn O’Halloran

Types of Variables Nominal Categories Names or types with no inherent ordering: Religious affiliation Marital status Race Ordinal Categories Variables with a rank or order University Rankings Academic position Level of education

Types of Variables (con’t) Interval/Ratio Categories Variables are ordered and have equal space between them Salary measured in dollars Age of an individual Number of children Thermometer scores measure intensity on issues Views on abortion Whether someone likes China Feelings about US-Japan trade issues Any dichotomous variable A variable that takes on the value 0 or 1. A person’s gender (M=0, F=1)

Note on thermometer scores Make assumptions about the relation between different categories. For example, if we look at the liberal to conservative political spectrum: Strong DemDemWeak Dems IND Weak RepRepStrong Rep Distance among groups is the same for each category.

Measuring Data Discrete Data Takes on only integer values whole numbers no decimals E.g., number of people in the room Continuous Data Takes on any value All numbers including decimals E.g., Rate of population increase

Discrete Data Example: Absenteeism for 50 employees

Discrete Data (con’t) Frequency Number of times we observe an event In our example, the number of employees that were absent for a particular number of days. Three employees were absent no days (0) through out the year Relative Frequency Number of times a particular event takes place in relation to the total. For example, about a quarter of the people were absent exactly 6 times in the year. Why do you think this was true? Absenteeism for 50 employees

Discrete Data: Chart Another way that we can represent this data is by a chart. Again we see that more people were absent 6 days than any other single value.

Example: Men’s Heights Continuous Data

Continuous Data (con’t) Note: must decide how to divide the data The interval you define changes how the data appears. The finer the divisions, the less bias. But more ranges make it difficult to display and interpret. Chart Data divided into 3” intervals Can represent information graphically

Continuous Data: Percentiles The N th percentile The point such that n-percent of the population lie below and (100-n) percent lie above it. What interval is the 50 th percentile? Height Example

Continuous Data: Income example What interval is the 50th percentile?

Measures of Central Tendency Mode The category occurring most often. Median The middle observation or 50th percentile. Mean The average of the observations

Central Tendency: Mode The category that occurs most often. Mode of the height example? What is the mode of the income example? Mode of the absentee example? 6 69 $29-$34

Central Tendency: Median The middle observation or 50th percentile. Median of the height example? Median of the absentee example? Height Example Absentee Example

Central Tendency: Mean The average of the observations. Defined as: This is denoted as:

Central Tendency: Mean (con’t) Notation: X i ’s are different observations Let X i stand for the each individual's height: X 1 (Sam), X 2 (Oliver), X 3 (Buddy), or observation 1, 2, 3… X stands for height in general X 1 stands for the height for the first person. When we write  (the sum) of X i we are adding up all the observations.

Central Tendency: Mean (con’t) Example: Absenteeism What is the average number of days any given worker is absent in the year? First, add the fifty individuals’ total days absent (243). Then, divide the sum (243) by the total number of individuals (50). We get 243/50, which equals Absentee Example

Central Tendency: Mean (con’t) Alternatively use the frequency Add each event times the number of occurrences 0*3 +1* , which is lect2.xlslect2.xls Absentee Example

Relative Position of Measures For Symmetric Distributions If your population distribution is symmetric and unimodal (i.e., with a hump in the middle), then all three measures coincide. Mode = 69 Median =69 Mean = 483/7= 69

Relative Position of Measures For Asymmetric Distributions If the data are skewed, however, then the measures of central tendency will not necessarily line up. Mode = 6, Median = 4.5, Mean = 4.86

Which measures for which variables? Nominal variables which measure(s) of central tendency is appropriate? Only the mode Ordinal variables which measure(s) of central tendency is appropriate? Mode or median Interval-ratio variables which measure(s) of central tendency is appropriate? All three measures

Measures of Deviation (con’t) Need measures of the deviation or distribution of the data. Unimodal Bimodal   Every different distributions even though they have the same mean and median. Need a measure of dispersion or spread.

Measures of Deviation (con’t) Range The difference between the largest and smallest observation. Limited measure of dispersion Unimodal Bimodal  

Measures of Deviation (con’t) Inter-Quartile Range The difference between the 75th percentile and the 25th percentile in the data. Better because it divides the data finer. But it still only uses two observations Would like to incorporate all the data, if possible. Unimodal Bimodal  25%75% 25% 75% 

Measures of Deviation (con’t) Mean Absolute Deviation The absolute distance between each data point and the mean, then take the average. Unimodal Bimodal  Small absolute distance from mean Large absolute distance from mean

Measures of Deviation (con’t) Mean Squared Deviation The difference between each observation and the mean, quantity squared. This is the variance of a distribution.

Measures of Deviation (con’t) Variance of a Distribution Note since we are using the square, the greatest deviations will be weighed more heavily. If we use a frequency table, as in the absenteeism case, then we would write this as:

Measures of Deviation (con’t) Standard deviation The square root of the variance: Gives common units to compare deviation from the mean.

Absentee Example: Variances From above, the mean = To calculate the variance, take the difference between each observation and the mean. Column 5 then squares that differences and multiplies it by the number of times the event occurred.

Descriptive Statistics for Samples Consistent estimators of variance & standard deviation: Why n-1? We use n-1 pieces of data to calculate the variance because we already used 1 piece of information to calculate the mean. So to make the estimate consistent, subtract one observation from the denominator.

Descriptive Statistics for Samples (con’t) Absentee Example: If sampling from a population, we would calculate: Variance = Standard Deviation =

Descriptive Statistics for Samples (con’t) Sample estimators approximate population estimates: Notice from the formulas, as n gets large: the sample size converges toward the underlying population, and the difference between the estimates gets negligible. This is known as the law of large numbers. We will discuss its relevance in more detail in later lectures. Mostly dealing in samples Use the (n-1) formula, unless told otherwise. Don’t worry the computer does this automatically!

Examples of data presentation— use it, don't abuse it Start axes at zero. Misleading comparisons Government spending not taking into account inflation. Nominal vs. Real Selecting a particular base years. For example, a university presenting a budget breakdown showed that since 1986 the number of staff increased only slightly. But they failed to mention the huge increase during the 5 years before.

HOMEWORK Lab Define one variable of each type nominal, ordinal and interval Print out summary statistics on each variable. There will be three statistics that you will not be expected to know what they are: Kurtosis, SE Kurtosis, Skewness, SE Skewness. The others though should be familiar. Excel