Statistics [0,I/2] The Essential Mathematics. Two Forms of Statistics Descriptive Statistics What is physically happening within the data? Inferential.

Slides:



Advertisements
Similar presentations
Chapter 3 Properties of Random Variables
Advertisements

© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 4. Measuring Averages.
Introduction to Summary Statistics
Numerically Summarizing Data
Descriptive Statistics
Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions.
Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions.
Statistics for the Social Sciences
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Descriptive Statistics Statistical Notation Measures of Central Tendency Measures of Variability Estimating Population Values.
Descriptive Statistics
Those who don’t know statistics are condemned to reinvent it… David Freedman.
Data observation and Descriptive Statistics
Measures of Central Tendency
Chapter 2 Describing distributions with numbers. Chapter Outline 1. Measuring center: the mean 2. Measuring center: the median 3. Comparing the mean and.
Describing Data: Numerical
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 2 NUMERICAL DATA REPRESENTATION.
Measurement Tools for Science Observation Hypothesis generation Hypothesis testing.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Created by Tom Wegleitner, Centreville, Virginia Section 3-1 Review and.
Measures of Central Tendency or Measures of Location or Measures of Averages.
Statistical Techniques I EXST7005 Review. Objectives n Develop an understanding and appreciation of Statistical Inference - particularly Hypothesis testing.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Descriptive Statistics Used to describe or summarize sets of data to make them more understandable Used to describe or summarize sets of data to make them.
Measures of Dispersion & The Standard Normal Distribution 2/5/07.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Skewness & Kurtosis: Reference
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Statistics 11 The mean The arithmetic average: The “balance point” of the distribution: X=2 -3 X=6+1 X= An error or deviation is the distance from.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
UTOPPS—Fall 2004 Teaching Statistics in Psychology.
Measures of Dispersion
Copyright © 2014 by Nelson Education Limited. 3-1 Chapter 3 Measures of Central Tendency and Dispersion.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Chapter 3 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 3: Measures of Central Tendency and Variability Imagine that a researcher.
Practice Page 65 –2.1 Positive Skew Note Slides online.
Measures of Central Tendency: The Mean, Median, and Mode
Descriptive Statistics The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.
Measures of Central Tendency or Measures of Location or Measures of Averages.
Introduction to Statistics Santosh Kumar Director (iCISA)
 The mean is typically what is meant by the word “average.” The mean is perhaps the most common measure of central tendency.  The sample mean is written.
Central Tendency & Dispersion
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8
LIS 570 Summarising and presenting data - Univariate analysis.
Descriptive Statistics(Summary and Variability measures)
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
MR. MARK ANTHONY GARCIA, M.S. MATHEMATICS DEPARTMENT DE LA SALLE UNIVERSITY.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
AP PSYCHOLOGY: UNIT I Introductory Psychology: Statistical Analysis The use of mathematics to organize, summarize and interpret numerical data.
Outline Sampling Measurement Descriptive Statistics:
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Descriptive Statistics (Part 2)
Teaching Statistics in Psychology
Reasoning in Psychology Using Statistics
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Description of Data (Summary and Variability measures)
Numerical Descriptive Measures
Descriptive and inferential statistics. Confidence interval
Descriptive Statistics: Describing Data
Describing Quantitative Data with Numbers
CENTRAL MOMENTS, SKEWNESS AND KURTOSIS
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Presentation transcript:

Statistics [0,I/2] The Essential Mathematics

Two Forms of Statistics Descriptive Statistics What is physically happening within the data? Inferential Statistics What can I glean from a sample that is pertinent to the population?

Descriptive Statistics Measures of Center mean, median, mode Measures of Spread variance, standard deviation, range, IQR, outliers Measures of Shape kurtosis, skewness

Descriptive Statistics Measures of Center mean, median, mode Measures of Spread variance, standard deviation, range, IQR, outliers Measures of Shape kurtosis, skewness Exploratory Analysis

Measures of Center The expectation of what should happen in a given situation at random Traditionally, we see that as the mean, but that can also be the median or the mode in certain contexts

Situation You are interested in the body mass of full grown adults from one gender. If you were to find one person from that gender at random, what would you expect that person to weigh?

Mean Four types of means Arithmetic mean (typical interpretation) Geometric mean Harmonic mean (most conservative) Quadratic mean (pooling operation)

Arithmetic Mean Unbiased estimator for the population mean When should I be concerned with the mean? Data should be symmetric equally likely to see something relatively large as I am relatively small Typically, the first thing to look at

Arithmetic Mean Add them up, divide by the number of them

Symmetric without a picture? Line the data up from worst to first (maximum to minimum) Find the one in the middle Subtract the minimum from the middle and subtract the middle from the maximum Are those two values equal? Skewness (we’ll see that later)

Situation You are interested in the economic conditions of a country (say the United States). If you were to select a household at random from the United States, how much money do you expect that household makes?

Median The exact middle observation of a set of data This is the mean when a set is symmetric When a set is asymmetric, these are different Not responsive to questionable influences The stoic of statistics

Median or Mean? Find the mean and the median How close are they? If they are “close”, use the mean If they are not close, typically use the median (this indicates skew)

Situation You are an artificial intelligence programmer and are interested in how to assign algorithms for random occurrences in a football game that result in scores. What is the expected score that happens on that play?

Weird Scenario... Football has a few ways of scoring, but we know what the set is going to be composed of: Touchdown (typical): 7 Touchdown (2 pt. conversion): 8 Touchdown (failed conversion): 6 Field Goal: 3 Safety: 2 The “average score” on a play in football is probably somewhere between 4.5 and 5 We should, however, expect the score to be either 3 or 7

Mode The mode is the most common observation in a dataset Sparingly used, but can be important If observations recur, why is that happening?

Questions: Which of the three makes sense based on my understanding of what should happen? Should this data be inherently symmetric? Should this data be pulled one way or the other? Should this data be predisposed to particular values? Answer these questions before you see it!

Measures of Spread What is the variation found within my data? Many different ways of looking at this (based on your choice of mean or median): Standard deviation/variance for mean Range/IQR for median

Variance Otherwise known as “residual error” Find the mean Take each observation and subtract the mean from it Square each value Add them up Divide by n-1

Variance If a set is “tight” to its mean, its variance will be low (we will call this leptokurtic later) If a set is “broad” to its mean, its variance will be higher (we will call this platykurtic later) Remember: the larger a residual, the higher the impact of squaring it is 5 2 = 25; 10 2 = 100, a factor of 4 when the residual doubled

Why square it? If we didn’t, variance would always be 0, rendering the statistic meaningless! Why? Variance allows us to see spread by making negative values positive and then adding more weight to something more distant (both effects of squaring)

Why n-1? Degrees of freedom Makes us more conservative Dividing by larger numbers reduces values; dividing by smaller numbers assumes wider We don’t have everything, so tend to conservative

Standard Deviation Undoes the squaring procedure Gives us the “average” distance between an observation and the mean If variance is high, standard deviation will be high; if low, standard deviation will be low Great metric for “how far” questions as it normalizes observations

Range and IQR In the case of the median, percentile observations are the focus Minimum, maximum 25%, 75% Range = maximum - minimum IQR = 75% - 25% IQR defines outliers

Skewness Is a distribution symmetric or biased? The signum of skewness is the relationship between the mean and the median Mean > median --> positive skew Mean negative skew

Reasons for left skew A test or task were too easy Ever taken an exam where nearly everyone got a great grade, but someone struggled? That’s left skew...

Reasons for right skew A variable naturally has a left bound Time-based data Economics

Right tail transform Right tail skews are typically transformed using logarithms or square roots Why?

Kurtosis Is data predisposed to a particular central occurrence? Can’t be less than 1 (-2 excess) Positive values of kurtosis reflect high peaks (predisposition) Negative values of kurtosis reflect flatter peaks

Assignment You will be provided a dataset that comes from a questionnaire about ecological values (New Ecological Paradigm). You will be shown all of the values mentioned in this slide set and a bar graph of the responses Determine the appropriate measure of central tendency. Determine whether or not you feel there are effects such as biasing or predisposition occurring. Remember: gut instincts...do not do any tests!

NEP For your reference: High values on odd questions favor human endeavors High values on even questions favor the environment