STAT 13 -Lecture 2 Lecture 2 Standardization, Normal distribution, Stem-leaf, histogram Standardization is a re-scaling technique, useful for conveying.

Slides:



Advertisements
Similar presentations
Statistical Reasoning for everyday life
Advertisements

Describing Quantitative Variables
Percentiles and the Normal Curve
Statistics and Probability (Day 1) 13.2 Measures of Center and Spread
Introduction to Summary Statistics
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
STAT 13 -Lecture 1 Lecture 1: What to learn ? develop Language for Statistical reasoning and probabilistic argument Day 1 (today) : variable, mean, median,
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Measures of Dispersion
Business Research Methods William G. Zikmund Chapter 17: Determination of Sample Size.
Central Tendency and Variability
12.3 – Measures of Dispersion
STANDARD SCORES AND THE NORMAL DISTRIBUTION
Today: Central Tendency & Dispersion
Math 116 Chapter 12.
Chapter 6.
Working with one variable data. Spread Joaquin’s Tests Taran’s Tests: 76, 45, 83, 68, 64 67, 70, 70, 62, 62 What can you infer, justify and conclude about.
BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.
A Look at Means, Variances, Standard Deviations, and z-Scores
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
Descriptive Statistics
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Warm Up Solve for x 2) 2x + 80 The product of a number
Overview Summarizing Data – Central Tendency - revisited Summarizing Data – Central Tendency - revisited –Mean, Median, Mode Deviation scores Deviation.
LECTURE 8 Thursday, 19 February STA291 Fall 2008.
Business Research Methods William G. Zikmund Chapter 17: Determination of Sample Size.
Descriptive Statistics Measures of Variation. Essentials: Measures of Variation (Variation – a must for statistical analysis.) Know the types of measures.
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Objectives The student will be able to: find the variance of a data set. find the standard deviation of a data set.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Measures of Dispersion How far the data is spread out.
1 Elementary Statistics Larson Farber Descriptive Statistics Chapter 2.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Agenda Descriptive Statistics Measures of Spread - Variability.
What does Statistics Mean? Descriptive statistics –Number of people –Trends in employment –Data Inferential statistics –Make an inference about a population.
Central Tendency & Dispersion
LIS 570 Summarising and presenting data - Univariate analysis.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
Statistics and Data Analysis
©2011 Brooks/Cole, Cengage Learning Elementary Statistics: Looking at the Big Picture 1 Lecture 7: Chapter 4, Section 3 Quantitative Variables (Summaries,
STAT03 - Descriptive statistics (cont.) - variability 1 Descriptive statistics (cont.) - variability Lecturer: Smilen Dimitrov Applied statistics for testing.
Chapter 14 Statistics and Data Analysis. Data Analysis Chart Types Frequency Distribution.
STATISTICS Chapter 2 and and 2.2: Review of Basic Statistics Topics covered today:  Mean, Median, Mode  5 number summary and box plot  Interquartile.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
S1: Chapter 4 Representation of Data Dr J Frost Last modified: 20 th September 2015.
Descriptive Statistics Measures of Variation
INTRODUCTION TO STATISTICS
Statistical Methods Michael J. Watts
Statistical Methods Michael J. Watts
3.5 z-scores & the Empirical Rule
Introduction to Summary Statistics
Introduction to Summary Statistics
Summary Statistics 9/23/2018 Summary Statistics
Descriptive Statistics
Introduction to Summary Statistics
Introduction to Summary Statistics
S1: Chapter 4 Representation of Data
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Introduction to Summary Statistics
Introduction to Summary Statistics
Summary (Week 1) Categorical vs. Quantitative Variables
Summary (Week 1) Categorical vs. Quantitative Variables
MCC6.SP.5c, MCC9-12.S.ID.1, MCC9-12.S.1D.2 and MCC9-12.S.ID.3
Introduction to Summary Statistics
Introduction to Summary Statistics
Presentation transcript:

STAT 13 -Lecture 2 Lecture 2 Standardization, Normal distribution, Stem-leaf, histogram Standardization is a re-scaling technique, useful for conveying information about the relative standing of any number of interest with respect to the whole distribution Normal distribution : ideal bell shape curve Stem-leaf, histogram: empirical

STAT 13 -Lecture 2 Measure of dispersion Maximum - minimum=range Average distance from average Average distance from median Interquartile range= third quartile - first quartile average’Standard deviation = square root of ‘average’ squared distance from mean (NOTE: n-1) The most popular one is standard deviation (SD) Why range is not popular? 1.Only two numbers are involved : regardless of what happen between. 2.Tends to get bigger and bigger as more data arrive

STAT 13 -Lecture center point= C Average dist from median= ( )/5= ( )/5=5/5 Why not use average distance from mean? Ans: the center point C that minimizes the average distance is not mean What is it? Ans : median median Average dist from mean= ( **)/5; where **= length of 1.0 Mean=

STAT 13 -Lecture 2 Mean or Median Median is insensitive to outliers. Why not use median all the time? Hard to manipulate mathematically Median price of this week (gas) is $1.80 Last week : $2.0 What is the median price for last 14 days? Hard! How about if last week’s median is $1.80 Still hard. The answer : anything is possible! Give Examples. Median minimizes average of absolute distances.

STAT 13 -Lecture 2 Mean is still the more popular measure for the location of “center” of data points What does it minimize? It minimizes the average of squared distance The average squared distance from mean is called variance The squared root of variance is called standard deviation How about the “n-1” (instead of n, when averaging the squared distance), a big deal ? Why?

STAT 13 -Lecture 2 Yes, at least at the conceptual level Population : the collection of all data that you imagine to have (It can be really there, but most often this is just an ideal world) Sample : the data you have now ALL vs. AML example =====well-trained statistician++++ Use sample estimates to make inference on population parameters; need sample size adjustment (will talk about this more later) If n is large, it does not matter to use n or n-1 Sample mean = sum divided by ??? n or n-1?

STAT 13 -Lecture 2 One standard deviation within the mean covers about 68 percent of data points Two standard deviation within the mean cover about 95 percent of data points The rule is derived under “normal curve” Examples for how to use normal table. Course scores

STAT 13 -Lecture 2 High value = dense Low value=sparse A long list of values from an ideal population Density curve represents the distribution in a way that mean 0 1.Find mean and Set mean to 0; apply formula to find height of curve 2. Find SD and set one SD above mean to Set one SD below mean to -1 grade 1 75 SD=

STAT 13 -Lecture 2 Normal distribution How to draw the curve? Step 1 : standardization: change from original scaling to standard deviation scaling using the formula z= (x minus mean) divided by SD Step 2 : the curve has the math form of e z2z2 2 22 1 When does it make sense? Symmetric; one mode

STAT 13 -Lecture 2 Use normal table For negative z, page For positive z, page Q: suppose your score is 85, What percentage of students score lower than you? Step 1 : standardization (ask how many SD above or below mean your score is) answer : z= (85-75)/15=.666 Look up for z=.66; look up for z=.67; any reasonable value between the two is fine (to be continued)

STAT 13 -Lecture 2 Step by Step illustration for finding median through Stem-leaf plot (bring final scores for in class demo) Find Interquartile range Guess the mean, SD From Stem-leaf to Histogram Three types of histograms (equal intervals recommended)

STAT 13 -Lecture 2 Homework 1 assigned (due Wed. 2nd week) Reading mean and median from histogram Symmetric versus asymmetric plot. Normal distribution

STAT 13 -Lecture 2 From stem-leaf to histogram Using drug response data NOT all bar charts are histograms!!! NCBI’s COMPARE Histograms have to do with “frequencies”