Stats Lab #3.

Slides:



Advertisements
Similar presentations
NORMAL CURVE Needed for inferential statistics. Find percentile ranks without knowing all the scores in the distribution. Determine probabilities.
Advertisements

Normal Distributions What is a Normal Distribution? Why are Many Variables Normally Distributed? Why are Many Variables Normally Distributed? How Are Normal.
Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area.
Examples of continuous probability distributions: The normal and standard normal.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Chapter 4: Measures of Central Tendency. Measures of central tendency are important descriptive measures that summarize a distribution of different categories.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Chapter 6: Random Variables
Continuous random variables
The Normal distribution
Linear Algebra Review.
11. The Normal distributions
CHAPTER 2 Modeling Distributions of Data
Numerical descriptions of distributions
Normal Distribution.
Chapter 2: Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Chapter 6: Random Variables
Describing Location in a Distribution
5.3 The Central Limit Theorem
Good Afternoon! Agenda: Knight’s Charge-please get started Good things
Z-scores & Shifting Data
Descriptive Statistics: Presenting and Describing Data
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Chapter 2: Modeling Distributions of Data
Density Curves and Normal Distribution
CHAPTER 2 Modeling Distributions of Data
Measures of central tendency
Chapter 2: Modeling Distributions of Data
The normal distribution
ID1050– Quantitative & Qualitative Reasoning
Social Science Statistics Module I Gwilym Pryce
Chapter 6: Random Variables
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
CHAPTER 2 Modeling Distributions of Data
Measuring location: percentiles
CHAPTER 2 Modeling Distributions of Data
Subscript and Summation Notation
CHAPTER 2 Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Do Now In BIG CLEAR numbers, please write your height in inches on the index card.
Chapter 2: Modeling Distributions of Data
S.M .JOSHI COLLEGE ,HADAPSAR
5.3 The Central Limit Theorem
Chapter 2: Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
The Normal Curve Section 7.1 & 7.2.
Chapter 6: Random Variables
Chapter 2: Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Lecture 12: Normal Distribution
Chapter 2: Modeling Distributions of Data
2.1: Describing Location in a Distribution
Chapter 6: Random Variables
Chapter 6: Random Variables
CHAPTER 2 Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Standard Deviation and the Normal Model
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Standard Scores and The Normal Curve
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Presentation transcript:

Stats Lab #3

Frequency Distribution Tables Create a set of data. Include at least 20 scores, and make sure there are many repetitions. X = c(..., ..., ...) First, look at the values in the sample and the frequencies of those values

Frequency Distribution Tables The function factor(X) tells R to treat X as a nominal variable. If we input this to the summary() function, we get counts of all the values (treating X as a nominal variable leads summary() to give us simple counts instead of statistics like median and mean). f = summary(factor(X)) f

Frequency Distribution Tables This is a frequency distribution table. It shows you the frequency f(x) for every value x. Notice that the total of the frequencies equals the size of the sample. n = length(X) n sum(f)

Frequency Distribution Tables f is very similar to a histogram. In fact we can make a bar plot of f to get a histogram. barplot(f)

Cumulative Distributions Pick any value in your sample. y= ___ To find the value of the cumulative distribution at x, we need to know how many scores are less than or equal to x. There are two ways to do this.

Cumulative Distributions First, we can count the raw scores. Recall that this command: y<= x will give a TRUE/FALSE vector showing which entries in X are less than or equal to x. If we take the sum of this vector, sum() will add up all the TRUEs, which tells us the total number of scores that are less than or equal to x. count = sum(y <= x)

Cumulative Distributions The second solution is to use the frequency table, and add up the frequencies for all values less than or equal to x. To do this, you need to find which entry x is in the table. If it were the third entry (meaning x is the third SMALLEST of the values that appear in the sample), then you would enter count = sum(f[1:3])

Cumulative Distributions These two uses of sum() work in different ways. The first counts how many times X is less than or equal to x, whereas the second adds up the frequencies in f that correspond to values less than or equal to x. You should be able to see why both methods give the same result.

Cumulative Distributions Either way you do it, you can then get the cumulative distribution by dividing the count by the sample size. count/n To get the whole cumulative distribution function, use ecdf(), which stands for empirical cumulative distribution function. F = ecdf(X)

Cumulative Distributions ecdf() is an unusual function because the result it gives, f, is a function itself. You can use the function ff to evaluate the cumulative distribution at any value. Start with x to verify you get the same result as above. Then try some other values, including values inside, below, and above the raw distribution. f(x)

Cumulative Distributions To see the whole function at once, plot it. Notice how it starts at zero, it jumps up at every value in the sample, and it ends at one. plot(f)

Percentiles To find what percentile a score is at, count the number of scores less than or equal to it, divide by n to get the proportion of the total sample, and multiply by 100 to get a percentage. count = sum(y<= x) proportion = count/n percentile = proportion * 100 Notice that the middle step, proportion, is the same as the cumulative distribution, f(x). You can also use F directly to get percentiles. f(x) * 100

Normal Distributions Here's a custom function that plots Normal distributions: (Theoretical) normal = function(mu,sigma) r = (-4000:4000)/1000 mu=mean(z) sigma=sqrt(sum((z-mu)^2)) x =mu + sigma*r Density = 1/(sigma*sqrt(2*pi))*exp(-z^2/2) plot(x,Density,type="l")

Normal Distributions If you’re interested, the important line is the one that defines Density. This is the mathematical equation that defines a Normal distribution. Go to this address and paste the code for the normal() function into the R window. http://matt.colorado.edu/teaching/stats/fall9/data/lab3-1.txt

Normal Distributions Here’s how to use the normal() function. mu and sigma are numbers you supply or variables you define. normal(mu, sigma) Start by plotting the Standard Normal, which means with a mean of zero and standard deviation of one (this is the standardized distribution, or distribution of z-scores, for any other Normal distribution). normal(0,1)

Normal Distributions Plot a few more Normal distributions using other values of mu and sigma. Notice how the values you use affect the position and size of the distribution, although the shape always stays the same.

Z-scores To find a z-score for any value x, all you need is the mean and standard deviation. You should know how to compute these statistics already. x = ___ z = (x-mu)/sigma Compare your choice of x to the distribution of X to verify that the result is a sensible z-score. Is x above or below the mean? Is it close to the center or far away?

mu=mean(x) sigma=sqrt((x-mu)^2) z=(x-mu)/sigma z

Z-scores If you have a z-score, you can convert back to the raw score. Recall that z tells you how many standard deviations the raw score is above (or below) the mean. So, to get the raw score, we start at the mean and add z standard deviations. z = ___ x = mu + z*sigma

Z-scores Again, compare your result to the distribution of X to verify it is sensible based on the z-score you started with. We can compute all the z-scores at once using vector arithmetic: Z = (X-mu)/sigma Z Z is now the standardized distribution.

Z-scores Plot the cumulative distribution functions for both the raw scores and the standardized scores, and notice how they are the same and how they are different. Before creating these plots, enter the following command, which tells R that you want to look at multiple plots at once (in this case, in a 12 arrangement). par(mfrow=c(1,2))

Probabilities of z-scores A useful function for inferential statistics is pnorm(), which is the cumulative distribution function for the standard Normal. Therefore pnorm(z) gives the probability that an observation will have a z-score less than or equal to z, assuming that the distribution is Normal. Try a few values, including 0, positive numbers, negative numbers, large numbers, and numbers close to 0. Compare your results to the plot of the standardized Normal from above (you will have to redraw it). pnorm(___)

Probabilities of z-scores If you want the probability of a z-score greater than or equal to z, just subtract from 1: 1 - pnorm(___) To get a percentile, multiply by 100: > pnorm(___) * 100

Probabilities of z-scores The inverse of pnorm() is qnorm(). qnorm() tells you what z-score is at a particular quantile. For example, the z-score for the first quartile is: qnorm(.25)

Probabilities of z-scores Try finding the z-scores for the median and the 62nd percentile. Think about what the result should be before you do each calculation. Also notice that you can’t give qnorm() inputs less than 0 or greater than 1 (try it), and test what happens if you input 0 or 1. Notice that the probability of being less than any quantile is the same as the number that defined that quantile. pnorm(qnorm(___)) Also, the quantile for any probability is the same as the original z-score: qnorm(pnorm(___))

Assignment #3 Imagine you are conducting a memory experiment, in which each subject is asked to recall 20 items. The possible scores are thus 0:20. Start by generating fake data for 200 subjects using this command: X = round(runif(200)*21 - .5) The runif() function creates random numbers that are uniformly distributed between 0 and 1. The rest of this command converts those numbers into integers from 0 to 20.

Assignment #3 1. Create a frequency distribution table for the data. Use the table to make a histogram of the distribution (include the commands, but not the plot, in your submission). Write a brief comment on the shape of the distribution – how does it compare to a Normal?

Assignment #3 2. Pick a value x in your distribution (use a value between 13 and 18) and use the cumulative distribution function to find its percentile. 3. Find the z-score for x. 4. Use the pnorm() function to predict the proportion of scores less than or equal to x. Convert the result to a percentile. 5. Notice the answers from questions 2 and 4 are different. Write two reasons why.