Psychology 202a Advanced Psychological Statistics September 1, 2015
Overview of today’s class Introductory comments on the reading Start discussion of variables and distributions (first using R) Working definition of a variable Working definition of a distribution Graphical and numerical methods for understanding distributions Introduce SAS
What is a variable? We are going to be using R and SAS as tools to help us understand the behavior of variables. We need a working definition of “variable.” For our purposes, a variable consists of –numbers –that convey information –about some well-defined entity.
Is this a variable? Numbers that convey information about a well-defined entity
Numbers that convey information… Peabody Picture Vocabulary scores The Peabody is intended to be a test of verbal ability. Items consist of a set of pictures and word; the task is to choose the picture that matches the word. Example from youtube.youtube
…about some well-defined entity. These scores are a sample from the “Child Health and Development Study.” 10-year-old children Members of the Kaiser-Permanente health plan in Oakland, CA Data were collected at the beginning of the 1970s.
Is this a variable? numbers that convey information about a well-defined entity Yes, it’s a variable.
But it’s not easy to say much about the variable data are not organized difficult to see structure
R can help us see the structure. Peabody <- c( 69, 72, 94, 64, 80, 77, 96, 86, 89, 69, 92, 71, 81, 90, 84,100, 76, 57, 61, 84, 81, 65, 87, 92, 89, 79, 91, 65, 91, 81, 86, 85, 95, 93, 83, 76, 84, 90, 95, 67) “Peabody gets cee of…”
How can R help us see the structure? How many scores are there? length(Peabody) What’s a big score or a small score? sort(Peabody) Note that there are lot’s of scores in the 80s, not so many in the 70s and 90s, and very few in the 50s or 100s.
What is a distribution? We’ve been looking at what values of Peabody occur, and how often they occur. That’s what a distribution is: –the values that a variable takes on, together with… –…the frequencies (or relative frequencies) of those values.
Understanding distributions… We could interpret that idea very literally. There is one score of 57. There is one score of 61. There is one score of 64. There are two scores of 65. This would rapidly become tedious… …and would not be very useful.
…by ignoring detail. The problem with that approach is that there is too much information. Simplify, ignore detail to see structure. Ways to do that: –group the data –use pictures –use summary numbers (descriptive statistics)
Grouping data Looking at our sorted data, we can see that there is (or are) –one number in the 50s –seven numbers in the 60s –six numbers in the 70s –fourteen numbers in the 80s –eleven numbers in the 90s –one number in the 100s.
Grouping data We’ve gone too far: that’s not enough information. Here’s a general principle: try to group so that there are between seven and fifteen categories. (as with any rule of thumb, there will be exceptions)
Peabody Distribution ValuesFrequency 55 – – – – – – – – – – 1041
Some details about grouping continuous and discrete variables real limits
Lower and upper real limits What we sayWhat we mean 55 – – – – – – – – – – – – – – – – – – – – 104.5
Relative frequency distribution Peabody ValuesRelative Frequency 55 – – – – – – – – – –
What can we say about the distribution? There is variation in the scores. Peabody scores are most frequent in the 80s and 90s. Scores at the extremes of the distribution are much less frequent than scores at the center. But it’s still a little hard to see all this.
A simple graphical technique Stem-and-leaf plot –Divide the numbers into fine-grained and coarse-grained information. –coarse = “stem” –fine = “leaf” Manual demonstration stem(Peabody)