HUDM4122 Probability and Statistical Inference January 26, 2015
ASSISTments Did everyone get an account for the ASSISTments system? Did anyone have difficulties setting up an account? First homework is due in a week
Today Ch. 1 in Mendenhall, Beaver, & Beaver Variables and Variable Types Graphing Data Basic Exploratory Data Analysis
Variables What is a variable?
Variables What is a variable? “A variable is a characteristic that changes or varies over time and/or for different individuals or objects under consideration.” – MBB p. 8
Which of these are examples of variables? GPA Shoe size Age Number of correct answers in ASSISTments Number of times gamed the system in ASSISTments Favorite vegetable Favorite type of pie Pi
What is a measurement?
A measurement is the result of measuring a variable on a single experimental unit – A person, if you are studying people – A class, if you are studying classes – A pizza, if you are studying pizzas
A measurement Person furthest towards my left in the front row, what is your name?
Now I have a measurement
A measurement Person furthest towards my right in the second row, what is your name?
Now I have data A set of measurements
Now I have data A set of measurements Note that in stats class or education journals, the word “data” is plural
Now I have data A set of measurements Note that in stats class or education journals, the word “data” is plural I only know one exception
Now I have data A set of measurements Note that in stats class or education journals, the word “data” is plural I only know one exception
Everyone repeat after me
“My data are in this Excel file.”
Everyone repeat after me “My data are in this Excel file.” “Your data aren’t evidence for that conclusion.”
Everyone repeat after me “My data are in this Excel file.” “Your data aren’t evidence for that conclusion.” “His data were hard to collect.”
However…
I do not recommend insisting that data is plural in bars, on first dates, or at Thanksgiving dinner
Any questions or concerns?
Univariate Data A single variable is collected Height 5’11” 5’10” 5’6”
Univariate Data Two variables are collected (for the same data point) HeightDrum-Playing Skill 5’11”1 2 5’10”4 5’6”8
Multivariate Data 3+ variables are collected NameHeightDrum-Playing Skill John Lennon5’11”1 Paul McCartney5’11”2 George Harrison5’10”4 Ringo Starr5’6”8
Any questions or concerns?
Types of Variables
Quantitative/Numerical Data Data that can be expressed as numbers
What are some examples Of numerical data?
Ordinal Data Refers to data where there is a known order, but either – The data clearly isn’t numbers – The space between values is not guaranteed to be equal
Examples of Ordinal Data Months of the year: January, February, March, April, … Agreement level: Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree Quality of university: Highly selective, selective, somewhat selective, non-selective
Other examples of ordinal data?
Nominal data Values have no order or spacing Name State of Residence – New Jersey is not greater or less than New York
Nominal data Values have no order or spacing Name State of Residence – New Jersey is not greater or less than New York – Although my brother might disagree
Other Examples of Nominal Data?
Another name Nominal data is often also called categorical data
Another name Nominal data is often also called categorical data Technically ordinal data is also categorical, but no one ever uses the term that way
Any questions or concerns?
Exploratory Data Analysis “Analyzing data sets to summarize their main characteristics” “Seeing what the data can tell us beyond the formal modeling or hypothesis testing task”
Goal Generate hypotheses Understand your data better
Often (but not always) done with graphs
Which of these is your favorite type of graph? Pie chart Bar graph Frequency histogram Line graph Scatterplot Stem-and-leaf plot Box plot Other
Pie Chart Take a set of categories that add to 100% Show the proportion each category has
Pie Chart: Example
Interpret This Graph Please
Never Ever Do This: Completely Visually Misleading Fair use; critique
Let’s make a pie chart Using the “your favorite graph” data
Any questions?
Alternative: Bar Graphs
Interpret this graph please
What are the advantages/disadvantages relative to pie chart?
By the way: X and Y axes X axis Y axis
Strengths of bar graphs Categories don’t have to add to 100% Easier to see small differences between categories You can compare variables too
Two-group bar graph
Let’s make a bar graph Using the “your favorite graph” data
Any questions?
Some suggest always using bar graphs instead of pie charts
“The only thing worse than a pie chart is several of them.” – Edward Tufte “Save the pies for dessert.” – Stephen Few
But they’re wrong
Pie charts are good for representing part- whole relationships in really easy to see ways Pie charts are good at representing overall proportions
Nice example (Gabrielle, 2013)
Any questions?
Frequency Histogram A type of bar graph – But usually when people say “bar graph”, they do not mean “frequency histogram” – Also: by convention, no space between bars X axis shows values or ranges of a quantitative variable Y axis shows how many data points have that value or range for the quantitative variable
Example from the book Visits to Starbucks
Another Example
Was this an easy exam or a hard exam?
Would you rather be in the blue class or the orange class?
By the way: outliers OUTLIER
If there’s time, let’s make a frequency histogram Everybody: What’s your height in feet-inches? (Example: I’m 5’9”)
Any questions?
Line Graph Shows trends from left-to-right The trend is usually over time But it doesn’t have to be…
Example Line Graph Used under Creative Commons License
Example Line Graph (VanLehn, 2011) (This graph shows perceptions, not data on effectiveness.)
Any questions?
Not going to discuss today Stem-and-leaf plot Very, very rare to see in actual use Quite poor for any sizable data set If you want to learn about them, see the book
Future Classes Scatterplot Box plot
Upcoming Classes 1/28 Describing Data with Numerical Measures – Ch. 2 2/2 Describing Bivariate Data (Asgn. 1 due) – Ch. 3 2/4 Introduction to Probability – Ch. 4
Questions? Comments?