Preliminary Chapter – What is Statistics?
Science of learning from data Statistics: Science of learning from data This includes collecting and interpreting! 4 main themes: (compare graphs and numbers) I. Exploring data II. Sampling & Experimentation (collect data) III. Anticipating Patterns (Probability and Simulation) IV. Statistical Inference (Make conclusions)
Population: Entire group of interest Sample: Representation of subjects from the population
Where do you get good data?
Available Data: Data that was produced in the past Available Data can be good or bad!
Questionnaire designed to gauge public opinion Survey: Questionnaire designed to gauge public opinion STEPS: 1. Select a sample to represent a larger population Ask questions to the sample and record their responses 3. Use results to draw conclusions about the population
Census: When you survey everyone in the population of interest
Example #1: I want to know if Steele Canyon students like school. How can I find out?
Observational Study: Don’t interfere with the individuals, no treatment applied. Includes surveys. Ex. How many students have cell phones?
Experiment: Do something to the individuals, apply a treatment (doesn’t have to be a drug) Ex. Can sleeping for 8hrs increase your GPA? Cause and Effect: Only experiments because a treatment was applied in a controlled environment.
Survey if they wash hands Example #2: Do people wash their hands every time after going to the bathroom? Design an observation and an experiment. Which one provides a cause and effect relationship? Observe: Survey if they wash hands Watch to see if they wash their hands Experiment: See if people are more likely to wash hands if someone else is present in the bathroom influences the frequency of people washing their hands. Cause-and-Effect: Experiment
If they exercise, prior health concerns, how much they drink Example #3: In adults, moderate use of alcohol is associated with better health. Some studies suggest that drinking wine rather than beer or spirits yields added health benefits. a. How would you determine if wine caused better health than beer? Experiment. Assign people into a drinking wine, beer, hard liquor, or no alcohol group. b. What else could influence your results? What are some examples that could influence health that should be considered? If they exercise, prior health concerns, how much they drink
Day 2: Making Sense of Data
Data Analysis: Organizing, displaying, summarizing data and asking questions. Individuals: Objects described by a set of data. Can be people, animals, things Variables: Characteristic of an individual.
Categorical Variables: Places an individual into one of several groups or categories (favorite color, ethnicity) Quantitative Variables: Takes numerical values for which arithmetic operations (like adding) make sense. (age, height, weight)
Where did the data come from? When given data, what should you ask yourself? W5HW Who: What: Why: When: Where: How: By Whom: Individuals variables purpose How old is the data? Where did the data come from? How did they gather the data? Who gathered the data?
Example #4 Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2004 model motor vehicles: a. Answer the key questions (who, what, why, when, where, how, and by whom). Who: 2004 vehicles What: Make and Model, vehicle type, transmission type, # of cylinders, City MPG, Highway MPG
a. Answer the key questions (who, what, why, when, where, how, and by whom). Why: Compare MPG in different cars When: ? Where: ? How: ? By whom: ?
b. Which variables are categorical? Quantitative? Make and Model, Vehicle type, transmission type Quantitative: # of cylinders, City MPG, Highway MPG
c. Can you assume that manual transmissions always have better mpg c. Can you assume that manual transmissions always have better mpg? Explain. No, Not enough data collected. Only one manual transmission in the list!
Example #1 Gallup News Service conducted a survey of 1012 adults aged 18 years or older, August 29-September 5, 2000. The respondents were asked, “Has anyone in your household been the victim of a crime in the past 12 months?” Of the 1012 adults surveyed, 24% said they or someone in the household had experienced some type of crime during the preceding year. Gallup News Service concluded that the percentage of victimizations had risen from past records. They claimed that 20% of all households had been victimized by crime during 1998-1999. For this survey, describe the following: Individuals- Variable measured- Categorical or Quantitative? 1012 adults 18 years old and older If they had experienced a crime in the past 12 months Categorical
categorical categorical quantitative quantitative quantitative Example #6: For each of the following variables, state whether it is categorical or quantitative: Whether a penny lands on heads or tails – The color of a Reese’s Pieces candy – Number of calories in a fast food meal – The life expectancy of a nation – Amount of college fees – The weight of an automobile – Who people voted for in the election – categorical categorical quantitative quantitative quantitative quantitative categorical
Distribution: The values the variables take and how often it takes these values Mean: Average value. Add up numbers and divide by # of values Mode: Most frequent value
Bar graph: Displays categorical variables
How to construct a bar graph: Step 1: Label your axes and title graph Step 2: Scale your axes Step 3: Leave spaces between bars
Side-by-Side bar graph: Compares two variables of one individual Dotplot: Dots are used to keep count of the frequency of each number
How to construct a dotplot: Step 1: Label your axis and title your graph. Step 2: Mark a dot above the corresponding value
Example #7 The following table gives information about the color preferences of vehicle purchasers in 1998. Make a side-by-side bar graph to compare the full-sized or intermediate-sized car vs. light truck or van and color choice.
b. What do you notice about the graph? White is the overall favorite truck color. Car color is fairly evenly distributed.
Make a dotplot of the data. Example #8 The number of goals scored by each team in the first round of the California Southern Section Division V high school soccer playoffs is shown in the following table. Make a dotplot of the data.
b. Describe what you see in a few sentences. Many teams didn’t score. The team that scored the most was the one that scored 7 points.
Probability: Chance behavior that is predictable in the long run
Example #9 You are taking the AP Stats exam. A multiple choice question is provided with answers a-e. You have no idea what the answer is! What is the probability you guess the correct answer? 1 5 = 0.2 = 20%
Statistical Inference: Making guesses on the population given many samples from that population.
Example #10 When you opened your bag of chips you were disappointed to see how empty the bag already was. The bag said it weighed 1.5 oz. You went and measured and discovered your bag weighed 1.45 oz. Can you say that the company should fix their machine? No! Sample size is too small!