Download presentation
Presentation is loading. Please wait.
1
The Role of statistics and the data analysis process
AP Stats Chapter 1
2
The Data Analysis Process
Step 1 Acknowledging Variability Collecting Data Sensibly Step 2 Describing Variability in the Data Descriptive Statistics Step 3 Drawing Conclusions in a Way That Recognizes Variability in the Data Probability Supports the Conclusion
3
Statistics (1) The scientific discipline that provides methods to help make sense of data. Suspicion: Extreme skeptics, usu. speaking out of ignorance, characterize this discipline as a subcategory of lying. Used properly, statistical methods offer a set of POWERFUL tools for gaining insight into the world around us. Used in business, medicine, agriculture, social sciences, natural sciences, and applied sciences.
4
Statistics (2) “…teaches us how to make intelligent judgments and informed decisions in the presence of uncertainty and variation.”
5
1.1 Three Reasons to Study Statistics
Be Informed To understand news reports making data-based claims. Extract information from tables and graphs. Follow numerical arguments. Understand the basics for valid research designs. Understand Issues and Sound Decision Making Based on Data Is existing info adequate, or do we need more? How to collect information in a reasonable and thoughtful manner. Summarize data in a useful and informative way. Analyze available data. Make conclusions and decisions, and assess risk for an incorrect decision. Evaluate Decisions That Affect Your Life Other people use statistical methods to make decisions that affect you life. Drug screening by companies, medical researchers, university financial aid, insurance companies, etc. Are the decisions made by these groups done in a reasonable way.
6
1.2 The Nature and Role of Variability
Statistics focuses on collecting, analyzing, and drawing conclusions from data. If all measurements were identical for every individual, this task would be easy. But populations without variability are virtually non-existent. In fact, variability is universal. We need to understand variability to be able to collect, analyze, and draw conclusions from data in a sensible way. The branch called descriptive statistics helps to increase our understanding of the nature of variability in a population.
7
Figure 1.1 Histogram of heights (in inches) of female athletes: (a) basketball players; (b) gymnasts. Sample size is 100 for both groups (N =100). Next
8
Example 1.1 If the Shoe Fits
Is the variation in the heights between the two groups similar? What if a 5’11” woman was looking for her sister who is practicing with her team in the gym, where would you direct her? Why? What if you found a pair of size 6 shoes left in the locker room? Where would you try to return them? You informally used statistical reasoning that combined your knowledge of the relationship of height between siblings and height and shoe size with the information about height distributions in Fig. 1.1.
9
Figure 1.2 Frequency of contaminant concentration (in ppm) in well water. Based on the average of five measurements per day for 200 days (N = 200).
10
Example 1.2 Monitoring Water Quality
Suppose a chemical spill occurred at a manufacturing plant 1 mile from the well. One month aver the spill the average contamination is 15.5 ppm. Would this be convincing evidence that well was affected by the spill? What if the average was 17.4 ppm? 22.0 ppm? In both Examples 1.1 and 1.2, reaching a conclusion required an understanding of variability. Variability allows us to distinguish between usual and unusual values.
11
1.3 Statistics and the Data Analysis Process
Conclusions based on data are seen regularly in popular media and professional and academic publications. Decisions are data driven in business, industry, and government. Descriptive statistics – methods for organizing and summarizing data. Next step in the data analysis process once a data set has be collected or an appropriate source identified. Inferential statistics – involves generalizing from a sample to the population and requires and understanding of the variation in the population (i.e., descriptive statistics). Also includes assessing the reliability of such generalizations. Because these conclusions are based on incomplete information there is a need to quantify the chance of an incorrect conclusion.
12
Population – the entire collection of individuals or objects about which information is desired.
Sample – a subset of the population, selected for study in some prescribed manner.
13
The Data Analysis Process (1)
Raw data without analysis is of little value, likewise even a sophisticated analysis cannot provide meaningful information from data that were not collected in a sensible way. Data collection and analysis allow researchers to answer questions about the way systems work. Steps to data analysis process: Understand the nature of the problem. Decide what to measure and how to measure it. Data collection Data summarization and preliminary analysis Formal data analysis Interpretation of results
14
Example 1.3 A Proposed New Treatment for Alzheimer’s Disease
In 2002, eleven patients had shunts implanted into brain. Comparison group received the standard care for Alzheimer’s Quarterly tests of memory function for both groups showed a steady decline in the control group, while the surgically treated (experimental) did not decline. Study was too small to produce conclusive statistical evidence, but the preliminary results justified a larger study to include 256 patients at 25 medical centers across the country.
15
The Data Analysis Process (2)
Evaluating a Research Study (in the popular and technical press): The six data analysis steps can be used as a guide. What were the researchers trying to learn? What question motivated their research? Was relevant information collected? Were the right things measured? Were the data collected in a sensible way? Were the data summarized in an appropriate way? Was an appropriate method of analysis used, given the type of data and how the date were collected? Are the conclusions drawn by the researchers supported by the data analysis?
16
Example 1.4 Spray Away the Flu
Newspaper article reported the results of a study in which vaccine was administered by nasal spray. Results general look promising, but the newspaper articles typically do not report the details of the study. How were the data collected? How were the children selected for the study? How was it determined which children received the vaccine and which received the placebo? How was subsequent diagnosis of flu made? Not mentioned by authors of text: how does this compare to vaccine by injection?
17
1.4 Types of Data and Some Simple Graphical Displays
Describing Data Variable – any characteristic whose value may change from one individual or object to another. Data – results from making observations either on a single variable or simultaneously on two or more variables. Univariate data set – data set consisting of observations on a single attribute. Categorical (or qualitative) – individual responses are categorical responses. Numerical (or quantitative) – observations are numerical.
18
Example 1.5 Airline Safety Violations
FAA monitors airlines USA Today March reported on violations that could lead to fines from FAA. Violation categories: Security (S), Maintenance (M), Flight Operations (F), Hazardous Materials (H), or Other (O). Subset of the data for 20 administrative actions: S S M H M O S M S S F S O M S M S M S M What kind of data set is this?
19
Describing data (continued)
Bivariate data set – when a data set consists of two attributes recorded simultaneously for each individual. Multivariate data set
20
Example 1.6 Revisiting Airline Safety Violations
Number of Violations Average Fine per Violation (in US dollars) Alaska 258 America West 257 American 1745 Continental 973 Delta 1280 Northwest 1097 Southwest 535 TWA 642 United 1110 US Airways 891 What type of data set is this?
21
Two Types of Numerical Data
Discrete – a numerical variable in which the possible values of the variable correspond to isolated points on the number line. Continuous – a numerical variable in which the possible values of the variable form an entire interval on the number line.
22
Example 1.7 Calls to Drug Abuse Hotline
The number of telephone calls per day to a drug abuse hotline is recorded for 12 days: This data set represents isolated points on a number line, thus this is a discrete numerical data set. Example 1.6 had both types of numerical data. Number of violations Average fine per violation In general, data are continuous when observations involve making measurements
23
Frequency Distributions and Bar Charts for Categorical Data
A tabular or graphical display can effectively communicate information A common way to present categorical data is in the form of a in a table called a frequency distribution. Frequency distribution for categorical data – a table that displays the possible categories along with the associated frequencies and/or relative frequencies. Frequency – for a particular category, the number of times the category appears in the data set. Relative frequency – for a particular category, the fraction or proportion of the observations resulting in the category.
24
If a table includes relative frequencies, it is sometimes referred to as a relative frequency distribution.
25
Example 1.8 Motorcycle Helmets Can You See Those Ears?
Table 1.1 Frequency distribution of helmet use. Helmet Use Category Frequency Relative Frequency No Helmet 731 0.430 Noncompliant Helmet 153 0.090 Compliant Helmet 816 0.480 TOTAL 1700 1.000 731/1700 153/1700 Total number of observations Should be equal to 1, but may be slightly off due to rounding.
26
Bar Charts – a graph of the frequency distribution of categorical data.
When to use: Categorical data. How to construct: Horizontal line, with category names below line at regularly spaced intervals Vertical line, label the scale using in frequency or relative frequency. Rectangular bar above every category should be same width, height determined by category’s frequency. What to look for: Frequently and infrequently occurring categories.
27
Example 1.9 Revisiting Motorcycle Helmets
Figure 1.5 Bar chart of helmet use.
28
Dotplots for Numerical Data
A dotplot is a simple way to display numerical data when the data set is reasonably small. When to use: Small numerical data sets How to construct: Draw a horizontal line and mark with an appropriate measurement scale. Locate each value in the data set along the measurement scale and represent it by a dot. If there are two or more observations with the same value, stack the dots vertically. What to look for: Dotplots convey information about: A representative or typical value in the data set. The extent of the spread of the data The nature of the distribution of values along the number line. The presence of unusual values in the data set.
29
Example 1.10 Graduation Rates for NCAA Division I Schools in CA and TX
From The Chronicle of Higher Education, Aug. 31, 2001 Reported graduation rates as percentages of full-time freshmen in fall 1993 who earned bachelor’s degrees by Aug California Texas 64 41 44 31 37 67 21 32 88 35 73 72 68 71 39 63 81 90 82 74 79 12 46 28 66 70 65 25 24 22
30
Figure 1.6 Minitab dotplot of graduation rates.
Figure 1.7 Minitab dotplot of graduation rates for California and Texas.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.