Presentation is loading. Please wait.

Presentation is loading. Please wait.

1.2: Statistical Thinking

Similar presentations


Presentation on theme: "1.2: Statistical Thinking"— Presentation transcript:

1 1.2: Statistical Thinking
Basic Practice of Statistics - 3rd Edition Math Fall 2013 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data Chapter 1 Chapter 1, Part 1 1

2 Recent Real-World Statistics
Math Fall 2013 Recent Real-World Statistics 74% of “online adults” use a social networking site of some kind. Among year olds, this rises to 89% [Pew Internet Project, Jan. 2014]. U.S. Adults living with at least one child under age 6 spent an average of 2 hrs/day providing primary care and 5.4 hrs/day on secondary care. [Bureau of Labor Statistics, June 2014]. In 2014, 43% of American adults self-identified as political independents [Gallup, Jan. 2015]. Chapter 1, Part 1

3 Links for Previous Examples
Math Fall 2013 Links for Previous Examples BLS American Time Use Survey: Pew Social Media User Demographics: Gallup Political Identification: Chapter 1, Part 1

4 Some Interesting Questions…
Math Fall 2013 Some Interesting Questions… Gallup’s political poll was based on phone interviews conducted during Were you contacted during this time? Gallup interviewed “only” 16,479 American adults, out of more than 230 million total! Question: Is Gallup justified in making such a claim about all American adults, using information from such a small percentage? Chapter 1, Part 1

5 Math Fall 2013 A Typical Problem The previous scenarios illustrate a central theme in real-world statistics: We have a VERY LARGE set of individuals (all American adults, for example). It is EFFECTIVELY IMPOSSIBLE to gather information from every single individual. Using information from a relatively small number of individuals, we can draw some conclusions about the VERY LARGE set. Chapter 1, Part 1

6 Statistical Thinking Section 1.2 Math 1231 - Fall 2013
Chapter 1, Part 1

7 *** Basic Terminology ***
Math Fall 2013 *** Basic Terminology *** Population: The complete set of individuals that we intend to study (Gallup Poll: All American adults—more than 230 million individuals!). Sample: The set of individuals for which we have obtained actual data. The Sample is a subset of the Population. (Gallup: 16,479 adults, contacted by telephone during 2014). Data: Specific information about each individual (Gallup: Each individual’s political affiliation). Assume we have data only from those in the sample. Chapter 1, Part 1

8 Population vs. Sample (not to scale!!)
Math Fall 2013 Population vs. Sample (not to scale!!) Chapter 1, Part 1

9 Descriptive / Summary Statistics
Math Fall 2013 Descriptive / Summary Statistics Goal: Take a set of data and organize or summarize it in a useful way. Example: Your current GPA summarizes your academic performance, without needing to see your entire transcript. Is this a fair/accurate summary? It doesn’t matter, many people will use it anyway. We will look at several different graphical and numerical summaries over the next few classes. Chapter 1, Part 1

10 Statistical Inference
Math Fall 2013 Statistical Inference Goal: Use (known) data from a sample in order to draw conclusions about the entire population. Since we don’t have data for the entire population, these conclusions ALWAYS have some degree of uncertainty. More carefully stated, Gallup concluded this: Based on sample data from 16,479 individuals, we claim that the percentage of American adults who self-identified as “independent” in 2014 was between 42% and 44%. There is a 95% chance this claim is correct. Chapter 1, Part 1

11 Statistical Inference
Math Fall 2013 Statistical Inference Gallup’s conclusion is based on data from a very small percentage, less than 0.01%, of the intended population. It may surprise you that such a small amount of data can be used to make a conclusion about a large population. We’ll discuss the underlying methods (and how to properly interpret these results) later in the course. For now: It is MORE IMPORTANT that the sample data are collected in an “appropriate” way, otherwise our methods will give potentially inaccurate results. Chapter 1, Part 1

12 How to think about statistical data and results
Math Fall 2013 How to think about statistical data and results Chapter 1, Part 1

13 Data: Important Considerations
Math Fall 2013 Data: Important Considerations Context: What do the data represent? The same numbers can have completely different meanings/interpretations: GRADE A B C D F W No. of Students 3 7 6 5 2 4 Which context is more interesting/relevant?? Day M T W R F S Pieces of Mail 3 7 6 5 2 4 Chapter 1, Part 1

14 Data: Important Considerations
Math Fall 2013 Data: Important Considerations Source: Where do the data come from? Who gathered the data? Who summarized or analyzed the data? Who sponsored or funded the research? Are those responsible for collecting/analyzing the data reliable? Is there any incentive to distort results and/or favor a particular type of result? Ask for examples where these might affect the results. Chapter 1, Part 1

15 Data: Important Considerations
Math Fall 2013 Data: Important Considerations Sampling Method: What process was used to choose the sample and collect data? Was sample selection limited to individuals who volunteered to provide data? Was sample selection limited to individuals who were convenient? Was data collection based on subjective judgment or ambiguous terminology? Example: Do you spend a lot of time studying? How/why might these affect the results? Chapter 1, Part 1

16 Important Considerations
Math Fall 2013 Important Considerations Conclusions: What are the results of the statistical analysis/inference? What is the intended population? Are the results valid for the entire population? Can you restate results in a way that can be understood by someone with no little or no knowledge of statistical terminology? Is there a cause-and-effect relationship, or merely a statistical relationship (“Correlation does not imply causality”—see Chapter 10). Most of these issues will be discussed later in the course. Chapter 1, Part 1

17 Some Other Considerations
Math Fall 2013 Some Other Considerations Practical Implications: Are the conclusions useful or relevant in a real-world context? A “Statistically Significant” claim comes from analyzing the data using numerical methods, without any context (see the next slide). “Practically Significant” means useful or relevant to the real world. These are not necessarily the same thing! Chapter 1, Part 1

18 *** Statistical Significance ***
Math Fall 2013 *** Statistical Significance *** When doing statistical inference, there is always some degree of randomness in how we gather the sample data. If we wind up with results that are unlikely to occur by random chance, we say the results are statistically significant. Simple Example: How likely is it that a fair coin would come up heads in 95/100 flips? Chapter 1, Part 1

19 Example: Class Attendance
Math Fall 2013 Example: Class Attendance In analyzing grade data among Math 1231 students from previous semesters, the average course grade for students with “many” absences was 15 points (out of 100) less than the average for students with “few or no” absences. My Claim: Students with “many” absences tend to have lower course grades than students with “few or no” absences. Is this statistically and/or practically significant? Is frequent absence the cause of lower grades? Chapter 1, Part 1

20 Math Fall 2013 Types of Data Section 1.3 Chapter 1, Part 1

21 *** Parameters vs. Statistics ***
Math Fall 2013 *** Parameters vs. Statistics *** The goal of statistical inference is to use sample data to draw conclusions about some VERY LARGE population. A parameter is a numerical value describing some aspect of the population. A statistic is a numerical value describing some aspect of a sample. The value of a statistic (computed from sample data) can be used to estimate the value of a parameter (almost always unknown). Chapter 1, Part 1

22 Parameters vs. Statistics
Math Fall 2013 Parameters vs. Statistics Example: I want to estimate the average height of all students currently in class. I choose four students “at random” and compute the average height for those four. Average class height: This is a parameter, its value is unknown to us (the population is the entire class). Average height for the group of four: This is a statistic (the sample consists of these four students). Question for later: Is it reasonable to claim that the sample average is “close” to the population average, based on our sample? Chapter 1, Part 1

23 Quantitative vs. Categorical
Math Fall 2013 Quantitative vs. Categorical Quantitative data consist of number that represent counts or measurements. All quantitative data is numerical, but not all numerical data is quantitative. Data with a unit of measurement (seconds, feet, pounds, dollars, etc.) is quantitative. Numerical data used as a label or range of values (Student ID Number, years) is not quantitative. Chapter 1, Part 1

24 Examples: Quantitative Data
Math Fall 2013 Examples: Quantitative Data The University keeps the following quantitative data about each student. Grade Point Average Number of Credit Hours Completed Age Amount of money owed for tuition Other examples? Chapter 1, Part 1

25 Math Fall 2013 Categorical Data Data that are not quantitative are called categorical. Non-numerical data must be categorical. Numerical data that serves to label or identify individuals are categorical (Example: Social Security Number). A useful guide: Would it make sense to consider an average value? If not, treat the data as categorical. Chapter 1, Part 1

26 Examples: Categorical Data
Math Fall 2013 Examples: Categorical Data The University keeps the following categorical data about each student: Name Laker ID Number Date of Birth Gender Residency (“in-state” or “out-of-state”) Other? Chapter 1, Part 1

27 Discrete vs. Continuous
Math Fall 2013 Discrete vs. Continuous Quantitative (number) data can be classified as: Discrete: Finitely many possible values, or infinitely many values with clearly-defined “next” and “previous” values. Discrete values can be put into a list. Continuous: Infinitely many values anywhere in a given range/interval, with no holes or gaps. A useful guide: Is it theoretically possible to make your measurements more accurate/precise? If so, then you probably have continuous data. Chapter 1, Part 1

28 Examples: Discrete or Continuous?
Math Fall 2013 Examples: Discrete or Continuous? Number of siblings. Amount of time it takes to run one mile. Resting pulse rate (beats per minute). Distance you live from this building. Grade point average. Credit card balance (in dollars/cents). Note: The answers may depend on how the data are measured and/or used. Chapter 1, Part 1

29 Math Fall 2013 Levels of Measurement An alternate way to classify data, based on what can be done to summarize/analyze it. There is some debate on how many levels are needed; these four are commonly used: Nominal (qualitative) Ordinal (ordering is meaningful) Interval (differences are meaningful) Ratio (ratios are meaningful) Chapter 1, Part 1

30 Nominal Level Consists of names, labels, or well-defined categories. There is no meaningful way to order values (alphabetical is often used). Colors (Red, Green, Yellow, etc.) Gender (Female, Male) Party Affiliation (Democrat, Republican, Other) State of Residence Nominal data is always categorical.

31 Ordinal Level Data can be arranged in some meaningful order, but differences between values cannot be computed or are useless. Course Grades (A, B, C, D, F) Competitive Rankings (Gold > Silver > Bronze, but “Gold minus Silver” is useless, even if we represent these as numbers 1, 2, 3). Ordinal data is often categorical (notable exceptions are IQ and Body Mass Index).

32 Interval Level Numerical values that can be put in order, and the difference between two values has some useful meaning. However, there is no “natural zero” level and ratios do not have any practical meaning. Examples: Temperature (Fahrenheit or Celsius): 15 is colder than 30, but zero degrees does not mean an absence of temperature (unless you use Kelvin). Calendar Data: Aug. 7th < Aug. 21st, with a difference of 14 days (but the 21st is not “three times” the 7th). Interval data is the least common of the four levels.

33 Math Fall 2013 Ratio Level Numerical values that can be put in order, the difference between two values has meaning, and there is a natural, non-arbitary “zero level.” Ratio data measures “amount of stuff.” The zero level means that “no stuff” is present. Distance, amount of time, mass/weight, many other physical quantities. Price, Checking account balance, many other monetary quantities. If “twice as much” or “half as much” make sense, then you have ratio data. Ratio data is always quantitative. Chapter 1, Part 1

34 Examples Classify the following data (about students): Age
Math Fall 2013 Examples Classify the following data (about students): Age Year of birth Academic major Weight Transfer student? (yes/no) Currently seated in which row? SAT score Chapter 1, Part 1

35 Examples Answers to the previous slide:
Math Fall 2013 Examples Answers to the previous slide: Age: Quantitative, Discrete(?), Ratio Year of birth: Quantitative(?), Discrete, Interval Academic major: Categorical, Nominal Weight: Quantitative, Continuous, Ratio Transfer student?: Categorical, Nominal Current row?: Categorical, Ordinal SAT score: Quantitative, Discrete, Ordinal(?) 08/15/11 Chapter 1, Part 1


Download ppt "1.2: Statistical Thinking"

Similar presentations


Ads by Google