Download presentation
Presentation is loading. Please wait.
1
What makes a first course in statistics?
Peter Holmes RSS Centre for Statistical Education
2
Statistical Literacy Growing literature
Many terms: numeracy, statistical numeracy, statistical literacy etc ability to use numbers in practice - particularly in context of statistics reflect how statistics is used in practice
3
An early definition from UK
Statistical numeracy requires a feel for numbers, an appreciation of levels of accuracy, the making of sensible estimates, a common sense approach to data in supporting an argument, the awareness of the variety of interpretation of figures and a judicious understanding of widely used concepts such as mean and percentages. All these are part of everyday living. This is basic statistical literacy for all and is essentially initially aimed at school rather than university courses.
4
What is Statistics? What statisticians do!!
Statistics changes numbers into information. Statistics is the art & science of deciding what are the appropriate data to collect, deciding how to collect them efficiently and then using them to give information, answer questions, draw inferences and make decisions.
5
What is Statistics? Statistics uses the language & ideas of probability to describe inferences and risk. Statistics uses samples to get insight into different populations. Statistics is making decisions when there is uncertainty
6
On to a first course After a basic statistical literacy there are different needs for different types of people (the businessman differs from the experimental scientist) The strength of statistics based on: Well-designed experiments - chance Small observational studies - bias Large observational studies - confounding Being statistically literate means being able to interpret data as done in practice. The growth of collecting all the data means that sampling variability becomes less important in practice. These essentially population data mean that this is not just descriptive data but a new and growing form of inferential statistics – hence the problem of confounding becomes important. Milo’s identification that Simpson’s paradox is a form of confounding is an important insight. (See the Chance article on presentation). The aim of having everyone statistically literate will have a lot of things in common but will also have different emphases for different types of student. In particular it will colour the course in statistics that are done with different students, particularly if they are only going to have one or at most two courses. For example it is clear that business, economics and social science majors ought to have a substantial section on reading and interpreting tables. These people will also need to be able to design good an unambiguous tables if they are to be statistically literate, as well as read them. 40 years ago, and continuing experience confirms, thqt what is a sensible first course for business majors is not a sensible course for medical students (for example) thsy may have little in common.
7
The Statistical Process
The nature of the statistical process Consider the diagram of real world and model/theoretical world that I have been using for some time. The essence of this diagram is that the main statistical question is what does the sample say about the population. This includes estimation of parameters and hypothesis testing, significance tests etc but also extends to effectiveness of treatments etc. The two ways of sampling include experiments and surveys (observational studies in Milo’s terms); but observational studies include other things which point to the existence of another set of statistical questions which are along the lines of ‘given the population figures what causes them to be as they are’; this can still be a question when the population figures are estimated from samples. In some ways it links with the difference between a statistical significant result and an important result.
8
What does the Sample tell us about the Population?
Well designed experiments : avoid confounding and bias, use chance Small surveys try to match the ‘gold standard’ to eliminate bias, typically estimate proportions and probability Large surveys or observational studies - essentially we know the population
9
Large observational studies
Are growing in number- often connected with legislation or business practice May be both sample and population e.g patients attending hospital are a (poor) sample of the neighbouring population; the total population of those who attend and a sample of people who attend hospitals so data are used for league tables. There are examples of where the same set of items can be seen as a sample and as a population then again as a different sample. Consider, for example, the records that hospitals keep of all patients admitted. These are a sample of those who could be admitted to the hospital (not well-defined and in important ways not representative). They are the actual population of the patients so admitted and statistics on these are of things like time series, quality control data. They are also a sample of all people admitted to all hospitals and so are used for comparisons, league tables and are open to confounding and paradoxes such as Simpson’s paradox.
10
The big ideas in Statistics
bias (in sample, questionnaire, response) causality, census, chance, conditional probability, confidence interval, confounding, correlation, cross-section study, dependent distribution (population, sample, probability)
11
The big ideas in Statistics
error, estimate, experiment, explanatory variable, fit, forecast. hypothesis test, independent, index number, inference, interaction, longitudinal study, mean, measuring scale, model (modelling) moving average, multivariate , observational data, outlier, percentage
12
The big ideas in Statistics
population, power of test, predict probability, random, rate, representative response variable, risk, sample, seasonality, significance, spatial statistics standard deviation, standard error standardise, survey, time series, trend type 1 error, type 2 error, utility, variability, variable, variance
13
Implications for the first course
Too much in previous slides for one course What do they know already? From general background With an AP Statistics background What do they need to know? For general education purposes For their particular subjects
14
Implications for the first course
The balance between use and do statistics The place of general examples such as those in Statistics a Guide to the Unknown Some specific things for specific majors Economics: time series, index numbers Industry: statistics process control Business: reading tables; understand confounding, surveys Psychology: experimental design, significance Government: spatial and time series
15
An example: Relationships & causes
Correlation is not causation is essentially a negative message - we can go beyond that Just to identify possible confounding factors is again essentially negative for decision making To be able to allow for the confounding effect means that it is possible to make stronger claims about cause and effect Simpson’s paradox is just an extreme case of the effect of a confounding variable
16
The Hospitals Note that City hospital has better record for good and for poor, but overall Rural hospital has better record. Confounded by proportion of poor patients
17
The Hospitals
18
The Hospitals At every proportion of poor patients the Rural Hospital had a 1% higher death rate than the City Hospital Raw totals compare 90% poor in City with 30% in Rural Overall there were 60% poor so standardize and compare there.
19
The End
21
Questions about the Population
With good sampling or large observational studies it is not the variability of any estimates that is of prime interest. What cause the data to be as they are? What are the underlying mechanisms? The descriptive/inferential divide of statistics is not particularly helpful in this context since, what was always true but now has growing importance, we start to look at what mechanism might underlie the process that leads to the data being as they are. Variability in estimates may not be particularly important. We start looking for causes. Chance may come in here, but in a different way to the classical uses. Much data collection is essentially population data.
22
Two Major areas for Statistical Literacy
Interpreting the data Identifying relationships and causes A third major area is to be able to make decisions based on data.
23
Interpreting the data Large studies typically give outputs in tables - need to read tables The variety of types of table is wide A key to reading them is the ability to recognise part/whole Other aspects of tables are identifying trends, effect of changing definitions, making them readable in the first place etc. Can refer to own small survey of school leavers from 20 years ago - using and reading tables were among the statistical things most done. Ehrenberg’s principles from 20 years ago are also still relevant.
24
Identifying Relationships and Causes
Correlation is not causation is essentially a negative message - we can go beyond that New approaches to definition of cause based on conditional probabilities - only hardening up what has been done previously with e.g. smoking causes lung cancer and the principles used by Cornfeld
25
Confounding Variables
A major problem with all observational studies Just to identify possible confounding factors is again essentially negative for decision making To be able to allow for the confounding effect means that it is possible to make stronger claims about cause and effect Simpson’s paradox is just an extreme case of the effect of a confounding variable
26
Conclusions (1) The Augsburg course has a different emphasis from many other courses to establish statistical literacy It reflects better the balance between the amount of data that comes as part of everyday life from the three types of statistical investigation - experiments, small and large observational studies Think about what you might say about the ASA recommendations for a first course in statistics, or a statistics minor.
27
Conclusions (2) It includes important material based on what is used, and contains genuine statistical insight At first sight it seems to include little on chance. In fact a lot of the proportional reasoning and a lot of the inference is essentially probabilistic reasoning in different words
28
Conclusions (3) In its approach and what it puts together it is unique - it draws on existing work in different places and adds some unique insights of its own Because of this it should be continually refined in the light of experience. Its emphasis is much more in line with the statistical literacy needed to read the news, by those in business , commerce or management, and by policy makers
29
The End
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.