Introduction: Why statistics? Petter Mostad 2005.08.29.

Introduction: Why statistics? Petter Mostad 2005.08.29

Statistics is… …a way to summarize and describe information: not very interesting in itself …an important tool for research in my field, and something I look forward to learning more about …an important tool for research in my field, but I only learn what I must learn about this … boring What best describes your attitude towards statistics?

How much do you already know? Definition of mean value, median, standard deviation? Bayes formula? t-tests? p-values? Computing the probability of getting dealt a flush in a game of poker?

Why a course in statistics?

What is research? A distinguishing feature of scientific research is that its conclusions are reproducible by other scientists Thus, research must –contain information about exactly what has been done –somehow convince the reader that if she repeates what has been done, she will reach the same conclusions

A goal of science: To study causality Ultimately, much of science is concerned with establishing statements like ”If A happens, then B will follow” In other words, one wants to show that B is reproduced every time A happens.

Example: Studying causality through intervension Retrospective studies can show covariation between variables, but not causality. Intervension can be used to argue that changing a certain variable causes another variable to change. To study effect of intervension, a control group is needed

Example: Reproducibility through randomization Assume an experiment is done, with two groups, receiving different ”treatment”: Differences in the result could be caused by differences in the treatments, or by differences between the groups from the start. Randomising the division into groups makes it unlikely that the groups are systematically different from the start

Example: blind, or double-blind studies Differences between the two groups could be caused by people’s knowledge they are in one group or the other. Differences could also be caused by the experimentalists (doctors) knowledge who is in which group. Removing the first knowledge gives a blind study, removing the second gives a double-blind study.

Quantitative and qualitative research Quantitative: Focus on things that can be measured or counted Qualitative: Focus on descriptions and examples. Two different scientific tratidions. Health economics and administration has elements from both. Both have advantages and disadvantages (which)?

Quantitative research For quantitative research, we have many good tools to ensure reproducibility of conclusions Statistics is a very important such tool Statistics used in this way can be called inferential statistics

Example: Reproducibility through statistics If you repeat a quantitative investigation (a questionnaire, an observation of a social phenomenon, a measurement) you are unlikely to get exactly the same numbers. Statistics can help you to estimate how different results are likely to be. This can tell you which conclusions are likely to be reproducible in a potential repetition of the investigation.

Descriptive vs. inferential statistics Descriptive statistics: To sum up, present, and visualize data. Inferential statistics: A tool to handle, and to draw (”infer”) reproducible conclusions on the basis of, uncertain information.

Descriptive statistics Goal: To reduce amount of data, while extracting the ”most important information” Can be done with single numbers (”summary statistics”), tables, or graphical figures. My next lecture will look at descriptive statistics

Can descriptive statistics be ”objective”? A person makes choices about: –What to measure –How to measure (for example what questions to ask or what scale to use) –How to present the result Thus: A presentation or publication should always contain information about exactly how results have been obtained

Inferential statistics: Hypothesis test example You throw a dice ten times, and get 1 seven out of these ten times. You conclude that this is not a fair dice. Is the conclusion reproducible? You need to compute what observations are to be expected if the dice is a fair one.

Example: probability calculations The disease X has a 1% prevalence in the population. There is a test for X, and –If you are sick, the test is positive in 90% of cases. –If you are not sick, the test is positive in 10% of cases. You have a positive test: What is the probability that you are sick?

Example: desicions based on uncertain information An oil company wants to produce the maximum amount from an oil field. Available information: –Measurements (seismics) describing approximately the geometry of the rock layers –Information from a couple of test drills –Information from geologists Where should they place the wells, and how should they produce?

The concept of a MODEL What separates inferential statistics from descriptive statistics is the use of a model. A model is a (mathematical) description of the connections between the variables you are interested in. It is a simplification of reality, and so never ”correct” or ”wrong”, but it can be more or less useful.

Statistical (or stochastic) models In statistical models, the variables are predicted with some variation or uncertainty: –The model for force moving a mass: F=ma, is exact. –The model for what the eyes of a fair dice will show contains probabilities We can use the observed data to choose between possible models. The word ”stochastic” is often used when we are focusing more on the model than on the data.

Example Assume a certain portion of the population carry a specific gene, you want to know how many The model is simply the unknown proportion p You select and measure a number of individuals, and use the information to select the right model, i.e., the right p

Example You want to know the height distribution among 30 year old Norwegian women. You assume, using experience, that a good model is a normal distribution with some expectation and some variance You use data from a number of women to select a model (i.e. an expectation and variance), or a range of likely such expectations and variances

Sampling Often, the model can be a simplifying description of the population we want to study. We investigate the model by sampling from the population. When each individual is selected independently and randomly from the population, we call it (simple) random sampling Simple random sampling makes it easier to compute what we can conclude about the model from the data

Using the results Selecting some models over others means that you increase your understanding of each variable, and the relationships between variables Once a model has been selected, it can be used to forecast or predict the future Being able to predict the likely results of different desicions can be used to improve the desicion making

The goals of this course To enable you to understand, use, and criticise research results produced by others, and in particular to understand and view critically the statistical arguments To enable you to produce your own valid research results, using statistical tools.

Overview of statistics topics we will look at Descriptive statistics Probability theory Sampling and estimation Regression Non-parametrics Analysis of variance Desicion theory Some more advanced topics Much information is and will be available at course web page

Introduction: Why statistics? Petter Mostad 2005.08.29.

Similar presentations

Presentation on theme: "Introduction: Why statistics? Petter Mostad 2005.08.29."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction: Why statistics? Petter Mostad 2005.08.29.

Similar presentations

Presentation on theme: "Introduction: Why statistics? Petter Mostad 2005.08.29."— Presentation transcript:

Similar presentations

About project

Feedback