Organization of statistical investigation. Medical Statistics Commonly the word statistics means the arranging of data into charts, tables, and graphs.

Slides:



Advertisements
Similar presentations
ADVANCED STATISTICS FOR MEDICAL STUDIES Mwarumba Mwavita, Ph.D. School of Educational Studies Research Evaluation Measurement and Statistics (REMS) Oklahoma.
Advertisements

Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
SUMMARIZING DATA: Measures of variation Measure of Dispersion (variation) is the measure of extent of deviation of individual value from the central value.
Business Statistics for Managerial Decision
Drawing Samples in “Observational Studies” Sample vs. the Population How to Draw a Random Sample What Determines the “Margin of Error” of a Poll?
Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.
Math 161 Spring 2008 What Is a Confidence Interval?
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Estimation from Samples Find a likely range of values for a population parameter (e.g. average, %) Find a likely range of values for a population parameter.
Quantitative Methods – Week 6: Inductive Statistics I: Standard Errors and Confidence Intervals Roman Studer Nuffield College
Evaluating Hypotheses
Why sample? Diversity in populations Practicality and cost.
Inferences About Process Quality
Today Concepts underlying inferential statistics
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 10 Sampling Distributions.
Chapter 5 Sampling Distributions
Multiple Choice Questions for discussion
Chapter 1: Introduction to Statistics
Virtual COMSATS Inferential Statistics Lecture-6
Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Average values. Measures of Association n Absolute risk -The relative risk and odds ratio provide a measure of risk compared with a standard. n Attributable.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Chapter Twelve Census: Population canvass - not really a “sample” Asking the entire population Budget Available: A valid factor – how much can we.
PARAMETRIC STATISTICAL INFERENCE
An Overview of Statistics
Standard Error and Confidence Intervals Martin Bland Professor of Health Statistics University of York
Dynamic Lines. Dynamic analysis n Health of people and activity of medical establishments change in time. n Studying of dynamics of the phenomena is very.
Day 3: Sampling Distributions. CCSS.Math.Content.HSS-IC.A.1 Understand statistics as a process for making inferences about population parameters based.
Estimation of authenticity of results of statistical research (part II)
Average Arithmetic and Average Quadratic Deviation.
Dynamic lines. Measures of Association n Absolute risk -The relative risk and odds ratio provide a measure of risk compared with a standard. n Attributable.
Statistical Inference Statistical Inference involves estimating a population parameter (mean) from a sample that is taken from the population. Inference.
Section 10.1 Confidence Intervals
BPS - 3rd Ed. Chapter 131 Confidence Intervals: The Basics.
FPP Confidence Interval of a Proportion. Using the sample to learn about the box Box models and CLT assume we know the contents of the box (the.
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
Medical Statistics as a science
Relative Values. Statistical Terms n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the data  not sensitive to.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size for Precision or Power.
Chapter Eight: Using Statistics to Answer Questions.
Inference: Probabilities and Distributions Feb , 2012.
Authenticity of results of statistical research. The Normal Distribution n Mean = median = mode n Skew is zero n 68% of values fall between 1 SD n 95%
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Average Arithmetic and Average Quadratic Deviation.
Organization of statistical investigation. Medical Statistics Commonly the word statistics means the arranging of data into charts, tables, and graphs.
Average values and their types. Averages n Averages are widely used for comparison in time, that allows to characterize the major conformities to the.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Estimation of authenticity of results of statistical research.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Measuring of Correlation. Definition Correlation is a measure of mutual correspondence between two variables and is denoted by the coefficient of correlation.
STAT03 - Descriptive statistics (cont.) - variability 1 Descriptive statistics (cont.) - variability Lecturer: Smilen Dimitrov Applied statistics for testing.
 Major concepts  Focused on key issues for practice, education, and administration  Examples: chronic pain, acute pain, self-care, coping, health.
Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
Direct method of standardization of indices. Average Values n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Introduction to Biostatistics Lecture 1. Biostatistics Definition: – The application of statistics to biological sciences Is the science which deals with.
Lecture 3 Biostatistics in practice of health protection
Relative Values.
Chapter 8: Inference for Proportions
Econ 3790: Business and Economics Statistics
Advanced Algebra Unit 1 Vocabulary
Chapter 5: Sampling Distributions
Presentation transcript:

Organization of statistical investigation

Medical Statistics Commonly the word statistics means the arranging of data into charts, tables, and graphs along with the computations of various descriptive numbers about the data. This is a part of statistics, called descriptive statistics, but it is not the most important part.

Statistical Analysis in a Simple Experiment Half the subjects receive one treatment and the other half another treatment (usually placebo) Define population of interest Use statistical techniques to make inferences about the distribution of the variables in the general population and about the effect of the treatment Measure baseline variables in each group Randomly select sample of subjects to study

The most important part The most important part is concerned with reasoning in an environment where one doesn’t know, or can’t know, all of the facts needed to reach conclusions with complete certainty. One deals with judgments and decisions in situations of incomplete information. In this introduction we will give an overview of statistics along with an outline of the various topics in this course.

The stages of statistic investigation 1 st stage – composition of the program and plan of investigation 2 nd stage – collection of material 3 ed stage – working up of material 4 th stage – analysis of material, conclusions, proposals 5 th stage – putting into practice

Survival Analysis n Kaplan-Meier analysis measures the ratio of surviving subjects (or those without an event) divided by the total number of subjects at risk for the event. Every time a subject has an event, the ratio is recalculated. These ratios are then used to generate a curve to graphically depict the probability of survival. n Cox proportional hazards analysis is similar to the logistic regression method described above with the added advantage that it accounts for time to a binary event in the outcome variable. Thus, one can account for variation in follow-up time among subjects.

Kaplan-Meier Survival Curves

Why Use Statistics?

Percentage of Specimens Testing Positive for RSV ( respiratory syncytial virus)

Descriptive Statistics

Distribution of Course Grades

The Normal Distribution n Mean = median = mode n Skew is zero n 68% of values fall between 1 SD n 95% of values fall between 2 SDs. Mean, Median, Mode 11 22

SAMPLING AND ESTIMATION One of the questions asked was “Do you try hard to avoid too much fat in your diet?” They reported that 57% of the people responded YES to this question, which was a 2% increase from a similar survey conducted in The article stated that the margin of error of the study was plus or minus 3%.

Measures of Association

Measures Of Diagnostic Test Accuracy n Sensitivity is defined as the ability of the test to identify correctly those who have the disease. n Specificity is defined as the ability of the test to identify correctly those who do not have the disease. n Predictive values are important for assessing how useful a test will be in the clinical setting at the individual patient level. The positive predictive value is the probability of disease in a patient with a positive test. Conversely, the negative predictive value is the probability that the patient does not have disease if he has a negative test result. n Likelihood ratio indicates how much a given diagnostic test result will raise or lower the odds of having a disease relative to the prior probability of disease.

Measures Of Diagnostic Test Accuracy

Expressions Used When Making Inferences About Data n Confidence Intervals -The results of any study sample are an estimate of the true value in the entire population. The true value may actually be greater or less than what is observed. n Type I error (alpha) is the probability of incorrectly concluding there is a statistically significant difference in the population when none exists. n Type II error (beta) is the probability of incorrectly concluding that there is no statistically significant difference in a population when one exists. n Power is a measure of the ability of a study to detect a true difference.

This is an example of an inference made from incomplete information. The group under study in this survey is the collection of adult Americans, which consists of more than 200 million people. This is called the population. SAMPLING AND ESTIMATION

Group properties of statistical totality: Distribution of characteristic (criterion – relative sizes) Average level of index (criterions – Mo-mean, Me- median, arithmetical mean) Variety of characteristic (criterions – lim- limit, am – amplitude, σ – average deviation) Representation (criterions – m M – mistake of average sizes, m % - mistake of relative sizes) Mutual connection between characteristics (criterion – r xy - coefficient of connection

If every individual of this group were to be queried, the survey would be called a census. Yet of the millions in the population, the Harris survey examined only 1;256 people. Such a subset of the population is called a sample. SAMPLING AND ESTIMATION

We shall see that, if done carefully, 1;256 people are sufficient to make reasonable estimates of the opinion of all adult Americans. Samuel Johnson was aware that there is useful information in a sample. He said that you don’t have to eat the whole ox to know that the meat is tough. SAMPLING AND ESTIMATION

The people or things in a population are called units. If the units are people, they are sometimes called subjects. A characteristic of a unit (such as a person’s weight, eye color, or the response to a Harris Poll question) is called a variable. SAMPLING AND ESTIMATION

If a variable has only two possible values (such as a response to a YES or NO question, or a person’s sex) it is called a dichotomous variable. If a variable assigns one of several categories to each individual (such as person’s blood type or hair color) it is called a categorical variable. And if a variable assigns a number to each individual (such as a person’s age, family size, or weight), it is called a quantitative variable. SAMPLING AND ESTIMATION

A number derived from a sample is called a statistic, whereas a number derived from the population is called a parameter. SAMPLING AND ESTIMATION

Parameters are is usually denoted by Greek letters, such as π, for population percentage of a dichotomous variable, or μ, for population mean of a quantitative variable. For the Harris study the sample percentage p = 57% is a statistic. It is not the (unknown) population percentage π, which is the percentage that we would obtain if it were possible to ask the same question of the entire population. SAMPLING AND ESTIMATION

Inferences we make about a population based on facts derived from a sample are uncertain. The statistic p is not the same as the parameter π. In fact, if the study had been repeated, even if it had been done at about the same time and in the same way, it most likely would have produced a different value of p, whereas π would still be the same. The Harris study acknowledges this variability by mentioning a margin of error of ± 3%. SAMPLING AND ESTIMATION

Consider a box containing chips or cards, each of which is numbered either 0 or 1. We want to take a sample from this box in order to estimate the percentage of the cards that are numbered with a 1. The population in this case is the box of cards, which we will call the population box. The percentage of cards in the box that are numbered with a 1 is the parameter π. SIMULATION

In the Harris study the parameter π is unknown. Here, however, in order to see how samples behave, we will make our model with a known percentage of cards numbered with a 1, say π = 60%. At the same time we will estimate π, pretending that we don’t know its value, by examining 25 cards in the box. SIMULATION

We take a simple random sample with replacement of 25 cards from the box as follows. Mix the box of cards; choose one at random; record it; replace it; and then repeat the procedure until we have recorded the numbers on 25 cards. Although survey samples are not generally drawn with replacement, our simulation simplifies the analysis because the box remains unchanged between draws; so, after examining each card, the chance of drawing a card numbered 1 on the following draw is the same as it was for the previous draw, in this case a 60% chance. SIMULATION

Let’s say that after drawing the 25 cards this way, we obtain the following results, recorded in 5 rows of 5 numbers: SIMULATION

An experiment is a procedure which results in a measurement or observation. The Harris poll is an experiment which resulted in the measurement (statistic) of 57%. An experiment whose outcome depends upon chance is called a random experiment. ERROR ANALYSIS

On repetition of such an experiment one will typically obtain a different measurement or observation. So, if the Harris poll were to be repeated, the new statistic would very likely differ slightly from 57%. Each repetition is called an execution or trial of the experiment. ERROR ANALYSIS

Suppose we made three more series of draws, and the results were + 16%, + 0%, and + 12%. The random sampling errors of the four simulations would then average out to: ERROR ANALYSIS

n Note that the cancellation of the positive and negative random errors results in a small average. Actually with more trials, the average of the random sampling errors tends to zero. ERROR ANALYSIS

So in order to measure a “typical size” of a random sampling error, we have to ignore the signs. We could just take the mean of the absolute values (MA) of the random sampling errors. For the four random sampling errors above, the MA turns out to be ERROR ANALYSIS

The MA is difficult to deal with theoretically because the absolute value function is not differentiable at 0. So in statistics, and error analysis in general, the root mean square (RMS) of the random sampling errors is generally used. For the four random sampling errors above, the RMS is ERROR ANALYSIS

The RMS is a more conservative measure of the typical size of the random sampling errors in the sense that MA ≤ RMS. ERROR ANALYSIS

For a given experiment the RMS of all possible random sampling errors is called the standard error (SE). For example, whenever we use a random sample of size n and its percentages p to estimate the population percentage π, we have ERROR ANALYSIS