Estimation of authenticity of results of statistical research (part II)

Slides:



Advertisements
Similar presentations
How would you explain the smoking paradox. Smokers fair better after an infarction in hospital than non-smokers. This apparently disagrees with the view.
Advertisements

Chapter 16 Inferential Statistics
Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.
ADVANCED STATISTICS FOR MEDICAL STUDIES Mwarumba Mwavita, Ph.D. School of Educational Studies Research Evaluation Measurement and Statistics (REMS) Oklahoma.
Business Statistics for Managerial Decision
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Data Analysis Statistics. Inferential statistics.
Today Concepts underlying inferential statistics
Richard M. Jacobs, OSA, Ph.D.
Inferential Statistics
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242.
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
Hypothesis Testing.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Chapter 5 Sampling Distributions
Multiple Choice Questions for discussion
Chapter 1: Introduction to Statistics
Simple Linear Regression
Average values. Measures of Association n Absolute risk -The relative risk and odds ratio provide a measure of risk compared with a standard. n Attributable.
PARAMETRIC STATISTICAL INFERENCE
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Estimation of authenticity of results of statistical research.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Dynamic Lines. Dynamic analysis n Health of people and activity of medical establishments change in time. n Studying of dynamics of the phenomena is very.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Organization of statistical investigation. Medical Statistics Commonly the word statistics means the arranging of data into charts, tables, and graphs.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Average Arithmetic and Average Quadratic Deviation.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
Dynamic lines. Measures of Association n Absolute risk -The relative risk and odds ratio provide a measure of risk compared with a standard. n Attributable.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Medical Statistics as a science
Relative Values. Statistical Terms n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the data  not sensitive to.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
Chapter Eight: Using Statistics to Answer Questions.
Medical Statistics as a science. Меdical Statistics: To do this we must assume that all data is randomly sampled from an infinitely large population,
Inference: Probabilities and Distributions Feb , 2012.
RESEARCH & DATA ANALYSIS
Authenticity of results of statistical research. The Normal Distribution n Mean = median = mode n Skew is zero n 68% of values fall between 1 SD n 95%
PCB 3043L - General Ecology Data Analysis.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
© Copyright McGraw-Hill 2004
Average Arithmetic and Average Quadratic Deviation.
Organization of statistical investigation. Medical Statistics Commonly the word statistics means the arranging of data into charts, tables, and graphs.
Average values and their types. Averages n Averages are widely used for comparison in time, that allows to characterize the major conformities to the.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Introduction to Medical Statistics. Why Do Statistics? Extrapolate from data collected to make general conclusions about larger population from which.
Estimation of authenticity of results of statistical research.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Measuring of Correlation. Definition Correlation is a measure of mutual correspondence between two variables and is denoted by the coefficient of correlation.
 Major concepts  Focused on key issues for practice, education, and administration  Examples: chronic pain, acute pain, self-care, coping, health.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Direct method of standardization of indices. Average Values n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Dr.Theingi Community Medicine
Lecture 3 Biostatistics in practice of health protection
Relative Values.
Understanding Results
Presentation transcript:

Estimation of authenticity of results of statistical research (part II)

Biostatistics Commonly the word statistics means the arranging of data into charts, tables, and graphs along with the computations of various descriptive numbers about the data. This is a part of statistics, called descriptive statistics, but it is not the most important part.

Why Do Statistics? n Extrapolate from data collected to make general conclusions about larger population from which data sample was derived n Allows general conclusions to be made from limited amounts of data n To do this we must assume that all data is randomly sampled from an infinitely large population, then analyse this sample and use results to make inferences about the population

The most important part The most important part is concerned with reasoning in an environment where one doesn’t know, or can’t know, all of the facts needed to reach conclusions with complete certainty. One deals with judgments and decisions in situations of incomplete information. In this introduction we will give an overview of statistics along with an outline of the various topics in this course.

Basic criteria of authenticity (representation): n Error of representation (w) n Confiding scopes n The coefficient of authenticity (the student criterion) is authenticity of difference of middle or relative sizes (t)

Basic criteria of authenticity (representation): n The errors of representation of /m/ are the degree of authenticity of average or relative value shows how much the results of selective research differ from results which it is possible to get from continuous study of general aggregate.

Basic criteria of authenticity (representation): n Confiding scopes – properties of selective aggregate are carried on general one, probability oscillation of index is shown in the general aggregate, its extreme values of minimum and maximal possibility, which the size of general aggregate can be within the limits of.

Basic criteria of authenticity (representation): n The coefficient of authenticity (the Student’s criterion) is authenticity of difference of middle or relative sizes (t). The student’s Criterion shows the difference of the proper indexes in two separate selective aggregates.

Measuring the Occurrence of Disease Counting Comparisons Inference Action Cases and populations Measurement Risk Methods - descriptive - analytic Association and causality Generalisability Clinical/health policy Further research

n Descriptive Statistics: concerned with summarising or describing a sample eg. mean, median n Inferential Statistics: concerned with generalising from a sample, to make estimates and inferences about a wider population eg. T-Test, Chi Square test

Meaning of P n P Value: the probability of observing a result as extreme or more extreme than the one actually observed from chance alone n Lets us decide whether to reject or accept the null hypothesis P > 0.05Not significantP > 0.05Not significant P = 0.01 to 0.05SignificantP = 0.01 to 0.05Significant P = to 0.01Very significantP = to 0.01Very significant P < 0.001Extremely significantP < 0.001Extremely significant

Epidemiological Measurements  Rates,Ratios,and Proportions  Incidence Rates  Prevalence Rates  Mortality Rates  Fatality Rates  Infection Rates

Ratios A ratio expresses the relationship between two numbers in the form x:y or x/y.

Ratios 1. The ratio of male to female births in the United States in 1979 was 1,791,000 : 1,703,000 or 1.052:1. 2. Sex ratio= number of live born males number of live born females

n Proportions A proportion is a specific type of ratio in which the numerator is included in the denominator, and the result value is expressed as a percentage.proportion denominator For example,the proportion of all births that were male is : Male births 179×10 4 = Male+female births ( )×10 4 =51.3%

The proportion of male students of the current class is %.

Proportion of Overweight in children from year old, Urumqi, 2003

A rate measures the occurrence of some particular events in a population during a given time period. Particular event: development of disease or the occurrence of death Rates

Rates are defined as follows: Number of events in a specified period ×K Population at risk of these events in a specified period K=100%, 1000 ‰ …

Five components of rate Numerator is the number of People, Episodes

Rate is n The rate is the measure that most clearly expresses probability or risk of disease in a defined population over a specified period of time. n In a rate numerator is part of denominator.

What does Rate tell us Rates tell us how fast the disease is occurring in a population. Proportion tell us what fraction of the population is affected.

For example, the death rate from cancer in the United States in 1980 was per 100,000 population, the formula: Deaths from cancer among U.S residents in ,000 × U.S. population in ,000

Incidence Rates Incidence is defined as the number of new cases of a disease that occur during a specified period of time in a population at risk for developing the disease.

1. Time of onset and the numerator

Denominator is population at risk. Average Population We can get this number in two ways. ( population in of last year+this year)/2 midyear population: :00 3.Specification of Denominator

Prevalence Rates Prevalence measures the number of people in a population who have disease at a given time. Point prevalence Period prevalence

Formula: number of existing cases of a disease at a point in time ×K total population

5 points 1.Numerator It refers to existing cases, currently affected, including new and old cases. No matter when did he get the disease, if only he has disease at the study time,he is one of numerator.

2.Denominator Total population. Not population at risk.

3.A point in time In survey of prevalence rate, time should be very short. Generally, time should be no more than 1 month, such as 1 week or 2 weeks. (point prevalence)

n Coefficient of variation is the relative measure of variety; it is a percent correlation of standard deviation and arithmetic average.

Terms Used To Describe The Quality Of Measurements n Reliability is variability between subjects divided by inter-subject variability plus measurement error. n Validity refers to the extent to which a test or surrogate is measuring what we think it is measuring.

Measures Of Diagnostic Test Accuracy n Sensitivity is defined as the ability of the test to identify correctly those who have the disease. n Specificity is defined as the ability of the test to identify correctly those who do not have the disease. n Predictive values are important for assessing how useful a test will be in the clinical setting at the individual patient level. The positive predictive value is the probability of disease in a patient with a positive test. Conversely, the negative predictive value is the probability that the patient does not have disease if he has a negative test result. n Likelihood ratio indicates how much a given diagnostic test result will raise or lower the odds of having a disease relative to the prior probability of disease.

Measures Of Diagnostic Test Accuracy

Expressions Used When Making Inferences About Data n Confidence Intervals -The results of any study sample are an estimate of the true value in the entire population. The true value may actually be greater or less than what is observed. n Type I error (alpha) is the probability of incorrectly concluding there is a statistically significant difference in the population when none exists. n Type II error (beta) is the probability of incorrectly concluding that there is no statistically significant difference in a population when one exists. n Power is a measure of the ability of a study to detect a true difference.

Multivariable Regression Methods n Multiple linear regression is used when the outcome data is a continuous variable such as weight. For example, one could estimate the effect of a diet on weight after adjusting for the effect of confounders such as smoking status. n Logistic regression is used when the outcome data is binary such as cure or no cure. Logistic regression can be used to estimate the effect of an exposure on a binary outcome after adjusting for confounders.

Survival Analysis n Kaplan-Meier analysis measures the ratio of surviving subjects (or those without an event) divided by the total number of subjects at risk for the event. Every time a subject has an event, the ratio is recalculated. These ratios are then used to generate a curve to graphically depict the probability of survival. n Cox proportional hazards analysis is similar to the logistic regression method described above with the added advantage that it accounts for time to a binary event in the outcome variable. Thus, one can account for variation in follow-up time among subjects.

Kaplan-Meier Survival Curves

Why Use Statistics?

Descriptive Statistics n Identifies patterns in the data n Identifies outliers n Guides choice of statistical test

Percentage of Specimens Testing Positive for RSV ( respiratory syncytial virus)

Descriptive Statistics

Distribution of Course Grades

SAMPLING AND ESTIMATION Let us take Louis Harris and Associates for an example It conducts polls on various topics, either face-to- face, by telephone, or by the internet. In one survey on health trends of adult Americans conducted in 1991 they contacted 1;256 randomly selected adults by phone and asked them questions about diet, stress management, seat belt use, etc.

SAMPLING AND ESTIMATION One of the questions asked was “Do you try hard to avoid too much fat in your diet?” They reported that 57% of the people responded YES to this question, which was a 2% increase from a similar survey conducted in The article stated that the margin of error of the study was plus or minus 3%.

This is an example of an inference made from incomplete information. The group under study in this survey is the collection of adult Americans, which consists of more than 200 million people. This is called the population. SAMPLING AND ESTIMATION

The people or things in a population are called units. If the units are people, they are sometimes called subjects. A characteristic of a unit (such as a person’s weight, eye color, or the response to a Harris Poll question) is called a variable. SAMPLING AND ESTIMATION

If a variable has only two possible values (such as a response to a YES or NO question, or a person’s sex) it is called a dichotomous variable. If a variable assigns one of several categories to each individual (such as person’s blood type or hair color) it is called a categorical variable. And if a variable assigns a number to each individual (such as a person’s age, family size, or weight), it is called a quantitative variable. SAMPLING AND ESTIMATION

A number derived from a sample is called a statistic, whereas a number derived from the population is called a parameter. SAMPLING AND ESTIMATION

Parameters are is usually denoted by Greek letters, such as π, for population percentage of a dichotomous variable, or μ, for population mean of a quantitative variable. For the Harris study the sample percentage p = 57% is a statistic. It is not the (unknown) population percentage π, which is the percentage that we would obtain if it were possible to ask the same question of the entire population. SAMPLING AND ESTIMATION

Inferences we make about a population based on facts derived from a sample are uncertain. The statistic p is not the same as the parameter π. In fact, if the study had been repeated, even if it had been done at about the same time and in the same way, it most likely would have produced a different value of p, whereas π would still be the same. The Harris study acknowledges this variability by mentioning a margin of error of ± 3%. SAMPLING AND ESTIMATION

Consider a box containing chips or cards, each of which is numbered either 0 or 1. We want to take a sample from this box in order to estimate the percentage of the cards that are numbered with a 1. The population in this case is the box of cards, which we will call the population box. The percentage of cards in the box that are numbered with a 1 is the parameter π. SIMULATION

In the Harris study the parameter π is unknown. Here, however, in order to see how samples behave, we will make our model with a known percentage of cards numbered with a 1, say π = 60%. At the same time we will estimate π, pretending that we don’t know its value, by examining 25 cards in the box. SIMULATION

We take a simple random sample with replacement of 25 cards from the box as follows. Mix the box of cards; choose one at random; record it; replace it; and then repeat the procedure until we have recorded the numbers on 25 cards. Although survey samples are not generally drawn with replacement, our simulation simplifies the analysis because the box remains unchanged between draws; so, after examining each card, the chance of drawing a card numbered 1 on the following draw is the same as it was for the previous draw, in this case a 60% chance. SIMULATION

Reducing Sample Size n Same results but using much smaller sample size (one tenth) ALIVE DEAD TOTAL % DEAD ALIVE DEAD TOTAL % DEAD PLACEBO 58 (69.2%) 26 (30.8%) 84 (100%) 30.8 DEAD 64 (75.3%) 21 (24.7%) 85 (100%) 24.7 TOTAL 122 (72.2%) 47 (27.8%) 169 (100%)  Reduction in death rate = 6.1% (still the same)  Perform Chi Square test  P = in 100 times this difference in mortality could have happened by chance therefore results not significant  Again, power of a study to find a difference depends a lot on sample size for binary data as well as continuous data

On repetition of such an experiment one will typically obtain a different measurement or observation. So, if the Harris poll were to be repeated, the new statistic would very likely differ slightly from 57%. Each repetition is called an execution or trial of the experiment. ERROR ANALYSIS

The RMS is a more conservative measure of the typical size of the random sampling errors in the sense that MA ≤ RMS. ERROR ANALYSIS

For a given experiment the RMS of all possible random sampling errors is called the standard error (SE). For example, whenever we use a random sample of size n and its percentages p to estimate the population percentage π, we have ERROR ANALYSIS

Summary n Size matters=BIGGER IS BETTER n Spread matters=SMALLER IS BETTER n Bigger difference=EASIER TO FIND n Smaller difference=MORE DIFFICULT TO FIND n To find a small difference you need a big study