Download presentation
Presentation is loading. Please wait.
1
Statistics
2
Definitions The science of collecting, summarizing, and analyzing numerical data. Statistics makes it possible to predict the likelihood of events. The mathematics of the collection, organization, and interpretation of numerical data. Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data. (distribution of the data: frequency; central tendency: mean, median, mode; Dispersion: standar deviation, variance) Inferential statistics, is used to try to reach conclusions that extend beyond the immediate data alone. Inferential statistics uses patterns in the sample data to draw inferences about the population represented, accounting for randomness. Thus, we use inferential statistics to make inferences from our data to more general conditions (and in some way to predict); we use descriptive statistics simply to describe what's going on in our data. Inference is a vital element of scientific advance, since it provides a prediction (based in data) for where a theory logically leads. To further prove the guiding theory, these predictions are tested as well, as part of the scientific method.
3
- Descriptive Statistics
identify patterns leads to hypothesis generating - Inferential Statistics distinguish true differences from random variation allows hypothesis testing
4
Step in a statistic study
I. Identifying the question * What is the question? (What are my hypotheses?) * Is it possible to answer the question with statistics? * Is the data obtainable? (birth weight, socio economic, drugs, alcohol) * Is it ethical to obtain such data? * If not, is there a reasonable substitute? II. Designing a Study * Identify the population of interest * Survey * Observational Studies * Designing an Experiment * EDA – Exploratory Data Analysis (trends, relationships, differences) * Pilot Study III. Collecting Data * Identify variables * Identify types of variables * Identify Limits of measurement or observation IV. Analyze the data * Use proper procedures and techniques. * Check the assumptions behind the procedures and techniques. V. Make Conclusions and Discuss Limitations * What are the answers to the original hypotheses? * What are the limitations of the study? * What conclusions does the study not make? * What new questions arise from this study?
5
Variables A variable is a characteristic that can be observed and vary among all the individual of a given population Class of variables: Qualitative = can not be expressed with number (nationality). Qualitative variable are those that express a qualitative attribute such as hair color, eye color, religion. Qualitative variables are sometimes referred to as categorical variables. Values on qualitative variables do not imply order, they are simply categories. b) Quantitative = can be expressed with number Quantitative variables are those variables that are measured in terms of numbers. Some examples of quantitative variables are height, weight, and shoe size. Discrete = integral number Continuous = the scale is continuous and not made up of discrete steps
6
Frequency table A frequency table is a way of summarizing a set of data. It is a record of how often each value (or set of values) of the variable in question occurs. It may be enhanced by the addition of percentages that fall into each category. Absolute frequency = number of individual or observation for each category. Relative frequency = relative frequency or percentage of time in which each category occurs (occurrence of an event is the score divided by the total number of observations.) A frequency table is used to summarize categorical, nominal, and ordinal data. It may also be used to summarize continuous data once the data set has been divided up into sensible groups. When we have more than one categorical variable in our data set, a frequency table is sometimes called a contingency table because the figures found in the rows are contingent upon (dependent upon) those found in the columns. Example Suppose that in thirty shots at a target, a marksman makes the following scores: The frequencies of the different scores can be summarised as: Score Frequency (absolute) Frequency (%)(relative) % % % % % %
7
Contingency table A contingency table (also referred to as cross tabulation or cross tab) is often used to record and analyze the relation between two or more categorical variables. It displays the (multivariate) frequency distribution of the variables in a matrix format. Right-handed Left-handed TOTALS Males Females TOTALS
8
Hypothesis A hypothesis is a proposed explanation for an observable phenomenon. For a hypothesis to be put forward as a scientific hypothesis, the scientific method requires that one can test it. Scientists generally base scientific hypotheses on previous observations that cannot satisfactorily be explained with the available scientific theories. The null hypothesis (H0) typically proposes a general or default position, such as that there is no relationship between two measured phenomena,or that a potential treatment has no effect. It is typically paired with a second hypothesis, the alternative hypothesis (H1), which asserts a particular relationship between the phenomena. Hypothesis testing works by collecting data and measuring how probable the data is, assuming the null hypothesis is true. If the data is very improbable (usually defined as observed less than 5% of the time), then the experimenter concludes that the null hypothesis is false. If the data do not contradict the null hypothesis, then no conclusion is made. In this case, the null hypothesis could be true or false; the data gives insufficient evidence to make any conclusion. You must chose the level of significance α (or critic value) usually 0.05 (5%).
9
Chi-square or χ²-distribution
In probability theory and statistics, the chi-square distribution (also chi-squared or χ²-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. It is one of the most widely used probability distributions in inferential statistics, e.g. in hypothesis testing, or in construction of confidence intervals. The best-known situations in which the chi-square distribution is used are the common chi-square tests for goodness of fit.
10
Chi-square test χ2 = Σ (O – E )2, O= observed
-A statistical test to determine the probability that an observed deviation from the expected event or outcome occurs solely by chance. -It is used to know if the data of your sampled population agree to the theoretic distribution we suspect to be true and if the differences between the observed and the expected are real or depends on random variation. χ2 = Σ (O – E )2, O= observed E E= expected degree of fredom = n-1; where n=# classes Degrees of freedom = the number of values in the final calculation of a statistic that are free to vary. Pearson's chi-square (χ2) test is the best-known of several chi-square tests Yates correction: to reduce the error in approximation that the discrete probability (a random variable that can be counted) of observed binomial frequencies in the table can be approximated by the continuous chi-squared distribution. The effect of Yates' correction is to prevent overestimation of statistical significance for small data. This formula is used when at least one cell of the table has an expected count smaller than 5. To reduce the error in approximation, Frank Yates, an English statistician, suggested a correction for continuity that adjusts the formula for Pearson's chi-square test by subtracting 0.5 from χ2 = Σ [(O – E )-0.5]2, O= observed E E= expected degree of fredom = n-1; where n=# classes
11
Probability: table of χ² value and P value
Do not reject H0 Reject H0 0.05
12
Chi-square test Data used in a chi-square analysis has to satisfy the following conditions 1. Randomly drawn from the population, 2. reported in raw counts of frequency, 3. measured variables must be independent, 4. observed frequencies cannot be too small, and 5. values of independent and dependent variables must be mutually exclusive. There are two types of chi-square test that we will use:. * The Chi-square test for goodness of fit which compares the expected and observed values to determine how well an experimenter's predictions fit the data. Chi-Square goodness of fit test is a non-parametric test that is used to find out how the observed value of a given phenomena is significantly different from the expected value. In Chi-Square goodness of fit test, the term goodness of fit is used to compare the observed sample distribution with the expected probability distribution. Chi-Square goodness of fit test determines how well theoretical distribution (such as normal, binomial, or Poisson) fits the empirical distribution. * Chi-square test of contingency: use to test the hypothesis that the frequencies of occurrence in the various categories of one variable are independent of the frequencies in the second variable.
13
Example of Chi square goodness of fit
Calculation of chi square goodness of fit, of data consisting of the Colors of 100 flowers, to a hypothesized color ratio of 3:1 (Yy x Yy) χ2 = Σ (O – E )2 E
14
(YyRr x YyRr)
15
(250/16)9
16
(250/16)9 (250/16)3
17
(250/16)9 (250/16)3 (250/16)3 (250/16)1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.