Doc.RNDr.Iveta Bedáňová, Ph.D.

Slides:



Advertisements
Similar presentations
Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
Advertisements

Introduction to Summary Statistics
Statistics. The usual course of events for conducting scientific work “The Scientific Method” Reformulate or extend hypothesis Develop a Working Hypothesis.
DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion.
Descriptive Statistics
Biostatistics Unit 2 Descriptive Biostatistics 1.
1 Basic statistics Week 10 Lecture 1. Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 2 Meanings.
Introduction to Educational Statistics
Data observation and Descriptive Statistics
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Describing Data: Numerical
Summarizing Scores With Measures of Central Tendency
Statistics for Linguistics Students Michaelmas 2004 Week 1 Bettina Braun.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
JDS Special Program: Pre-training1 Basic Statistics 01 Describing Data.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Biostatistics: Measures of Central Tendency and Variance in Medical Laboratory Settings Module 5 1.
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Descriptive Statistics
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Descriptive Statistics, The Normal Distribution, and Standardization.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Psychology 101. Statistics THE DESCRIPTION, ORGANIZATION AND INTERPRATATION OF DATA.
FREQUANCY DISTRIBUTION 8, 24, 18, 5, 6, 12, 4, 3, 3, 2, 3, 23, 9, 18, 16, 1, 2, 3, 5, 11, 13, 15, 9, 11, 11, 7, 10, 6, 5, 16, 20, 4, 3, 3, 3, 10, 3, 2,
Basic Statistical Terms: Statistics: refers to the sample A means by which a set of data may be described and interpreted in a meaningful way. A method.
Chapter 13 Descriptive Data Analysis. Statistics  Science is empirical in that knowledge is acquired by observation  Data collection requires that we.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
RESEARCH & DATA ANALYSIS
IE(DS)1 Descriptive Statistics Data - Quantitative observation of Behavior What do numbers mean? If we call one thing 1 and another thing 2 what do we.
Quality Control: Analysis Of Data Pawan Angra MS Division of Laboratory Systems Public Health Practice Program Office Centers for Disease Control and.
Introduction to statistics I Sophia King Rm. P24 HWB
Chapter 2 Describing and Presenting a Distribution of Scores.
Measures of Central Tendency (MCT) 1. Describe how MCT describe data 2. Explain mean, median & mode 3. Explain sample means 4. Explain “deviations around.
Descriptive Statistics(Summary and Variability measures)
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 2 Describing and Presenting a Distribution of Scores.
Data Presentation Numerical Summary Measures Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Statistical Methods Michael J. Watts
Statistics in Management
Populations.
Statistical Methods Michael J. Watts
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Descriptive Statistics
Central Tendency and Variability
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Descriptive Statistics
Description of Data (Summary and Variability measures)
Numerical Descriptive Measures
Descriptive Statistics
Introduction to Statistics
Basic Statistical Terms
Descriptive and inferential statistics. Confidence interval
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Numerical Descriptive Measures
Univariate Statistics
Numerical Descriptive Measures
Descriptive Statistics
Descriptive Statistics
Numerical Descriptive Measures
Presentation transcript:

Doc.RNDr.Iveta Bedáňová, Ph.D. Biostatistics www.vfu.cz/statistics Doc.RNDr.Iveta Bedáňová, Ph.D. bedanovai@vfu.cz

= statistics applied to biological problems. Biostatistics = statistics applied to biological problems. Every individual is unique, therefore data obtained may be very different and variable (genetic variability) – they need specific methods (i.e. statistical) for their evaluation. Statistical methods can take into account great variability of biological data, evaluate them and give correct inferencies about studied biological objects. Use: in research sphere - how to design experiments and evaluate their results.

Types of Biological Data (Variables) Data on Nominal Scale – are classified by some quality (Categorical Data) (2 possibilities: present or not present – disease, anomaly, death, vaccination … ) Data on Ordinal Scale – consist of arrangement of measurements (Rank Data) based on subjective scale. (classification on grades, points in competitions) Data on Numerical Scale – exact numeric values (obtained in objective measurement, device). (body temperature, weight, lenght, volume etc.)

Formal viewpoint: Continuous Data - variables that could be any conceivable value within any observed range (height, lenght, weight, temperature) Discrete Data (discontinuous) - variables that can take only certain values – integer numbers (number of animals, patients, eggs, cells etc.) Numerical- and ordinal-scale data may be continuous or discrete. Nominal-scale data are discrete by their nature.

Statistical Sets (groups of individuals – animals, plants, cells, etc Population (Universe) – N= („endless“ number of members) - „all items“, that could show studied variable - is often very large (e.g. cattle in Europe, dogs in CR) Sample (Subset) – n (number of members) - definite number of individuals from the population (inaccuracy in comparison with the whole population) - „representative“ subset of the population (to reach the most valid conclusions about a population): • random sample (no subjective choice) • appropriate size of the sample

Random Variable - Frequency Distribution (Discrete Data: Bar Graph) 0 1 2 3 4 5 6 7 8 x (number of pups) y (frequency) 3 2 1 Discrete data - number of puppies in a litter: 2,3,4,4,5,5,5,6,6,7,7,8

Frequency Distribution – Continuous Data: (Histogram) We create classes = equivalent intervals of data. Freq. x (Weight) Polygon (specific for 1 sample) Histogram Midpoint of the class All data in the interval get the same value = midpoint of the class Number of items (individuals) in the interval = frequency of the class

Theoretical curve (population) Frequency (Probability) Distribution P(x) – Probability (proportion of cases) Empirical curves (samples) Theoretical curve (population) x (Weight) Empirical curves for different samples (from one population) are located along the only one theoretical curve (continuous), that describes probability distribution of the variable in the population.

Shapes of Probability Distributions Normal (Gaussian) symmetric bell b) Nonnormal („Unknown“ ) asymetric, extreme, irregular

Quantiles, Proportions of Distribution For every distribution, we can define measures (quantiles) that divide a group of ordered data into 2 parts (portions): - values that are smaller than quantile - values that are bigger than quantile 50% quantile – x0.5 (Median) divides a group into 2 halves X0.5 50% X0.5 50% Quartiles (4 equal parts), Deciles (10 parts), Percentiles (100) Quantiles are used as critical values in statistical hypotheses testing.

Descriptive Characteristics of Statistical Sets

Parameters – describe characteristic features of populations (exact, but we are not able to calculate them for endless number of individuals in the population – we can only estimate them by means of sample data) - represented by Greek letters (e.g. ) Statistics – describe characteristic features of samples (we calculate them from the sample data and they serve as an estimate of exact population parameters) - represented by Latin letters (e.g. )

Descriptive Characteristics A) Measures of Central Tendency - describe the middle of range of values in a sample or population B) Measures of Dispersion and Variability - describe dispersion of values around the middle in a sample or population

Measures of Central Tendency (describe where a majority of measurements occurs) 1) The Arithmetic Mean: (population), (sample) (Average – AVG) Properties: is affected by extreme values  it should be used in homogenous regular distributions (Gaussian) only (to describe the middle of the population correctly) has the same units of measurement as do the individual observations (sum of all deviations from the mean will be always 0)

2) The Median: (population), (sample) = the middle value in an ordered set of data (there are just as many values bigger than the median as there are smaller) if the sample size (n) is odd  there is only 1 middle value in ordered sample data and indicates the median if n is even  there are two middle values, and the median is a midpoint (mean) between them Rank of the median:

The Median - Properties: - is not affected by extreme values - 50% quantile (divides distribution curve into 2 halves ) 50% 50% - it may be used in irregular (asymetric) distributions (is a better characteristic of the middle of the set than the average)

Example: Body weights in two varieties of laboratory mice: Variety A Variety B xi (g) xi (g) 34 34 36 36 37 37 39 39 40 40 41 41 42 42 43 43 79 44 __________ ___45______ n = 9 n = 10

3) The Mode: (population), (sample) = most frequently occuring measurement in a data set (top of distribution curve) Properties: is not affected by extremes is not very exact measure of the middle of set (not often used in biological and medical data)

B) Measures of Variability - describe dispersion (scattering) of measurements around the center of a distribution 1) The Range: R= xmax – xmin is dependent on 2 extreme values of data relatively imprecise measure of variability – it does not take into account any measurements between the highest and lowest value.

Variability expressed in terms of deviations from the mean: As the sum of all deviations from the mean is always equal to 0  summation would be useless as a measure of variability. The method to eliminate the signs of the deviations from the mean: to square the deviations. Then we can define the sum of squares:

2) The Variance: (population), (sample) = the mean sum of squares about a mean Population variance „Estimated variance“ Variance has the square units as do the original measurements.

Degree of freedom (DF): =n-1 (n reduced by number of known statistics in the sample) DF reflects a sample error in comparison with the population: 2  s2 Population Sample When n is big (small error)  the result of s2 calculation is only a little different from the exact 2. When n is small (big error)  the result of s2 calculation is very different from the exact 2.

3) The Standard Deviation (SD): (population), (sample) = square-root of the variance (it has the same units as the original measurements) 4) The Coefficient of Variability: („Relative standard deviation“) – a relative measure, not dependent on units of measurement „Estimated V“ Used for comparison of variability in data sets with different magnitude of their units (e.g.weight in mice and cows).