Samples & Population Population: A population is an entire group, collection or space of objects which we want to characterize (we want to study the bad.

Slides:



Advertisements
Similar presentations
Learning Objectives In this chapter you will learn about measures of central tendency measures of central tendency levels of measurement levels of measurement.
Advertisements

Math Qualification from Cambridge University
© Biostatistics Basics An introduction to an expansive and complex field.
QUANTITATIVE DATA ANALYSIS
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Chapter 3: Central Tendency
Measures of Central Tendency
Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Measures of Central Tendency or Measures of Location or Measures of Averages.
Smith/Davis (c) 2005 Prentice Hall Chapter Four Basic Statistical Concepts, Frequency Tables, Graphs, Frequency Distributions, and Measures of Central.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
STAT 211 – 019 Dan Piett West Virginia University Lecture 1.
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
Chapter 2 Describing Data.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
INVESTIGATION 1.
Graphical Presentation of Data
Measures of Central Tendency or Measures of Location or Measures of Averages.
Central Tendency A statistical measure that serves as a descriptive statistic Determines a single value –summarize or condense a large set of data –accurately.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
LIS 570 Summarising and presenting data - Univariate analysis.
Introduction to statistics I Sophia King Rm. P24 HWB
Anthony J Greene1 Central Tendency 1.Mean Population Vs. Sample Mean 2.Median 3.Mode 1.Describing a Distribution in Terms of Central Tendency 2.Differences.
Chapter 3: Central Tendency 1. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Descriptive Statistics(Summary and Variability measures)
Data Description Chapter 3. The Focus of Chapter 3  Chapter 2 showed you how to organize and present data.  Chapter 3 will show you how to summarize.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Chapter 4: Measures of Central Tendency. Measures of central tendency are important descriptive measures that summarize a distribution of different categories.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Outline Sampling Measurement Descriptive Statistics:
PRESENTATION OF DATA.
Exploratory Data Analysis
Measure of the Central Tendency For Grouped data
Methods of mathematical presentation (Summery Statistics)
Pharmaceutical Statistics
Measurements Statistics
INTRODUCTION AND DEFINITIONS
Different Types of Data
Descriptive Statistics
Chapter 2: Methods for Describing Data Sets
Topic 3: Measures of central tendency, dispersion and shape
Measures of Central Tendency
CHAPTER 5 Basic Statistics
Chapter 5 STATISTICS (PART 1).
Descriptive Statistics: Presenting and Describing Data
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
IB Psychology Today’s Agenda: Turn in:
Description of Data (Summary and Variability measures)
Frequency Distributions and Their Graphs
Numerical Descriptive Measures
MEASURES OF CENTRAL TENDENCY
An Introduction to Statistics
Basic Statistical Terms
Numerical Descriptive Measures
MEASURES OF CENTRAL TENDENCY
Paf 203 Data Analysis and Modeling for Public Affairs
LESSON 3: CENTRAL TENDENCY
Descriptive Statistics
An introduction to an expansive and complex field
Chapter 3: Central Tendency
Statistics Definitions
Chapter 3: Central Tendency
Biostatistics Lecture (2).
Presentation transcript:

Samples & Population Population: A population is an entire group, collection or space of objects which we want to characterize (we want to study the bad effects of smoking on UJ’s student: The population is all students in University of Jordan). In case Population is too large to study we need to take a reprehensive sample Sample: A sample is a collection of observations on which we measure one or more characteristics. Frequently, we use (small) samples of (large) populations to characterize the properties and affinities within the space of objects in the population of interest. For example, if we want to characterize the US population, we can take a sample (poll or survey) and the summaries that we obtain on the sample (e.g., mean age, race, income, body-weight, etc.) may be used to study the properties of the population, in general. Descriptive measures that describe a POPULATION are called PARAMETERS. Descriptive measures that describe a SAMPLE are called STATISTICS.

Samples & Population In statistics, we reach a conclusion about a population on the basis of information contained in a sample that has been drawn from that population. There are many kinds of samples that may be drawn from a population. The simplest type of scientific samples that may be used to for analysis is the simple random sample.

Is the sample taken randomly? Assuming that the distribution is the performance of pharmacy students at UJ? Describe possible reasons for the sample distribution?

Each individual in n is chosen randomly and entirely by chance Simple Random Sample If you use the letter N to designate the size of a finite population and the letter n to designate the size of finite sample then: If a sample of size n is drawn from a population of size N in such a way that every variable in the group N has the same chance of being selected, the sample is called a simple random sample. Each individual in n is chosen randomly and entirely by chance [N] N=400 n=48 [n]

Measurement and Measurement Scale A measurement may be defined as the assignment of numbers to objects or events according to a set of rules. The various measurement scales result from the fact that measurements may be carried out under different set of rules: The nominal scale ( male female ) The ordinal scale (Good-Very Good-Excellent) The interval scale (20, 30, 40, 50 oC) The ratio scale (20, 30, 40, 50 Kg)

Nominal Scale A nominal scale consists of “naming” or “classifying” observations into various mutually exclusive and collectively exhaustive categories. [Male:female], [well:sick], [<65 yrs: >65 yrs] Example: which of the following pharmaceutical process you master? coating compression granulation mixing packaging

Ordinal Scale An ordinal scale deals with observations that are not only different from category to category but can be ranked according to some criterion. Rank in college (freshman, sophomore, junior, senior), size of soda (small, medium, large), the intelligence of children (above average, average, below average). In each of the previous examples, the members of any one category are considered equal but the members of one category are considered lower, better or smaller than those in another category. The function of numbers assigned to ordinal data is to order or rank the observations from lowest to highest. The categories is to order/rank but the difference between “Hot” and “Hotte”r not necessary the same as the difference between “Hotter” and “Hottest”

Ordinal Scale Another example on ordinal scale: you might ask patients to express the amount of pain they are feeling on a scale of 1 to 10. A score of 7 means more pain that a score of 5, and that is more than a score of 3. But the difference between the 7 and the 5 may not be the same as that between 5 and 3. The values simply express an order only.

Interval Scale The interval scale is a scale where the difference between two values is meaningful (not like the ordinal!!) In the interval scale, it is not only possible to order measurements but also the distance between any two measurements is known. e.g. we know that the difference between T=40 oC and a T=60 oC is equal to the difference between T=60 oC and a T=80 oC The ability to do this implies the use of a unit distance and a zero point. The selected zero point is not necessarily a true zero in that it does not have to indicate a total absence of the quantity being measured. Good examples of interval scales are the Fahrenheit and Celsius temperature scales. A temperature of "zero" does not mean that there is no temperature...it is just an arbitrary zero point

Ratio Scale A ratio variable, has all the properties of an interval variable, and also has a clear definition of a zero point. When the variable equals 0.0, there is none of that variable. Variables like [height, tablet weight, enzyme activity] are ratio variables. Temperature, expressed in F or C, is not a ratio variable. A temperature of 0.0 on either of those scales does not mean 'no temperature'. However, temperature in degrees Kelvin in a ratio variable, as 0.0 degrees Kelvin really does mean 'no temperature'. Another counter example is pH. It is not a ratio variable, as pH=0 just means 1 molar of H+. A pH of 0.0 does not mean 'no acidity’. When working with ratio variables, but not interval variables, you can look at the ratio of two measurements. A weight of 4 grams is twice a weight of 2 grams, because weight is a ratio variable. A temperature of 100 degrees C is not twice as hot as 50 degrees C, because temperature C is not a ratio variable. A pH of 3 is not twice as acidic as a pH of 6, because pH is not a ratio variable.

Summary of Measurement and Measurement Scale Scale Type Properties Nominal Named Categories Ordinal Same as Nominal + ordered categories Interval Same as Ordinal + equal intervals Ratio Same as Interval+ meaningful zero

Graphical Presentation of Data Biostatistics Lecture 2 Graphical Presentation of Data

Data Organization Measurements that have not been organized, summarized or otherwise manipulated are called raw data. Unless the number of observations is extremely small, it will be unlikely that these raw data will impart much information until they have been put into some kind of order. Always it is easier to analyze organized data

The ordered array The preparation of the ordered array is the first step in organizing data. An ordered array is a listing of the values of a collection (either population or sample) from the smallest value to the largest value. The ordered array enables one to determine quickly the value of the smallest measurement, the value of the largest measurement and the general trends in the data. Raw data 13 3 17 9 5 7 15 11 Organized

Grouped Data The frequency distribution Although a set of observation can be made more comprehensible and meaningful by means of an ordered array, further useful summarization may be achieved by grouping the data. To group a set of observations, we select a set of non-overlapping intervals such that each value in the data set of observations can be placed in one, and only one, interval. These intervals are usually referred to as Class Intervals. Usually class intervals are ordered from smallest to largest.

Grouped data The frequency distribution How many intervals should we use? (0-100 years) Too few intervals are undesirable because of the resulting loss of information. (eg. 0-50, 51-100) two intervals Too many intervals, on the other hand, will not meet the objective of summarization. (eg. 0-1,2-3,4-5,…….99-100)!! A commonly used rule is there should be no fewer than six intervals and no more than 15. (6-15 is optimal) Sturges rule: where k is the number of class intervals and n is the number of values in the data set under consideration. (rounded to nearest integer) The size of the class interval is often selected as 5, 10, 15 or 20 etc

Grouped data The frequency distribution The width of class intervals: Class intervals should be generally of the same width. The width may be obtained by dividing the range by k, the number of class intervals. eg: tablet hardness values range between 50 and 120 N, calculate the recommended number of intervals and the interval width for data contains 60 values of tablet hardness?? n= 60 Range (k)=largest – smallest=120-50=70 #intervals=1+3.329(logn)=1+3.329(log60)=7 Interval width=k/n=70/7=10

Grouped data The frequency distribution Frequency distribution of ages of 169 subjects. Class Interval Frequency 10-19 4 20-29 66 30-39 47 40-49 36 50-59 12 60-69 Total 169 non-overlapping intervals How many subjects are there in each class interval? Variables range = 69-10 + 1 = 60 Interval width

Grouped data The relative frequency distribution It may be useful sometimes to know the proportion rather than the number, of values falling between a particular class interval. We obtain this information by dividing the number of values in the particular class interval by the total number of values. We refer to the proportion of values falling within a class interval as the relative frequency of values in that interval. We may sum (cumulate) the frequencies and relative frequencies to facilitate obtaining information regarding frequency or relative frequency of values within two or more contiguous class intervals. Class Interval Frequency Relative Frequency Cumulative Frequency Cumulative Relative Frequency 10-19 4 0.0237 20-29 66 0.3905 70 0.4142 30-39 47 0.2781 117 0.6923 40-49 36 0.2130 153 0.9053 50-59 12 0.0710 165 0.9763 60-69 169 1.0000 Total

Grouped data The relative frequency distribution We use true limits to fill the gaps between intervals for a continuous variable. Using true limits is very essential to calculate statistics (range, median,…etc) of grouped data. Upper true limit = upper class value + 0.5. Lower true limit = lower class value - 0.5. Intervals True limits frequency 10--19 9.5-19.5 4 20-29 19.5-29.5 66 30-39 29.5-39.5 47 40-49 39.5-49.5 36 50-59 49.5-59.5 12 60-69 59.5-69.5

Histogram We may display a frequency distribution (or a relative frequency distribution) graphically in the form of a histogram, which is a special type of bar graphs. This histogram is a probability distribution that consists of adjacent columns to represent a continuous variable such as weight, height, age..etc. When we construct a histogram, the variable under consideration are represented by the horizontal (x) axis, while the the frequency (or relative frequency) of occurrence is the (y) axis. Histogram Frequency Age interval, yrs (variable)

Class Problem I Everyone: Choose a color from the list below: Green, Blue, Red, Yellow, Black and type it on your notebook Let us select a random variable sample from this population (you all!!) Let us count the frequency for each selected color Color Frequency Green Blue Red Yellow Black Draw a representative histogram for the variable frequency of the listed colors (Use Excel to draw it, HW1-B) Dr. Alkilany 2012

Class Problem II 1. Use the data above to construct a frequency table A school nurse weighed 30 students in Year 10. Their weights (in kg) were recorded as follows: 50 52 53 54 55 65 60 70 48 63 74 40 46 59 68 44 47 56 49 58 63 66 68 61 57 58 62 52 56 58 1. Use the data above to construct a frequency table Range = 74-40=34 Let width of class interval =5 #intervals=34/5=7 There are 7 class intervals.  This is reasonable for the given data. The frequency table is as follows: 2. Complete the table to calculate: cumulative frequency, relative frequencies, cumulative relative frequencies (HW1-C)

Lecture 3 Descriptive Statistics Biostatistics Lecture 3 Descriptive Statistics

Descriptive Statistics With interval scale (continuous measurement) data, there are two aspects to the figures that we should be trying to describe: How large are they? ‘indicator of central tendency’ How variable are they? ‘indicator of dispersion’ FBG for two sets of patients as follows: Set A: 84, 85,89, 89, 93, 94. Set B:72, 82,89, 89, 96, 106. which is larger? Which is more variable? ‘indicator of central tendency’ describes any statistic that is used to indicate an average value around which the data are clustered Three possible indicators of central tendency are in common use – the mean, median and mode. Dispersion Central tendency Dr. Alkilany 2012

Mean The usual approach to showing the central tendency of a set of data is to quote the average or the ‘mean’. Example: Potency data of different vaccine batches. Each batch is intended to be of equal potency, but some manufacturing variability is unavoidable. A series of 10 batches has been analyzed and the results are shown in the following table: Sum = 991.5 n=10 Mean=99.15 Dr. Alkilany 2012

Types of Mean Arithmetic mean Arithmetic mean Geometric Mean Harmonic mean Arithmetic mean

Arithmetic Mean Arithmetic mean represents the balance point of the distribution. Symmetrical Tail to the right Tail to the left Mean ( ) The arithmetic mean has the following properties: Uniqueness, for a given set of data there is one and only one mean. Simplicity, easily to be understood and computed. Not robust to extreme values, it is affected by each value in the data. e.g. 5,10,15: mean=10…………..5,10,150: mean=55

Geometric Mean Geometric Mean: Is the anti-log of the average of the logarithms of the observations. Example, for the values 50, 100, 200 Geometric mean = Antilog[(log50+log100+log200)/3]=100 while the arithmetic mean is 116.67. Is meaningful for data with logarithmic relationships as in the case of the current procedure in bioequivalence studies where the ratios of log-transformed parameters are compared (log (AUC), log(Cmax)). Dr. Alkilany 2012

Harmonic Mean Harmonic Mean: Is the appropriate mean following reciprocal transformation. Example, the half-lives of a certain drug in 3 subjects were 2, 4, 8 hrs. determine the harmonic mean half-life for this drug? While the arithmetic mean is 4.667 hrs.

Median The point that divides the distribution into two equal parts, or the point between the upper and lower halves of the distribution. Accordingly, if we have a finite number of values, then the median is the value that divides those values into two parts such that the number of values equal to or greater than the median is equal to the number of values equal to or less than the median. 7 variables 7 variables Median Dr. Alkilany 2012

Median (example) Fifteen patients were provided with their drugs in a child-proof container of a design that they had not previously experienced. The time it took each patient to open the container was measured. The results are shown below. The mean = 7.09 s, Is this the most representative/descriptive figure? Some outliers shifted the mean and thus median can tell us better information in this case Values are clustering here

Median Most patients have got the idea more or less straight away and have taken only 2–5 s to open the container. However, four seem to have got the wrong end of the stick and have ended up taking anything up to 25 s. These four have contributed a disproportionate amount of time (65.6 s) to the overall total. This has then increased the mean to 7.09 s. We would not consider a patient who took 7.09 s to be remotely typical. In fact they would be distinctly slow.

Median (other example) This problem of mean values being disproportionately affected by a minority of outliers arises quite frequently in biological and medical research. A useful approach in such a case is to use the median. eg: Blood Glucose Level (mg/dl): 80, 81, 82, 83, 84, 84, 86, 86, 180 Mean: 93 Median: 84 The outlier 180 shifted the mean to higher value, which is not descriptive for the data set in this case!! Values are clustering here outlier Median Mean

Median (how to determine it in ordered array?) When n is an odd number, then the median is the value number (n+1)/2 in an ordered array Example, what is the median for the following data set: 10, 15, 12, 25, 20. Rank the data: 10, 12, 15, 20, 25. The median is the value number (n+1)/2 (5+1)/2=3rd so the median is 15. When n is an even number, then the median is the mean of the two middle values (n/2)th and ((n/2) + 1)th in an ordered array . 10, 15, 20, 25, 30, 5 5, 10, 15, 20, 25, 30 The median is the average of (n/2)th and the (n/2 + 1)th values: 3rd and 4th (15+20)/2= 17.5

Robustness to extreme values Mean Vs. Median Properties Mean Median Uniqueness Yes Simplicity Robustness to extreme values No The median is robust to extreme outliers. The term ‘robust’ is used to indicate that a statistic or a procedure will continue to give a reasonable outcome even if some of the data are aberrant. eg. 2, 4, 6, 8, 10 median=6 2, 4, 6, 8, 1000 median=6 If last variable increased to 1000 instead of 10, the median will stay the same (6), while the the mean would be hugely inflated!!!!

Mode Mode: value which occurs most frequently. If all values are different there is no mode and a set of values may have more than one mode. Used for quick estimation and for identifying the most common observation. Properties: Not unique Simple Not robust, less stable than the median and the mean. Dr. Alkilany 2012

Mode The condition of sixty patients with arthritis is recorded using a global assessment variable. A positive score indicates an improvement and a negative one a deterioration in the patient’s condition after treatment. The mean (0.77) [Do you think the mean is the best descriptive parameter for these data? Dr. Alkilany 2012

Mode A histogram of the above data shows that there are two distinct sub-populations. Slightly under half the patients have improved quality of life, but for the remainder, their lives are actually made considerably worse. Dr. Alkilany 2012

Mode Neither the mean nor the median indicator remotely describes the situation. The mean is particularly unhelpful as it indicates a value that is very untypical – very few patients show changes close to zero. We need to describe the fact that in this case, there are two distinct groups. The data consisted of values clustered around some central points. Dr. Alkilany 2012

Mode Data distribution can be ‘unimodal’ or ‘polymodal’ in the case with several clustering. If we want to be more precise, we use terms such as bimodal or trimodal to describe the exact number of clusters. Dr. Alkilany 2012

How mean, median, and mode are related? For symmetric distributions: the mean and median are equal For skewed distributions with a single mode the three measures differ Dr. Alkilany 2012

How mean, median, and mode are related? For skewed distributions with a single mode the three measures differ: mean>median>mode (positively skewed distributions) mean<median<mode (negatively skewed distributions) Dr. Alkilany 2012