Download presentation
Presentation is loading. Please wait.
1
DEPT. OF PHARMACEUTICS & PHARM. TECH
PCT 202 STATISTICS DR. C. P. AZUBUIKE DEPT. OF PHARMACEUTICS & PHARM. TECH FACULTY OF PHARMACY UNIVERSITY OF LAGOS
2
OBJECTIVES Understanding of the fundamental concepts of statistics
Analyze in a statistically acceptable manner, data from scientific experiments.
3
TOPIC DESCRIPTION Compilation and presentation of data
Analysis of data Distribution of data Measurement of variation-SD, SE, limits of errors Comparison of data- tests for significance, co-efficient of variation, etc. Introduction to statistical quality control
4
TEXTBOOKS Pharmaceutical Statistics by David Jones
Understanding Statistics by O. A. Adedayo Biostatistics….. A practical Approach to Research and Data handling by Anthony E. Ogbeibu
5
WHAT IS STATISTICS? The term statistics is the science and art of collecting, organizing, analyzing and interpreting numerical data affected by a multiplicity of causes, offering objective evaluation of the reliability of the conclusions based on the data. It is a theory of decision making in the face of uncertainty
6
TYPES OF STATISTICS Statistics may be divided into two sub-categories :- Descriptive statistics Inferential statistics Descriptive or deductive statistics provides general information about the fundamental statistical properties of data e.g. mean, median, mode, variance, standard deviation etc.
7
TYPES OF STATISTICS Inferential or analytical or inductive statistics enables one to draw inferences/conclusions based on information derived from experimental procedures, e.g. the antidiabetic effect of formulation A is greater than that of formulation B
8
APPLICATIONS OF STATISTICS
Collection of data Numerical description of data Formulation of hypothesis concerning the nature of the data Understanding the relevance of data Design of Experiments to test the hypotheses, or indeed, to further consolidate or reject the hypothesis
9
DEFINITION OF COMMON TERMS USED IN STATISTICS
Population: refers to the total number of cases in our focus of interest. Census: is the complete enumeration of a target population together with the collection of some important information on every element of the population. Sample: is a part of population and thus consists of any subgroup drawn from target population.
10
DEFINITION OF COMMON TERMS USED IN STATISTICS
Parameter: is a numerical quantity summarizing a population (mean, SD, variance) Statistic: is a numerical quantity that summarized the characteristic of a sample e.g. sample mean, Variable: : is an occurrence which can assume any value from a prescribed set of values. It can be discrete or continuous variables
11
VARIATION IN SCIENTIFIC DATA
A variable is a property with respect to which individuals in a sample differ in some ascertainable way. E.g. of variables include: The height of men in a particular region The weights of tablets derived from the same batch Concentration of cholesterol in the plasma of female subjects.
12
TYPES OF VARIABLES Measurement variables: may be described in a numerically ordered fashion. It may be continuous or discrete variables Continuous variables can assume an infinite number of numerical values between the lowest and highest points on a scale Discrete variables have a fixed number of values and always have integer values (whole numbers)
13
TYPES OF VARIABLES Ranked variables: ranking scales are e.g. of continuous variables. Although they do not represent physical measurement , such scales represent numerically ordered systems. Attributes or nominal variables cannot be measured because of their qualitative nature. Unlike ranked variables are not associated with numerical values. Note: Whenever attributes are combined with frequencies, they are referred to as enumeration data
14
PRESENTATION OF DATA Data can be presented in:- Tabular forms
Diagrammatic forms In order to organize the data well, we need to classify the data before presenting either in tabular or diagrammatic forms Classification deals with grouping of data which have some identified common characteristics.
15
TABULATION OF DATA Tabulation deals with presentation of the classified data in tabular form A table is an array of data in rows and columns It enhances condensation of a large mass of data It enables ease comparison among classes of data It takes up less space than data presented in narrative form
16
CONTENTS OF A GOOD TABLE
Title: is written at the top of the table and gives the description of the contents of the table Caption: is the column heading Stubs: are the row headings Footnote: is a brief explanatory information about the table which is not self-evident Units of Measurement: should be clearly specified.
17
TYPES OF TABLES Simple table: Complex table Further complex
18
SAMPLE OF A TABLE
19
DIAGRAMMATICAL/GRAPHICAL DATA PRESENTATION
Pictograms Pie Chart Bar Charts Histogram Cumulative frequency curve or Ogive
20
PICTOGRAM Pictogram contains a pictorial symbol that represents the data of interest. The number of diagram drawn is usually proportional to the given data. A key is usually given to inform us about the value of each pictorial symbol
21
PICTOGRAM
22
PIE CHART A pie chart consists of a circle that has been divided into sectors which are proportional to the data Most of the data presented in a pie chart are categorical data Generally pie charts do not give information on the absolute magnitude except figures are assigned to each sector
23
PIE CHART
24
BAR CHARTS Simple Bar chart consists of separated rectangular bars drawn in such a way that the height is equivalent to the frequency. It could be drawn vertically or horizontally but vertical bar charts are most popular Unlike pie chart, it is easier to make comparison of the heights than of sectors Simple bar chart, multiple bar charts and component bar charts are e.g. of types of bar charts
25
BAR CHARTS
26
HISTOGRAM Histogram is similar to the simple bar chart except that the bars are not separated The area of each rectangular bar is proportional to its frequency The line joining the midpoint of the top of one bar to the other is known as frequency polygon
27
HISTOGRAM
28
CUMULATIVE FREQUENCY CURVE (OGIVE)
When the cumulative totals of successive frequencies of a distribution are plotted against the corresponding class boundaries then we have an ogive The last cumulative frequency is the total of the frequencies in the distribution It can only be plotted for a frequency distribution For ungrouped frequency table, the values on the x-axis are the individual values of X For a grouped frequency table we shall need to compute the class boundaries.
29
CUMULATIVE FREQUENCY CURVE (OGIVE)
30
MEASUREMENT OF CENTRAL TENDENCY
Single value which is a central point of a distribution is known as a measure of central tendency (CT) or location Measures of CT are typical and representative of a data. Every value in the distribution clusters around the measure of the location Arithmetic mean, median, mode, harmonic mean and geometric mean are e.g. of measure of CT
31
ARITHMETIC MEAN Mean is the most popular method for describing the central nature of data It refers to the centre of a distribution of data The use is most appropriate when ever the data is symmetrically distributed around the mean Mathematically, the arithmetic mean (µ, X) is described as follows:
32
ARITHMETIC MEAN 𝑗=1 𝑁 𝑋𝑗 / N Where ∑ is the sigma notation which is used for summing up numbers (‘the sum of’) 𝑋𝑗 refers to all data from j=1 to j=N N is the number of data contributing to the calculation
33
ARITHMETIC MEAN Question: The reduction in BP (mmHg) in 6 patients 4 hours after administration of a standard dose of a novel antihypertensive agent is shown in Table below: Calculate the mean reduction in BP reduction in the 6 patients Patient Number Reduction in blood pressure (mmHg) 1 20 2 25 3 21 4 34 5 41 6 37
34
ARITHMETIC MEAN Solution: Substituting the figures from the table into the equation the arithmetic mean, we obtain: 𝑗=1 𝑁 𝑋𝑗 / N = ( ) 6 = = mmHg The term arithmetic mean in the current usage can be abbreviated as mean
35
WEIGHTED MEAN It is a special e.g. of the mean in which each datum point in the distribution does not contribute equally to the overall calculation of the mean Weighted mean is commonly employed whenever the data is divided into groups Each of the group possesses different weighting.
36
WEIGHTED MEAN Question: The effect of a defined dose of a commercially available analgesic to suppress pain following a painful stimulus was evaluated in 20 volunteers using a visual analogue scale. The results are presented in the table below: Number of Volunteers Pain assessment by volunteers 2 3 (extreme pain) 12 2 (moderate pain) 6 1 (slight pain)
37
WEIGHTED MEAN Solution: in this e.g., three sub-groups describe different clinical effects and therefore not of equal magnitude (weighting) Calculation of weighted average employs the following: 𝑗=1 𝑁 𝑤𝑗𝑋𝑗 / N wj is the weighting (frequency) of each group or series 𝑗=1 𝑁 𝑋𝑗 / N = 2𝑥3 + 12𝑥2 +(6𝑥 1)) 20 = = 1.8 Note: calculated mean does not have any dimensions, as a direct result of the analogue scale used to access pain
38
MEDIAN It is an alternative means of describing of central nature of data which is relatively unaffected by the nature of the spread of data. It is the central value For data that is distributed in a normal fashion, the numerical values of the mean and median should be identical Both mean and median may be used to describe the central properties of moderately skewed. For data that is +ly skewed (i.e. distributed towards the y-axis), mean is greater than median.
39
MEDIAN Question: The weights of 11 tablets removed from a batch for quality control purposes are presented in the table below. Calculate the mean and median values of the tablet weights.
40
MEDIAN Individual weights of 11 tablets removed from a production batch Tablet number Tablet weight (mg) 1 251 2 255 3 250 4 245 5 265 6 260 7 231 8 225 9 10 275 11 300
41
MEDIAN Step 1 Calculation of the mean Step 2 Calculation of the median
( ) 11 = mg Step 2 Calculation of the median 1st, arrange the data in order of magnitude 225, 231, 245, 250, 250, 251, 255, 260, 265, 275 and 300 The median is defined as the central value, i.e. the value in position 6 which is 251mg.
42
MODE Mode is defined as most commonly occurring measurement in a set of data. Question: The concentrations of therapeutic agent (mg/ml) in 10 vials of a commercially available parenteral product have been determined using a chromatographic method. Calculate the mode of the observed concentrations.
43
MODE Concentration of a therapeutic agent in 10 vials of a commercially available product Vial number Conc of therapeutic agent (mg/ml) 1 200 2 205 3 4 201 5 199 6 195 7 202 8 9 10 207
44
MODE The most popular value in the above set of data is 205mg/ml (four concurrences) This value (205mg/ml) is the mode. Data may contain more than one mode. If two modes are present in a data set, the data is said to be bimodal.
45
MEASUREMENT OF THE VARIATION
It deals with the spread or variability or dispersion of the data in the distribution. Dispersion deals with the way values in the distribution are scattered. As illustration, suppose the monthly salaries of 5 workers from two companies are as shown in the above table. Mean salary is same (N5, 000) but the distribution is not same. Company A N4,000 N4,500 N5,000 N5,500 N6,000 Company B N1,500 N1,750 N2,250 N15,000
46
MEASUREMENT OF THE VARIATION
The concept of variability is very useful in inferential statistics The higher the variability of a distribution the less accurate is the estimate to be obtained about the measures of location In statistics, the variability of a distribution is better described in terms of a summary data rather than being described as low or high The measures include range, mean absolute deviation, variance, quartile deviation and standard deviation
47
RANGE It is simply defined as the difference between the highest value and lowest value Range = Highest value – Lowest value Coefficient of range = 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒−𝐿𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒+𝐿𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 The use of range to accurately describe data variation is limited It does not truly describe the variation of the entire data
48
RANGE The ranges of data sets A and B in the table below are as follows: Data set A: range = = 20 Data set AB range = = 2 Data Set A Data Set B 10 28 20 29 30 Mean=30 Mean = 30
49
MEAN DEVIATION It is a measure of data variation that is calculated as the average deviation from the mean. I mathematical terms, the mean deviation is described as follows: MD = 𝑋𝑗 −𝑋 𝑁 Where 𝑋𝑗 −𝑋 is the absolute value of the deviation of the values in the data set from the mean of the data set and N is the nos of observations in the data set
50
MEAN DEVIATION Question: The weights of six tablets of folic acid taken from a batch that was prepared by wet granulation method were , 98.3, 98.9, 95.1, and Calculate the mean deviation. Tablet No Weight of tablet (mg) 1 100.6 2 98.3 3 98.9 4 95.1 5 104.5 6 105.5
51
MEAN DEVIATION 𝑗=1 𝑁 𝑋𝑗 / N =
Solution: Step 1 calculate the mean 𝑗=1 𝑁 𝑋𝑗 / N = ) 6 = 100.5mg/ml Step 2 Calculate the mean difference 𝑋𝑗 −𝑋 𝑁 = {(100.6−100.5)+ 98.3− (98.9−100.5)+ 95.1− − (105.5−100.5)) 6 = 3.1 mg/ml Note: No reference is made to the algebraic sign
52
VARIANCE Variance is the sum of the squared deviations from the mean divided by the number of observations (N) The sum of the squared deviations from the mean is called sum of squares (SS) SS = 𝑋𝑗−𝑋 2 Variance is thus the mean sum of squares. σ2= 𝑋𝑗−µ 2 /𝑁 where σ2 is the variance of a population σ2 is designated by the symbol sigma squared. µ is the population mean
53
VARIANCE s2= 𝑋𝑗−𝑋 2 /𝑁 -1 where s2 is the variance of a sample
𝑿 is the sample mean N-1 is called the degrees of freedom Primary reason for the differences b/w the two equations relates to the relative inaccuracy of the estimation of population variance from the sample variance.
54
STANDARD DEVIATION SD is a commonly used measure of the dispersion of data. SD is defined as the +ve √ of the variance It may be written mathematically as follows: SD of a population σ =√ 𝑋𝑗−µ 2 /𝑁 SD of a sample s= √ 𝑋𝑗−µ 2 /𝑁 Most calculators and computers can calculate SDs rapidly.
55
STANDARD DEVIATION (ERROR) OF THE MEAN
Standard deviation of the mean is sometimes referred to as standard error of the mean (SEM) SD describes the variability (dispersion) of a set of data around a value, and an estimate of the variability of the data in a population may be derived from it. SEM is a measure of the variability of a set of mean values, calculated from individual groups of measurements (samples) that have been derived from a population
56
COEFFICIENT OF VARIATION
CV is a statistical term that expresses the variability of a set of data It is defined as the ratio of the standard deviation (s) to the mean of the data set (X): 𝐶𝑉 % = 𝑠 𝑋 𝑥 100 It allows the variation of data sets of differing magnitude to be directly compared. E.g. if the mean ± SD of two sets are (A) 2500 ± 125 and (B) 50 ± 35, at first glance one may be deceived into believing that the variation of data B is less than that of A
57
ACCURACY Accuracy is defined as the closeness of a measured value to the true value (the value that would be expected in the absence of error) In pharmaceutical analysis, it is commonplace to describe the accuracy of an analytical method as the closeness of the observed (analysed) and expected values. Absolute error and Relative error may be used to describe the difference b/w observed and expected values.
58
ABSOLUTE ERROR It can be calculated using the formula errorabs =O-E
Where errorabs, is the absolute error, O is the observed value or alternatively, the observed mean of a set of values, E is the expected (true) value. Question: A solution of quinine sulphate has been analysed using three analytical methods, and the results are shown in the table below. Calculate the errorabs
59
ABSOLUTE ERROR Concentration of quinine sulphate in a solution, as determined using three analytical methods Analytical method Observed value (mg/ml) Expected value (mg/ml) Absolute error (mg/ml) HPLC with UV detection 2.51 2.50 +0.01 UV spectroscopy 3.53 +1.03 Fluorescence spectroscopy 2.19 -0.31
60
ABSOLUTE ERROR Solution:
The most accurate method may be defined as that which possesses the lowest value of absolute error while least accurate method may be defined as that which possesses the largest value of absolute error HPLC with UV detection (errorabs, mg/ml) is most accurate while UV spectroscopy (errorabs, mg/ml) is least accurate
61
RELATIVE ERROR The term was developed to overcome the problem with errorabs It can be calculated using the formula: errorrel = errorabs/E = O-E/E In the calculation, the sign of the difference (+ve or –ve) is ignored Greater numerical values of relative error are indicative of decreased accuracy. Advantage of the use of relative error can be seen in the next table
62
RELATIVE ERROR Analytical method Observed value (mg/ml)
Expected value (mg/ml) Absolute error (mg/ml) Relative error (%) HPLC with UV detection 2.51 2.50 +0.01 0.40 UV spectroscopy 3.53 +1.03 41.20 Fluorescence spectroscopy 2.19 -0.31 12.40 0.19 0.50 62.00
63
PRECISION It is a statistical term that describes the dispersion of a set of measurements Unlike accuracy, it provides no indication of the closeness of an observation to particular expected quantity High precision is associated with low dispersion of values around a central value, a low SD.
64
PRECISION Question In a quality control laboratory the fill volumes of three samples of an antacid suspension have been measured and recorded as shown in the table below. Comment on the accuracy and precision of the volumes of the three samples
65
Fill volumes of selected samples of an antacid formulation
PRECISION Fill volumes of selected samples of an antacid formulation Fill volume of sample A (ml) Fill volume of sample B (ml) Fill volume of sample C (ml) 47 29 28 48 39 26 49 27 50 59 30 51 69 X = 49.0 X = 28.2 s= 1.6 s= 15.8 s= 1.80 Errorrel of mean =2.0% Errorrel of mean =43.6%
66
PRECISION From the table
The relative errors associated with samples A and B are identical and these samples are therefore considered to be equally accurate measurements of the true (expected) fill volume. Conversely, the accuracy of the mean fill volume of sample C is poor (43.6% relative error) and this is considered to be poor representation of the true fill
67
PRECISION Considering the SDs associated with the data, we can evaluate the precision of the measurements. Sample A has a low SD (1.6), low CV (3.3%) hence low dispersion of the data set around the mean …. precise Sample C has a low SD (1.8), low CV (6.4%) hence low dispersion of the data set around the mean … precise Sample B has a high SD (15.8), high CV (32.2%) hence high dispersion of the data set around the mean ... imprecise A exhibits high accuracy & high precision, B high accuracy & low precision
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.