Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic Statistic for Research Dr. Subash Gopinath School of Bioprocess Engineering, UniMAP.

Similar presentations


Presentation on theme: "Basic Statistic for Research Dr. Subash Gopinath School of Bioprocess Engineering, UniMAP."— Presentation transcript:

1 Basic Statistic for Research Dr. Subash Gopinath School of Bioprocess Engineering, UniMAP

2 Statistics It is the science which deals with development and application of the most appropriate methods for the:  Collection of data.  Presentation of the collected data.  Analysis and interpretation of the results.  Making decisions on the basis of such analysis Other definitions for “Statistics”   Frequently used in referral to recorded data   Denotes characteristics calculated for a set of data : sample mean

3 Role of statisticians   To guide the design of an experiment or survey prior to data collection   To analyze data using proper statistical procedures and techniques   To present and interpret the results to researchers and other decision makers

4 Descriptive Statistics Types of descriptive statistics: Organize Data Organize Data  Tables  Graphs Summarize Data Summarize Data  Central Tendency  Variation Duplicates Triplicates Replicates

5  Numerical presentation  Graphical presentation  Mathematical presentation Methods of presentation of data Common Tools Excel Origin DOE

6 Design of Experiments (DOE) R 2 T-value Inhibitory Constant (IC 50 ) Dissociation Constant (kD) Mean, Median, Mode Standard deviation (SD) MIC (minimal inhibitory constant) Common calculations

7 Types of data Constant Variables

8 Quantitative continuous Types of variables Quantitative variablesQualitative variables Quantitative discrète Qualitative nominal Qualitative ordinal

9 Distribution of 50 patients at the surgical department of hospital according to their ABO blood groups Blood group Frequency% ABABO121851524361030 Total50100

10 Complex frequency distribution Table Distribution of 20 lung cancer patients at the chest department of hospital and 40 controls according to smoking Smoking Lung cancer Total CasesControl No.% % % Smoker 1575%820%2338.33 Non smoker 525%3280%3761.67 Total201004010060100

11 Visual Data Summaries Some visual ways to summarize data (one variable at a time): Some visual ways to summarize data (one variable at a time):  Tables  Graphs  Line graph  Frequency polygons  Histograms  Bar charts  Pie chart  Box plots  Scatter plot

12 Line Graph

13 Frequency polygon Frequency polygons are a graphical device for understanding the shapes of distributions

14 Histogram Distribution of 100 cholera patients at (place), in (time) by age

15 Bar chart

16 Pie chart

17 Scatter plot

18 Box plot

19 Graphical Summaries Bar Graphs Bar Graphs  Nominal data  No order to horizontal axis Histograms Histograms  Continuous or ordinal data on horizontal axis Box Plots Box Plots  Continuous data

20 Mathematical presentation Measures of location 1- Measures of central tendency 1- Measures of central tendency 2- Measures of non central locations 2- Measures of non central locations (Quartiles, Percentiles ) Measures of dispersion

21 Measures of central tendency (averages) Midrange Smallest observation + Largest observation 2Mode the value which occurs with the greatest frequency i.e. the most common value the value which occurs with the greatest frequency i.e. the most common value

22 Measures of central tendency (cont.)  Median the observation which lies in the middle of the ordered observation. the observation which lies in the middle of the ordered observation.  Arithmetic mean (mean) Sum of all observations Number of observations

23 Standard deviation SD 7 7 7 7 7 7 7 8 7 7 7 6 3 2 7 8 13 9 Mean = 7 SD=0 Mean = 7 SD=0.63 Mean = 7 SD=4.04

24 Standard error of mean SE A measure of variability among means of samples selected from certain population SE (Mean) = S n

25 P-value The chance of rejecting the null hypothesis by coincidence ---------------------------- For gene expression analysis we can say: the chance that a gene is categorized as differentially expressed by coincidence The output of the statistics The term "null hypothesis" usually refers to a general statement that there is no relationship between two measured phenomena

26 The t-test Assumptions 1. The observations in the two categories must be independent 2. The observations should be normally distributed 3. The sample size must be ‘large’(>30 replicates)

27 Multi-testing? In a typical microarray analysis we test thousands of genes If we use a significance level of 0.05 and we test 1000 genes. We expect 50 genes to be significant by chance 1000 x 0.05 = 50

28 What's inside the black box ‘statistics’ t-test or ANOVA

29 The t-test Calculate T Lookup T in a table

30 The t-test II The t-test tests for difference in means (  ) Intensity of gene x Density  wt wt  mut mutant

31 t The t statistic is based on the sample mean and variance The t-test III the term "null hypothesis" usually refers to a general statement that there is no relationship between two measured phenomena

32 ANOVA ANalysis Of Variance Very similar to the t-test, but can test multiple categories Ex: is gene x differentially expressed between wt, mutant 1 and mutant 2 Advantage: it has more ‘power’ than the t-test

33 ANOVA II Intensity Density Variance between groups Variance within groups

34 Example: Batch to batch variation  Within batch variation is lower than the between batch variation

35 Mean Most commonly called the “average.” Add up the values for each case and divide by the total number of cases. Y-bar = (Y1 + Y2 +... + Yn) n Y-bar = Σ Yi n

36 Mean Class A--IQs of 13 Students 102115 128109 13189 98106 140119 9397 110 Class B--IQs of 13 Students 127162 131103 96111 80109 9387 120105 109 Σ Yi = 1437 Σ Yi = 1433 Y-bar A = Σ Yi = 1437 = 110.54 Y-bar B = Σ Yi = 1433 = 110.23 n 13 n 13

37 Mean The mean is the “balance point.” Each person’s score is like 1 pound placed at the score’s position on a see-saw. Below, on a 200 cm see-saw, the mean equals 110, the place on the see-saw where a fulcrum finds balance: 17 units below 4 units below 110 cm 21 units above The scale is balanced because… 17 + 4 on the left = 21 on the right 0 units 1 lb at 93 cm 1 lb at 106 cm 1 lb at 131 cm

38 Mean 1. Means can be badly affected by outliers (data points with extreme values unlike the rest) 2. Outliers can make the mean a bad measure of central tendency or common experience All of Us Bill Gates Mean Outlier Income in the U.S.

39 Median The middle value when a variable’s values are ranked in order; the point that divides a distribution into two equal halves. When data are listed in order, the median is the point at which 50% of the cases are above and 50% below it. The 50 th percentile.

40 Median Median = 109 (six cases above, six below) Class A--IQs of 13 Students 89939798102106109110115119128 131 140

41 Median Median = 109.5 109 + 110 = 219/2 = 109.5 (six cases above, six below) If the first student were to drop out of Class A, there would be a new median: 89939798102106109110115119128131140

42 Median The median is unaffected by outliers, making it a better measure of central tendency, better describing the “typical person” than the mean when data are skewed. All of Us Bill Gates outlier

43 Median If the recorded values for a variable form a symmetric distribution, the median and mean are identical. In skewed data, the mean lies further toward the skew than the median. Mean Median Mean Median Symmetric Skewed

44 Median The middle score or measurement in a set of ranked scores or measurements; the point that divides a distribution into two equal halves. Data are listed in order—the median is the point at which 50% of the cases are above and 50% below. The 50 th percentile.

45 Mode The most common data point is called the mode. The combined IQ scores for Classes A & B: 80 87 89 93 93 96 97 98 102 103 105 106 109 109 109 110 111 115 119 120 127 128 131 131 140 162 It is possible to have more than one mode! mode!!

46 Mode It may mot be at the center of a distribution. Data distribution on the right is “bimodal” (even statistics can be open- minded)

47 Mode 1. It may give you the most likely experience rather than the “typical” or “central” experience. 2. In symmetric distributions, the mean, median, and mode are the same. 3. In skewed data, the mean and median lie further toward the skew than the mode. Median Mean MedianMeanMode Symmetric Skewed

48 Descriptive Statistics Summarizing Data: Central Tendency (or Groups’ “Middle Values”) Central Tendency (or Groups’ “Middle Values”) Mean Mean Median Median Mode Mode  Variation (or Summary of Differences Within Groups)  Range  Interquartile Range  Variance  Standard Deviation

49 Range The spread, or the distance, between the lowest and highest values of a variable. To get the range for a variable, you subtract its lowest value from its highest value. Class A--IQs of 13 Students 102115 128109 13189 98106 140119 9397 110 Class A Range = 140 - 89 = 51 Class B--IQs of 13 Students 127162 131103 96111 80109 9387 120105 109 Class B Range = 162 - 80 = 82

50 Interquartile Range A quartile is the value that marks one of the divisions that breaks a series of values into four equal parts. The median is a quartile and divides the cases in half. 25 th percentile is a quartile that divides the first ¼ of cases from the latter ¾. 75 th percentile is a quartile that divides the first ¾ of cases from the latter ¼. The interquartile range is the distance or range between the 25 th percentile and the 75 th percentile. Below, what is the interquartile range? 0 250 500 750 1000 25% of cases 25% 25% of cases

51 Variance A measure of the spread of the recorded values on a variable. A measure of dispersion. The larger the variance, the further the individual cases are from the mean. The smaller the variance, the closer the individual scores are to the mean. Mean

52 Standard Deviation To convert variance into something of meaning, let’s create standard deviation. The square root of the variance reveals the average deviation of the observations from the mean. s.d. = Σ (Yi – Y-bar) 2 n - 1 n - 1

53 R 2 value

54 Equilibrium constant

55 IC50 or EC50 half maximal effective concentration (EC 50 ) half maximal inhibitory concentration (IC 50 )

56 Symbol Molar concentrations Physical measurements Size of molecules Units Other considerations


Download ppt "Basic Statistic for Research Dr. Subash Gopinath School of Bioprocess Engineering, UniMAP."

Similar presentations


Ads by Google