Download presentation
Presentation is loading. Please wait.
Published byCornelius Lamb Modified over 8 years ago
1
Basic Statistic for Research Dr. Subash Gopinath School of Bioprocess Engineering, UniMAP
2
Statistics It is the science which deals with development and application of the most appropriate methods for the: Collection of data. Presentation of the collected data. Analysis and interpretation of the results. Making decisions on the basis of such analysis Other definitions for “Statistics” Frequently used in referral to recorded data Denotes characteristics calculated for a set of data : sample mean
3
Role of statisticians To guide the design of an experiment or survey prior to data collection To analyze data using proper statistical procedures and techniques To present and interpret the results to researchers and other decision makers
4
Descriptive Statistics Types of descriptive statistics: Organize Data Organize Data Tables Graphs Summarize Data Summarize Data Central Tendency Variation Duplicates Triplicates Replicates
5
Numerical presentation Graphical presentation Mathematical presentation Methods of presentation of data Common Tools Excel Origin DOE
6
Design of Experiments (DOE) R 2 T-value Inhibitory Constant (IC 50 ) Dissociation Constant (kD) Mean, Median, Mode Standard deviation (SD) MIC (minimal inhibitory constant) Common calculations
7
Types of data Constant Variables
8
Quantitative continuous Types of variables Quantitative variablesQualitative variables Quantitative discrète Qualitative nominal Qualitative ordinal
9
Distribution of 50 patients at the surgical department of hospital according to their ABO blood groups Blood group Frequency% ABABO121851524361030 Total50100
10
Complex frequency distribution Table Distribution of 20 lung cancer patients at the chest department of hospital and 40 controls according to smoking Smoking Lung cancer Total CasesControl No.% % % Smoker 1575%820%2338.33 Non smoker 525%3280%3761.67 Total201004010060100
11
Visual Data Summaries Some visual ways to summarize data (one variable at a time): Some visual ways to summarize data (one variable at a time): Tables Graphs Line graph Frequency polygons Histograms Bar charts Pie chart Box plots Scatter plot
12
Line Graph
13
Frequency polygon Frequency polygons are a graphical device for understanding the shapes of distributions
14
Histogram Distribution of 100 cholera patients at (place), in (time) by age
15
Bar chart
16
Pie chart
17
Scatter plot
18
Box plot
19
Graphical Summaries Bar Graphs Bar Graphs Nominal data No order to horizontal axis Histograms Histograms Continuous or ordinal data on horizontal axis Box Plots Box Plots Continuous data
20
Mathematical presentation Measures of location 1- Measures of central tendency 1- Measures of central tendency 2- Measures of non central locations 2- Measures of non central locations (Quartiles, Percentiles ) Measures of dispersion
21
Measures of central tendency (averages) Midrange Smallest observation + Largest observation 2Mode the value which occurs with the greatest frequency i.e. the most common value the value which occurs with the greatest frequency i.e. the most common value
22
Measures of central tendency (cont.) Median the observation which lies in the middle of the ordered observation. the observation which lies in the middle of the ordered observation. Arithmetic mean (mean) Sum of all observations Number of observations
23
Standard deviation SD 7 7 7 7 7 7 7 8 7 7 7 6 3 2 7 8 13 9 Mean = 7 SD=0 Mean = 7 SD=0.63 Mean = 7 SD=4.04
24
Standard error of mean SE A measure of variability among means of samples selected from certain population SE (Mean) = S n
25
P-value The chance of rejecting the null hypothesis by coincidence ---------------------------- For gene expression analysis we can say: the chance that a gene is categorized as differentially expressed by coincidence The output of the statistics The term "null hypothesis" usually refers to a general statement that there is no relationship between two measured phenomena
26
The t-test Assumptions 1. The observations in the two categories must be independent 2. The observations should be normally distributed 3. The sample size must be ‘large’(>30 replicates)
27
Multi-testing? In a typical microarray analysis we test thousands of genes If we use a significance level of 0.05 and we test 1000 genes. We expect 50 genes to be significant by chance 1000 x 0.05 = 50
28
What's inside the black box ‘statistics’ t-test or ANOVA
29
The t-test Calculate T Lookup T in a table
30
The t-test II The t-test tests for difference in means ( ) Intensity of gene x Density wt wt mut mutant
31
t The t statistic is based on the sample mean and variance The t-test III the term "null hypothesis" usually refers to a general statement that there is no relationship between two measured phenomena
32
ANOVA ANalysis Of Variance Very similar to the t-test, but can test multiple categories Ex: is gene x differentially expressed between wt, mutant 1 and mutant 2 Advantage: it has more ‘power’ than the t-test
33
ANOVA II Intensity Density Variance between groups Variance within groups
34
Example: Batch to batch variation Within batch variation is lower than the between batch variation
35
Mean Most commonly called the “average.” Add up the values for each case and divide by the total number of cases. Y-bar = (Y1 + Y2 +... + Yn) n Y-bar = Σ Yi n
36
Mean Class A--IQs of 13 Students 102115 128109 13189 98106 140119 9397 110 Class B--IQs of 13 Students 127162 131103 96111 80109 9387 120105 109 Σ Yi = 1437 Σ Yi = 1433 Y-bar A = Σ Yi = 1437 = 110.54 Y-bar B = Σ Yi = 1433 = 110.23 n 13 n 13
37
Mean The mean is the “balance point.” Each person’s score is like 1 pound placed at the score’s position on a see-saw. Below, on a 200 cm see-saw, the mean equals 110, the place on the see-saw where a fulcrum finds balance: 17 units below 4 units below 110 cm 21 units above The scale is balanced because… 17 + 4 on the left = 21 on the right 0 units 1 lb at 93 cm 1 lb at 106 cm 1 lb at 131 cm
38
Mean 1. Means can be badly affected by outliers (data points with extreme values unlike the rest) 2. Outliers can make the mean a bad measure of central tendency or common experience All of Us Bill Gates Mean Outlier Income in the U.S.
39
Median The middle value when a variable’s values are ranked in order; the point that divides a distribution into two equal halves. When data are listed in order, the median is the point at which 50% of the cases are above and 50% below it. The 50 th percentile.
40
Median Median = 109 (six cases above, six below) Class A--IQs of 13 Students 89939798102106109110115119128 131 140
41
Median Median = 109.5 109 + 110 = 219/2 = 109.5 (six cases above, six below) If the first student were to drop out of Class A, there would be a new median: 89939798102106109110115119128131140
42
Median The median is unaffected by outliers, making it a better measure of central tendency, better describing the “typical person” than the mean when data are skewed. All of Us Bill Gates outlier
43
Median If the recorded values for a variable form a symmetric distribution, the median and mean are identical. In skewed data, the mean lies further toward the skew than the median. Mean Median Mean Median Symmetric Skewed
44
Median The middle score or measurement in a set of ranked scores or measurements; the point that divides a distribution into two equal halves. Data are listed in order—the median is the point at which 50% of the cases are above and 50% below. The 50 th percentile.
45
Mode The most common data point is called the mode. The combined IQ scores for Classes A & B: 80 87 89 93 93 96 97 98 102 103 105 106 109 109 109 110 111 115 119 120 127 128 131 131 140 162 It is possible to have more than one mode! mode!!
46
Mode It may mot be at the center of a distribution. Data distribution on the right is “bimodal” (even statistics can be open- minded)
47
Mode 1. It may give you the most likely experience rather than the “typical” or “central” experience. 2. In symmetric distributions, the mean, median, and mode are the same. 3. In skewed data, the mean and median lie further toward the skew than the mode. Median Mean MedianMeanMode Symmetric Skewed
48
Descriptive Statistics Summarizing Data: Central Tendency (or Groups’ “Middle Values”) Central Tendency (or Groups’ “Middle Values”) Mean Mean Median Median Mode Mode Variation (or Summary of Differences Within Groups) Range Interquartile Range Variance Standard Deviation
49
Range The spread, or the distance, between the lowest and highest values of a variable. To get the range for a variable, you subtract its lowest value from its highest value. Class A--IQs of 13 Students 102115 128109 13189 98106 140119 9397 110 Class A Range = 140 - 89 = 51 Class B--IQs of 13 Students 127162 131103 96111 80109 9387 120105 109 Class B Range = 162 - 80 = 82
50
Interquartile Range A quartile is the value that marks one of the divisions that breaks a series of values into four equal parts. The median is a quartile and divides the cases in half. 25 th percentile is a quartile that divides the first ¼ of cases from the latter ¾. 75 th percentile is a quartile that divides the first ¾ of cases from the latter ¼. The interquartile range is the distance or range between the 25 th percentile and the 75 th percentile. Below, what is the interquartile range? 0 250 500 750 1000 25% of cases 25% 25% of cases
51
Variance A measure of the spread of the recorded values on a variable. A measure of dispersion. The larger the variance, the further the individual cases are from the mean. The smaller the variance, the closer the individual scores are to the mean. Mean
52
Standard Deviation To convert variance into something of meaning, let’s create standard deviation. The square root of the variance reveals the average deviation of the observations from the mean. s.d. = Σ (Yi – Y-bar) 2 n - 1 n - 1
53
R 2 value
54
Equilibrium constant
55
IC50 or EC50 half maximal effective concentration (EC 50 ) half maximal inhibitory concentration (IC 50 )
56
Symbol Molar concentrations Physical measurements Size of molecules Units Other considerations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.