Analysis and Empirical Results Frequency Distribution, Basic statistical tools Software’s used for analysis
What is Statistics? Statistics is the science of collecting, organizing, analyzing, interpreting, and presenting data. A statistic is a single measure (number) used to summarize a sample data set. For example, the average height of students in this class. 5/13/2018 DR. MADHUKAR DALVI
Overview of Statistics Describing Data Making Inferences from Samples Visual Displays Numerical Summaries Estimating Parameters Testing Hypotheses 5/13/2018 DR. MADHUKAR DALVI
Internet Usage Data 5/13/2018 DR. MADHUKAR DALVI Respondent Sex Familiarity Internet Attitude Toward Usage of Internet Number Usage Internet Technology Shopping Banking 1 1.00 7.00 14.00 7.00 6.00 1.00 1.00 2 2.00 2.00 2.00 3.00 3.00 2.00 2.00 3 2.00 3.00 3.00 4.00 3.00 1.00 2.00 4 2.00 3.00 3.00 7.00 5.00 1.00 2.00 5 1.00 7.00 13.00 7.00 7.00 1.00 1.00 6 2.00 4.00 6.00 5.00 4.00 1.00 2.00 7 2.00 2.00 2.00 4.00 5.00 2.00 2.00 8 2.00 3.00 6.00 5.00 4.00 2.00 2.00 9 2.00 3.00 6.00 6.00 4.00 1.00 2.00 10 1.00 9.00 15.00 7.00 6.00 1.00 2.00 11 2.00 4.00 3.00 4.00 3.00 2.00 2.00 12 2.00 5.00 4.00 6.00 4.00 2.00 2.00 13 1.00 6.00 9.00 6.00 5.00 2.00 1.00 14 1.00 6.00 8.00 3.00 2.00 2.00 2.00 15 1.00 6.00 5.00 5.00 4.00 1.00 2.00 16 2.00 4.00 3.00 4.00 3.00 2.00 2.00 17 1.00 6.00 9.00 5.00 3.00 1.00 1.00 18 1.00 4.00 4.00 5.00 4.00 1.00 2.00 19 1.00 7.00 14.00 6.00 6.00 1.00 1.00 20 2.00 6.00 6.00 6.00 4.00 2.00 2.00 21 1.00 6.00 9.00 4.00 2.00 2.00 2.00 22 1.00 5.00 5.00 5.00 4.00 2.00 1.00 23 2.00 3.00 2.00 4.00 2.00 2.00 2.00 24 1.00 7.00 15.00 6.00 6.00 1.00 1.00 25 2.00 6.00 6.00 5.00 3.00 1.00 2.00 26 1.00 6.00 13.00 6.00 6.00 1.00 1.00 27 2.00 5.00 4.00 5.00 5.00 1.00 1.00 28 2.00 4.00 2.00 3.00 2.00 2.00 2.00 29 1.00 4.00 4.00 5.00 3.00 1.00 2.00 30 1.00 3.00 3.00 7.00 5.00 1.00 2.00 5/13/2018 DR. MADHUKAR DALVI
Frequency Distribution In a frequency distribution, one variable is considered at a time. A frequency distribution for a variable produces a table of frequency counts, percentages, and cumulative percentages for all the values associated with that variable. 5/13/2018 DR. MADHUKAR DALVI
Frequency Distribution of Familiarity with the Internet Table 15.2 5/13/2018 DR. MADHUKAR DALVI
Frequency Histogram Frequency Familiarity 8 7 6 5 4 3 2 1 2 3 4 5 6 7 2 3 4 5 6 7 Familiarity 5/13/2018 DR. MADHUKAR DALVI
Obtaining a histogram in Excel 5/13/2018 DR. MADHUKAR DALVI
Frequency Distributions and Histograms Excel Histograms Specify a range of cells containing the bin limits or accept Excel’s default. 5/13/2018 DR. MADHUKAR DALVI
USING MEGASTAT In MegaStat, you can specify the interval width and lower limit of the first interval or accept the default INTERNETA USAGE DATA.xls 5/13/2018 DR. MADHUKAR DALVI
Statistics Associated with Frequency Distribution Measures of Location The mean, or average value, is the most commonly used measure of central tendency. The mean, ,is given by Where, Xi = Observed values of the variable X n = Number of observations (sample size) The mode is the value that occurs most frequently. It represents the highest peak of the distribution. The mode is a good measure of location when the variable is inherently categorical or has otherwise been grouped into categories. X n S = X / n X i i = 1 5/13/2018 DR. MADHUKAR DALVI
Statistics Associated with Frequency Distribution Measures of Location The median of a sample is the middle value when the data are arranged in ascending or descending order. If the number of data points is even, the median is usually estimated as the midpoint between the two middle values – by adding the two middle values and dividing their sum by 2. The median is the 50th percentile. 5/13/2018 DR. MADHUKAR DALVI
Statistics Associated with Frequency Distribution Measures of Variability The range measures the spread of the data. It is simply the difference between the largest and smallest values in the sample. Range = Xlargest – Xsmallest. The interquartile range is the difference between the 75th and 25th percentile. For a set of data points arranged in order of magnitude, the pth percentile is the value that has p% of the data points below it and (100 - p)% above it. 5/13/2018 DR. MADHUKAR DALVI
Statistics Associated with Frequency Distribution Measures of Variability The variance is the mean squared deviation from the mean. The variance can never be negative. The standard deviation is the square root of the variance. The coefficient of variation is the ratio of the standard deviation to the mean expressed as a percentage, and is a unitless measure of relative variability. n 2 ( X - X ) S s = i x n - 1 i = 1 5/13/2018 DR. MADHUKAR DALVI
Descriptive Statistics in Excel Go to Tools | Data Analysis and select Descriptive Statistics 5/13/2018 DR. MADHUKAR DALVI
Highlight the data range, specify a cell for the upper-left corner of the output range, check Summary Statistics and click OK. 5/13/2018 DR. MADHUKAR DALVI
Here is the resulting analysis. 5/13/2018 DR. MADHUKAR DALVI
Descriptive Statistics in MegaStat 5/13/2018 DR. MADHUKAR DALVI
Here is the resulting MegaStat analysis: 5/13/2018 DR. MADHUKAR DALVI
Statistics Associated with Frequency Distribution Measures of Shape Skewness. The tendency of the deviations from the mean to be larger in one direction than in the other. It can be thought of as the tendency for one tail of the distribution to be heavier than the other. Kurtosis is a measure of the relative peakedness or flatness of the curve defined by the frequency distribution. The kurtosis of a normal distribution is zero. If the kurtosis is positive, then the distribution is more peaked than a normal distribution. A negative value means that the distribution is flatter than a normal distribution. 5/13/2018 DR. MADHUKAR DALVI
Skewness of a Distribution Figure 15.2 Symmetric Distribution Skewed Distribution Mean Median Mode (a) Mean Median Mode (b) 5/13/2018 DR. MADHUKAR DALVI
5/13/2018 DR. MADHUKAR DALVI
Line Charts Simple Line Charts Two-scale line chart – used to compare variables that differ in magnitude or are measured in different units.CellPhones.xls 5/13/2018 DR. MADHUKAR DALVI
Scatter Plots A scatter plot shows n pairs of observations as dots (or some other symbol) on an XY graph. A starting point for bivariate data analysis. Allows observations about the relationship between two variables. Answers the question: Is there an association between the two variables and if so, what kind of association? 5/13/2018 DR. MADHUKAR DALVI
Scatter Plots A scatter plot shows n pairs of observations as dots (or some other symbol) on an XY graph. A starting point for bivariate data analysis. Allows observations about the relationship between two variables. Answers the question: Is there an association between the two variables and if so, what kind of association? 5/13/2018 DR. MADHUKAR DALVI
Select the XY (Scatter) option. In Excel, highlight the two data columns, then click on the Chart Wizard icon on the toolbar. Select the XY (Scatter) option. 5/13/2018 DR. MADHUKAR DALVI
Scatter Plots Making a Scatter Plot in Excel Click Next and then click the Series tab.BirthRates1.xls Excel assumes that the first column contain X-axis values and the second column contains Y-axis values. Alternatively, you can specify the data range explicitly for each variable. 5/13/2018 DR. MADHUKAR DALVI
Effective Excel Charts Chart Wizard Click on the Chart Wizard icon on the toolbar to open a sequence of pop-up menus to guide you through the steps of creating a chart. Step 1: Select the Chart type and then click Next. 5/13/2018 DR. MADHUKAR DALVI
Regression Terminology Fitting a Regression on a Scatter Plot in Excel 5/13/2018 DR. MADHUKAR DALVI
MEGA STAT 5/13/2018 DR. MADHUKAR DALVI
5/13/2018 DR. MADHUKAR DALVI
5/13/2018 DR. MADHUKAR DALVI