Exploring Microsoft® Excel® 2016 Series Editor Mary Anne Poatsy Exploring Microsoft Office 2016 Series Editor Mary Anne Poatsy Mulbery|Davidson Series Created by Dr. Robert T. Grauer
Copyright © 2017 Pearson Education, Inc. Chapter 8 In Chapter 8, you will learn how to employ statistical functions to analyze data for decision making. Statistical Functions Analyzing Statistics Copyright © 2017 Pearson Education, Inc.
Copyright © 2017 Pearson Education, Inc. Objectives Use Conditional Math and Statistical Functions Calculate Relative Standing with Statistical Functions Measure Central Tendency Load the Analysis ToolPak Perform Analysis Using the Analysis Toolpak Create a Forecast Sheet The objectives for this chapter are: Use Conditional Math and Statistical Functions Calculate Relative Standing with Statistical Functions Measure Central Tendency Load the Analysis ToolPak Perform Analysis Using the Analysis Toolpak Create a Forecast Sheet Copyright © 2017 Pearson Education, Inc.
Objective 1: Use Conditional Math and Statistical Functions In this section, the skills include: Use the SUMIF, AVERAGEIF, and COUNTIF Functions Use the SUMIFS, AVERAGEIFS, and COUNTIFS Functions Enter Math and Statistical Functions Skills: Use the SUMIF, AVERAGEIF, and COUNTIF Functions Use the SUMIFS, AVERAGEIFS, and COUNTIFS Functions Enter Math and Statistical Functions Copyright © 2017 Pearson Education, Inc.
Use Conditional Math and Statistical Functions Excel Math and Statistical functions: SUMIF—calculates the total of a range of values when a specified condition is met =SUMIFS(sum_range,criteria_range1,criteria1,criteria_range2,criteria2…) AVERAGEIF—calculates the average of a range when a specified condition is met =AVERAGEIFS(average_range,criteria_range1,criteria1,criteria_range2,criteria2…) COUNTIF—counts the number of cells in a range when a specified condition is met =COUNTIFS(criteria_range1,criteria1,criteria_range2,criteria2…) When you use SUM, AVERAGE, and COUNT functions, Excel calculates the respective total, the mathematical average, and the number of values for all values in the range specified in the function’s arguments. The math and statistical function categories contain related functions—SUMIF, AVERAGEIF, COUNTIF, SUMIFS, AVERAGEIFS, and COUNTIFS—that perform similar calculations but based on a condition. The SUMIF function is a statistical function similar to the SUM function except that it calculates the total of a range of values when a specified condition is met. The AVERAGEIF function calculates the average, or arithmetic mean, of all cells in a range when a specified condition is met. The COUNTIF function is a statistical function, similar to the COUNT function except that it calculates the number of cells in a range when a specified condition is met. Additional Math and Statistics functions are discussed on the next slide. Copyright © 2017 Pearson Education, Inc.
Use Conditional Math and Statistical Functions Excel Math and Statistical functions: SUMIFS—calculates the total value of cells in a range that meet multiple criteria =SUMIFS(sum_range,criteria_range1,criteria1,criteria_range2,criteria2…) AVERAGEIFS—calculates the average value of a range that meet multiple criteria =AVERAGEIFS(average_range,criteria_range1,criteria1,criteria_range2,criteria2…) COUNTIFS—counts the number in a range that meet multiple criteria =COUNTIFS(criteria_range1,criteria1,criteria_range2,criteria2…) The three functions discussed on the slide are similar to their counterparts on the previous slide, except multiple criteria can be used. The SUMIFS function calculates the total value of cells in a range that meet multiple criteria. The AVERAGEIFS function calculates the average value of cells in a range that meet multiple criteria. The COUNTIFS function counts the number of cells in a range that meet multiple criteria. Copyright © 2017 Pearson Education, Inc.
Use Conditional Math and Statistical Functions This slide shows the application of several of the math and statistical functions discussed in the two previous slides. Take time to look at the functions to see how they are written and understand their results. To actually find these functions, Excel organizes the conditional functions in the math and statistical function categories. The SUMIF and SUMIFS functions are math functions and are located under Math & Trig in the Function Library group. The AVERAGEIF, AVERAGEIFS, COUNTIF, and COUNTIFS functions are statistical functions. To locate these functions, click More Functions in the Function Library group, and then click Statistical. Copyright © 2017 Pearson Education, Inc.
Objective 2: Calculate Relative Standing with Statistical Functions In this section, the skills include: Use the RANK and PERCENTRANK Functions Use the QUARTILE and PERCENTILE Functions Skills: Use the RANK and PERCENTRANK Functions Use the QUARTILE and PERCENTILE Functions Copyright © 2017 Pearson Education, Inc.
Calculate Relative Standing with Statistical Functions Excel ranking functions: RANK.EQ—identifies a value’s rank omitting the next rank when tie values exist =RANK.EQ(number,ref,[order]) RANK.AVG—identifies the rank of a value but assigns an average rank when identical values exist =RANK.AVG(number,ref,[order]) Excel contains several ranking functions: The RANK.EQ function identifies a value’s rank within a list of values, omitting the next rank when tie values exist. The RANK.AVG function identifies the rank of a value but assigns an average rank when identical values exist. The arguments for these two functions are: The number argument specifies the cell containing the value you want to rank. The ref argument specifies the range of values that you want to use to identify their rankings. The optional order argument enables you to specify how you want to rank the values. The implied default is 0 for descending order, and any nonzero value is for ascending order. This difference between these two functions is shown on the next slide. Copyright © 2017 Pearson Education, Inc.
Calculate Relative Standing with Statistical Functions Same Salary RANK.AVG Both rank functions use Column D, which has the salaries ranked in ascending order. RANK.EQ Copyright © 2017 Pearson Education, Inc.
Calculate Relative Standing with Statistical Functions Excel ranking functions: PERCENTRANK.INC—displays a value’s rank as a percentile of the range, where values range from 0 to 1 inclusive =PERCENTRANK.INC(array,x,[significance]) PERCENTRANK.EXC—displays a value’s rank as a percentile of the range, where values range from 0 to 1 exclusive =PERCENTRANK.EXC(array,x,[significance]) Additional ranking functions: The PERCENTRANK.INC function displays a value’s rank as a percentile of the range of data in the dataset. The values range from 0 to 1, where 0 is the lowest percent rank and 1 is the highest. The INC indicates that 0 and 1 are included. The PERCENTRANK.EXC function is similar to PERCENTRANK.INC in that it returns a value’s rank as a percent, but excludes 0 and 1, hence the EXC. The arguments for these two functions are: The array argument specifies the range that contains the values to compare. The x argument specifies an individual cell. The optional significance argument designates the number of significant digits for precision. This difference between these two functions is shown on the next slide. Copyright © 2017 Pearson Education, Inc.
Calculate Relative Standing with Statistical Functions Both PERCENTRANK functions use Column D, which has the salaries ranked in ascending order. Notice the differences, especially at the top and bottom where 0 and 1 were included and excluded. PERCENTRANK.INC PERCENTRANK.EXC Copyright © 2017 Pearson Education, Inc.
Calculate Relative Standing with Statistical Functions Excel ranking functions: QUARTILE.INC—identifies the value at a specific quartile including 0 and 4 =QUARTILE.INC(array,quart) QUARTILE.EXC—identifies the value at a specific quartile excluding 0 and 4 =QUARTILE.EXC(array,quart) The next two ranking functions use quartiles—dividing into four groups. The QUARTILE.INC function identifies the value at a specific quartile for a dataset, including quartile 0 for the lowest value and quartile 4 for the highest value in the dataset. The QUARTILE.EXC function identifies the value at a specific quartile for a dataset, excluding quartile 0 for the lowest value and quartile 4 for the highest value in the dataset. The arguments for these two functions are: The array argument specifies the range of values. The quart argument specifies the quartile from 0-4, where 0 and 1 are only allowed with the INC version. This difference between these two functions is shown on the next slide. Copyright © 2017 Pearson Education, Inc.
Calculate Relative Standing with Statistical Functions Both QUARTILE functions use Column D, which has the salaries ranked in ascending order. Notice the differences, especially at the top and bottom where 0 and 1 were included and excluded. QUARTILE.INC QUARTILE.EXC Copyright © 2017 Pearson Education, Inc.
Calculate Relative Standing with Statistical Functions Excel ranking functions: PERCENTILE.INC—identifies the value at a specific percentile including 0 and 1 =PERCENTILE.INC(array,k) PERCENTILE.EXC—identifies the value at a specific percentile excluding 0 and 1 =PERCENTILE.EXC(array,k) The final two ranking functions use percentiles—dividing into 100 groups. The PERCENTILE.INC function identifies the value at a specific percentile for a dataset, including quartile 0 for the lowest value and quartile 100 for the highest value in the dataset. The PERCENTILE.EXC function identifies the value at a specific percentile for a dataset, excluding quartile 0 for the lowest value and quartile 100 for the highest value in the dataset. The arguments for these two functions are: The array argument specifies the range of values. The k argument specifies the percentile from 0–1, where 0 and 1 are only allowed with the INC version. This difference between these two functions is shown on the next slide. Copyright © 2017 Pearson Education, Inc.
Calculate Relative Standing with Statistical Functions Both rank functions use Column D, which has the salaries ranked in ascending order. Notice the differences, especially at the top where 0 was included and excluded. PERCENTILE.INC PERCENTILE.EXC Copyright © 2017 Pearson Education, Inc.
Objective 3: Measure Central Tendency In this section, the skills include: Use the Standard Deviation Function Use the Variance Function Use the CORREL Function Use the FREQUENCY Function Skills: Use the Standard Deviation Function Use the Variance Function Use the CORREL Function Use the FREQUENCY Function Copyright © 2017 Pearson Education, Inc.
Measure Central Tendency Statistical groups: Population—dataset that contains all the data Sample—smaller portion of the population Central tendency functions: Variance—measures a dataset’s dispersion Standard deviation—measures of how far the data sample is spread around the mean When calculating central tendencies, there are two groups to consider: A population is a dataset that contains all the data you would like to evaluate. A sample is a smaller, more manageable portion of the population. The example given in the text is all educators in the state of Pennsylvania constitute an example of a population, and a survey of 10% of the educators of each city in Pennsylvania is a sample. When dealing with statistics, determining the 10% that are surveyed is the difficult part. The two common functions for measuring central tendencies are: Variance is a measure of a dataset’s dispersion. Standard deviation is the measure of how far the data sample is spread around the mean (average). Copyright © 2017 Pearson Education, Inc.
Measure Central Tendency Standard deviations functions: STDEV.P—calculates the standard deviation based on the population =STDEV.P(number1,number2) STDEV.S—calculates the standard deviation based on a sample =STDEV.S(number1,number2) Excel provides four versions of functions to calculate these two central tendencies. We will focus on two versions of each. Standard deviation functions: The STDEV.P function calculates the standard deviation based on the population. The STDEV.S function calculates the standard deviation based on a sample of the population. The arguments for these two functions are: The first argument specifies the range of values used to calculate the standard deviation. The second argument is not commonly used. Copyright © 2017 Pearson Education, Inc.
Measure Central Tendency Variance functions: VAR.P—calculates the variance based on the population =STDEV.P(number1,number2) VAR.S—calculates the variance based on a sample =STDEV.S(number1,number2) Variance functions: The VAR.P function calculates the variance based on the population. The VAR.S function calculates the variance based on a sample of the population. Copyright © 2017 Pearson Education, Inc.
Measure Central Tendency STDEV.S VAR.S Both functions use Column C, which contains the test scores. Notice that the standard deviation only uses one argument, which is the test scores. This is a sample of 50 students, so the “S” versions of STDEV and VAR are used. Copyright © 2017 Pearson Education, Inc.
Measure Central Tendency CORREL—determine the strength of a relationship between two variables =CORREL(array1,array2) CORREL You have heard many times about a relationship (correlation) between two variables. For example, the correlation between smoking and cancer. Excel has a function that measures correlations between two variables. The range for the coefficient of correlation is -1 to 1. The stronger the relationship, the closer the coefficient of correlation is to -1 or 1. You might expect a coefficient of correlation between the hours spent studying and the grades on a test would have a value close to one. On the other hand, the coefficient of correlation between the speed that cars are driven and the average miles per gallon would have a value close to negative one. What would you expect the coefficient of correlation between a person’s height and their income? The CORREL function, short for “correlation coefficient,” helps determine the strength of a relationship between two variables. When used to compare datasets, the function will return a value between -1 and 1. As seen in the Formula Bar, we are comparing Columns A and B, which represent Salary and Credit Score. As you can see, there is a strong correlation between salaries and credit scores. Copyright © 2017 Pearson Education, Inc.
Measure Central Tendency FREQUENCY—determine the frequency distribution of a dataset =FREQUENCY(data_array,bins_array) FREQUENCY The frequency distribution is a meaningful descriptive tool because it determines how often a set of numbers appears within a dataset. The FREQUENCY function is a descriptive statistical function in Excel that determines the frequency distribution of a dataset. The two arguments for the function are: The data_array is the range of cells that contain the values that are being evaluated for frequency of occurrence. The bins_array argument is the range of numbers that specify the bins in which the data should be counted. As seen in the Formula Bar, we are using Column C, which contains salaries, and creating three bins based on quartiles and the corresponding salaries in Column H. We see that there are 12 people in the first quartile and 13 in the second and third quartiles. Copyright © 2017 Pearson Education, Inc.
Objective 4: Load the Analysis ToolPak In this section, the skills include: Load the Analysis ToolPak Skills: Load the Analysis ToolPak Copyright © 2017 Pearson Education, Inc.
Load the Analysis ToolPak Data Analysis Add-in added to Ribbon Select Analysis Tookpak Add-in As you learned in Chapter 6, add-ins are programs that can be added to Excel to provide enhanced functionality. The Analysis ToolPak is an add-in program that provides statistical analysis tools, specifically for performing an analysis of variance or ANOVA and COVARIANCE, and creating a histogram. To enable the Analysis ToolPak add-in, you: Click the File tab and click Options. Select Add-ins on the left side. Ensure that Excel Add-ins is selected in the Manage box and click Go. Click the Analysis ToolPak check box to select it and click OK. Copyright © 2017 Pearson Education, Inc.
Objective 5: Perform Analysis Using the Analysis ToolPak In this section, the skills include: Perform Analysis of Variance (ANOVA) Calculate COVARIANCE Create a Histogram Skills: Perform Analysis of Variance (ANOVA) Calculate COVARIANCE Create a Histogram Copyright © 2017 Pearson Education, Inc.
Perform Analysis Using the Analysis ToolPak ANOVA is a statistical hypothesis test that helps determine if samples of data were taken from the same population. In practical use, it can be used to accept or reject a hypothesis. There is no one function to calculate ANOVA in Excel; however, we will create an ANOVA report using the Analysis ToolPak. Of the three types of ANOVA, we will create an ANOVA report using a single-factor ANOVA about a school system’s high school students. To use the Analysis ToolPak to create a single-factor ANOVA report: On the Data tab in the Analysis group, click Data Analysis to display the Data Analysis dialog box. Select Anova: Single Factor and click OK. Click the Input Range selection box and select the range of data you want to analyze. Select either Grouped By Columns or Grouped By Rows based on your data layout. Choose the default Alpha 0.05 (meaning there is a 5% chance of rejecting the null hypothesis). Select an output option and click OK. This slide shows the result of an ANOVA test and the table on the right shows how to interpret the results. Don’t worry, this will not be on the final! Copyright © 2017 Pearson Education, Inc.
Perform Analysis Using the Analysis ToolPak Covariance is similar to correlation. It is a measure of how two sets of data vary simultaneously. In Excel, there are COVARIANCE.P and COVARIANCE.S functions that can calculate covariance, and there is also a covariance reporting feature included in the Analysis ToolPak. To create a covariance report: On the Data tab in the Analysis group, click Data Analysis to display the Data Analysis dialog box. Select Covariance and click OK. Click the Input Range selection box and select the range of the data you want to analyze. Select Grouped By Columns or Grouped By Rows depending on the organization of the dataset. If the first row contains labels, check Labels in the First Row check box to select it. Select an output option and click OK. In this example, we are hypothesizing that the more days of school missed by students, the lower the student’s SAT scores. We would expect a negative relationship, which is indicated in the results shown in the slide. -1484.57 Copyright © 2017 Pearson Education, Inc.
Perform Analysis Using the Analysis ToolPak A histogram is a visual display of tabulated frequencies. We will use the Analysis ToolPak to create a histogram. Creating a histogram is somewhat similar to using the FREQUENCY function in that it requires bins to tabulate the data and will return a frequency distribution table. To create a histogram: On the Data tab in the Analysis group, click Data Analysis to display the Data Analysis dialog box. Enter the Input Range in the Input Range box. Enter the Bin Range in the Bin Range box. Click the Labels box. Select the output options of your choice. Select an output option and click OK. Shown in this slide is the frequency distribution table from which the histogram is created. Copyright © 2017 Pearson Education, Inc.
Objective 6: Create a Forecast Sheet In this section, the skills include: Create a Forecast Sheet Skills: Create a Forecast Sheet Copyright © 2017 Pearson Education, Inc.
Create a Forecast Sheet Excel 2016 offers a new feature that has the ability to create a forecast worksheet to detail trends based on historical data. In this example, we are going to track average SAT scores over the years as they compare to average teacher salaries. The Forecast Sheet feature will generate a chart and corresponding table to provide a future forecast based on given data. To create a Forecast worksheet, you: Sort the data in chronological order. Select the data range for the desired forecast. On the Data tab in the Forecast group, click Forecast Sheet. Set the desired Forecast End. Click Create. Looking at the slide, the blue line represents the historical data based on the shaded table and the orange lines represent the forecasted results. Copyright © 2017 Pearson Education, Inc.
Copyright © 2017 Pearson Education, Inc. Summary Additional Excel functions: Math and statistical related functions: SUMIF, AVERAGEIF, COUNTIF, SUMIFS, AVERAGEIFS, and COUNTIFS Relative standing functions, which formed various groups: RANK, PERCENTRANK, QUARTILE, and PERCENTILE Descriptive statistical functions: STDEV, VAR, CORREL, and FREQUENCY Inferential statistical functions: ANOVA, COVARIANCE, Histograms, and Forecast sheets In this chapter, we explored the application of additional Excel Functions in several areas: Math and statistical related functions, which enabled us to calculate statistics based on specified criteria: SUMIF, AVERAGEIF, COUNTIF, SUMIFS, AVERAGEIFS, and COUNTIFS Relative standing functions, which formed various groups: RANK, PERCENTRANK, QUARTILE, and PERCENTILE Descriptive statistical functions, which enabled us to examine central tendencies: STDEV, VAR, CORREL, and FREQUENCY Inferential statistical functions, which enabled us to make inferences based on samples and populations: ANOVA, COVARIANCE, Histograms, and Forecast sheets Copyright © 2017 Pearson Education, Inc.
Copyright © 2017 Pearson Education, Inc. Questions ? It is important to understand how to use statistical functions to analyze data. Are there any questions? Copyright © 2017 Pearson Education, Inc.
Copyright Copyright © 2017 Pearson Education, Inc.