Data Science and Analytics

Slides:



Advertisements
Similar presentations
Describing Distributions With Numbers
Advertisements

Exercise 7.5 (p. 343) Consider the hotel occupancy data in Table 6.4 of Chapter 6 (p. 297)
Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin TECHNOLOGY PLUG-IN T3 PROBLEM SOLVING USING EXCEL.
McGraw-Hill/Irwin ©2008 The McGraw-Hill Companies, All Rights Reserved TECHNOLOGY PLUG-IN T4 PROBLEM SOLVING USING EXCEL Goal Seek, Solver & Pivot Tables.
Decision Analysis Tools in Excel
Computer Programming (TKK-2144) 13/14 Semester 1 Instructor: Rama Oktavian Office Hr.: T.12-14, Th
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
Chapter 2 Matrices Finite Mathematics & Its Applications, 11/e by Goldstein/Schneider/Siegel Copyright © 2014 Pearson Education, Inc.
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
Chapter 6: Pivot Tables Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
Goldstein/Schnieder/Lay: Finite Math & Its Applications, 9e 1 of 86 Chapter 2 Matrices.
Finite Mathematics & Its Applications, 10/e by Goldstein/Schneider/SiegelCopyright © 2010 Pearson Education, Inc. 1 of 86 Chapter 2 Matrices.
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Non-Linear Simultaneous Equations
COMPREHENSIVE Excel Tutorial 10 Performing What-If Analyses.
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
Problem Solving Using Excel
Chapter 19 Linear Programming McGraw-Hill/Irwin
Spreadsheet Modeling of Linear Programming (LP). Spreadsheet Modeling There is no exact one way to develop an LP spreadsheet model. We will work through.
Chapter 10: Working with Large Data Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
XP Agenda Video Last Class Excel Tutorial 5: Working with Excel Lists Agenda for Next Class 1 New Perspectives on Microsoft Office Excel 2003 Tutorial.
XP. Objectives Sort data and filter data Summarize an Excel table Insert subtotals into a range of data Outline buttons to show or hide details Create.
June 21, Objectives  Enable the Data Analysis Add-In  Quickly calculate descriptive statistics using the Data Analysis Add-In  Create a histogram.
Slide 1 of 35 Welcome to GSA’s Vendor and Customer Self Service (VCSS) course Section 3: Basic Navigation This presentation is compliant with section 508.
McGraw-Hill/Irwin ©2009 The McGraw-Hill Companies, All Rights Reserved Business Driven Information Systems 2e Plug-In T3: Problem Solving Using Excel 2007.
Chapter 10: Working with Large Data Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
Microsoft ® Excel ® 2013 Enhanced Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts.
Linear Programming McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Microsoft Office Tips Pivot tables. Agenda Learn how to create and use PivotTables Q&A Excel 2010 is very similar to 2007, I have tried to demonstrate.
Problem Solving Using Excel
Descriptive Statistics ( )
Compatible with the latest browsers; Chrome, Safari, Firefox, Opera and Internet Explorer 9 and above.
Formulas and Functions
MS EXCEL PART 4.
Descriptive Statistics
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
Mixed Costs Chapter 2: Managerial Accounting and Cost Concepts. In this chapter we explain how managers need to rely on different cost classifications.
Analyzing Data with Excel
Statistical Analysis with Excel
Microsoft Office Excel 2003
Microsoft Office Illustrated
Performing What-if Analysis
Laugh, and the world laughs with you. Weep and you weep alone
Chapter 3 Describing Data Using Numerical Measures
Creating a Workbook Part 2
Exploring Microsoft® Excel® 2016 Series Editor Mary Anne Poatsy
(Estimation and Allocation)
CHAPTER 1: Picturing Distributions with Graphs
Wyndor Example; Enter data
DAY 3 Sections 1.2 and 1.3.
Introduction to linear programming (LP): Minimization
Chapter 0. Statistics Basics
Describing Distributions of Data
Navya Thum February 13, 2013 Day 7: MICROSOFT EXCEL Navya Thum February 13, 2013.
Microsoft Excel 101.
Introduction to Database Programs
Navya Thum January 30, 2013 Day 5: MICROSOFT EXCEL Navya Thum January 30, 2013.
Analytics In Action: Pivot Tables Chapter 12.
Day 52 – Box-and-Whisker.
Tutorial 7 – Integrating Access With the Web and With Other Programs
Introduction to Database Programs
St. Edward’s University
Chapter 8 Using Document Collaboration and Integration Tools
Chapter 7 Excel Extension: Now You Try!
Microsoft Office Illustrated Fundamentals
Chapter 13 Excel Extension: Now You Try!
Chapter 9 Excel Extension: Now You Try!
Presentation transcript:

Data Science and Analytics Introduction to Data Science and Analytics Stephan Sorger www.StephanSorger.com Unit 3. Excel Tools Disclaimer: All images such as logos, photos, etc. used in this presentation are the property of their respective copyright owners and are used here for educational purposes only Some material adapted from: Sorger, “Marketing Analytics: Strategic Models and Metrics” Welcome to Module 3 of Introduction to Data Science and Analytics. This module is important because it covers essential tools and techniques we can use in Microsoft Excel. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 1

Outline/ Learning Objectives Topic Description Basic Statistics Mean, Median, Variance, Standard deviation, RMS Pivot Tables Extract significance from large data sets Solver Maximize/Minimize criteria subject to constraints ToolPak Analysis add-in functionality of Excel In this module, we will cover several learning objectives: -How to calculate basic statistics such as mean, median, and so forth -How to extract significance from large data sets using pivot tables -How to maximize or minimize objectives based on constraints And -How to implement and use the analysis add-in functionality built into Microsoft Excel © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 2

Basic Statistics We start our discussion with basic statistics © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 3

Basic Statistics: Overview Data set summarized by basic statistics: Mean (average) Median (half-way point) RMS (Root Mean Square) Standard Deviation (degree of variability) Data set In this section, we show how to calculate four common statistics: Mean, which is the average of a data set; Median, which is the mid-point of a data set; RMS, or root mean square, which we use for other calculations; and Standard Deviation, which indicates the degree of variability of the data; © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 4

Basic Statistics: Example Single-season home run records: Barry Bonds, Mark McGwire, and Sammy Sosa Each player wanted to break the record held by Roger Maris Home run counts per season: Barry Bonds 1 6 9 2 4 2 5 5 3 3 3 4 4 3 7 7 4 0 2 4 6 9 5 6 7 3 Stemplot -Separate each observation into a “stem” (left digits) and “leaf” (right digit) So, 16 would be: 1 | 6 with “1” as the stem and “6” as the leaf -Write stems vertically in increasing order from top to bottom -Draw vertical line from top to bottom -”Split” the stems for greater clarity by entering two “2”, “3”, “4”, etc. -Interpret the stemplot: Study the distribution; Outlier at 73? Here we have a dataset from San Francisco Giants baseball player Barry Bonds. Big hitters like Bonds, McGwire, and Sosa wanted to break the home run record set by Roger Maris. As such, we consider the performance by Bonds. The table on top shows the home run counts per season. We can view the distribution of the data quickly by executing a stemplot. The stemplot separates each observation into a “stem” (the digits on the left) And a “leaf” , the digits on the right. We record the stems vertically in increasing order from top to bottom. For greater visual clarity, we can add a vertical line. Even from this simple plot, we can see clusters of data around the 30s, And a potential outlier at 73. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 5

Basic Statistics: Mean Mean = (Sum of all observation values) / (Number of observations) Xbar = Mean = (16 + 25 + 24 + … + 73) / (16) = 35.4375 What if we did not count the outlier in 2001? Mean = (16 + 25 + 24 + …. + 49) / 15 = 32.93; One good season increased his average 2.5! In statistics, we say that the mean is not a resistant measure of center, because it cannot resist the influence of one extreme observation The mean of a data set is defined as the sum of all the observation values Divided by the number of observations, similar to the arithmetic average. In our case, we add all the home run observations, 16, 25, 24, and so forth And divide that sum by the number of observations, or 16, to get 35.4375 In our stemplot, we noticed a potential outlier at 73. What if we did not include it? If we remove that observation from the dataset, we arrive at a mean of 32.95. Therefore, one good season increased his overall average by almost 2.5 points. We use the team resistance to indicate the influence of one extreme observation. Because we see such a large change removing one data point, we can say that mean in not a resistant measure of center. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 6

Basic Statistics: Median M = Median = Center (middle) of set of observations To find the median, we re-arranged the observations from smallest to largest (above) For an odd number, the process is easy  just pick the middle one But we have 16 observations, which is an even number So we pick the “center pair” of observations # 8 and #9 (both of these are 34) What if we remove the extreme observation of 73? Next, we compute the median, or center of the set of observations. To do so, we sort the data in ascending fashion to find the middle point. In case of even numbers of observations, we select the middle pair. In our case, we pick the center pair of observations, or 34. If we removed the extreme observation of 73, Our median would remain at 34. Therefore, we say that medians represent a resistant measure of center. Median is still 34; Therefore, we say that medians are a resistant measure © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 7

Basic Statistics: RMS RMS = Root Mean Square A kind of average used in statistics and engineering Used as a component of the calculation of the standard deviation To compute, square all the numbers in the set, find the mean, and take the square root RMS = SQRT ( (a1)^2 + (a2)^2 + (a3)^2 + …) / n ) where a1, a2, a3, … = observations n = number of observations Similar in size to average (average was 35.4375) We move on to calculating the root mean square, or RMS of the data. RMS represents a kind of average, and is used as a component Of the calculation of the standard deviation, and for other purposes. To compute it, we square each observation, find the mean, and take the square root. In our case, we arrive at an RMS of 37.8, which is roughly equivalent To our average calculation of 35.4. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 8

Basic Statistics: Standard deviation Measures the spread by examining how far the observations are from their mean To compute, calculate the variance: Variance = s^2 = [ (x1 – xbar)^2 + (x2 – xbar)^2 + …+ (xn – xbar)^2 ] / (n – 1) s = SQRT (Variance) For our previous baseball example, recall that the mean (xbar) = 35.4375: We end our discussion of basic statistics with the calculation of s, or standard deviation. Standard deviation measures the spread of data By examining the distance of the observations from their mean. To compute the standard deviation, We subtract the mean from each observation, square that amount, And repeat the process for each observation, then divide the sum by n-1, Where n represents the number of observations. In our case, the standard deviation is 13.6, Which shows that the observations are about 13.6 runs from the mean of 35.4. We could compare this value to standard deviations from other hitters to Compare the consistency of their hitting. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 9

Pivot Tables Pivot tables are an elegant way to analyze and communicate data. Much of the same functionality can be done by sorting, but pivot tables make the process much easier and faster. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 10

Pivot Tables Name Sales Date of Sale Product Channel Alex Alpha $1,100 January Product A Store Betty Beta $100 February Product B Internet Debbie Delta $300 February Product B Store Edie Epsilon $200 January Product B Internet Gary Gamma $1,300 January Product A Store Extract significant data points from large data set Original Data Set   We start by examining a typical customer data set. Each row represents one customer. For example, we can see that Alex Alpha accounted for a total of $1,100 in sales, placed their order in January, selected product A, and used the retail store distribution channel. Suppose we want to see total sales for each product, or product sales by distribution channel, or some other view. We could sort the data to obtain that view, but we will see how developing pivot tables is a much better alternative. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 11

Pivot Tables Create Pivot Table … A B C D E F G Choose the data set to analyze Select a table or range: Excel Table Range: Sheet1:$A$1:$E$6 Home Insert … Use an external data source Pivot Table A B C D E F G Choose where you want the Pivot Table report New Worksheet Existing Worksheet OK To build pivot tables in Microsoft Excel, click on the Insert tab and select Pivot Table. The right side of the slide shows the Create Pivot Table dialog box that pops up. Click on the radio button marked Select a table or range, and then enter the cell range of the dataset. Most of the time, Excel can sense the location and will pre-enter it for you. Check to make sure Excel didn't assume incorrectly. Next, click on the radio button marked New Worksheet. Building the pivot table on a new worksheet is much cleaner than placing it on the existing worksheet. Click OK on the lower right. Launching Pivot Table in Excel © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 12

Drag fields between areas below: Pivot Tables Pivot Table Field List Choose fields to add to report: Customer Excel’s Pivot Table Field List, Based on Original Input Data Set; Select “Sales” and “Product” to get basic table of sales by product Sales Date Product Channel Drag fields between areas below: Report Filter Column Labels Different versions of Excel look different; PC vs. MAC Row Labels Values Excel will build a pivot table field list and present it to you as a dialog box. It lists all the fields of our dataset, which consist of the labels at the top of each column, such as Customer and Sales. Because we want to see sales by product, select the Sales and Product fields. Pivot Table Field List © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 13

Drag fields between areas below: Pivot Tables Pivot Tables: Basic Report: Sales by Product; Select “Date” to see how sales vary over time A B C D Pivot Table Field List 1 Row Labels Sum of Sales Choose fields to add to report: 2 Product A 2400 Customer 3 Product B 600 x Sales 4 Grand Total 3000 Date x Product 5 Channel 6 7 Drag fields between areas below: 8 Report Filter Column Labels 9 10 Row Labels Values 11 Product Sum of Sales 12 This slide shows the result if we select the Sales and Product fields. Note that the table shows Product A and Product B as rows. We see that Product A delivered $2400 in sales, and product B delivered $600, for a total of $3,000. If we want to see how those product sales varied over time, we would select the Date field. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 14

Drag fields between areas below: Pivot Tables Pivot Tables: Sales by Product and Date Select “Channel” to see how sales vary with type of Channel (store) A B C D Pivot Table Field List 1 Row Labels Sum of Sales Choose fields to add to report: 2 Product A 2400 Customer 3 January 2400 x Sales 4 Product B 600 x Date x Product 5 January 200 Channel 6 February 400 7 Grand Total 3000 Drag fields between areas below: 8 Report Filter Column Labels 9 10 Row Labels Values 11 Product Sum of Sales This slide shows the result when we select the Sales, then Product, and then Date fields. The table still shows total product sales for product A and product B, but now the sales are broken down by date. If we want to see how our product sales by date varied over distribution channel, we would select the Channel field. Date 12 © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 15

Drag fields between areas below: Pivot Tables Pivot Tables: Sales by Product, Date, and Channel (Added Date, and then Channel) What if we had added Channel, and then Date? A B C D Pivot Table Field List 1 Row Labels Sum of Sales Choose fields to add to report: 2 Product A 2400 Customer 3 January 2400 x Sales 4 Retail Store 2400 x Date x Product 5 Product B 600 x Channel 6 January 200 7 Internet 200 Drag fields between areas below: 8 February 400 Report Filter Column Labels 9 Internet 100 10 Retail Store 300 Row Labels Values 11 Grand Total 3000 Product Sum of Sales This slide shows the result when we select the Sales field, then the Product field, then the Date field, and then the Channel field. The table shows Sales for product A and B, broken down by date, and further broken down by channel. For example, we see product B total sales as $600, with February sales of $400. $100 of February sales came to us using the Internet channel, and $300 were sold through the retail store. Suppose we wanted to visualize the data in a different format, such as emphasizing channel over date. That is, show the total sales through each channel, and how those sales varied over time. To do this, we would select channel first, and then select date. Date 12 © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 16

Drag fields between areas below: Pivot Tables Pivot Tables: Sales by Product, Date, and Channel (Added Channel, and then Date) A B C D Pivot Table Field List 1 Row Labels Sum of Sales Choose fields to add to report: 2 Product A 2400 Customer 3 Retail Store 2400 x Sales 4 January 2400 x Date x Product 5 Product B 600 x Channel 6 Internet 300 7 January 200 Drag fields between areas below: 8 February 100 Report Filter Column Labels 9 Retail Store 300 10 February 300 Row Labels Values 11 Grand Total 3000 Product Sum of Sales This slide shows what would have happened had we selected the channel field first, and then the date field. Note that the table emphasizes channel over date. For example, Product B sold a total of $600, with $300 sold through our Internet distribution channel. Of that $300, $200 was sold in January, and $100 in February. Date 12 © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 17

Drag fields between areas below: Pivot Tables Adding Field to Report Filter A B C D Pivot Table Field List 1 Row Labels Sum of Sales Choose fields to add to report: 2 Product A 2400 Customer 3 January 2400 x Sales 4 Product B 600 x Date x Product 5 January 200 Channel 6 February 400 Add to Report Filter 7 Grand Total 3000 Drag fields between areas below: 8 Report Filter Column Labels 9 10 Row Labels Values 11 Product Sum of Sales We can also view information by adding a report filter. For example, the table currently shows sales by product and date. If we wanted to see channel as well, we could right-click on channel, and select the term, "Add to Report Filter." Date 12 © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 18

Drag fields between areas below: Pivot Tables Selecting Reports using Report Filter A B C D Pivot Table Field List 1 Channel (All) Choose fields to add to report: 2 (All) Customer 3 Internet x Sales Retail Store 4 x Date x Product 5 x Channel 6 7 Drag fields between areas below: 8 Report Filter Column Labels 9 Channel OK 10 Row Labels Values 11 Product Sum of Sales This slide shows what happens when we invoke the report filter for channel. The dialog box shows the different values for channel, in this case Internet and Retail Store, as well as an All option. We click OK to see the resulting table. Date 12 © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 19

Drag fields between areas below: Pivot Tables A B C D Pivot Table Field List 1 Sum of Sales Column Labels Choose fields to add to report: 2 Row Labels Internet Retail Store Grand Total Customer 3 Product A 2400 2400 x Sales 4 January 2400 2400 x Date x Product 5 Product B 300 300 600 x Channel 6 January 200 200 7 February 100 300 400 Drag fields between areas below: 8 Grand Total 300 2700 3000 Report Filter Column Labels 9 Channel 10 Row Labels Values 11 Product Sum of Sales This slide shows the resulting table with the report filter. Note that the table still shows product sales by product and date, and further breaks out sales by channel, with separate columns for the Internet and Retail Store sales channels, as well as a Grand Total column. We have shown only a fraction of the possible views possible with pivot tables. We encourage you to build your own dataset, or start with an existing dataset from your organization, and experiment building different views of the data using pivot tables. You will be amazed by all the different views you can achieve, and will quickly find pivot tables to be an indispensable tool for analytics success. Date 12 © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 20

Microsoft Excel Solver In this section, we discuss the Solver functionality within Microsoft Excel. We can use Microsoft Excel Solver to maximize or minimize objectives of interest subject to constraints. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 21

Or Minimized Objective Excel Solver: Maximization under Constraints Common Terms for Process: -Linear Optimization -Linear Programming -Maximization/ Minimization INPUTS OUTPUTS Objective Function Maximized Objective Or Minimized Objective Linear Optimization Model Constraints The promotion allocation model uses linear optimization to calculate the optimum allocation of budget across promotion vehicles. It is an example of a normative model, as we discussed in chapter 1. The model takes an objective function and constraint equations as inputs, and delivers the maximized objective and other information as outputs. We cover the model in this lecture. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 22

Solver: Maximization under Constraints: Process Promotion Data Budget Allocation Vehicle Contribution Promotion Objective Promotion Constraints Optimization Model Vehicle Contribution: Determine effectiveness of campaigns, based on historical data Promotion Objective: Declare promotion objective in equation form Promotion Constraints: Specify promotion constraints in equation form Optimization Model: Execute model The linear optimization process entails four steps. In the first step, we calculate the vehicle contribution, which is the amount of results generated per use of the vehicle. For example, we could estimate the number of impressions, or views, delivered by each ad. In the next two steps, we express our promotion objective and constraints in equation form. We cover how to do that in the next slides. We then execute the model. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 23

Budget NTE (not to exceed) $XXX/yr Excel Solver: Constraints Financial Constraints Contractual Constraints Budget NTE (not to exceed) $XXX/yr Typical Promotion Constraints Contracts with outside agencies Legal Constraints Company Policy Constraints Must follow legal regulations Observe company policies We start by discussing promotion constraints. The slide shows a few typical constraints. First is budget. Almost every organization faces a limited budget, such as a certain amount of money that can be spent each month on promotion. Next, we face legal constraints, such as legal regulations that cover that jurisdiction. Third, we often face contractual constraints, such as specific limits placed on us due to performance contracts with external agencies. And fourth, we must observe company policies. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 24

Excel Solver Promotion Vehicle Audience/Ad Cost/Ad Maximum Quantity D: Direct Marketing 30 Viewers/Ad $30/Ad 30 P: Pay Per Click 30 Viewers/Ad $40/Ad 20 S: Social Media 40 Viewers/Ad $60/Ad 10 Direct Marketing: Emails sent directly to individuals within target market Pay Per Click: Campaigns displaying ads during relevant Internet searches Social Media: Paid advertisements on social media platforms We will demonstrate the process using an example. In the example, we promote our goods and services using three different promotion vehicles--direct marketing, pay per click, and social media. In direct marketing, we send emails directly to individuals we think will be interested in our offerings. In pay per click, we work with Google and other search engines to display ads for our offerings when users search for relevant topics. In social media, we insert paid ads on various social media platforms, such as Facebook and LinkedIn. We examine the performance of previous campaigns and discover that our direct marketing campaigns result in 30 viewers per ad, and that they cost $30 to develop and send. Similarly, pay per click delivers 30 viewers per ad and costs $40, and social media results in 40 viewers per ad and costs $60. Our current personnel count and contracts with outside advertising agencies gives us the capacity to create 30 direct marketing campaigns, as well as 20 pay per click and 10 social media campaigns per month. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 25

Excel Solver Linear Optimization Element Equation Objective Function Z = 30*D + 30*P + 40*S Constraint #1: Budget B = 30*D + 40*P + 60*S <= $2,000 #2: Maximum campaigns/ month: D D <= 30 #3: Maximum campaigns/ month: P P <=20 #4: Maximum campaigns/ month: S S <= 10 Objective Function Z = 30 * D + 30 * P + 40 * S In this slide, we show how to create the objective function. We state that we want to maximize the number of impressions from all promotion vehicles. The variable Z represents the objective quantity, in this case, the number of impressions. Impressions come from direct mail, pay per click, and social media campaigns, represented by the variables D, P, and S, respectively. The coefficients for each of the campaign variables are equal to the number of viewers per campaign, as we discussed in the previous slide. For example, we found that direct email campaigns resulted in 30 viewers per ad, so we place a 30 before the D. Therefore, our objective function is stated as Z equals 30 multiplied by D plus 30 multiplied by P plus 40 multiplied by S. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 26

Excel Solver Linear Optimization Element Equation Objective Function Z = 30*D + 30*P + 40*S Constraint #1: Budget B = 30*D + 40*P + 60*S <= $2,000 #2: Maximum campaigns/ month: D D <= 30 #3: Maximum campaigns/ month: P P <=20 #4: Maximum campaigns/ month: S S <= 10 Promotion Constraints: Budget B = 30 * D + 40 * P + 60 * S ≤ $2,000 Similarly, we calculate our budget equation, again using D, P, and S as the variables for direct email, pay per click, and social media, respectively. For the budget equation, the coefficients for each variable would be the cost for each campaign. For example, the direct mail campaign costs us $30 per campaign, so we place a 30 before the D. Let's say that our company limits us to $2,000 of promotion budget per month. Then the equation for budget would be what we spend, which is 30 multiplied by D plus 40 multiplied by P plus 60 multiplied by S, and ensure that amount is less than or equal to our budget limit, which is $2,000. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 27

Excel Solver Linear Optimization Element Equation Objective Function Z = 30*D + 30*P + 40*S Constraint #1: Budget B = 30*D + 40*P + 60*S <= $2,000 #2: Maximum campaigns/ month: D D <= 30 #3: Maximum campaigns/ month: P P <=20 #4: Maximum campaigns/ month: S S <= 10 Promotion Constraints D ≤ 30: Cannot exceed 30 direct marketing campaigns per month P ≤ 20: Cannot exceed 20 pay per click campaigns per month S ≤ 10: Cannot exceed 10 social media campaigns per month We had stated we had the capacity to produce only a finite number of campaigns. In our case, we can create 30 direct, 20 pay per click, and 10 social media campaigns per month. We would need to hire more people, or contract for more outside services, in order for us to create more. We express this constraint by simply stating the campaign variable less than or equal to the limit. For example, D is less than or equal to 30, meaning we can not create more than 30 direct marketing campaigns per month. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 28

Identify limiting factors Excel Solver: Execution Optimization Model: Setup Optimization Model: Execution Optimization Model: Interpretation Specific format Excel Solver function Identify limiting factors To run the optimization model, we follow a three step process. We set up the model in a specific format, we execute the model using Microsoft Excel's Solver function, and then interpret the results. We will cover each step. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 29

Excel Solver: Execution: Setup Columns for D, P, and S parameters Changing cells for D, P, and S A B C D E F 1 D P S 2 Changing Cells Target Cell (Contains objective equation) a b c 3 Constraint, Left Side (Contains constraint equation) 4 Target Cell 30 30 40 d Constraint, Right Side (Contains constraint value) 5 6 Constraint #1: Budget 30 40 60 e f 7 Constraint #2: D ≤ 30 1 g h 8 Constraint #3: P ≤ 20 1 i j 9 Constraint #4: S ≤ 10 This slide shows the format I recommend. In the first column, we show the elements of the optimization model, including the rows for the changing cells, target cell, and constraints. In the middle set of columns, we show the information for the D, P, and S parameters, which the D, P, and S stand for direct email, pay per click, and social media campaigns. In the right side set of columns, we see the values for the target cell and constraints. Microsoft Excel refers to variables as changing cells. In our case, the changing cells are D, P, and S. Excel refers to the objective function as the Target Cell. In our case, we would enter the objective equation in typical Excel equation form into the target cell. In the constraint area, place the equations for constraints in the left column cells, and enter the values for the constraint limits on the right side. I recommend placing all of the numeric values, such as coefficients, in separate cells, rather than hard-coding them into equations. This way, if you find that the effectiveness scores or costs change, they are easy to change. I also recommend placing ones in the cells corresponding to the D, P, and S variables, forming a diagonal. That way you can easily see which variables the constraints refer to. In the next slide, we run the model by using Excel's Solver function. 1 k l © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 30

Excel Solver: Execution: Launch Home … Data … Solver A B C D E F G To run Microsoft Excel's Solver function, click on the Data tab, and then click on the Solver function. For Macintosh systems using Excel before 2011, go to Solver.com and download the free app that will add the Solver functionality into Excel. Macintosh machines with Excel 2011 and beyond will have Solver built in. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 31

Subject to the Constraints: Excel Execution: Add Constraints Solver Parameters Add Constraint Set Target Cell: $E$4 Solve Cell Reference: Constraint: Equal To: Max. Min. $E$6 <= =$F$6 By Changing Cells: OK $B$2: $D$2 Options Subject to the Constraints: $E$6 <= $F$6 Add $E$7 <= $F$7 Positive Integer constraint: To ensure our answers are positive integers: -Select box “Make variables non-negative” -Add constraint: Changing Cells > Int (pull-down menu) $E$8 <= $F$8 $E$9 <= $F$9 Solver will open a dialog box called Solver Parameters. Enter the location of target cell in the Set Target Cell box. Enter the location of the changing cells in the By Changing Cells box. We wish to maximize revenue, so click on the Max. button. For the constraints box, you will need to enter the constraints one at a time. To enter constraints, click on the Add button. The Add Constraints dialog box will include three elements--cell reference, sign, and constraint value. For cell reference, enter the location of the constraint equation. Recall we consolidated all constraint equations in the left columns of the constraint section of the spreadsheet. For the constraint box, enter the location of the constraint value. Recall we consolidated all constraint values in the right columns of the constraint section of the spreadsheet. The middle box is a pull-down, allowing you to select from various options, such as less than or equal to, greater than or equal to, and so forth. Because we face maximum limits, select the less than or equal to sign. Then click OK. Repeat the process until all the constraints are added, and then click on the Solve button in the Solver Parameters dialog box. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 32

Excel Solver: Execution Columns for D, P, and S parameters Changing cells for D, P, and S A B C D E F 1 D P S 2 Changing Cells Target Cell (Contains objective equation) 30 20 5 3 Constraint, Left Side (Contains constraint equation) 4 Target Cell 30 30 40 1700 Constraint, Right Side (Contains constraint value) 5 6 Constraint #1: Budget 30 40 60 2000 2000 7 Constraint #2: D ≤ 30 1 30 30 8 Constraint #3: P ≤ 20 1 20 20 9 Constraint #4: S ≤ 10 Solver will populate the spreadsheet with the values it finds. It shows the number of D, P, and S campaigns needed to maximize impressions. It also shows the value of the target cell, which equals the total number of impressions. On the lower right, the spreadsheet also shows that we ran against the limits for D and P, but not for S. We discuss this topic further in the next slide. 1 5 10 © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 33

Solver Results: Summary Excel Solver: Interpretation Promotion Vehicle Solver Result Cost/Ad Total Cost per Vehicle D: Direct Marketing 30 (30 max.) $30/Ad $900 P: Pay Per Click 20 (20 max.) $40/Ad $800 S: Social Media 5 (10 max.) $60/Ad $300 Total Spending $2,000 Solver Results: Summary This slide shows the results produced by Solver. The table shows the summary, indicating the total cost per vehicle. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 34

Solver Results: Constraints Excel Solver: Interpretation Promotion Vehicle Solver Result Max. Allowable Status D: Direct Marketing 30 30 Binding P: Pay Per Click 20 20 Binding S: Social Media 5 10 Not Binding Budget $2,000 $2,0000 Binding Solver Results: Constraints The table shows the recommended number of campaigns to execute, relative to the capacity maximums. Note that we ran against the limits for D and P, but not for S. Solver refers to constrained situations as Binding and non-constrained situations as Non-Binding. We can interpret this result by stating that we have more capacity in social media than we need. For example, if you had a staff of 10 social media personnel executing one campaign each, you would have a capacity of 10 social media campaigns. By stating that the social media amount recommended is 5, with a maximum allowable of 10, we are in essence saying that 5 of the 10 people are doing nothing all day. We should re-deploy those excess people to doing something else. For example, we might train them to execute direct marketing or pay per click campaigns. If we re-deployed those assets over those two areas, we could then re-run the model with higher constraint values for direct marketing and pay per click, and generate more revenue as a result. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 35

Microsoft Excel Analysis ToolPak In this section, we discuss the Analysis ToolPak add-in available for certain versions of Microsoft Excel. It provides analysis functionality relevant to data science, as we shall soon see. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 36

Analysis ToolPak Purpose: Provides data tools for financial, statistical, and engineering data analysis PC Installation: 1. Click on upper left: -Called “Office Button” on Excel pre-2010 -Called “File Tab” on Excel post-2010 2. Excel Options The Analysis ToolPak add-in for Microsoft Excel provides regression analysis and other data tools for financial, statistical, and engineering data analysis. The add-in is available for Microsoft Windows based computers. For computers with Apple Macintosh operating systems, We will discuss an alternative in a few slides from now. For Microsoft-Windows based computers, we will need to enable the add-ins preinstalled in Excel. To do so, select the upper left button. Microsoft refers to the upper left button in pre-2010 versions of Excel as the Office Button. Microsoft changed the name to File Tab in post-2010 versions. After clicking on the upper left, select Excel Options. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 37

Analysis ToolPak 3. Click “Add-Ins” Next, we select Add-ins to bring up the add-ins screen. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 38

Analysis ToolPak 4. Click “Add-Ins” again We select Excel Add-Ins from the pull-down menu and click Go. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 39

Analysis ToolPak 5. Select: “Analysis ToolPak” “Solver Add-in” Click OK We select the Analysis ToolPak checkbox for the Analysis ToolPak and the Solver Add-In checkbox to get Solver, and then we click OK. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 40

Analysis ToolPak 6. Access Add-ins through Data tab Apple Macintosh users: ToolPak not available. Instead, download StatPlus (free) add-in. Go to StephanSorger.com and follow instructions Once we enable the add-ins, we can access them through the Data tab. For computers with Apple Macintosh operating systems, the Analysis ToolPak is not available. Instead, we recommend using commercially available analytics tools such as StatPlus. You can get links to the tools by going to the StephanSorger.com Website and following the instructions on the site. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 41

Analysis ToolPak Several analysis tools available. Typically use the following: -Descriptive statistics -Exponential smoothing -Moving average -Regression -t-test; z-test The Analysis ToolPak includes several tools useful for data analysis. A few examples include Descriptive statistics, for basic statistics summarizing a data set; Exponential smoothing, to more easily identify long term trends in data; Moving average, often used for data set smoothing; Regression analysis to identify the relationship between a dependent variable and one or more independent variables; And tools to test data validity, such as t tests and z tests. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 42

Outline/ Learning Objectives Topic Description Basic Statistics Mean, Median, Variance, Standard deviation, RMS Pivot Tables Extract significance from large data sets Solver Maximize/Minimize criteria subject to constraints ToolPak Analysis add-in functionality of Excel In this module, we covered several essential Excel tools. We discussed basic statistics to summarize data sets, such as mean, median, variance, standard deviation, and RMS. We also covered various Excel functions, such as pivot tables, To extract significance from data sets, Solver, to maximize objectives subject to constraints; And the Microsoft Excel Analysis ToolPak add-in To obtain analytics functionality within Excel. © Stephan Sorger 2016; www.StephanSorger.com; Data Science: Excel Tools; 43