Setting up tests to determine the impact of AP technology Experimental Design Setting up tests to determine the impact of AP technology Real Science. Real Results.
Real Science. Real Results. Background Need to convince your Boss/Operations/Friends about the impact of new AP measures. How do we test it such that the results are: Accurate (not distorted by other things going on in the store) Applicable over comparable stores (who is the AP measure for) Able to be tested statistically Real Science. Real Results.
Real Science. Real Results. Project Background We want to test the impact of these interventions but how do we avoid biases in the data what if the test stores experience relatively higher shrink than the control stores? If we test over the holiday season, how can we separate the impact between increased seasonal shoplifting vs the effect of the treatment? How can we find relatively similar stores and assign them to test and control groups? Real Science. Real Results.
Real Science. Real Results. Experimental Design Some things to consider before testing How many stores can be involved in this test? Affects sample size What is the timeframe for testing this? Affects experimental design type How quickly and at what volume are you losing products? Affects the power of the study What do we expect the decrease in shrink % to be? Affects experimental design type and our analysis Do a subset of our stores have more in common with some stores more than others? Assists us in blocking, helping us remove confounding variables Real Science. Real Results.
Real Science. Real Results. Experimental Design Common Experimental Designs Parallel Crossover Parallel/Crossover Balanced Crossover Block designs Real Science. Real Results.
Real Science. Real Results. Methods of Testing Parallel studies Compares Group A to Group B over the same period of time. Helpful if you feel that seasonality may impact the test results. i.e. if shrink increases over the holidays in both settings, the treatment group may increase less than control stores. Pro: Easiest to analyze (2 sample t-test) less time required when compared to other tests. Cons: lower power possible bias based on store locations. Real Science. Real Results.
Real Science. Real Results. Methods of Testing Parallel studies Groups Time Period 1 Time Period 2 Control No intervention Treatment Intervention Real Science. Real Results.
Real Science. Real Results. Methods of Testing Crossover study Compares two different time periods before and after the treatment is placed. All stores in study have a predetermined time for data collection without the treatment, the treatment is implemented, and then collect shrink data for a predetermined time. Pros: Higher power when compared to parallel design Each store is its own control More power with smaller samples than parallel designs. Cons: Longer testing period required Temporal confounding possibly Real Science. Real Results.
Real Science. Real Results. Methods of Testing Crossover studies Groups Time Period 1 Installation Time Period 2 Treatment No Intervention Installation period Intervention Real Science. Real Results.
Real Science. Real Results. Methods of Testing Parallel/Crossover hybrid Compares two different time periods before and after the treatment is placed. ALSO compares groups of stores Efficient if sample sizes are moderately sized Tested via paired t-test, regression models, Mixed ANOVA Pros: Higher power when compared to parallel design Each store is its own control in treatment group Can also account for temporal effects. Cons: longer testing period required when compared to crossover Larger sample size required Real Science. Real Results.
Real Science. Real Results. Methods of Testing Balanced crossover Compares two different time periods before and after the treatment is placed. Efficient if sample sizes are relatively small Tested via paired t-test, regression models, Mixed ANOVA Pros: Higher power when compared to parallel design Each store is its own control Can take temporal effects into consideration Shorter timespan required Cons: Treatment effects may last over to the control time period Real Science. Real Results.
Real Science. Real Results. Methods of Testing Parallel/Crossover Groups Time Period 1 Installation Time Period 2 control No intervention Add intervention Intervention Treatment Remove intervention No Intervention Real Science. Real Results.
Real Science. Real Results. Blocking Blocking – The arranging of experimental units (stores) in groups that are similar to one another All of these designs benefit from blocking Blocking is imperative to ensure you are comparing apples to apples. For instance, if you had a sample of 10 stores (5 high shrink stores, 5 low shrink stores) in a parallel study design that didn’t take into consideration shrink levels, you may have all of your high shrink stores in the treatment group and all low shrink stores in control group. Real Science. Real Results.
Real Science. Real Results. Blocking Blocking Also allows for post hoc testing Are the AP measures more effective in high risk stores? Certain geographical locations? Certain types of building layouts? Real Science. Real Results.
Real Science. Real Results. Word of Caution WARNING Even if you use the most sophisticated methods to assign stores to treatment and control groups, a project is only as good as its data. Routine quality checks should be made during the testing period to identify any problems Positive shrink values Compliance for frequency of measurement Make sure stores that were supposed to get interventions get them at the correct time Ensure that control stores are not changing their displays. Real Science. Real Results.
Real Science. Real Results. Word of Caution WARNING Some study participants will worry that since you are keeping track of products, managers will want to make sure they are losing less during the test period Hawthorne effect – people will behave differently if they know they are in a study Real Science. Real Results.
Real Science. Real Results. Current Project Goals The Consumer Product Research Project was designed to test the effects of protective interventions on shrink and sales of blade & razor products in operating Stores Conducted in 2009, the LPRC wanted to evaluate the impact on shrink for 2 AP measures; protective fixtures and ePVMs. Real Science. Real Results.
Real Science. Real Results. Current Project Goals The LPRC was given a 4 month period (April to July) to evaluate the impact of protective fixtures and ePVMs on razor blade loss. Razor blade loss happening every week, losing dozens of products. 3 retailers involved with the project. Each retailer able to provide 20 stores for testing Real Science. Real Results.
Project Goals/Hypotheses Do interventions reduce the level of shrink/loss in test stores vs. control time periods? Do interventions increase the number of sales in test stores vs. control time periods? Real Science. Real Results.
Real Science. Real Results. Project Design Since retailers are likely to have more similar stores, retailer was used as a blocking method. To achieve balanced sample size stores were divided evenly between 3 groups; 20 stores received fixtures, 20 stores received ePVMs and 20 stores were used as controls Based on the number of stores each retailer could provide, the treatment and control assignments were made to make sure there was representation from each retailer in each group. Used a crossover/parallel study design to measure the impact the ePVM and fixtures had on shrink and sales and compared them to the control group. Real Science. Real Results.
Real Science. Real Results. Testing Period Analyzed n Status Pre test 6 weeks Intervention 12 days Post test 15 Stores Test Baseline Measurements CCTV ePVMs Post Measurements 14 Stores Display Fixtures 19 Stores Control No Treatment Open Display Real Science. Real Results.
Real Science. Real Results. Hold on a second… We initially had 20 stores assigned to each group and now we see that’s not the case. Problems with data collection Stores installed interventions during the pre-test period Control stores installed fixtures A lot can go wrong throughout the course of a project. Data integrity is incredibly important, it is better to lose a few stores within a project than keeping them to potentially skew your results. Real Science. Real Results.
Real Science. Real Results. Results - Shrink Solution Pre-test shrink Post test shrink Fixture M = 45.43 M = 24.79 ePVM M = 18.00 M = 14.13 Control M = 16.37 M=18.00 Real Science. Real Results.
Real Science. Real Results. Results -Sales Solution Pre-test sales Post test sales Fixture M = 69.39 M = 58.21 ePVM M = 62.13 M = 62.73 Control M = 49.26 M=48.84 Real Science. Real Results.
Real Science. Real Results. Suggestions Notice the significantly different #s of sales and shrink for the group assigned the fixture Preliminary knowledge of sales and shrink for certain products should be used as a blocking variable. Could make assignments by store type and then assign within that block based on sales/shrink All results were aggregated, where a store by store breakdown of the results may have elucidated some interesting findings. Business fluctuations were not too dramatic over the test period. Thus, a crossover design could have provided more information on the impact of the AP measures when compared to the design used. Real Science. Real Results.