Kevin Savage Toilet Stats. Measuring usage This talk is about measuing usage What we measure How we use R to predict future usage Specific code examples.

Slides:



Advertisements
Similar presentations
The Simple Linear Regression Model Specification and Estimation Hill et al Chs 3 and 4.
Advertisements

Can we reliably forecast individual 3G usage data? An analysis using mathematical simulation of time series algorithms Cosmo Zheng.
Decomposition Method.
CountrySTAT Team-I November 2014, ECO Secretariat,Teheran.
also known as the “Perceptron”
Chapter 11: Forecasting Models
G. Alonso, D. Kossmann Systems Group
Anindya Ghose Panos Ipeirotis Arun Sundararajan Stern School of Business New York University Opinion Mining using Econometrics A Case Study on Reputation.
Time Series Analysis Autocorrelation Naive & Simple Averaging
Regression. So far, we've been looking at classification problems, in which the y values are either 0 or 1. Now we'll briefly consider the case where.
1Notes  Reference  Witkin and Baraff, “Physically Based Modelling” course, SIGGRAPH 2001  Link on the course website.
Swami NatarajanJune 17, 2015 RIT Software Engineering Reliability Engineering.
Forecasting IME 451, Lecture 2. Laws of Forecasting 1.Forecasts are always wrong! 2.Detailed forecasts are worse than aggregate forecasts! Dell forecasts.
Engineering Data Analysis & Modeling Practical Solutions to Practical Problems Dr. James McNames Biomedical Signal Processing Laboratory Electrical & Computer.
1 Lecture 2 Decision Theory Chapter 5S. 2  Certainty - Environment in which relevant parameters have known values  Risk - Environment in which certain.
MOVING AVERAGES AND EXPONENTIAL SMOOTHING
CHAPTER 3 Forecasting.
Clustering.
Judgment in Forecasting, Forecast Accuracy, Moving Averages and Decomposition Lecture 2 February 23, 2010.
Part II – TIME SERIES ANALYSIS C2 Simple Time Series Methods & Moving Averages © Angel A. Juan & Carles Serrat - UPC 2007/2008.
X-12 ARIMA Eurostat, Luxembourg Seasonal Adjustment.
Tracking with Linear Dynamic Models. Introduction Tracking is the problem of generating an inference about the motion of an object given a sequence of.
Finance 30210: Managerial Economics Demand Forecasting.
AGEC 622 Mission is prepare you for a job in business Have you ever made a price forecast? How much confidence did you place on your forecast? Was it correct?
Demand Forecasts The three principles of all forecasting techniques: –Forecasting is always wrong –Every forecast should include an estimate of error –The.
Slides 13b: Time-Series Models; Measuring Forecast Error
Fall, 2012 EMBA 512 Demand Forecasting Boise State University 1 Demand Forecasting.
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.
#RefreshCache CI - Daily Builds w/Jenkins – an Open Source Continuous Integration Server Nick Airdo Community Developer Advocate Central Christian Church.
Diane Stockton Trend analysis. Introduction Why do we want to look at trends over time? –To see how things have changed What is the information used for?
1 Forecasting Field Defect Rates Using a Combined Time-based and Metrics-based Approach: a Case Study of OpenBSD Paul Luo Li Jim Herbsleb Mary Shaw Carnegie.
LSS Black Belt Training Forecasting. Forecasting Models Forecasting Techniques Qualitative Models Delphi Method Jury of Executive Opinion Sales Force.
Operations and Supply Chain Management
Demand Management and Forecasting
Large Two-way Arrays Douglas M. Hawkins School of Statistics University of Minnesota
Paul Bakker – Social Impact Squared
3-1 McGraw-Hill/Irwin Operations Management, Seventh Edition, by William J. Stevenson Copyright © 2002 by The McGraw-Hill Companies, Inc. All rights reserved.
3-1Forecasting. 3-2Forecasting FORECAST:  A statement about the future value of a variable of interest such as demand.  Forecasts affect decisions and.
DSc 3120 Generalized Modeling Techniques with Applications Part II. Forecasting.
LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.
Time series Decomposition Farideh Dehkordi-Vakil.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 5: Exponential Smoothing (Ch. 8) Material.
Time series Model assessment. Tourist arrivals to NZ Period is quarterly.
Forecasting Chapter 9. Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall Define Forecast.
Issues in Estimation Data Generating Process:
Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe.
Linear Search Efficiency Assessment P. Pete Chong Gonzaga University Spokane, WA
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
© Wallace J. Hopp, Mark L. Spearman, 1996, Forecasting The future is made of the same stuff as the present. – Simone.
1 BABS 502 Moving Averages, Decomposition and Exponential Smoothing Revised March 14, 2010.
Forecasting is the art and science of predicting future events.
Chapter 4 Minitab Recipe Cards. Correlation coefficients Enter the data from Example 4.1 in columns C1 and C2 of the worksheet.
3-1Forecasting CHAPTER 3 Forecasting McGraw-Hill/Irwin Operations Management, Eighth Edition, by William J. Stevenson Copyright © 2005 by The McGraw-Hill.
ISEN 315 Spring 2011 Dr. Gary Gaukler. Forecasting for Stationary Series A stationary time series has the form: D t =  +  t where  is a constant.
1 Decision Making ADMI 6510 Forecasting Models Key Sources: Data Analysis and Decision Making (Albrigth, Winston and Zappe) An Introduction to Management.
Forecast 2 Linear trend Forecast error Seasonal demand.
Review Law of averages, expected value and standard error, normal approximation, surveys and sampling.
3-1Forecasting Weighted Moving Average Formula w t = weight given to time period “t” occurrence (weights must add to one) The formula for the moving average.
Operations Management Demand Forecasting. Session Break Up Conceptual framework Software Demonstration Case Discussion.
Operations Management Contemporary Concepts and Cases
Forecasting Approaches to Forecasting:
Fall, 2017 EMBA 512 Demand Forecasting
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Trend Extrapolation Following we will see a series of data that we want to use to predict the value of Y in the 36th year of this data; if that value.
Regression Models - Introduction
Exponential Smoothing
Time series graphs……. SMOOTHING.
Exponential Smoothing
Presentation transcript:

Kevin Savage Toilet Stats

Measuring usage This talk is about measuing usage What we measure How we use R to predict future usage Specific code examples Things that didn't work for us

Background Mendeley is an academic reference management system Desktop client Automatic metadata extraction Crowdsourced library of documents Website, iOS, Android, APIs... Make it easy to manage and reference academic literature

Background Mendeley bought in 2013 by the publisher Elsevier Up until then we used burn rate Our targets were changed to growth Measured in quite a complicated way

Burn rate As a start-up you have funding from investers and income from customers You have to make a profit before you "burn" all your investment money If you don't, the business fails You measure your "burn rate"

Core Users Core Users = Measured Core Users + Estimated Core Users

Core Users Core Users = Measured Core Users + Estimated Core Users MCU if > X sessions in the past 24 weeks

Core Users Core Users = Measured Core Users + Estimated Core Users MCU if > X sessions in the past 24 weeks ECU if > Y sessions in the second week and they are <24 weeks old

Targets Graph showing model

Targets This seems like a reasonable idea This seems like a reasonable way of measuring it Ideally we would like to know we are increasing core users when we make software changes Can we do some analysis?

Properties of Core Users Long term, hard to see cause and effect We have an issue with delayed event capture Estimated Core Users is not a very good estimate Our usage is very seasonal

Long term

Messages arriving after the event

Classifiers Estimated core users is a prediction of becoming a measured core user How good a prediction is it?

Measuring classifiers Precision: if we predict someone will be a Measured Core User, how likely is it they will be? Recall: if someone becomes a measured core user, how likely is it that we predicted this?

Performance of our classifier Precision ~ 0.5 Recall ~ 0.5

Improving the classifier Logistic regression K-means PCA/segmentation GLA Random forests Decision trees

Seasonality of the data Graph showing model

STL Seasonal Decomposition of Time Series by Loess (Cleveland et al 1990) Decomposes into seasonal, trend and remainer Cleveland shows that it is performant even for long data series

Loess smoothing LOcal regrESSion Generate local d-polynomial fit to nearest q points weighted by distance As q -> ∞ becomes ordinary least squares With d = 0 this is moving average We use d = 1 (locally linear) and q = season length (a year)

STL We want to calculate Y=S+T+R STL works iteratively We use an initial value for T = 0 We only use the ‘inner loop’ and only iterate twice but you can also use an ‘outer loop’ STL then post smooths the seasonal component

For the parameters we use this is... Calculate Y - T Break into cycle-subseries Smooth the above with loess to give C Low pass filter of C to give L Detrend C by calculating C - L to give new S Smooth Y - S to give new T Repeat Smooth S and calculate R as Y – S - T

Code series <- read.table(file="data.csv", header=F, sep=";")$V2 myts <- ts(data=series, start=c(2011, 43), frequency=52) mystl <- stl(myts, s.window="periodic") plot(mystl)

Result of decomposition [Graph of decomp]

Prediction Take the trend Use a linear model to predict future trend Add back in the seasonal Add in some error estimates based on fit There is an R package for this on CRAN

Code library(forecast) mystlf <- stlf(ts, s.window="periodic") plot(mystlf)

Prediction

Looking good...

Looking ok...

Looking...

Oh…

What happened? ?

Prediction vs Reality [Graph of prediction]

Assuming we had the data

Why are we underestimating? Possibilities: The underlying trend is not linear We’re doing better than expected: – iOS app – integration with other products – improvements to existing products

Where are we now? We might already have hit the target Once new messages come in we should exceed the target Very high chance of success (according to our model) It could still go wrong...

Next steps Can we improve our results by assuming a non linear trend? Can we use seasonality to make better predictions? Can we feed core user estimates back into our design process? Does the metric need changing?

Communication Toilet stats [picture of toilet]

Questions ?