Download presentation
Presentation is loading. Please wait.
Published byWarren Ray Modified over 9 years ago
1
Kevin Savage Toilet Stats
2
Measuring usage This talk is about measuing usage What we measure How we use R to predict future usage Specific code examples Things that didn't work for us
3
Background Mendeley is an academic reference management system Desktop client Automatic metadata extraction Crowdsourced library of documents Website, iOS, Android, APIs... Make it easy to manage and reference academic literature
4
Background Mendeley bought in 2013 by the publisher Elsevier Up until then we used burn rate Our targets were changed to growth Measured in quite a complicated way
5
Burn rate As a start-up you have funding from investers and income from customers You have to make a profit before you "burn" all your investment money If you don't, the business fails You measure your "burn rate"
6
Core Users Core Users = Measured Core Users + Estimated Core Users
7
Core Users Core Users = Measured Core Users + Estimated Core Users MCU if > X sessions in the past 24 weeks
8
Core Users Core Users = Measured Core Users + Estimated Core Users MCU if > X sessions in the past 24 weeks ECU if > Y sessions in the second week and they are <24 weeks old
9
Targets Graph showing model
10
Targets This seems like a reasonable idea This seems like a reasonable way of measuring it Ideally we would like to know we are increasing core users when we make software changes Can we do some analysis?
11
Properties of Core Users Long term, hard to see cause and effect We have an issue with delayed event capture Estimated Core Users is not a very good estimate Our usage is very seasonal
12
Long term
13
Messages arriving after the event
14
Classifiers Estimated core users is a prediction of becoming a measured core user How good a prediction is it?
15
Measuring classifiers Precision: if we predict someone will be a Measured Core User, how likely is it they will be? Recall: if someone becomes a measured core user, how likely is it that we predicted this?
16
Performance of our classifier Precision ~ 0.5 Recall ~ 0.5
17
Improving the classifier Logistic regression K-means PCA/segmentation GLA Random forests Decision trees
18
Seasonality of the data Graph showing model
19
STL Seasonal Decomposition of Time Series by Loess (Cleveland et al 1990) Decomposes into seasonal, trend and remainer Cleveland shows that it is performant even for long data series
20
Loess smoothing LOcal regrESSion Generate local d-polynomial fit to nearest q points weighted by distance As q -> ∞ becomes ordinary least squares With d = 0 this is moving average We use d = 1 (locally linear) and q = season length (a year)
21
STL We want to calculate Y=S+T+R STL works iteratively We use an initial value for T = 0 We only use the ‘inner loop’ and only iterate twice but you can also use an ‘outer loop’ STL then post smooths the seasonal component
22
For the parameters we use this is... Calculate Y - T Break into cycle-subseries Smooth the above with loess to give C Low pass filter of C to give L Detrend C by calculating C - L to give new S Smooth Y - S to give new T Repeat Smooth S and calculate R as Y – S - T
23
Code series <- read.table(file="data.csv", header=F, sep=";")$V2 myts <- ts(data=series, start=c(2011, 43), frequency=52) mystl <- stl(myts, s.window="periodic") plot(mystl)
24
Result of decomposition [Graph of decomp]
25
Prediction Take the trend Use a linear model to predict future trend Add back in the seasonal Add in some error estimates based on fit There is an R package for this on CRAN
26
Code library(forecast) mystlf <- stlf(ts, s.window="periodic") plot(mystlf)
27
Prediction
28
Looking good...
29
Looking ok...
30
Looking...
31
Oh…
32
What happened? ?
33
Prediction vs Reality [Graph of prediction]
34
Assuming we had the data
35
Why are we underestimating? Possibilities: The underlying trend is not linear We’re doing better than expected: – iOS app – integration with other products – improvements to existing products
36
Where are we now? We might already have hit the target Once new messages come in we should exceed the target Very high chance of success (according to our model) It could still go wrong...
37
Next steps Can we improve our results by assuming a non linear trend? Can we use seasonality to make better predictions? Can we feed core user estimates back into our design process? Does the metric need changing?
38
Communication Toilet stats [picture of toilet]
39
Questions ?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.