Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Popular Technology Blog.  Has articles on:  Tech Industry trends  Start ups  New technologies  Expert Opinions 2.

Similar presentations


Presentation on theme: " Popular Technology Blog.  Has articles on:  Tech Industry trends  Start ups  New technologies  Expert Opinions 2."— Presentation transcript:

1

2  Popular Technology Blog.  Has articles on:  Tech Industry trends  Start ups  New technologies  Expert Opinions 2

3  Client wishes to  enhance traffic.  increase readership.  identify “core” visitors/visitor groups. 3

4  Classification and Regression Trees (CART)  Provides better performance as compared to linear models.  Easy to implement.  Easy to interpret. 4

5  R  Version 3.0.1  http://www.r-project.org/ http://www.r-project.org/  free download  Google Analytics  www.google.com/analytics www.google.com/analytics  For details of names and descriptions of data fields: http://tinyurl.com/dyzst66 5

6  Data obtained from Google Analytics.  Contains the following fields:  visitorID  daySinceLastVisit  visitCount  pageDepth  The aim of the study is to “predict if the visitor will visit again”. 6

7  Dataset creation:  An API created to pull in data from Google Analytics into R, using OAuth 2.0.  A random sample of 400 observations chosen.  The above sample is split into a “training” dataset and a “test” dataset on a 60:40 basis ( arbitrarily chosen; 65:35 or 70:30 is fine too).  VisitorID field “derived” from other fields. Visitor information has been kept anonymous. 7

8  Prediction accuracy of 95 - 98% obtained.  Client feedback:  Can additional variables be added to this model? Can page details (i.e. landing page, exit page, etc.) be included?  (Yes, landing page and exit page can be included within the model. Any other relevant qualitative (i.e. non-numeric) field can also be included.) 8

9  Client feedback:  Can this analysis be run by a person that does not know the R language?  (Yes. An R script can be created, and scheduled to run at required intervals. Results can be stored at desired locations on the network.) 9

10  Client feedback  Is this model applicable in a situation wherein the values of the variables are the outside the range of values seen in the current data set? For example, if the visitCount values are greater than 20?  (Yes, the advantage with using the CART model is that it adapts itself to the data it is applied to. So if the values change drastically, the “tree structure” may change, but the model is still valid, and results are reliable. Other analysis techniques may be applicable, but the use of CART is still justified.) 10

11  Ideas for further analysis:  Like the client suggested, Landing page and Exit Page data can be included in the analysis.  Google Analytics makes a large number of data fields available. The analysis could be expanded to include some of these fields.  Visitors could be divided into subgroups based upon visit frequency, content viewed, etc., and separate models can be built for each subgroup. This could help make more accurate predictions. 11

12  Spread over a period of 5 months.  Data collected at different points of time during this period to get as random a dataset as possible.  80% of the time for this project was spent on data gathering, cleaning and authentication. This is true for most predictive analysis projects. 12

13  The exact dollar amounts are confidential.  The project needed about 80 hours of analyst time, and about 13 hours of manager(on the client side) time. Software license costs were zero, since R is a free download. 13


Download ppt " Popular Technology Blog.  Has articles on:  Tech Industry trends  Start ups  New technologies  Expert Opinions 2."

Similar presentations


Ads by Google