Wikipedia Traffic Forecasting

Slides:



Advertisements
Similar presentations
DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Exam 1 review: Quizzes 1-6.
Advertisements

Exercise 7.5 (p. 343) Consider the hotel occupancy data in Table 6.4 of Chapter 6 (p. 297)
Experiments and Variables
4-1 Operations Management Forecasting Chapter 4 - Part 2.
Introduction to Data Mining with XLMiner
Forecasting Purpose is to forecast, not to explain the historical pattern Models for forecasting may not make sense as a description for ”physical” behaviour.
T T18-03 Exponential Smoothing Forecast Purpose Allows the analyst to create and analyze the "Exponential Smoothing Average" forecast. The MAD.
Assignment week 38 Exponential smoothing of monthly observations of the General Index of the Stockholm Stock Exchange. A. Graphical illustration of data.
Analyzing and Forecasting Time Series Data
Chapter 12 - Forecasting Forecasting is important in the business decision-making process in which a current choice or decision has future implications:
Chapter 5 Time Series Analysis
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
Clementine Server Clementine Server A data mining software for business solution.
Forecasting & Time Series Minggu 6. Learning Objectives Understand the three categories of forecasting techniques available. Become aware of the four.
Quantitative Business Forecasting Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Statistical Forecasting Models
T T18-06 Seasonal Relatives Purpose Allows the analyst to create and analyze the "Seasonal Relatives" for a time series. A graphical display of.
Chapter 12 Section 1 Inference for Linear Regression.
Business Forecasting Chapter 5 Forecasting with Smoothing Techniques.
Slides 13b: Time-Series Models; Measuring Forecast Error
Applied Business Forecasting and Planning
Writing Lab Report 3 Understanding the Mechanisms That Control the Rates of Enzymatic Reactions.
The Forecast Process Dr. Mohammed Alahmed
Copyright © 2012 Pearson Education, Inc. All rights reserved. Chapter 10 Introduction to Time Series Modeling and Forecasting.
Effect Sizes for Meta-analysis of Single-Subject Designs S. Natasha Beretvas University of Texas at Austin.
Analyzing and Interpreting Quantitative Data
Holt’s exponential smoothing
DSc 3120 Generalized Modeling Techniques with Applications Part II. Forecasting.
© 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual.
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
Week 11 Introduction A time series is an ordered sequence of observations. The ordering of the observations is usually through time, but may also be taken.
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 5: Exponential Smoothing (Ch. 8) Material.
Time series Model assessment. Tourist arrivals to NZ Period is quarterly.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Introducing ITSM 2000 By: Amir Heshmaty Far. S EVERAL FUNCTION IN ITSM to analyze and display the properties of time series data to compute and display.
Wobbles, humps and sudden jumps1 Transitions in time: what to look for and how to describe them …
FREQUANCY DISTRIBUTION 8, 24, 18, 5, 6, 12, 4, 3, 3, 2, 3, 23, 9, 18, 16, 1, 2, 3, 5, 11, 13, 15, 9, 11, 11, 7, 10, 6, 5, 16, 20, 4, 3, 3, 3, 10, 3, 2,
4-1 Operations Management Forecasting Chapter 4 - Part 2.
Big Data at Home Depot KSU – Big Data Survey Course Steve Einbender Advanced Analytics Architect.
MANAGEMENT SCIENCE AN INTRODUCTION TO
Chapter 6: Analyzing and Interpreting Quantitative Data
The Box-Jenkins (ARIMA) Methodology
QM Spring 2002 Business Statistics Analysis of Time Series Data: an Introduction.
1 1 Chapter 6 Forecasting n Quantitative Approaches to Forecasting n The Components of a Time Series n Measures of Forecast Accuracy n Using Smoothing.
Forecasting is the art and science of predicting future events.
13 – 1 Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall. Forecasting 13 For Operations Management, 9e by Krajewski/Ritzman/Malhotra.
1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter : Zhao-Wei Luo Che-Jung Chang,Der-Chiang.
Forecast 2 Linear trend Forecast error Seasonal demand.
Chapter 20 Time Series Analysis and Forecasting. Introduction Any variable that is measured over time in sequential order is called a time series. We.
Tables & Graphing Laboratory Skills. Basic Tables Tables, or charts, are used to organize information Tables, or charts, are used to organize information.
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
Time Series Forecasting Trends and Seasons and Time Series Models PBS Chapters 13.1 and 13.2 © 2009 W.H. Freeman and Company.
Chapter 8 Part I Answers The explanatory variable (x) is initial drop, measured in feet, and the response variable (y) is duration, measured in seconds.
Lecture 9 Forecasting. Introduction to Forecasting * * * * * * * * o o o o o o o o Model 1Model 2 Which model performs better? There are many forecasting.
Analysis of Time Series
Energy Consumption Forecast Using JMP® Pro 11 Time Series Analysis
Collage Score Card & Software defect prediction
A Smart Tool to Predict Salary Trends of H1-B Holders
Just the basics: Learning about the essential steps to do some simple things in SPSS Larkin Lamarche.
Chapter 4: Seasonal Series: Forecasting and Decomposition
Analyzing and Interpreting Quantitative Data
Income Forecasting.
LINDSEY BREWER CSSCR (CENTER FOR SOCIAL SCIENCE COMPUTATION AND RESEARCH) UNIVERSITY OF WASHINGTON September 17, 2009 Introduction to SPSS (Version 16)
Investigating Relationships
Lecture 12: Data Wrangling
CHAPTER 12 More About Regression
Chapter 14 Inference for Regression
Statistics Time Series
Forecasting - Introduction
Exponential Smoothing
Presentation transcript:

Wikipedia Traffic Forecasting Compiled by- Divya Lingwal Karishma Patil Shivani Mathur Monalisa Singh

Introduction It is important to predict the websites future traffic volume, in order to provide a quality of service to the users. We are implementing the process of Data analysis for Wikipedia Traffic using the time-series approach. Predicting the future behaviour of time series for Wikipedia articles.

Problem Statement Wikipedia has large number of potential users viewing approximately 145,000 Wikipedia articles. It is essential to deal with the problem of overload, for that we need to predict future web traffic for thousands of Wikipedia articles.

Project Description AIM: To help predict future views of Wikipedia articles Using Time series analysis which encapsulates problems like analysis, inference, classification and forecast. We are using R programming language. Used both Time series analysis and forecasting techniques. Analyzed all 4 components of Time series. Datasets used: User view data, each column is a date and each row is an article Mapping between article names and a unique ID column

Brief Introduction of Wikimedia and Mediawiki

Procedure Data Cleaning Data transformation Time Series extraction Parameter extraction Forecasted Methods

Data Cleaning Finding the missing values-8% missing values Few missing values. These seem to be important so we did not remove them

Data Transformation Split data into 2 parts: ‘Page’ and ‘Dates’

Data Transformation cont’d Divided the article info from *wikipedia*, *wikimedia*, and *mediawiki*

Data Transformation cont’d Tokenized the pages into different columns Access: Desktop, Mobile-web. All-access Agent: All-agents, Spider Locale: wikmed, medwik, codes(zh,en,ja,es,fr) for wikipedia

Experimental Evaluations Un-Smoothened Graph Need to smooth it to check for better patterns and trends

Smoothing

Normalized Time series

Scale changed to visualize better

Curve according to specifications

Data Overview Comparison of articles in 7 languages: German, Chinese, English, Spanish, French, Japanese, Russian. Higher frequency for english articles Slightly higher number of mobile viewers than Desktop viewers

Calculating Time Series parameters To analyze the time series data we need to find the following parameters: Mean Standard Deviation Amplitude Slope Linear model for slope: Views~ pages+dates

Individual observations with extreme parameters Finding out the values with highest views

Time-Series Visualization of Top 4 Articles

Plotting time series for shorter duration Plotting time series for 2-month duration

Forecasting Approach In our project we forecast 2 months data and measure the prediction accuracy by keeping a holdout sample of 60 days from our forecast data. We have done this by using autoregressive integrated moving average(ARIMA) model. This consists of three parts( parameterized by indices p, d, q ) as ARIMA(p, d, q): Auto-regressive/p: p indicates the range of lags. Integrated/d: d is a differencing parameter, which gives us the number of times we are subtracting the current and the previous values of a time series. Moving average/q: q gives us the number of previous error terms to include in the regression error of the model.

Forecasts

Future Work Seasonality component Holt- Winters method

THANK YOU