Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.

Slides:



Advertisements
Similar presentations
1 SEARCH ENGINE OPTIMIZATION AT Search engine optimization (SEO) is the process of affecting the visibility of a website or a web page in a search engine's.
Advertisements

TAILS: COBWEB 1 [1] Online Digital Learning Environment for Conceptual Clustering This material is based upon work supported by the National Science Foundation.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Information Retrieval in Practice
Search Engines and Information Retrieval
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
Presented by Zeehasham Rasheed
1 I256: Applied Natural Language Processing Marti Hearst Nov 8, 2006.
Automatic Blog Monitoring and Summarization Ka Cheung “Richard” Sia PhD Prospectus.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Search Engines
Introduction to machine learning
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Section 2.1 Compare the Internet and the Web Identify Web browser components Compare Web sites and Web pages Describe types of Web sites Section 2.2 Identify.
1 Data Mining over the Deep Web Tantan Liu, Gagan Agrawal Ohio State University April 12, 2011.
SaariStory: A framework to represent the medieval history of Saarland Michael Barz, Jonas Hempel, Cornelius Leidinger, Mainack Mondal Course supervisor:
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Search Engines and Information Retrieval Chapter 1.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
A Framework for Examning Topical Locality in Object- Oriented Software 2012 IEEE International Conference on Computer Software and Applications p
Division of IT Convergence Engineering Towards Unified Management A Common Approach for Telecommunication and Enterprise Usage Sung-Su Kim, Jae Yoon Chung,
Search Engine Optimization ext 304 media-connection.com The process affecting the visibility of a website across various search engines to.
Popularity-Aware Topic Model for Social Graphs Junghoo “John” Cho UCLA.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Hao Wu Nov Outline Introduction Related Work Experiment Methods Results Conclusions & Next Steps.
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
НИУ ВШЭ – НИЖНИЙ НОВГОРОД EDUARD BABKIN NIKOLAY KARPOV TATIANA BABKINA NATIONAL RESEARCH UNIVERSITY HIGHER SCHOOL OF ECONOMICS A method of ontology-aided.
1 Linmei HU 1, Juanzi LI 1, Zhihui LI 2, Chao SHAO 1, and Zhixing LI 1 1 Knowledge Engineering Group, Dept. of Computer Science and Technology, Tsinghua.
ITGS Databases.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Unconstrained Endpoint Profiling Googling the Internet Ionut Trestian, Supranamaya Ranjan, Alekandar Kuzmanovic, Antonio Nucci Reviewed by Lee Young Soo.
 Goal recap  Implementation  Experimental Results  Conclusion  Questions & Answers.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
2016/2/4Course Introduction1 COMP 4332, RMBI 4330 Advanced Data Mining (Spring 2012) Qiang Yang Hong Kong University of Science and Technology
A Latent Social Approach to YouTube Popularity Prediction Amandianeze Nwana Prof. Salman Avestimehr Prof. Tsuhan Chen.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Nonparametric Method for Early Detection of Trending Topics Zhang Advisor: Prof. Aravind Srinivasan.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li
Enhancing Tor’s Performance using Real- time Traffic Classification By Hugo Bateman.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
Information Overload on the Internet: The Web Mining Techniques Approach UNIVERSITI UTARA MALAYSIA COLLEGE OF ARTS AND SCIENCES RESEARCH METHODOLOGY (SZRZ6014)
Introduction to Machine Learning, its potential usage in network area,
Information Retrieval in Practice
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Mining Data Streams with Periodically changing Distributions Yingying Tao, Tamer Ozsu CIKM’09 Supervisor Dr Koh Speaker Nonhlanhla Shongwe April 26,
Evaluation Anisio Lacerda.
Sentiment analysis algorithms and applications: A survey
QianZhu, Liang Chen and Gagan Agrawal
Yi-Chia Wang LTI 2nd year Master student
Text Classification CS5604 Information Retrieval and Storage – Spring 2016 Virginia Polytechnic Institute and State University Blacksburg, VA Professor:
Data Analyzing Artificial Intelligence (AI)
A Unifying View on Instance Selection
Finding Trends with Visualizations
Introduction to Stream Computing and Reservoir Sampling
A Suite to Compile and Analyze an LSP Corpus
Towards Unified Management
Building Topic/Trend Detection System based on Slow Intelligence
Presentation transcript:

Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local traffic if they apply proactive caching methods, by predicting future popular videos. The main slogan of the media “give people what they want” gives us the assumption mainstream news will always generate articles related to popular topics in society. Given such trend it is interesting to observe the relation between mainstream media and user behavior online. Under the influence of news, users browse videos related to the popular topics. The purpose of our study was to identify popular topics in the news articles, and pre-cache related videos at the strategic nodes to reduce the overall traffic. Methods Topic modeling has been an active research area for the past few years. There are a number of tools available online for classifying and clustering topics from document set. We chose to use latent Dirichlet allocation (LDA) for the purpose of evaluating topic popularity. In order to form topic titles we select only article titles classified as part of a certain topic, and apply frequent pattern mining algorithm (Apriori) to detect frequent-2 / frequent-3 itemset. Network Traffic Enhancement Through Proactive Caching By Mining Mainstream Media Implementation Several topic classification tools can be found on the web, we have downloaded Online LDA tool implemented in Python programming language. The original tool would download random Wikipedia articles and classify the topics. We have modified the original source code to download articles from specified sources. The list of sources is composed from the major news agencies. Each article is parsed to be handed to Online LDA tool. Final output consists of 100 topics with 53 words per topic. Words are sorted based on their appearance in the news articles. Finally topics are sorted by their popularity, and we query videos related to the topics. Implementation task consisted of two parts Identifying popular topics, Evaluating the performance of our system. Subsequent implementation  Identify popular topics in articles  Select document titles from document per topic distribution  Select words from frequent itemset for YouTube query  Sort videos from YouTube result page  Monitor the view-count statistics of selected videos Experimental Results Online LDA alone accurately choses the most popular topic around 57% of the times using 1k articles. With 100k articles it is around 91% accurate. The blue line represents the accuracy using Online LDA combined with frequent pattern mining. With 1k articles the accuracy is around 92%. Using 100k articles the accuracy is close to 100%. When using only Online LDA there is only around a 60% chance the selected video will be relevant to the actual topic when using 10k articles. When using 100k articles the probability rises to about 87%. When using frequent pattern mining and Online LDA there is around a 94% chance the video selected is relevant using 10k articles. With 100k the probability is 100%. (X axis) # of feeds VS (Y axis) Video relevance to the topic Framework to select query keywords from popular topics (X axis) # of feeds VS (Y axis) Accuracy of selecting video with most traffic Conclusion The objective of our project is to predict network traffic by mining news articles. The main slogan of the media “give people what they want” gives us assumption articles will always reflect the most popular topics in the society at a given time. From these results we conclude that using LDA combined with frequent pattern mining will predict which videos will generate most traffic. Experimental results show our proactive caching method can achieve much better performance in terms of reducing the delay compared to other conventional methods.