Presentation is loading. Please wait.

Presentation is loading. Please wait.

Elizeu Santos-Neto, Tatiana Pontes, Jussara Almeida, Matei Ripeanu

Similar presentations


Presentation on theme: "Elizeu Santos-Neto, Tatiana Pontes, Jussara Almeida, Matei Ripeanu"— Presentation transcript:

1 Elizeu Santos-Neto, Tatiana Pontes, Jussara Almeida, Matei Ripeanu
On the Choice of Data Sources to Improve Content Discoverability via Textual Feature Optimization Elizeu Santos-Neto, Tatiana Pontes, Jussara Almeida, Matei Ripeanu University of British Columbia - Vancouver, Canada Universidade Federal de Minas Gerais - Belo Horizonte, Brazil Santiago, Chile - September 4th, 2014

2 Introduction 100 hours of video per minute ads ads specialized content management companies are responsible for: publishing monitoring promoting the owner's content revenue shared between the manager and the video's owner As revenue is directly related to the number of ad prints, this motivates managers to boost content popularity. Start showing that: 1. the amount of video produced today is enormous 2. there are companies responsible to manage these content in order to make money through ads associated with the videos 3. since revenue is related to video popularity, this is a *relevant concern*

3 Introduction Title Viewers reach content items via different leads:
Title video repository keyword/tag-based search URL website promotion campaign Description Tag Tag Tag Tag comment Here, you need to emphasize that a large portion of viewers reach the content via keyword/tag-based search.

4 Introduction Title Description
Tag comment Textual features have a major impact on the view count of a video, and consequently on the advertisement-generated revenues. This work focuses on the automated tag selection to boost video popularity. Here, we show how textual features are related to view count and, consequently, to the ad-generated revenues.

5 Context Assumption: annotating a video with the terms users would use to search for it increases the chance that users view the video. Textual sources related to the video and whose content can be automatically retrieved can be used as inputs for recommenders to suggest tags. - Recommendation Pipeline -

6 Context In this slide you should present and explain each data source, then you mention the filters used reaching the 'candidate keywords' box. The next slides will continue the pipeline. Experts Peers

7 Our Goal Q1. Is there room for improvement in the current YouTube video tags? [1] Santos-Neto, E., Pontes, T., Almeida, J., and Ripeanu, M. Towards Boosting Video Popularity via Tag Selection. Workshop on Social Multimedia and Storytelling, (2014) Q2. How do peer- and expert-produced input data sources compare regarding the quality of the recommended tags? Q3. Is the quality of the recommendation affected by the number of contributing peers? Here, we point out our *general focus* and research questions. You should explain that the 1st research question was answered in our last paper and motivated our current study.

8 Building the Ground Truth
Amazon Mechanical Turk (AMT) Our task - Watch a movie trailer video and answer the following question: What query terms would you use to search for this video? For each video, associate a minimum of 3 and a maximum of 10 keywords. Talk about the ideal ground truth and the challenges to build it. Then, present our alternative solution (remember to talk about the pilot survey, if you think it is necessary).

9 Quality Control - each evaluation was inspected before approval.
AMT Task Properties Property Value Number of Videos 382 Number of Turkers 33 Number of Evaluations 1,146 Payment per Task $0.30 Payment per Hour $3.00 Total Cost $345.00 I think it's interesting to mention that we take about 24 hours to get all the evaluations we need with the amount of money we made available in AMT. Also, emphasize the *quality control* as a strategy to avoid spam. You can point out that we rejected just one evaluation and say why. Quality Control - each evaluation was inspected before approval.

10 Experimental Evaluation
By improvement we mean extending/modifying the tags of YouTube videos to better match the ground truth. - Success Metric: F3-measure Performance of the original YouTube tagset Performance of the recommended tagset produced by different data sources X I’m not sure if this slide is good. Here, we need to say what we want to evaluate/compare.

11 Q1. Is there room for improvement in the current YouTube video tags?
Experimental Results Q1. Is there room for improvement in the current YouTube video tags?

12 Q1. Is there room for improvement in the current YouTube video tags?
Experimental Results Q1. Is there room for improvement in the current YouTube video tags? Rotten Tomatoes - best source of candidate keywords followed by Movie Lens. Rotten Tomatoes incorporates a schema for the information provided (named entities). The difference in recommender’s performance for each pair of data source is statistically significant.

13 Experimental Results Q2. How do peer- and expert-produced input data sources compare regarding the quality of the recommended tags?

14 currently assigned to YouTube videos
Experimental Results Q2. How do peer- and expert-produced input data sources compare regarding the quality of the recommended tags? Peer-produced data source is significantly better than using the expert-produced data. For frequency, the performance of peer-produced source is comparable to that of Rotten Tomatoes. Peer-produced source provides improvement relative to the tags currently assigned to YouTube videos (statistically significant for frequency).

15 Experimental Results Q3. Is the quality of the recommendation affected by the number of contributing peers? How many peers is an expert worth? Spearman’s rank correlation between the number of users and F3-measure for Movie Lens: 0.31 (mild positive correlation). The number of contributors partially explains the value added by Movie Lens to the recommenders’ performance. One potential reason is that the choices of keywords to tag a movie or search for it may be driven by different purposes. E.g.: A user may use the keyword 'boring' to tag a movie as an indication about his opinion about the video. However, this tag may not be considered a good choice for another user who want to search for the same video.

16 Conclusions Rotten Tomatoes has a schema of named entities that makes the keywords obtained from it significantly more promising than the ones already assigned to YouTube videos; Peer-produced data sources provide better candidate keywords for our automated tag recommendation than expert-produced sources; The number of contributing peers partially explains why peer-produced data sources outperform expert ones.

17 Elizeu Santos-Neto, Tatiana Pontes, Jussara Almeida, Matei Ripeanu
On the Choice of Data Sources to Improve Content Discoverability via Textual Feature Optimization Elizeu Santos-Neto, Tatiana Pontes, Jussara Almeida, Matei Ripeanu University of British Columbia - Vancouver, Canada Universidade Federal de Minas Gerais - Belo Horizonte, Brazil Santiago, Chile - September 4th, 2014

18 Thank you!

19 Context Frequency - based on tag frequency patterns. Score candidate keywords based on how often they appear in the data provided by a data source. Random Walk - based on tag co-occurrence patterns. Builds a graph where each keyword is a node and an edge connects two keywords if they co-occur. The final node scores are used to rank the candidate keywords. * Our goal is not to design a novel and more efficient tag recommendation algorithm. Highlight that the algorithms are here used as a backdrop for our original problem. We do NOT aim to propose a new and revolutionary recommendation algorithm.

20 Context Analyse how automated tag selection optimize the tags associated to YouTube videos to attract search traffic. Evaluate how the choice of the data source used as input for a recommendation pipeline impacts the quality of the recommended tagset. This slide summarizes our general goal with the recommendation pipeline. So, after explaining the main components, you should summarize *our goal* to make the audience aware about what specifically we want to do. The next slide contains our research questions.


Download ppt "Elizeu Santos-Neto, Tatiana Pontes, Jussara Almeida, Matei Ripeanu"

Similar presentations


Ads by Google