Download presentation
Presentation is loading. Please wait.
1
Commentary-based Video Categorization and Concept Discovery By Janice Leung
2
Agenda Introduction to Video Sharing Sites Current Problem Previous Works Commentary-based Video Clustering Conclusion Future Works
3
Video Sharing Sites Allows users to upload videos Shares videos worldwide Example: Dailymotion YouTube MySpace
4
De Facto YouTube More than 65,000 new videos every day 100 million videos views daily 20 million unique visitors per month
5
Immense amount of videos Incredible growth of videos How to search for desired video? YouTube: Tags + simple Categorization
6
YouTube Predefined categories Videos Title Description Tags Category Comments Provided by the one who uploads the video Provided by many users
9
Related Works Classify videos: Video features: color, grayscale histogram, pixel information Keywords from description Tags Find user interests: Object fetching information Tags
10
Problems Video features Cannot tell exactly what the video is about No users interest is considered Keywords from description Description provided by the one who uploaded the video Not sufficient information
11
Problems (Cont.) Tags Not sufficient information May reflect users feelings on videos but too brief to represent the complex idea of the videos Object fetching information Reflects users interests but no information about the videos at all
12
Video Categorization and Concept Discovery Site: YouTube Videos: involving Hong Kong singers
13
Comment vs Tag Comments Given by many users Can be large amount Express users opinions Rich words describe fine-grained level ideas Tags Given by only one person (the one who uploaded the video) Few tags Describe the video in a very brief way Singer name Song name
15
Comments Include: Video content Music styles Music ages Singer description Appearance Style News etc.
16
Commentary-based Video Categorization Objective: Categories videos based on user interests and discover the concept of videos Cluster videos by using comments Group videos based on user interests Find video concepts Clustering algorithm: multi-assignment NMF
17
Video clustering Bi-clustering: videos and words Clusters videos and words into k groups by matrix factorization Video-word matrix X as input Video-word matrix X is derived by tf-idf
18
Tf-idf Term frequency (tf) Suppose there are t distinct terms in document j where f i,j is the number of occurrence of term i in document j
19
Tf-idf (Cont.) Inverse document frequency where N is the total number of documents in dataset and n i is number of documents containing term i
20
Tf-idf (Cont.) Importance weight of term i to document j Matrix X as input to NMF is defined as
21
Video Clustering (Cont.) Decompose X into non-negative matrices W and H by minimizing where Ref. : Document Clustering Based On Non-negative Matrix Factorization (Xu et al SIGIR’03)
22
Video Clustering (Cont.) NMF decomposition for video clustering
23
Video Clustering (Cont.) Suppose Number of videos: N Number of distinct terms: M Threshold: β W in size M x K w n,k : coefficient indicates how video n belongs to cluster k
24
Video-cluster assignment Videos can belongs to multiple groups Multi-cluster assignment Video n belongs to cluster k if Set of clusters that video n belongs to: where K is set if all clusters
25
Video-cluster assignment (Cont.) Threshold, β Many irrelevant videos for each cluster Coefficient distribution varies for different clusters Coefficient distribution dependant Different for different clusters
26
Concept Discovery Matrix H in size of K x M h k,m : how likely term m belongs to cluster k Term belongs to a cluster describes the videos in that cluster Concept words of cluster k videos Top 10 words of cluster k
27
Experiment 19305 videos 102 Hong Kong singers 7271 users Number of cluster, k: 20
28
Experiment (Cont.) Threshold, β Coefficient distribution dependant Threshold for cluster i is defined as
29
Experiment (Cont.) Video coefficients may distribute in an extremely uneven manner Cause poor result To compensate, threshold can be set as
30
Experiment (Cont.) 0.700.300.00 0.310.640.05 0.130.220.65 0.010.640.35 0.120.230.65 0.700.300.00 0.310.640.05 0.130.220.65 0.010.640.35 0.120.230.65 Mean Coef. 0.330.4060.254 Mean + SD Coef. 0.6310.6220.526 C1 C2 C3 V1 V2 V4 V3 V5
31
Experiment (Cont.)
34
Concept Words vs Tags
35
Percentage of videos with tags covering concept words across groups
36
Singer Relationship Discovery Comments on videos may talk about singers Singer styles, appearance, news Singer clustering using comments Reveals relationships between singers Discovers hidden phenomenon
37
Singer Relationship Discovery (Cont.)
38
Conclusion Captures user interests more accurately and fairly than that of the human predefined categories Categories can be changed dynamically, user interest changes from time to time Obtain clusters with fine-grained level ideas Ease the task of video search by categorizing videos and refining index
39
Future Works Extend to user clustering Obtain relationships videos, singers and users of the entire social network Study the social culture Ease the job of advertising to target customers Connect people who share the same interests
40
Q & A Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.