Download presentation
Presentation is loading. Please wait.
1
Video Summarization Using Mutual Reinforcement Principle and Shot Arrangement Patterns Lu Shi Oct. 4, 2004
2
Outline Background Background Video semantics and annotation Video semantics and annotation Mutual reinforcement Mutual reinforcement Shot arrangement analysis Shot arrangement analysis Video skim selection Video skim selection Preliminary experiments Preliminary experiments
3
Background Why video summarization Why video summarization Help the user to quickly grasp the content of a video Help the user to quickly grasp the content of a video Video summary target: Video summary target: Conciseness Conciseness Content coverage Content coverage Coherency Coherency Type Type Static and dynamic Static and dynamic
4
Background Two kinds of video summarization Two kinds of video summarization Unconstrained Unconstrained Generate a preview, only try to cover all the content of the video, only constrained by the time limit L Generate a preview, only try to cover all the content of the video, only constrained by the time limit L Can be helped by mutual reinforcement result Can be helped by mutual reinforcement result Constrained Constrained User may have some preference on some specific content, like specific time range, with specific person, etc. User may have some preference on some specific content, like specific time range, with specific person, etc.
5
Background 4 level hierarchical video structure 4 level hierarchical video structure
6
System overview
7
Video semantics Low level features and high level concepts: semantic gap Low level features and high level concepts: semantic gap Summary based on low level features is not able to ensure the perceived quality Summary based on low level features is not able to ensure the perceived quality Solution: obtain video semantic information by manual/semi-automatic annotation Solution: obtain video semantic information by manual/semi-automatic annotation Usage: Usage: Retrieval Retrieval Summary Summary
8
Video semantics Semantic content template for a video shot Semantic content template for a video shot Who Who Where Where What action What action What other What other When When Dialog script Dialog script Concept term and video shot description (user editable) Concept term and video shot description (user editable)
9
Video semantics Concept term and video shot description Concept term and video shot description Term: denote an entity, e.g. “ Joe ”, “ talking ”, “ in the bank ” Term: denote an entity, e.g. “ Joe ”, “ talking ”, “ in the bank ” Context: “ who ”, “ what action ”… Context: “ who ”, “ what action ”… Shot description: the set comprising all the concept terms that is related to the shot Shot description: the set comprising all the concept terms that is related to the shot Obtained by semi-automatic or video annotation Obtained by semi-automatic or video annotation
10
Video shot annotation Annotation interface Annotation interface
11
Video Edit Process Shoot a set of video shot groups with similar semantic content (takes) Shoot a set of video shot groups with similar semantic content (takes) Select video shots from the takes then arrange the video shots from different video shot groups to depict the story scene Select video shots from the takes then arrange the video shots from different video shot groups to depict the story scene
12
Video summarization Recover the semantic video shot groups Recover the semantic video shot groups Video summarization can be viewed as an “ inversion ” of video editing, then select the important parts Video summarization can be viewed as an “ inversion ” of video editing, then select the important parts
13
Mutual Reinforcement Given the annotated video shots Given the annotated video shots How to measure the priority for a set of concept terms and a set of descriptions? Who is the most important person? Which shot is the most important one? How to measure the priority for a set of concept terms and a set of descriptions? Who is the most important person? Which shot is the most important one? A more important description contains more important terms; A more important description contains more important terms; A more important term should be contained by more important descriptions A more important term should be contained by more important descriptions Mutual reinforcement principle [1] Mutual reinforcement principle [1]
14
Mutual Reinforcement Let W be the weight matrix describes the relationship between some terms and some shot descriptions (W can have various definitions, e.g. the number of occurrence of a term in a description) Let W be the weight matrix describes the relationship between some terms and some shot descriptions (W can have various definitions, e.g. the number of occurrence of a term in a description) Let U,V be the vector of the importance value of the video shot description set and concept term set Let U,V be the vector of the importance value of the video shot description set and concept term set We have We have U and V can be calculated by SVD of W U and V can be calculated by SVD of W
15
Mutual Reinforcement For each semantic context: For each semantic context: We choose the singular vectors correspond to W ’ s largest singular value as the importance vector for concept terms and sentences We choose the singular vectors correspond to W ’ s largest singular value as the importance vector for concept terms and sentences Since W is non-negative, the first singular vector will be non-negative Since W is non-negative, the first singular vector will be non-negative The importance score vector can be used to group semantic similar video shots The importance score vector can be used to group semantic similar video shots
16
Experiments Priority calculation on one video scene Priority calculation on one video scene Based on context “ who ” Based on context “ who ”
17
Experiments Shot groups Shot groups Joe Joe and Terry Terry Background people
18
Experiments Priority calculation Priority calculation Based on context “ what action ” Based on context “ what action ”
19
Experiments Shot groups Shot groups fight Quarrel Background
20
Shot arrangement patterns The way the director arrange the video shots conveys his intention The way the director arrange the video shots conveys his intention Minimal content redundancy and visual coherence Minimal content redundancy and visual coherence Semantic video shot group label form a string Semantic video shot group label form a string K-Non-Repetitive Strings (k-nrs) K-Non-Repetitive Strings (k-nrs) String coverage String coverage {3124} covers {312,124,31,12,24,3,1,2,4} {3124} covers {312,124,31,12,24,3,1,2,4}
21
Shot arrangement patterns Several detected nrs strings Several detected nrs strings
22
Video skim selection do do Select the most important k-nrs string into the skim shot set Select the most important k-nrs string into the skim shot set Remove those nrs strings covered by the selected string Remove those nrs strings covered by the selected string Until the target skim length is reached Until the target skim length is reached
23
Experiments We conduct the subjective test We conduct the subjective test Compared with the previous graph based algorithm Compared with the previous graph based algorithm Achieve better coherency Achieve better coherency
24
Future work More efficient way to annotate video shots More efficient way to annotate video shots Augment the semantic template Augment the semantic template Personalized video summary Personalized video summary
25
Q & A Thank you!! Thank you!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.