Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.

Similar presentations


Presentation on theme: "Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C."— Presentation transcript:

1 Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C omputing R esearch G roup C ommunication N etworks L aboratory Department of Informatics and Telecommunications University of Athens – Greece KAMC ‘07 @ Genoa, Italy polina@di.uoa.grb.tsetsos@di.uoa.grshadj@di.uoa.grpolina@di.uoa.grb.tsetsos@di.uoa.grshadj@di.uoa.grpolina@di.uoa.grb.tsetsos@di.uoa.gr

2 Outline The Polysema Platform Introduction - Motivation Video Categorization Method Experimental Evaluation Conclusions - Future Work

3 Polysema platform Develops an end-to-end platform for iTV services Semantics-related research focuses on the development of: semantics extraction techniques for automatic annotation of audiovisual content, a personalization framework for iTV services with SW technologies, a tool with GUI for video annotation and MPEG-7 metadata creation http://polysema.di.uoa.gr

4 Introduction - Motivation Multimedia databases are becoming popular Most video classification methods are based on visual/audio signal processing Text processing is more lightweight than visual/audio processing High-level semantics are more closely related to human language than to visual features Subtitles capture the semantics of the corresponding video

5 Step 1: Text Preprocessing Subtitles are segmented into sentences A Part of Speech Tagger is applied to each sentence Stop words (e.g., “to”, “him”) are removed based on a stop words list

6 Step 2: Keyword extraction We used the TextRank algorithm to extract keywords TextRank represents the text as a graph, applies to the vertices a ranking algorithm based on Google’s PageRank, sorts vertices in decreasing rank order, extracts the top highly ranked vertices for further processing Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts, in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain, July 2004 TextRank:

7 Step 3: Word Sense Disambiguation Words have many possible meanings, called senses A Word Sense Disambiguation (WSD) algorithm is applied to determine the correct sense of each word WSD is based on the lexical database WordNet, is a variation of Lesk’s WSD algorithm Banerjee, S., Pedersen, T.: An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In the Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics (CICLING-02) Mexico City, Mexico (2002) WSD:

8 Step 4: WordNet Domains Extraction (1/2) augment WordNet with domain labels a taxonomy of ~200 domain labels synsets have been annotated with at least one domain label WordNet domains http://wndomains.itc.it/wordnetdomains.html WN domains:

9 Step 4: WordNet Domains Extraction (2/2) For each video: Extract the WordNet domains for each keyword’s sense Calculate the frequency occurrence of each domain label Sort domain labels in decreasing order according to their occurrence frequency

10 Step 5: Correspondences between categories & WN domains For each category label: Look up in WordNet the senses related to it (include senses related through hypernym & hyponym relations) Obtain the corresponding WordNet domains Calculate the occurrence score for each domain Sort domains in decreasing occurrence order CategoryWordNet domains animalsanimals, biology, entomology warmilitary, history sciencemedicine, biology, mathematics Example:

11 Step 6: Category label assignment Compare the ordered list with the WN domains of each video with the ordered list of the WN domains of each category CategoryWordNet domains animalsanimals, biology, entomology warmilitary, history sciencemedicine, biology, mathematics Example: animals, entomology, biology WN domains of a video animals biology, mathematics, physics science

12 Experimental Evaluation (1/2) 36 subtitle files of documentaries Statistical information of files (average values): duration (min:sec) # of words # of non stop words # of keywords # of WN domains 41:484442200035053 Classify under the categories: geography, animals, history, war, technology, science, accidents, music, transportation, people, religious, politics, arts 36 subtitle files of documentaries Classify under the categories: geography, animals, history, war, technology, science, accidents, music, transportation, people, religious, politics, arts

13 Experimental Evaluation (2/2) Classifiers: Proposed method Proposed method in which Step 6 has been replaced with Spearman’s footrule distance J4.8 decision tree classifier supervised approach

14 Conclusions – Future Work Conclusions A novel approach that is based only on text and uses natural language processing techniques No training phase is required (unsupervised approach) Future Work The application of a method on a per video segment basis Definition of domain knowledge more close to movie classification Performance comparison with other unsupervised approaches

15 Thank you! Questions??? http://p-comp.di.uoa.gr


Download ppt "Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C."

Similar presentations


Ads by Google