Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.

Slides:



Advertisements
Similar presentations
A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Polarity Analysis of Texts using Discourse Structure CIKM 2011 Bas Heerschop Erasmus University Rotterdam Frank Goossen Erasmus.
Improved TF-IDF Ranker
Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Semantic Access to Data from the Web Raquel Trillo *, Laura Po +, Sergio Ilarri *, Sonia Bergamaschi + and E. Mena * 1st International Workshop on Interoperability.
1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Title Course opinion mining methodology for knowledge discovery, based on web social media Authors Sotirios Kontogiannis Ioannis Kazanidis Stavros Valsamidis.
Ontology-Based User Modeling for Pedestrian Navigation Systems Panayotis Kikiras, Vassileios Tsetsos, and Stathes Hadjiefthymiades P ervasive C omputing.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Unsupervised Information Extraction from Unstructured, Ungrammatical Data Sources on the World Wide Web Mathew Michelson and Craig A. Knoblock.
On the Evaluation of Semantic Web Service Matchmaking Systems Vassileios Tsetsos, Christos Anagnostopoulos and Stathes Hadjiefthymiades P ervasive C omputing.
Semantic Location Based Services for Smart Spaces Kostas Kolomvatsos, Vassilis Papataxiarhis, Vassileios Tsetsos P ervasive C omputing R esearch G roup.
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
Representation of hypertext documents based on terms, links and text compressibility Julian Szymański Department of Computer Systems Architecture, Gdańsk.
DOG I : an Annotation System for Images of Dog Breeds Antonis Dimas Pyrros Koletsis Euripides Petrakis Intelligent Systems Laboratory Technical University.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI) Roberto Navigli 1 A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
An Effective Fuzzy Clustering Algorithm for Web Document Classification: A Case Study in Cultural Content Mining Nils Murrugarra.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou KBS Computing.
Reduction of Training Noises for Text Classifiers Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Unsupervised Word Sense Disambiguation REU, Summer, 2009.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
Intelligent Database Systems Lab Presenter : Kung, Chien-Hao Authors : Yoong Keok Lee and Hwee Tou Ng 2002,EMNLP An Empirical Evaluation of Knowledge Sources.
Automatic Identification of Pro and Con Reasons in Online Reviews Soo-Min Kim and Eduard Hovy USC Information Sciences Institute Proceedings of the COLING/ACL.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
Word Sense Disambiguation Algorithms in Hindi
GRAPH BASED MULTI-DOCUMENT SUMMARIZATION Canan BATUR
Linguistic Graph Similarity for News Sentence Searching
Using lexical chains for keyword extraction
Web News Sentence Searching Using Linguistic Graph Similarity
Exploring and Navigating: Tools for GermaNet
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Applying Key Phrase Extraction to aid Invalidity Search
Statistical NLP: Lecture 9
Presented by: Prof. Ali Jaoua
Hierarchical, Perceptron-like Learning for OBIE
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C omputing R esearch G roup C ommunication N etworks L aboratory Department of Informatics and Telecommunications University of Athens – Greece KAMC Genoa, Italy

Outline The Polysema Platform Introduction - Motivation Video Categorization Method Experimental Evaluation Conclusions - Future Work

Polysema platform Develops an end-to-end platform for iTV services Semantics-related research focuses on the development of: semantics extraction techniques for automatic annotation of audiovisual content, a personalization framework for iTV services with SW technologies, a tool with GUI for video annotation and MPEG-7 metadata creation

Introduction - Motivation Multimedia databases are becoming popular Most video classification methods are based on visual/audio signal processing Text processing is more lightweight than visual/audio processing High-level semantics are more closely related to human language than to visual features Subtitles capture the semantics of the corresponding video

Step 1: Text Preprocessing Subtitles are segmented into sentences A Part of Speech Tagger is applied to each sentence Stop words (e.g., “to”, “him”) are removed based on a stop words list

Step 2: Keyword extraction We used the TextRank algorithm to extract keywords TextRank represents the text as a graph, applies to the vertices a ranking algorithm based on Google’s PageRank, sorts vertices in decreasing rank order, extracts the top highly ranked vertices for further processing Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts, in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain, July 2004 TextRank:

Step 3: Word Sense Disambiguation Words have many possible meanings, called senses A Word Sense Disambiguation (WSD) algorithm is applied to determine the correct sense of each word WSD is based on the lexical database WordNet, is a variation of Lesk’s WSD algorithm Banerjee, S., Pedersen, T.: An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In the Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics (CICLING-02) Mexico City, Mexico (2002) WSD:

Step 4: WordNet Domains Extraction (1/2) augment WordNet with domain labels a taxonomy of ~200 domain labels synsets have been annotated with at least one domain label WordNet domains WN domains:

Step 4: WordNet Domains Extraction (2/2) For each video: Extract the WordNet domains for each keyword’s sense Calculate the frequency occurrence of each domain label Sort domain labels in decreasing order according to their occurrence frequency

Step 5: Correspondences between categories & WN domains For each category label: Look up in WordNet the senses related to it (include senses related through hypernym & hyponym relations) Obtain the corresponding WordNet domains Calculate the occurrence score for each domain Sort domains in decreasing occurrence order CategoryWordNet domains animalsanimals, biology, entomology warmilitary, history sciencemedicine, biology, mathematics Example:

Step 6: Category label assignment Compare the ordered list with the WN domains of each video with the ordered list of the WN domains of each category CategoryWordNet domains animalsanimals, biology, entomology warmilitary, history sciencemedicine, biology, mathematics Example: animals, entomology, biology WN domains of a video animals biology, mathematics, physics science

Experimental Evaluation (1/2) 36 subtitle files of documentaries Statistical information of files (average values): duration (min:sec) # of words # of non stop words # of keywords # of WN domains 41: Classify under the categories: geography, animals, history, war, technology, science, accidents, music, transportation, people, religious, politics, arts 36 subtitle files of documentaries Classify under the categories: geography, animals, history, war, technology, science, accidents, music, transportation, people, religious, politics, arts

Experimental Evaluation (2/2) Classifiers: Proposed method Proposed method in which Step 6 has been replaced with Spearman’s footrule distance J4.8 decision tree classifier supervised approach

Conclusions – Future Work Conclusions A novel approach that is based only on text and uses natural language processing techniques No training phase is required (unsupervised approach) Future Work The application of a method on a per video segment basis Definition of domain knowledge more close to movie classification Performance comparison with other unsupervised approaches

Thank you! Questions???