Annotation Framework & ImageCLEF 2014 JAN BOTOREK, PETRA BUDÍKOVÁ 10. 3. 2014.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Chapter 8: Evaluating Alternatives for Requirements, Environment, and Implementation.
ImageCLEF breakout session Please help us to prepare ImageCLEF2010.
Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled.
 Users annotate things (resources) with labels (tags)  These annotations are shared, creating a collaborative dataset called a folksonomy coffee java.
HCC class lecture 6 comments John Canny 2/7/05. Administrivia.
Automatic Discovery of Technology Trends from Patent Text Youngho Kim, Yingshi Tian, Yoonjae Jeong, Ryu Jihee, Sung-Hyon Myaeng School of Engineering Information.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Visual Basic: An Object Oriented Approach
Learning Subjective Adjectives from Corpora Janyce M. Wiebe Presenter: Gabriel Nicolae.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Algorithms for Data Mining and Querying with Graphs Investigators: Padhraic Smyth, Sharad Mehrotra University of California, Irvine Students: Joshua O’
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.
Moving forward with Scalable Game Design. The landscape of computer science courses…  Try your vegetables (sneak it in to an existing course)  Required.
Information Retrieval in Practice
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.
IST 210 Database Design Process IST 210 Todd S. Bacastow January 2005.
04/30/13 Last class: summary, goggles, ices Discrete Structures (CS 173) Derek Hoiem, University of Illinois 1 Image: wordpress.com/2011/11/22/lig.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : Jorge Villalon and Rafael A. Calvo 2011, EST Concept Maps as Cognitive Visualizations.
1 Query Operations Relevance Feedback & Query Expansion.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
A Word at a Time: Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky (Technion) Eugene Agichtein (Emory) Evgeniy Gabrilovich (Yahoo!
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Describing Images using Inferred Visual Dependency Representations Authors : Desmond Elliot & Arjen P. de Vries Presentation of Paper by : Jantre Sanket.
A Graph-based Friend Recommendation System Using Genetic Algorithm
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Lathe Ontology. First steps What is the ontology for? Who will use it? What are the relevant bodies of data? Are there existing ontologies covering the.
Beyond Nouns Exploiting Preposition and Comparative adjectives for learning visual classifiers.
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Li Fei-Fei Dept. of Computer Science, Princeton University, USA CVPR ImageNet1.
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
Wordnet - A lexical database for the English Language.
Describing Images using Inferred Visual Dependency Representations Authors : Desmond Elliot & Arjen P. de Vries Presentation of Paper by : Jantre Sanket.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Communicative and Academic English for the EFL Professional.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Your caption here POLYPHONET: An Advanced Social Network Extraction System from the Web Yutaka Matsuo Junichiro Mori Masahiro Hamasaki National Institute.
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Personalized Social Image Recommendation
Social Knowledge Mining
Information Retrieval
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
The Big Health Data–Intelligent Machine Paradox
Using Natural Language Processing to Aid Computer Vision
Presentation transcript:

Annotation Framework & ImageCLEF 2014 JAN BOTOREK, PETRA BUDÍKOVÁ

Annotation Tool 2 Annotation tool + MUFIN - House - Nature - Forest - Tree … - WordNet - Visual Concept Ontology - MUFIN visual object search Profimedia

What is ImageCLEF? Competition regarding cross-language annotation and retrieval of images Competition regarding cross-language annotation and retrieval of images Different areas of interest: Different areas of interest: Robot vision – object recognition Robot vision – object recognition Plant identification Plant identification Medical image identification Medical image identification Concept Image Annotation Concept Image Annotation Goal: From a given set of concepts, select those that are relevant for an input image Goal: From a given set of concepts, select those that are relevant for an input image 3

Concept Image Annotation in 2014 Focus on scalability – no manually labeled training data Focus on scalability – no manually labeled training data Noisy training data downloaded from internet are only available Noisy training data downloaded from internet are only available Development data – images with ground truth concepts Development data – images with ground truth concepts Participants are allowed to use no manually labeled training data that was created directly for machine learning Participants are allowed to use no manually labeled training data that was created directly for machine learning Profiset is OK, since it is a by-product of another activity (image selling) Profiset is OK, since it is a by-product of another activity (image selling) 4

CLEF Annotation Task vs. our annotation tools Our long-time research objective: general image annotation without well-labeled training data Our long-time research objective: general image annotation without well-labeled training data „Big Data approach“ „Big Data approach“ Primarily focused on description of noun objects Primarily focused on description of noun objects Difficult to evaluate Difficult to evaluate CLEF: Annotation contest with ground truth CLEF: Annotation contest with ground truth Considers nouns, adjectives and verbs Considers nouns, adjectives and verbs Participants typically use model-based approaches Participants typically use model-based approaches 5

Competition’s benefits for us Evaluation of our tools on a well-known ground truth Evaluation of our tools on a well-known ground truth Images of numerous types and situations are included in the CLEF collection as a training and development material. Images of numerous types and situations are included in the CLEF collection as a training and development material. Compare our ideas to solution with other teams Compare our ideas to solution with other teams Utilize our current approach to the image annotation process in the practice Utilize our current approach to the image annotation process in the practice With a feedback from the initiators With a feedback from the initiators Long-time objective: journal paper about search-based annotation, CLEF results as part of evaluation Long-time objective: journal paper about search-based annotation, CLEF results as part of evaluation 6

What we plan to utilize? WordNet-based relations WordNet-based relations Must be accustomed to the given word types (not just nouns) Must be accustomed to the given word types (not just nouns) WordNet-based word similarity metrics WordNet-based word similarity metrics Visual Concept Ontology Visual Concept Ontology Similar Images search – powered by MUFIN Similar Images search – powered by MUFIN Co-occurrence relations among words within Profimedia dataset Co-occurrence relations among words within Profimedia dataset Constructed by Institute of Formal and Applied Linguistics (MFF UK) Constructed by Institute of Formal and Applied Linguistics (MFF UK) 7

WordNet vs. co-occurence WordNet – fundamental technology WordNet – fundamental technology Meanings, relations, multiple word types Meanings, relations, multiple word types Hypernymy, antonymy, part-whole, gloss… Hypernymy, antonymy, part-whole, gloss… “language” point of view “language” point of view Co-occurrence table of related words Co-occurrence table of related words Constructed from very large text corpora (linguists from MFF UK) Constructed from very large text corpora (linguists from MFF UK) For each word that occurrs in Profiset descriptions, we have 100 most co-occurred words For each word that occurrs in Profiset descriptions, we have 100 most co-occurred words No word types attached No word types attached “human/database” point of view “human/database” point of view 8

Our approach I. Overall View 9

Our approach II. Network example distribute probabilities – PageRank style distribute probabilities – PageRank style 10

Our approach III. The probability-transfer network The probability-transfer coefficients of links between individual nodes are defined for different types of relations: hypernymy, synonymy, meronymy, word co-occurrence, … The probability-transfer coefficients of links between individual nodes are defined for different types of relations: hypernymy, synonymy, meronymy, word co-occurrence, … E.g. Meronyms (whole -> parts): (1-l)/n E.g. Meronyms (whole -> parts): (1-l)/n l: constant l: calibration constant 11

Our approach IV. Algorithm steps 1) Assign probability values to initial nodes 2) Build the network Extend initial nodes by related synsets AND co- occurred words Extend initial nodes by related synsets AND co- occurred words Assign “probability-transfer coefficients” to links between nodes (determined by the type of relationship) Assign “probability-transfer coefficients” to links between nodes (determined by the type of relationship) 3) “Page-ranking” process Run a process where synsets will mutually boost one another’s probability values Run a process where synsets will mutually boost one another’s probability values 4) Select the most probable synsets 12

Our approach V: Network example 13

Our approach VI. Unresolved issues Calibration of probability-transfer coefficients Calibration of probability-transfer coefficients What constants should be used? What constants should be used? Initial step: assignment of initial probabilities for particular annotation words Initial step: assignment of initial probabilities for particular annotation words Details of the probability transfer algorithm Details of the probability transfer algorithm Final step: Selecting of the most probable concepts Final step: Selecting of the most probable concepts 14

Summary Problem Problem Select descriptive words of given image from a predefined set of concept words Select descriptive words of given image from a predefined set of concept words Our approach Our approach Construction of a network of synsets; a node (synset) influences another’s probability by mutual relations Construction of a network of synsets; a node (synset) influences another’s probability by mutual relations Inspired by page-ranking algorithm Inspired by page-ranking algorithm Main research objectives Main research objectives Design and construct a model of the synset network Design and construct a model of the synset network Define and calibrate relations (links) among nodes in the network (synsets) Define and calibrate relations (links) among nodes in the network (synsets) 15