Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

Slides:

Advertisements

Similar presentations

Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos.

Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.

Introduction to Information Retrieval

Lecture: Dudu Yanay.  Input: Each instance is associated with a rank or a rating, i.e. an integer from ‘1’ to ‘K’.  Goal: To find a rank-prediction.

Low/High Findability Analysis Shariq Bashir Vienna University of Technology Seminar on 2 nd February, 2009.

Image Search Presented by: Samantha Mahindrakar Diti Gandhi.

1 Statistical correlation analysis in image retrieval Reporter : Erica Li 2004/9/30.

Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun

Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.

Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.

SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,

STUDENTLIFE PREDICTIVE MODELING Hongyu Chen Jing Li Mubing Li CS69/169 Mobile Health March 2015.

DOG I : an Annotation System for Images of Dog Breeds Antonis Dimas Pyrros Koletsis Euripides Petrakis Intelligent Systems Laboratory Technical University.

 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.

Keyphrase Extraction in Scientific Documents Thuy Dung Nguyen and Min-Yen Kan School of Computing National University of Singapore Slides available at.

A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

Active Learning for Class Imbalance Problem

APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Presented by Tienwei Tsai July, 2005

©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.

Iterative Readability Computation for Domain-Specific Resources By Jin Zhao and Min-Yen Kan 11/06/2010.

Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif

Multimodal Alignment of Scholarly Documents and Their Presentations Bamdad Bahrani JCDL 2013 Submission Feb 2013.

Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.

Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.

Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.

AUTOMATED TEXT CATEGORIZATION: THE TWO-DIMENSIONAL PROBABILITY MODE Abdulaziz alsharikh.

Crawling and Aligning Scholarly Presentations and Documents from the Web By SARAVANAN.S 09/09/2011 Under the guidance of A/P Min-Yen Kan 10/23/

Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.

1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Deep Questions without Deep Understanding

Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu.

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Automating Readers’ Advisory to Make Book Recommendations for K-12 Readers by Alicia Wood.

Post-Ranking query suggestion by diversifying search Chao Wang.

Final Project Mei-Chen Yeh May 15, General In-class presentation – June 12 and June 19, 2012 – 15 minutes, in English 30% of the overall grade In-class.

ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.

A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.

BOOTSTRAPPING INFORMATION EXTRACTION FROM SEMI-STRUCTURED WEB PAGES Andrew Carson and Charles Schafer.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.

Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006.

Extending linear models by transformation (section 3.4 in text) (lectures 3&4 on amlbook.com)

PREDICTION ON TWEET FROM DYNAMIC INTERACTION Group 19 Chan Pui Yee Wong Tsz Wing Yeung Chun Kit.

Experience Report: System Log Analysis for Anomaly Detection

Learning to Detect and Classify Malicious Executables in the Wild by J

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Evaluating Classifiers

Evaluation of IR Systems

Machine Learning Week 1.

Overview of Machine Learning

Ying Dai Faculty of software and information science,

Learning to Rank with Ties

Presentation transcript:

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

Min-Yen Kan, Digital Libraries 2JCDL: Beyond Text19 June Beyond Text in Digital Libraries Papers don’t store all of the information about a discovery Dataset Tools Implementation details / conditions They also don’t help a person learn the research Textbooks Slides We’ll focus on this

Min-Yen Kan, Digital Libraries 3JCDL: Beyond Text19 June What is Slide Image Retrieval? Hypothesis: image and presentation features can enhance text-based features

Min-Yen Kan, Digital Libraries 4JCDL: Beyond Text19 June Slidir Architecture Presentation Slides 3. Machine Learning Model Training Testing 4. Image Ranking 1. Pre- Processing Text + Image + Presentation Features Relevance Judgments Queries 2. Feature Extraction Presentation Slides

Min-Yen Kan, Digital Libraries 5JCDL: Beyond Text19 June Preprocessing Generally consistent background  Perform collective separation Top down approach using projection profile cuts with thresholds from PPT template Simple preprocessing, not meant to be state-of-the-art

Min-Yen Kan, Digital Libraries 6JCDL: Beyond Text19 June Sample Extraction Results

Min-Yen Kan, Digital Libraries 7JCDL: Beyond Text19 June Feature Extraction Utilizes textual, image and presentation features Textual Features – Main component of feature set – Most images identified by surrounding and embedded text Image Features – Supplement to textual features – Useful in ranking images with similar textual feature scores Presentation Features –Features unique to presentation slides

Min-Yen Kan, Digital Libraries 8JCDL: Beyond Text19 June 2008 Sample Features Image Image Size Image Colors NPIC image type (CIVR, 06) Presentation 1.Slide Order 2.Image position on Slide 3.Number of Images on Slide 8 Textual 1.All text 2. Slide text 3. Next slide’s text 4. Previous slide’s text 5. Slide Title 6. Presentation Title 7. Slide Image Text …

Min-Yen Kan, Digital Libraries 9JCDL: Beyond Text19 June Machine Learning Regression Model –Relevance judgments and feature vectors input into machine learner –Linear regression used to produce continuous output value for image ranking Human Relevance Judgments –9 participants –Four presentation sets of 10 queries each –Participants rank up to five most relevant images to the query

Min-Yen Kan, Digital Libraries 10JCDL: Beyond Text19 June Regression Analysis 10-fold cross validation System with fielded text features able to replicate human upper bound Image and presentation features further improve correlation All text baselineFielded Text With Presentation and Image Features Inter-annotator Mean absolute error (lower is better) Correlation (higher is better) Comparison of mean absolute error and correlation Still a gap to improve Looks pretty good

Min-Yen Kan, Digital Libraries 11JCDL: Beyond Text19 June Binary Classification –Human participants judgments converted to binary –Evaluated using SVM and J48 decision tree classifier –General accuracy fairly good –Decision tree algorithm slightly more precise  sequential testing adequate SVM (SMO) J48 Decision Tree Accuracy 90.9%92.5% Kappa Relevant Class Precision Recall.58 F1F fold performance over 8062 instances

Min-Yen Kan, Digital Libraries 12JCDL: Beyond Text19 June Feature Analysis Human subject study with 20 volunteers Subjects select most and least relevant images Sample Human Subject Survey

Min-Yen Kan, Digital Libraries 13JCDL: Beyond Text19 June Textual features most important, image and presentation features still play significant role Rank SLIDIR Features Regression Coefficients 1 Slide Title Slide Image Type Slide Text Slide Image Text Next Slide’s Text Presentation Title Number of Images on Slide Slide Order Image Size Slide Image Position Number of Colors Previous Slide’s Text 31.70

Min-Yen Kan, Digital Libraries 14JCDL: Beyond Text19 June Image Ranking Evaluation Online survey with 15 volunteers 8 given queries and sampled dataset of 100 images Volunteers indicate relevant images to query 53% of top 10 images are relevant (max of 15 relevant) Average precision of 0.74 Precision at nRecall at n n = top n = top n = top n = top Average Precision.74

Min-Yen Kan, Digital Libraries 15JCDL: Beyond Text19 June Future Work Limitations Very small data set – 100 images from an AI course Preprocessing Improve segmentation and collective extraction Field Implementation – Ongoing work Incorporate with previous work on SlideSeer (JCDL, 2007) that aligns presentation and document pairs More information: Liew Guo Min(2008) SLIDIR: Slide Image Retrieval, Undergraduate Thesis, School of Computing, National University of Singapore.

Min-Yen Kan, Digital Libraries 16JCDL: Beyond Text19 June 2008 See you in Singapore! Next month or next year! Questions?