Object Recognition Part 2 Authors: Kobus Barnard, Pinar Duygulu, Nado de Freitas, and David Forsyth Slides by Rong Zhang CSE 595 – Words and Pictures Presentation.

Slides:



Advertisements
Similar presentations
Clustering Art & Learning the Semantics of Words and Pictures Manigantan Sethuraman.
Advertisements

LEARNING SEMANTICS OF WORDS AND PICTURES TEJASWI DEVARAPALLI.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Heuristic Search techniques
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
October 1999 Statistical Methods for Computer Science Marie desJardins CMSC 601 April 9, 2012 Material adapted.
1 Functions and Applications
Word Lesson 11 Customizing Tables and Creating Charts Microsoft Office 2010 Advanced Cable / Morrison 1.
Stat-JR: eBooks Richard Parker. Quick overview To recap… Stat-JR uses templates to perform specific functions on datasets, e.g.: – 1LevelMod fits 1-level.
Image Retrieval Basics Uichin Lee KAIST KSE Slides based on “Relevance Models for Automatic Image and Video Annotation & Retrieval” by R. Manmatha (UMASS)
Chapter 14 Comparing two groups Dr Richard Bußmann.
Computer Vision – Image Representation (Histograms)
Topic 6: Introduction to Hypothesis Testing
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
Jimmy Lin The iSchool University of Maryland Wednesday, April 15, 2009
Ensemble Learning: An Introduction
Spreadsheets With Microsoft Excel ® as an example.
Statistics Psych 231: Research Methods in Psychology.
Microsoft ® Office Excel ® 2007 Training Get started with PivotTable ® reports [Your company name] presents:
Microsoft ® Office Excel ® 2007 Training Get started with PivotTable ® reports Guangzhou Newelink Technology Co,. Ltd.
Presented by Zeehasham Rasheed
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Lot-by-Lot Acceptance Sampling for Attributes
Computer Vision - A Modern Approach Set: Segmentation Slides by D.A. Forsyth Segmentation and Grouping Motivation: not information is evidence Obtain a.
Object Recognition as Machine Translation Matching Words and Pictures Heather Dunlop : Advanced Perception April 17, 2006.
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Initial Data Analysis Central Tendency. Notation  When we describe a set of data corresponding to the values of some variable, we will refer to that.
Overview: Humans are unique creatures. Everything we do is slightly different from everyone else. Even though many times these differences are so minute.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
The Binomial Distribution © Christine Crisp “Teach A Level Maths” Statistics 1.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Information Retrieval and Web Search Text properties (Note: some of the slides in this set have been adapted from the course taught by Prof. James Allan.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Querying Structured Text in an XML Database By Xuemei Luo.
DLS on Star (Single-level tree) Networks Background: A simple network model for DLS is the star network with a master-worker platform. It consists of a.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
1 Psych 5500/6500 Standard Deviations, Standard Scores, and Areas Under the Normal Curve Fall, 2008.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Section 10.1 Confidence Intervals
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
BUSINESS PERFORMANCE MANAGEMENT
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Exploiting Ontologies for Automatic Image Annotation Munirathnam Srikanth, Joshua Varner, Mitchell Bowden, Dan Moldovan Language Computer Corporation SIGIR.
Data Analysis.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
Organization of statistical investigation. Medical Statistics Commonly the word statistics means the arranging of data into charts, tables, and graphs.
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
VizTree Huyen Dao and Chris Ackermann. Introducing example
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Step 1: Specify a null hypothesis
Multimodal Learning with Deep Boltzmann Machines
Lecture Slides Elementary Statistics Thirteenth Edition
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Matching Words with Pictures
One-Way Analysis of Variance
Matching Words and Pictures
Presentation transcript:

Object Recognition Part 2 Authors: Kobus Barnard, Pinar Duygulu, Nado de Freitas, and David Forsyth Slides by Rong Zhang CSE 595 – Words and Pictures Presentation Day:

What is Object Recognition? Process of attaching words to images and to image regions. (An object) Essentially translating visual representations such as pictures or illustrations into language. IMG Source:

Why Use This Method? This method is concrete and testable. No need to specify in advance which objects and or scene semantics are to be considered. Can exploit large datasets such as: – Indexed image databases (photo collections online). – Web images with captions or other associated text. – Or even video data along with speech recognition. Can be used to develop vision tools which are applicable to the general recognition problem.

How Do We Do This? Use clustering methods (last paper presentation), this will “learn” the joint statistics of image words and segments. – This method has been applied traditional image data base applications. – It is suggested that these methods can produce words associated with images outside the training set. – These produced words that are indicative to the scene semantics. Strong connection between the produced words and scene.

Scene Based Word Prediction Previous presentation showed us how we can generate words for images. – The tree structure: We can generate this tree via top-down (general to specifics) or bottom up (specifics to generals). Leaf to Root = Cluster. D = Set of Observations, d = specific document, c = clusters, i = indexed items, l = indexed levels. Relies on documents specific to the training set. So for images outside the training set requires more complex work.

What if we want to consider a document not in the training set? Need to estimate the vertical mixing weights or marginalizing out the training data(not good for large datasets). We can use a cluster specific average computed during training. Or we can re-fit the model with the document under consideration. For some applications it works well, which indicated that the selection of vertical nodes used is the most important and that further modeling of distribution doesn’t matter as much.

Word Prediction To predict words from images we assume that we have a new document with a set of observed image segments, S, and we use the following: Here we don’t really take into account the document index, d, in P(l). – We are interested in applying it to documents outside the training set.

Region based word prediction (correspondence) The previous models implicitly learn some correspondence through co-occurrence because there was a fitting advantage to having “topics” collected at the nodes. – Tigers and orange stripy areas. The following word prediction method can to find how well the nodes bind words and regions:

Integrating Correspondence Integrating the correspondence method into the model and learning it during training is a much better way to it. So now we assume that observed words and regions (image segment) are emitted in pairs, that D={(w,s) i }.

Getting Better Previous methods would produce a “likely” cluster from a query on a segment. The predicted words had more freedom. – Not exactly what we’re looking for. The new model brings to the table, stronger ties between the emitted words and regions. – But this increases complexity of the learning algorithm (expected).

Learning During Training We sample likely correspondences using estimates of the probabilities that each word is emitted with each segment. Done before the E step in the model fitting process. – The assignment (word and segment) is then used in the estimation of the expectation of the indicator variables. We assume now that the prob. a word and a segment is tied together can be estimated by the prob. that they are emitted from the same node.

The Final Probability We consider each segment in turn and choose words using the following formula. But we do so while restricted to words not yet accounted for or without restrictions once all words have been paired with.

Measuring Performance We want to be able to point out results that are far below human abilities. (really bad ones…) We can observe word predictions on held out data. – The held out data would also already have associated words or text. We can use this for our comparisons.

The Difficulties… Well our vocabulary is fairly large… How do we tell how close, or accurate, a word is to another word? – Is there a scale of measuring how “close” a word is to another? General purpose lose function is the observation that certain errors aren’t as bad or critical as others. – For example, “cat” for “tiger” is nowhere near as critical of an error as “car” for “vegetable”. – Again, the issue still remains. Closeness of a term to another term is very subjective.

Solution? We can count the words. Even still, the task is hard. (Vocabulary) – The number of incorrect terms far outnumber the number of appropriate ones. We also can’t just subtract the bad predictions from the good ones, it would be bad.

~ Solution

Measuring Performance of Region + Word prediction Measuring region/segment oriented word prediction performance is, however, much harder than the straight annotation. – we don’t have the correspondence information. But we can still use this annotation task as a proxy. The performance is correlated. – We can report back the results with image word based prediction methods such as ave-vert, doc-vert. – We can also use the summing over the words emitted by the regions (pair-cluster, pair-only) method.

Humans can get involved too We can further complement the previously mentioned methods with human judgment. We can count the number of times the word with the highest prob, for a region, was acceptable as an indexed term, had any ties to the region and or had some sort of visual connection to it. But, things get a bit fuzzy if the region crosses multiple natural boundaries. – We tend to count the most prevailing one, >50%.

Experiment Used 160 CD’s, from the Coral image data set. – Each disk had a relatively specific topic such as “aircraft”. – Used 80 as the sample. Of the sample, 75% of it was used for the training process and 25% was used for the test process. Results indicated that learning correspondence was helpful for the annotation task. – We can apply this to tasks that don’t require correspondence such as auto-annotation. Training with correspondence improved performance from 14.2% to 17.3%

The top row is fairly good. As you can see, the tiger is having slight problems. Race Car isn’t all that great either.

As you can see, using the held out data drops performance But when corresponde nce is added, performance improves.

This shows that learning corresponde nce is generally helpful in the word prediction method for held out data.