Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung.

Slides:



Advertisements
Similar presentations
CORE 1 UNIT 8 Patterns of Chance
Advertisements

CHAPTER 14: Confidence Intervals: The Basics
Linear Regression.
Chapter 5: Introduction to Information Retrieval
VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
CS 795 – Spring  “Software Systems are increasingly Situated in dynamic, mission critical settings ◦ Operational profile is dynamic, and depends.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
The Rate of Concentration of the stationary distribution of a Markov Chain on the Homogenous Populations. Boris Mitavskiy and Jonathan Rowe School of Computer.
Planning under Uncertainty
Visual Recognition Tutorial
POMDPs: Partially Observable Markov Decision Processes Advanced AI
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Evaluating Hypotheses
Link Analysis, PageRank and Search Engines on the Web
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Estimate the Number of Relevant Images Using Two-Order Markov Chain Presented by: WANG Xiaoling Supervisor: Clement LEUNG.
Relationships Among Variables
The effect of New Links on Google Pagerank By Hui Xie Apr, 07.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Binomial Distributions
Random Walks and Markov Chains Nimantha Thushan Baranasuriya Girisha Durrel De Silva Rahul Singhal Karthik Yadati Ziling Zhou.
1 Level of Significance α is a predetermined value by convention usually 0.05 α = 0.05 corresponds to the 95% confidence level We are accepting the risk.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Statistics 101 Chapter 10. Section 10-1 We want to infer from the sample data some conclusion about a wider population that the sample represents. Inferential.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Basic Statistics Inferences About Two Population Means.
LECTURE 2. GENERALIZED LINEAR ECONOMETRIC MODEL AND METHODS OF ITS CONSTRUCTION.
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
Lecture 2 Forestry 3218 Lecture 2 Statistical Methods Avery and Burkhart, Chapter 2 Forest Mensuration II Avery and Burkhart, Chapter 2.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.
PHARMACOECONOMIC EVALUATIONS & METHODS MARKOV MODELING IN DECISION ANALYSIS FROM THE PHARMACOECONOMICS ON THE INTERNET ®SERIES ©Paul C Langley 2004 Maimon.
Interval Estimation and Hypothesis Testing Prepared by Vera Tabakova, East Carolina University.
Term Paper Topics Quantitative Methods for Business Strayer University by Tristan Hübsch Quantitative Methods for Business Strayer University.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
 The point estimators of population parameters ( and in our case) are random variables and they follow a normal distribution. Their expected values are.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
AP Statistics Section 11.1 B More on Significance Tests.
Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
Testing Hypotheses about a Population Proportion Lecture 31 Sections 9.1 – 9.3 Wed, Mar 22, 2006.
Chapter 6 Lecture 3 Sections: 6.4 – 6.5. Sampling Distributions and Estimators What we want to do is find out the sampling distribution of a statistic.
Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
10.1 Properties of Markov Chains In this section, we will study a concept that utilizes a mathematical model that combines probability and matrices to.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Construction Engineering 221 Probability and Statistics Binomial Distributions.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Other Models for Time Series. The Hidden Markov Model (HMM)
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Probability.
PageRank and Markov Chains
Chapter 8: Inference for Proportions
Chapter 9 Hypothesis Testing.
Hidden Markov Models Part 2: Algorithms
Hidden Markov Autoregressive Models
CHAPTER 14: Confidence Intervals The Basics
Interval Estimation and Hypothesis Testing
One-Way Analysis of Variance
The Binomial Distributions
Introduction to Probability: Solutions for Quizzes 4 and 5
Presentation transcript:

Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Introduction Due to the increased importance of the Internet, the use of image search engines is becoming increasingly widespread. However, it is difficult for users to make a decision as to which image search engine should be selected. The more effective the system is, the more it will offer satisfaction to the user. Retrieval effectiveness becomes one of the most important parameters to measure the performance of image retrieval systems.

Measures: Precision Recall Significant Challenge: the total number of relevant images is not directly observable in such a potentially infinite database

Objective To Investigate the probabilistic behavior of the distribution of relevant images among the returned results for the image search engines: a) Independent Distribution b) Markov Chain Distribution From such models, we shall determine algorithms for the meaningful estimation of recall.

Independent Model Let p k denote the probability that the cumulative relevance of all the images in page k. In general, it is normally true that, for search engines, the first pages provide a larger probability, so that p 1  p 2    p k  p k+1   Since the relevant outcomes of different ranked images are not mutually exclusive events and that the search results do not feasibly terminate, we have in general and that, as

Independent Model Record the number of relevant images per page as some stochastic process X i1,X i2, …X ik, where i=1,2, …69 k=1,2… Investigate the quadratic formula: P k =  1 k 2 +  2 k + , where k=1, 2, 3… Determine the parameters using the least square method Calculate the percentage that the cumulative relevance of all the images in page k, Obtain a mean number of relevant images for each page    69 1,...2,1, i ikk kXX

Markov Chain Model Since in internet image search, results are returned in units of pages, we shall focus on the integer-valued stochastic process X 1, X 2,…, where X J represents the aggregate relevance of all the images in page J, the sequence X={X 1, X 2,…} will be modeled as Markov Chain. Take the conditional probability of the number of relevant images in X J given the number of relevant images in X J-1 to be the transition probability: p (J-1),J ={ X J =x J |X J-1 =x J-1 }.

Markov Chain Model From this, we construct the transition probability matrix. where n is the number of images contained in a page.

Markov Chain Model Calculate the initial probabilities. The probabilities are placed in a vector of state probabilities:  (J)= vector of state probabilities for page J = (  0,  1,  2,  3, …,  n ) Where  k is the probability of having k relevant images Therefore, from this model, we can estimate the number of relevant images by pages by using the formula:  (J) =  (J-1)*P, J=1, 2, 3, …, n

Experiment Image search engine selection: Google, Yahoo, Msn Queries Selection: the queries consist of one-word, two-word and more than three-word queries, which range from simple words like apple to specific query like apple computers and finally progressing to more specific query like eagle catching fish Record the stochastic sequence X={X 1, X 2,…} for each query Apply the models: Independent Model and Markov Chain Model Test the returned results using the query: volcano, tibetan girl, desert camel shadow

Independent Model and Testing Results for Google Figure 1. Independent Model for Google Figure 2. Testing Results and Independent Distribution Model for Google

Independent Model and Testing Results for Yahoo Figure 3. Independent Model for Yahoo Figure 4. Testing Results and Independent Distribution Model for Yahoo

Independent Model and Testing Results for Msn Figure 5. Independent Model for MsnFigure 6. Testing Results and Independent Distribution Model for Msn

Markov Chain Model and Testing Results for Google Figure 7. Search Result of Testing Queries and Markov Chain Model for Google

Markov Chain Model and Testing Results for Yahoo Figure 8. Search Result of Testing Queries and Markov Chain Model for Yahoo

Markov Chain Model and Testing Results for Msn Figure 9. Search Result of Testing Queries and Markov Chain Model for Msn

Measure of Accuracy mean absolute deviationMAD One measure of accuracy is the mean absolute deviation (MAD) ISE MAD Model GoogleYahooMsn One- word Two- word Three- word One- word Two- word Three- word One- word Two- word Three- word INDP Model MC Model

Conclusion In terms of MAD, we conclude that the Markov Chain Model can estimate the number of relevant images for the ISE better than Independent Model does. Except for three word query for Msn, such models could estimate the total number of image search engines quite well

Future Work Optimal stopping rules for the different models will be established Time series modeling and exponential Smoothing. Because the previous models indicates that the situation may be modeled as a time series with the page number representing the time.

Q & A