March 7, 2002 1 Using Pattern Recognition Techniques to Derive a Formal Analysis of Why Heuristic Functions Work B. John Oommen A Joint Work with Luis.

Slides:

Advertisements

Similar presentations

Tests of Static Asset Pricing Models

Advertisements

Boyce/DiPrima 9th ed, Ch 2.8: The Existence and Uniqueness Theorem Elementary Differential Equations and Boundary Value Problems, 9th edition, by William.

Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.

Bayesian inference of normal distribution

Fast Algorithms For Hierarchical Range Histogram Constructions

Monte Carlo Methods and Statistical Physics

CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.

1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests

Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.

Outline input analysis input analyzer of ARENA parameter estimation

The General Linear Model. The Simple Linear Model Linear Regression.

All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.

Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.

ELEC 303 – Random Signals Lecture 18 – Statistics, Confidence Intervals Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 10, 2009.

Visual Recognition Tutorial

Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.

Section 6.1 Let X 1, X 2, …, X n be a random sample from a distribution described by p.m.f./p.d.f. f(x ;  ) where the value of  is unknown; then  is.

1 An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

Evaluating Hypotheses

G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.

Visual Recognition Tutorial

MAE 552 – Heuristic Optimization Lecture 5 February 1, 2002.

Linear Programming Applications

Copyright © Cengage Learning. All rights reserved. 6 Point Estimation.

Getting rid of stochasticity (applicable sometimes) Han Hoogeveen Universiteit Utrecht Joint work with Marjan van den Akker.

1 10. Joint Moments and Joint Characteristic Functions Following section 6, in this section we shall introduce various parameters to compactly represent.

CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.

Probability theory: (lecture 2 on AMLbook.com)

A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.

1 Statistical Distribution Fitting Dr. Jason Merrick.

Lecture 15: Statistics and Their Distributions, Central Limit Theorem

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Lab 3b: Distribution of the mean

Properties of OLS How Reliable is OLS?. Learning Objectives 1.Review of the idea that the OLS estimator is a random variable 2.How do we judge the quality.

Biostatistics Unit 5 – Samples. Sampling distributions Sampling distributions are important in the understanding of statistical inference. Probability.

ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.

PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.

Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.

Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.

Chapter5: Evaluating Hypothesis. 개요 개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.

South Dakota School of Mines & Technology Introduction to Probability & Statistics Industrial Engineering.

Chapter 5 Sampling Distributions. The Concept of Sampling Distributions Parameter – numerical descriptive measure of a population. It is usually unknown.

1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.

Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.

Chapter 11 Introduction to Computational Complexity Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.

Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

Copyright © Cengage Learning. All rights reserved.

12. Principles of Parameter Estimation

Estimation Maximum Likelihood Estimates Industrial Engineering

Determining the distribution of Sample statistics

Estimation Maximum Likelihood Estimates Industrial Engineering

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

Learning From Observed Data

Evaluating Hypothesis

12. Principles of Parameter Estimation

Further Topics on Random Variables: Derived Distributions

Further Topics on Random Variables: Derived Distributions

Chapter 4. Supplementary Questions

Further Topics on Random Variables: Derived Distributions

Optimization under Uncertainty

Presentation transcript:

March 7, Using Pattern Recognition Techniques to Derive a Formal Analysis of Why Heuristic Functions Work B. John Oommen A Joint Work with Luis G. Rueda School of Computer Science Carleton University

March 7, Optimization Problems Any arbitrary optimization problem: Instances, drawn from a finite set, X, An Objective function Some feasibility functions The aim: Find an (hopefully the unique) instance of X, which leads to a maximum (or minimum) subject to the feasibility constraints.

March 7, An Example The Traveling Salesman Problem (TSP) Consider the cities numbered from 1 to n, The salesman starts from city 1, visits every city once, and Returns to city 1. An instance of X is a permutation of cities: For example, , if five cities considered The objective function: The sum of the inter-city distances: 1  4, 4  3, 3  2, 2  5, 5  1

March 7, Heuristic Functions A Heuristic algorithm is an algorithm which attempts to find a certain instance X that maximizes the objective function It iteratively invokes a Heuristic function. The heuristic function estimates (or measures) the cost of the solution. The heuristic itself is a method that performs one or more changes to the current instance.

March 7, An Open Problem Consider a Heuristic algorithm that invokes any of Two Heuristic Functions : H 1 and H 2 used in estimating the solution to an Optimization problem If Estimation accuracy of H 1 > Estimation accuracy of H 2 Does it imply that H 1 has higher probability of leading to the optimal QEP?

March 7, Pattern Recogniton Modeling Two heuristic functions : H 1 and H 2 Probability of choosing a cost value of a Solution: two independent random variables: X 1 and X 2 Distribution -- doubly exponential: c where, and

March 7, Pattern Recogniton Modeling Our model: Error function is doubly exponential. Typical in reliability analysis and failure models. How reliable is a Solution when only estimate known? Assumptions: Mean cost of Optimal Solution: , then shift the origin by   E[X] = 0 Variances: Estimate X 1 better than Estimate of X 2

March 7, Main Result (Exponential) H 1 and H 2, two heuristic functions. X 1 and X 2, two r.v.  optimal solution obtained by H 1 and H 2 X 1 ’ and X 2 ’, other two r.v. for sub-optimal solution Let p 1 and p 2 the prob. that H 1 and H 2 respectively make the wrong decision. Shown that: then :

March 7, Proof (Graphical Sketch) For a particular x, the prob. that x leads to wrong decision by H 1 is given by: X 1 (opt) X 1 (subopt) X 2 (subopt) X 2 (opt)

March 7, Proof (Cont’d) or X 1 (opt) X 1 (subopt) X 2 (subopt) X 2 (opt) if x < c

March 7, Proof (Cont’d) The total probability that H 1 makes the wrong decision for all values of x is: Similarly, the prob. that H 2 makes the wrong decision for all values of x is:

March 7, Proof (Cont’d) Solving integrals and making p 1  p 2, we have: which, using ln x  x - 1, implies that p 1  p 2 QED where  1 = 1 c and  2 = 2 c Also  2 substituted for k  1

March 7, Second Theorem F(  1,k) can also be written in terms of  1 and k as: Suppose that  1  0 and 0  k  1, then G(  1,k)  0, and there are two solutions for G(  1,k) = 0 and Proof: Taking partial derivatives and solving: and

March 7, Graphical Analysis (Histograms) R-ACM / Eq-width R-ACM / Eq-depth T-ACM / Eq-width T-ACM / Eq-depth G >>> 0, or p 1 <<< p 2 R-ACM / T-ACM Eq-width / Eq-depth G  0, or p 1  p 2 Minimum in  1 = 0 and 0  k  1

March 7, Analysis : Normal Distn’s No integration possible for the normal pdf Shown numerically that p 1  p 2

March 7, Plot of the Function G

March 7, Estimation for Histograms is estimated as where N is the # of samples

March 7, Similarities of R-ACM and d-Exp Estimated for RACM True d-Exp

March 7, Simulations Details Simulations performed in Query Optimization: 4 independent runs per simulation. 100 random Databases per run  400 per simulation. 6 Relations, 6 Attributes per relation, 100 tuples per relation. Four independent runs on 100 databases: R-ACM vs. Traditional using: 11 bins, 50 values

March 7, Empirical Results # of times in which R-ACM yields better QEP # of times in which Eq-width yields better QEP # of times in which Eq-depth yields better QEP

March 7, Conclusions Applied PR Techniques to solve problem of relating Heuristic Function Accuracy and Solution Optimality Used a reasonable model of accuracy (doubly exponential distribution). Shown analytically how the high accuracy of heuristic function leads to a superior solutions. Numerically shown the results for normal distributions Shown that R-ACM yield better QEPs in a larger number of times than Equi-width and Equi-depth. Empirical results on randomly generated databases also shown the superiority of R-ACM. Graphically demonstrated the validity of our model.