Geometric Hashing Visual Recognition Lecture 9 “Answer me speedily” Psalm, 17.

Geometric Hashing Visual Recognition Lecture 9 “Answer me speedily” Psalm, 17

Main approaches to recognition: Pattern recognition Pattern recognition Invariants Invariants Alignment Alignment Part decomposition Part decomposition Functional description Functional description

Geometric Hashing A technique for model-based recognition of 3-D objects from unknown view points using single gray scale images A technique for model-based recognition of 3-D objects from unknown view points using single gray scale images Especially useful for recognition of scenes with Especially useful for recognition of scenes with overlapping and partially occluded objects overlapping and partially occluded objects An efficient matching algorithm which assumes An efficient matching algorithm which assumes affine approximation affine approximation The algorithm has an off-line model preprocessing phase and a recognition phase to reduce matching complexity The algorithm has an off-line model preprocessing phase and a recognition phase to reduce matching complexity Successfully tested in recognition of flat industrial objects appearing in composite occluded scenes. Successfully tested in recognition of flat industrial objects appearing in composite occluded scenes.

Definition of the Problem Object recognition in a cluttered 3-D scene Object recognition in a cluttered 3-D scene The models of the objects to be recognized are assumed to be known in advance The models of the objects to be recognized are assumed to be known in advance The objects in the scene may overlap and also be partially occluded by other (unknown objects) The objects in the scene may overlap and also be partially occluded by other (unknown objects) The image may be obtained from an arbitrary viewpoint The image may be obtained from an arbitrary viewpoint At this stage we will assume that we are dealing with flat objects At this stage we will assume that we are dealing with flat objects

Pliers and their composite scene – observe different lengths of handles in the composite scene due to tilt

We assume that the depth of the centroids of the objects in the scene is large compared to the focal length of the camera, and that the depth variation of the objects are small compared to the depth of their centroids We assume that the depth of the centroids of the objects in the scene is large compared to the focal length of the camera, and that the depth variation of the objects are small compared to the depth of their centroids Under these assumptions it is well known that the perspective projection is well approximated by a parallel (orthographic) projection with a scale factor Under these assumptions it is well known that the perspective projection is well approximated by a parallel (orthographic) projection with a scale factor Hence, two different images of the same flat object are in an affine 2-D correspondence Hence, two different images of the same flat object are in an affine 2-D correspondence There is a non singular 2X2 matrix A and 2-D (translation) vector b such that each point x in the first image is translated to the corresponding point There is a non singular 2X2 matrix A and 2-D (translation) vector b such that each point x in the first image is translated to the corresponding point Ax + b in the second image Ax + b in the second image

Our problem is: to recognize the objects in the scene, and for each recognized object to find the affine transformation that gives the best least-squares fit between the model of the object and its transformed image in the scene. to recognize the objects in the scene, and for each recognized object to find the affine transformation that gives the best least-squares fit between the model of the object and its transformed image in the scene.

Choice of ‘Interest Points’ The matching algorithm extracts ’interest points’ both in the object model images and in the scene image to find the best match between those point sets The matching algorithm extracts ’interest points’ both in the object model images and in the scene image to find the best match between those point sets Point extraction methods should be data base dependent. Different data bases of models will suggest different natural ‘interest points’ Point extraction methods should be data base dependent. Different data bases of models will suggest different natural ‘interest points’ For example - a DB of polyhedral objects naturally suggest the use of polyhedral vertices as ‘interest points’, while ‘curved’ objects suggest the use of sharp convexities, deep concavities and,maybe, zero curvature points For example - a DB of polyhedral objects naturally suggest the use of polyhedral vertices as ‘interest points’, while ‘curved’ objects suggest the use of sharp convexities, deep concavities and,maybe, zero curvature points

Extracted interest points in the composite scene image

‘Interest points’ do not have to appear physically in the image. For example, a point may be taken as the intersection of two non-parallel line segments, which are not necessary touching. ‘Interest points’ do not have to appear physically in the image. For example, a point may be taken as the intersection of two non-parallel line segments, which are not necessary touching. An ‘interest point’ does not necessarily have to correspond to a geometrical feature (e.g. an ‘interest operation’ based on high variance in intensity - Barnard 1980) An ‘interest point’ does not necessarily have to correspond to a geometrical feature (e.g. an ‘interest operation’ based on high variance in intensity - Barnard 1980) The basic assumption is that enough ‘interest points’ can be extracted in the relevant images The basic assumption is that enough ‘interest points’ can be extracted in the relevant images No special classification of these points is assumed No special classification of these points is assumed

Recognition of a Single Model Affine transformation of the plane is uniquely defined by the transformation of three non-collinear points Affine transformation of the plane is uniquely defined by the transformation of three non-collinear points There is unique affine transformation which maps any non-collinear triplet in the plane to another non- collinear triplet There is unique affine transformation which maps any non-collinear triplet in the plane to another non- collinear triplet Hence we may extract interesting points on the model and the scene and try to match non-collinear triplets of such points to obtain candidate affine transformations Hence we may extract interesting points on the model and the scene and try to match non-collinear triplets of such points to obtain candidate affine transformations Each such transformation can be checked by matching the transformed model against the scene (classical alignment see Huttenlocher and Ullman 87) Each such transformation can be checked by matching the transformed model against the scene (classical alignment see Huttenlocher and Ullman 87)

Unfavorable Complexity Quite Given m points in the model and n points in the scene,the worst case complexity is, where t is the complexity of matching the model against the scene Quite Given m points in the model and n points in the scene,the worst case complexity is, where t is the complexity of matching the model against the scene If we assume that m and n are of the same magnitude, and t is at least of magnitude m, the worst case complexity is of order ! If we assume that m and n are of the same magnitude, and t is at least of magnitude m, the worst case complexity is of order ! One way to reduce complexity is to classify the points in a distinctive way, so that each triplet can match only a small number of other triplets (however, such a distinction might not exist or cannot be made in reliable way) One way to reduce complexity is to classify the points in a distinctive way, so that each triplet can match only a small number of other triplets (however, such a distinction might not exist or cannot be made in reliable way) A more efficient triplet matching algorithm: GH A more efficient triplet matching algorithm: GH

The Algorithm Two major steps: A preprocessing step A preprocessing step  applied to the model points  does not use any information about the scene  is executed off-line before actual matching is attempted Proper matching Proper matching  Uses the data prepared by the first step to match the models against the scene  Execution time of this second step is the actual recognition time

Independence that allows comparison An affine transformation is uniquely defined by the transformation of three non-collinear points An affine transformation is uniquely defined by the transformation of three non-collinear points Consider a set of m points and pick any ordered subset of three non-collinear points Consider a set of m points and pick any ordered subset of three non-collinear points The two linearly independent vectors based on these points are a 2-D linear basis The two linearly independent vectors based on these points are a 2-D linear basis The coordinates of all model points can be expressed in this basis The coordinates of all model points can be expressed in this basis

Affine Transformation Any affine transformation applied to the set point will not change the set of coordinates based on the same ordered basis triplet Any affine transformation applied to the set point will not change the set of coordinates based on the same ordered basis triplet Let be an ordered affine basis triplet in the plane Let be an ordered affine basis triplet in the plane The affine coordinates of a point v are: The affine coordinates of a point v are: Application of an affine transformation T will transform the point v to: Application of an affine transformation T will transform the point v to: Hence has the same coordinates in the basis triplet Hence has the same coordinates in the basis triplet

Preprocessing Given an image of a model, where m ‘interest points’ have been extracted Given an image of a model, where m ‘interest points’ have been extracted For each ordered non-collinear triplet points the coordinates of all other m-3 model points are computed taking this triplet as an affine of the 2-D plane For each ordered non-collinear triplet points the coordinates of all other m-3 model points are computed taking this triplet as an affine of the 2-D plane Each such coordinate (after a proper quantization) is used as an entry to a hash-table, where we record the number of the basis-triplet at which the coordinate was obtained and the number of the model (in case of more than one model ) Each such coordinate (after a proper quantization) is used as an entry to a hash-table, where we record the number of the basis-triplet at which the coordinate was obtained and the number of the model (in case of more than one model ) The complexity of this preprocessing step is of order The complexity of this preprocessing step is of order per model per model New models added to the DB can be processed independently without re-computing the hash-table New models added to the DB can be processed independently without re-computing the hash-table

Recognition Given an image of a scene, where ‘interest points’ have been extracted Given an image of a scene, where ‘interest points’ have been extracted Choose an arbitrary ordered triplet in the scene and compute the coordinates of the scene points taking this triplet as an affine basis Choose an arbitrary ordered triplet in the scene and compute the coordinates of the scene points taking this triplet as an affine basis For each such coordinate check the appropriate entry in the hash- table, and for every pair (model number,basis-triplet number), which appears there, tally a vote for the model and the basis-triplet as corresponding to the triplet which was chosen in the scene (If there is only one model, we have to vote for the basis triplet alone) For each such coordinate check the appropriate entry in the hash- table, and for every pair (model number,basis-triplet number), which appears there, tally a vote for the model and the basis-triplet as corresponding to the triplet which was chosen in the scene (If there is only one model, we have to vote for the basis triplet alone) If a certain pair (model, basis-triplet) scores a large number of votes, decide that this triplet corresponds to the one chosen in the scene If a certain pair (model, basis-triplet) scores a large number of votes, decide that this triplet corresponds to the one chosen in the scene The uniquely defined affine transformation between these triplets is assumed to be the transformation between the model and the scene The uniquely defined affine transformation between these triplets is assumed to be the transformation between the model and the scene If the current triplet doesn’t score high enough, pass to another basis-triplet in the scene If the current triplet doesn’t score high enough, pass to another basis-triplet in the scene

Some Remarks For the algorithm to be successful it is enough, For the algorithm to be successful it is enough, theoretically, to pick three non – collinear points in the theoretically, to pick three non – collinear points in the scene, belonging to one model. scene, belonging to one model. The voting process, per triplet, is linear in the number The voting process, per triplet, is linear in the number points in the scene. points in the scene. Hence, the overall recognition time is dependent on Hence, the overall recognition time is dependent on the number of model points in the scene, and the number the number of model points in the scene, and the number of additional ‘interest points’ which belong to the scene, of additional ‘interest points’ which belong to the scene, but did not appear on any of the models but did not appear on any of the models In the worst case,we might have an order of In the worst case,we might have an order of operations operations

When the number of models is small the algorithm will be much faster When the number of models is small the algorithm will be much faster If there are k model points in a scene of n points, the probability of not choosing a model triplet in t trials is approximately: If there are k model points in a scene of n points, the probability of not choosing a model triplet in t trials is approximately: Hence, for a given, if we assume a lower bound on the ‘density’ of model points in a scene, then the number of trials t giving is of order Hence, for a given, if we assume a lower bound on the ‘density’ of model points in a scene, then the number of trials t giving is of order which is a constant independent of n which is a constant independent of n Since the verification process is linear in n we have an algorithm of complexity which will succeed with probability of at least Since the verification process is linear in n we have an algorithm of complexity which will succeed with probability of at least

Close Basis Points Numerical errors in the point coordinates are more severe when the basis points are close to each other compared to the other model points in the scene Numerical errors in the point coordinates are more severe when the basis points are close to each other compared to the other model points in the scene To overcome this problem: To overcome this problem: If a certain basis triplet gets a number of votes,which,on one hand, are not enough to accept it as a ‘candidate’ basis, but, on the other hand, do not justify total rejection – If a certain basis triplet gets a number of votes,which,on one hand, are not enough to accept it as a ‘candidate’ basis, but, on the other hand, do not justify total rejection – change this triplet by another triplet consisting of points, which were among the ‘voting’ coordinates, and are more distant from each other than the previous basis points. change this triplet by another triplet consisting of points, which were among the ‘voting’ coordinates, and are more distant from each other than the previous basis points. In the correct case this procedure will result in a growing match, as the numerical errors become less significant In the correct case this procedure will result in a growing match, as the numerical errors become less significant Even if a basis-triplet belonging to some model did not get enough votes due to noisy data, we still have chance to recover this model from another basis-triplet Even if a basis-triplet belonging to some model did not get enough votes due to noisy data, we still have chance to recover this model from another basis-triplet

Finding the Best Least-Squares Match Assume that we are looking for an affine match between the sequences of planar points and We would like to find the affine transformation of the plane which minimize the distance between the sequences and and To simplify the calculation, first translate the set so that

Then But Hence b and A appear independently in and we can minimize their contribution separately.

To minimize over b we simply put As to denote We have to find To find this minima one has to solve the following system of 4 equations (*) (*)

Since g is a quadratic function in each of its unknowns, (*) is a system of four linear equations with four unknowns. ( Actually two independent sets of two linear equations with two unknowns ) For i=1,2 define the following four n -dimensional vectors:

The solution of (*) is given by Where As we can see is dependent only on one set of points (in this case the model points), so we can know in advance, which sets of model points will give a solution for the minima.

Where As we can see is dependent only on set of points (in this case the model points), so we can know in advance, which sets of model points will give a solution for the minima. sets of model points will give a solution for the minima.

Left - Pliers rotated and tilted in space (see different length of handles) Right –Extracted ‘interest points’

Left – A fit obtained by calculating the affine transformation from three basis points Right – Same model is fitted using the best lest-squares affine match based on 10 points (all of which were recovered by the transformation obtained in the left image)

Summary of the Algorithm Our algorithm can be summarized as follows: A Represent the model objects by sets of ‘interest points’ B For each non-collinear triplet of model points compute the coordinates of all the other model points according to this basis coordinates of all the other model points according to this basis triplet and hash these coordinates into a table which stores all triplet and hash these coordinates into a table which stores all the parts(model number, basis triplet number) for every the parts(model number, basis triplet number) for every coordinate coordinate

C Given an image of a scene extract its interest points, choose C Given an image of a scene extract its interest points, choose a triplet of non-collinear points as a basis triplet and compute a triplet of non-collinear points as a basis triplet and compute the coordinates of the other points in this basis. the coordinates of the other points in this basis. For each such coordinate vote for the pairs (model number, For each such coordinate vote for the pairs (model number, basis triplet number), and find the pairs which obtained the basis triplet number), and find the pairs which obtained the most coincidence votes. most coincidence votes. If a certain pair scored a large number of votes, decide that If a certain pair scored a large number of votes, decide that its model and basis triplet correspond to the one chosen in its model and basis triplet correspond to the one chosen in the scene. the scene. If not, continue by checking another basis triplet If not, continue by checking another basis triplet

D For each candidate model and basis triplet from the previous step, establish a correspondence between the model points and the appropriate scene points, and find the affine transformation giving the best least-squares match for these corresponding sets. If the least-squares difference is too big go back to Step C If the least-squares difference is too big go back to Step C for another candidate triplet. for another candidate triplet. Finally, the transformed model is compared with the scene (this time we are considering not only previously extracted ‘interest points’). Finally, the transformed model is compared with the scene (this time we are considering not only previously extracted ‘interest points’). If this comparison gives a bed result go back again to Step C. If this comparison gives a bed result go back again to Step C.

Recognition under Similarity The situation when the viewing angle of the camera is the same both for the model and the image (e.g. industry setting) The situation when the viewing angle of the camera is the same both for the model and the image (e.g. industry setting) Similarity: private case of affine – no change is needed Similarity: private case of affine – no change is needed Similarity is orthogonal – two points are enough to form a basis which spans the 2D plane (third point is uniquely defined by the two) Similarity is orthogonal – two points are enough to form a basis which spans the 2D plane (third point is uniquely defined by the two) Same algorithm with pairs instead of triplets Same algorithm with pairs instead of triplets Complexity is reduced for preprocessing by a factor of m and worse case of the recognition by factor of n Complexity is reduced for preprocessing by a factor of m and worse case of the recognition by factor of n

Line Matching Extraction of points might be quite noisy. A line is Extraction of points might be quite noisy. A line is more stable feature than a point. Whenever lines can be extracted in a reliable way, e.g. scenes of polyhedral more stable feature than a point. Whenever lines can be extracted in a reliable way, e.g. scenes of polyhedral objects, we can apply similar procedures to lines objects, we can apply similar procedures to lines All the point matching techniques apply directly to lines, since lines can be viewed as points in the dual space All the point matching techniques apply directly to lines, since lines can be viewed as points in the dual space Three lines which have no parallel pair are a basis of the Three lines which have no parallel pair are a basis of the affine space, each line has unique coordinates in this basis affine space, each line has unique coordinates in this basis We repeat exactly the matching procedure as is We repeat exactly the matching procedure as is We can use line segments to reduce the complexity of the matching algorithm We can use line segments to reduce the complexity of the matching algorithm

If the endpoints of line segments can be reliably extracted, If the endpoints of line segments can be reliably extracted, then instead of a triplet of points or lines as a basis, we can then instead of a triplet of points or lines as a basis, we can take a line segment plus an additional point. take a line segment plus an additional point. The reduction of complexity is significant - Since an affine transformation maps collinear points into The reduction of complexity is significant - Since an affine transformation maps collinear points into collinear points and points of line intersection into points of the same line intersection, we may develop algorithms which combine point and line information collinear points and points of line intersection into points of the same line intersection, we may develop algorithms which combine point and line information For example, even if the algorithm utilizes point triplets as an affine basis, the verification can be done not only on other ‘interest points’ coordinates, but also on line For example, even if the algorithm utilizes point triplets as an affine basis, the verification can be done not only on other ‘interest points’ coordinates, but also on line equations, etc. equations, etc.

Experimental Results Recognition results of a composite overlapping scene of Recognition results of a composite overlapping scene of both pliers, which was also significantly tilted both pliers, which was also significantly tilted In the scene we have additional ‘interest points’ which are created by the superposition of the two objects. These points do not correspond to the ‘interest points’ of the original models In the scene we have additional ‘interest points’ which are created by the superposition of the two objects. These points do not correspond to the ‘interest points’ of the original models A number of the original ‘interest points’ are occluded in the scene A number of the original ‘interest points’ are occluded in the scene The total number of ‘interest points’ in the scene (next) is 28. 16 of them are unoccluded model points of the second pliers out of 21 original model points (see next) The total number of ‘interest points’ in the scene (next) is 28. 16 of them are unoccluded model points of the second pliers out of 21 original model points (see next)

Extracted interest points in the composite scene image

Running the recognition algorithm on all the possible basis Running the recognition algorithm on all the possible basis triplets of the scene. For each triplet we found the set of best (maximum vote) matching model triplets. For each triplet we found the set of best (maximum vote) matching model triplets. The number of points identified by such a triplet as model points are the, so called, no. of votes in the first column of the table The number of points identified by such a triplet as model points are the, so called, no. of votes in the first column of the table The second column gives the number of triplets, which obtained these votes The second column gives the number of triplets, which obtained these votes The third column gives the number of triplets which were The third column gives the number of triplets which were verified as belonging to the model (correct triplets). verified as belonging to the model (correct triplets). Experimental Results

Remarks: a) Since we have 16 model points in the scene, we expect a maximum of 13 votes for a correct triplet. maximum of 13 votes for a correct triplet. b) Since all 6 ordered occurrences of the same unordered triplet will give the same voting result, unordered triplets are counted in the statistics. In the algorithm we are dealing with ordered triplets, thus, for example, we have 4x6=24 ordered basis triplets with the maximal number of votes.

The former composite pliers scene with an additional object which do not belong to the model data base

Conclusions The method is based on the representation of objects by point sets and matching corresponding sets of points The method is based on the representation of objects by point sets and matching corresponding sets of points By applying geometric constraints these sets of points can be further represented by a small subset of points (basis points) By applying geometric constraints these sets of points can be further represented by a small subset of points (basis points) The size of the basis depends on the transformation applied to the models The size of the basis depends on the transformation applied to the models A basis of 2 points is sufficient for 2-D scenes under rotation, translation and scale A basis of 2 points is sufficient for 2-D scenes under rotation, translation and scale A basis of 3 points is sufficient for affine transformation for the perspective view A basis of 3 points is sufficient for affine transformation for the perspective view The process is divided into preprocessing and recognition – reduces complexity, enables off-line preprocessing The process is divided into preprocessing and recognition – reduces complexity, enables off-line preprocessing

Error Analysis Analysis of the effect of noise on the accuracy of the measurements obtained from the image Analysis of the effect of noise on the accuracy of the measurements obtained from the image Feature coordinates are quantized to hash Feature coordinates are quantized to hash In the presence of noise there might be some error in the extracted values of the coordinates In the presence of noise there might be some error in the extracted values of the coordinates This may result in accessing incorrect bins of the hash table This may result in accessing incorrect bins of the hash table Calculate range of hash table bins which are consistent with the feature coordinates extracted in the presence of noise Calculate range of hash table bins which are consistent with the feature coordinates extracted in the presence of noise By accessing all these bins we assure that votes for the correct solution will not be lost By accessing all these bins we assure that votes for the correct solution will not be lost

Redundancy factor The need to access a range of bins for a given coordinate results in an increased number of candidate (model,basis) pairs participating in the voting The need to access a range of bins for a given coordinate results in an increased number of candidate (model,basis) pairs participating in the voting Incorrect (model,basis) pairs might get high vote at random Incorrect (model,basis) pairs might get high vote at random To estimate this effect on the likelihood of getting false matches to a given basis we estimate the size of the set of (model,basis) pairs counted for a given image point To estimate this effect on the likelihood of getting false matches to a given basis we estimate the size of the set of (model,basis) pairs counted for a given image point The number of bins in the hash table which are consistent with a given coordinate assuming a certain noise model is defined as the redundancy factor The number of bins in the hash table which are consistent with a given coordinate assuming a certain noise model is defined as the redundancy factor Estimate the redundancy factor for the case of point matching under various transformations and estimate the probability of a ‘random’ candidate solution to score relatively high vote. Estimate the redundancy factor for the case of point matching under various transformations and estimate the probability of a ‘random’ candidate solution to score relatively high vote. Worse case analysis is assumed Worse case analysis is assumed

The Probability of False Matches In order to evaluate the efficiency of the voting stage we have to estimate the average number of solutions that may get a high vote In order to evaluate the efficiency of the voting stage we have to estimate the average number of solutions that may get a high vote Given a certain vote threshold and a random pair (model,basis) we would like to know the probability of this random pair to get more than votes Given a certain vote threshold and a random pair (model,basis) we would like to know the probability of this random pair to get more than votes Although such ‘false solutions’ will be rejected in the following verification stages, their expected number directly affects the computational efficiency of the technique Although such ‘false solutions’ will be rejected in the following verification stages, their expected number directly affects the computational efficiency of the technique We assume that each of the bins in the hash table has equal probability to be picked in the voting procedure We assume that each of the bins in the hash table has equal probability to be picked in the voting procedure Note that the coordinates of the points in different bases are dependent, hence the computation of their Note that the coordinates of the points in different bases are dependent, hence the computation of their distribution is not straightforward, and the former assumption is simplistic distribution is not straightforward, and the former assumption is simplistic

What is the probability that a certain random basis will obtain more than votes ? Let k be the size of a basis; M - the number of models; m # of features on a model, n # of features in the image; - the fraction of model features serving as an acceptance threshold; N – size of the hash table; b - voting redundancy factor Assuming one model in the DB, the entries of the hash table contain the information on the bases at which the address coordinates occurred Given a k - feature basis in the image, the coordinates of all the other n - k features are computed, and each of them votes for a certain bin in the hash table.

Once a k -tuple of basis features in the image is chosen, the Once a k -tuple of basis features in the image is chosen, the coordinates of the n - k other image features are computed, and for each such coordinate the hash table is accessed (approximately) b times coordinates of the n - k other image features are computed, and for each such coordinate the hash table is accessed (approximately) b times Since each model basis has m – k entries in the N bin table,we Since each model basis has m – k entries in the N bin table,we assume that each basis has a probability of to be chosen in a single access assume that each basis has a probability of to be chosen in a single access The probability to chose a certain basis B in b accesses is The probability to chose a certain basis B in b accesses is (for small p it is ~bp) (for small p it is ~bp) The number of votes V scored by a basis B in n - k accesses can be computed using the Binomial Distribution with probability The number of votes V scored by a basis B in n - k accesses can be computed using the Binomial Distribution with probability, namely, namely The probability that V exceeds the threshold : The probability that V exceeds the threshold :

Since is usually very small and n-k is large, the Binomial Distribution is well approximated by the Poison Distribution with Hence is well approximated by : The calculation of gave us the probability of one specific basis to be voted as a correct match The calculation of gave us the probability of one specific basis to be voted as a correct match However, we are interested in the average number of bases that will be accepted as a correct match However, we are interested in the average number of bases that will be accepted as a correct match Let be the number of model bases that can be a-priori matched to a given k -tuple basis in the image Let be the number of model bases that can be a-priori matched to a given k -tuple basis in the image - since each basis is defined by a pair of model points - since each basis is defined by a pair of model points

Let X be the number of bases that accepted more than votes Then X is modeled by the Binomial Distribution Hence the expected number of ‘accepted bases’ is The above calculation is for one basis k-tuple chosen in the image. It increases linearly with the number of image k-tuples (bases) examined and with the number of models M in the data base

The probability to score 0.6 m votes. In the table one can see some typical examples for the expect number of ‘random’ bases achieving a 0.6 m vote. (The total number of bins in this case is 7,200) One can see that these numbers are very small.

Coordinate Error Estimation We assume 2-D recognition under affine transformation We assume 2-D recognition under affine transformation We assume that the models can be acquired under ‘ideal’ We assume that the models can be acquired under ‘ideal’ circumstances (e.g. from a CAD model), hence the preprocessing step is noiseless circumstances (e.g. from a CAD model), hence the preprocessing step is noiseless In the recognition step, image coordinates of interest points are measured and are represented by 2-D vectors In the recognition step, image coordinates of interest points are measured and are represented by 2-D vectors We may define a norm on this 2-D vector space. We will We may define a norm on this 2-D vector space. We will usually use either the Euclidean or the maximum usually use either the Euclidean or the maximum coordinate norm. Assume that image point measurements introduce an error of at most in the given norm. coordinate norm. Assume that image point measurements introduce an error of at most in the given norm.

The computation of the coordinates of an interest point The computation of the coordinates of an interest point in the affine basis (a,b,c) can be formulated as a solution of the linear system of 2 equations in 2 unknowns Ax=d in the affine basis (a,b,c) can be formulated as a solution of the linear system of 2 equations in 2 unknowns Ax=d If a is the origin of the affine basis triplet, then the two columns of the matrix are the difference vectors of the basis interest points b-a and c-a respectively, and the free vector is If a is the origin of the affine basis triplet, then the two columns of the matrix are the difference vectors of the basis interest points b-a and c-a respectively, and the free vector is These vectors are represented in image coordinates, while the solution vector x gives the representation of the point in the affine basis (a,b,c) coordinates These vectors are represented in image coordinates, while the solution vector x gives the representation of the point in the affine basis (a,b,c) coordinates

Taking the errors into account, our task can be formulated as the solution of the following linear system: where and,are the errors of the matrix A and the vectors x and d respectively By the nature of our point measurements, we may assume that the absolute values of entries of the matrix and the vector are less than some given measurement error Note that (e.g. Golub) where is the condition number of the matrix A The above inequality holds for any vector norm and its appropriate matrix norm.

The above inequality gives an estimate of the maximal relative error which can be introduced by the image measurement noise into the coordinates of the hash- table address x The above inequality gives an estimate of the maximal relative error which can be introduced by the image measurement noise into the coordinates of the hash- table address x Hence, the voting procedure reflects this noise: Hence, the voting procedure reflects this noise: For an address x all the bins with addresses in the of x participate in the voting. of x participate in the voting. This ensures that votes for a correct model basis are This ensures that votes for a correct model basis are not missed due to noise not missed due to noise In practice, tighter bounds usually apply In practice, tighter bounds usually apply

Since appropriate voting bins for each address can be Since appropriate voting bins for each address can be evaluated in advance, we do not expect a correct basis triplet to achieve less votes than the corresponding number of unoccluded model points evaluated in advance, we do not expect a correct basis triplet to achieve less votes than the corresponding number of unoccluded model points There still remains the possibility of a ‘random’ basis- triplet achieving a large number of votes. There still remains the possibility of a ‘random’ basis- triplet achieving a large number of votes. Such a ‘wrong’ candidate will be discovered by two verification procedures that are incorporated in the algorithm Such a ‘wrong’ candidate will be discovered by two verification procedures that are incorporated in the algorithm Although ‘wrong’ candidates will be discovered in the Although ‘wrong’ candidates will be discovered in the verification step and discarded, we would still like to show verification step and discarded, we would still like to show that the probability of a ‘random’ configuration to get a high vote is small that the probability of a ‘random’ configuration to get a high vote is small See simulation results next See simulation results next

Simulation - Affine Transformation Under affine we expect a greater effect of noise, since the Under affine we expect a greater effect of noise, since the condition number of the matrix A is no longer bounded by 2 condition number of the matrix A is no longer bounded by 2 We have often basis triplets with k(A) between 6 and 10 We have often basis triplets with k(A) between 6 and 10 Bases triplets resulting in a matrix A with a relatively big condition number represent unstable solutions, hence are not too informative Bases triplets resulting in a matrix A with a relatively big condition number represent unstable solutions, hence are not too informative Such bases can be eliminated from the recognition process without entering the voting procedure Such bases can be eliminated from the recognition process without entering the voting procedure

Relative error in point coordinate for a given basis triplet This error depends also on the distance of that point from the This error depends also on the distance of that point from the origin of the basis triplet (see inequality) origin of the basis triplet (see inequality) Hence, even if the condition number is of moderate size, still Hence, even if the condition number is of moderate size, still might be relatively large might be relatively large Hence, only coordinates with under a prescribed threshold are participating in the voting procedure Hence, only coordinates with under a prescribed threshold are participating in the voting procedure The threshold on was taken to be 0.25, namely we have allowed x to deviate at most 25% of its size in the norm The threshold on was taken to be 0.25, namely we have allowed x to deviate at most 25% of its size in the norm Such thresholding, usually, resulted in approximately 70% of all the possible coordinate values participating in the voting procedure. Such thresholding, usually, resulted in approximately 70% of all the possible coordinate values participating in the voting procedure.

Percentages of coordinates which were obtained in three different simulations of the recognition having ratio or less with the parameters m=12,n=20,.

Simulation results from m=12,n=20, Some representative results of the simulation experiments Number of votes obtained by all the image bases Number of votes obtained by all the image bases Estimated probability of random bases matches Estimated probability of random bases matches The total number of possible model-image bases pairing is The total number of possible model-image bases pairing is The probabilities of in both columns are of the same magnitude The probabilities of in both columns are of the same magnitude

Simulation results for M=1,m=15, Although the absolute number of bases with a high vote in may look large,we should note that in these cases the search space is much bigger than in the cases of the similarity transformation Thus the probability of a randomly chosen image basis to obtain a high score remains very low.

Discussion The recognition part of the Geometric Hashing technique is based on two major stages: voting and verification Are they both necessary? Are they both necessary? Can the voting procedure on its own recover the correct Can the voting procedure on its own recover the correct solution only, without introducing false ‘candidates’? solution only, without introducing false ‘candidates’? - The examples, that we have examined, strongly suggest that - The examples, that we have examined, strongly suggest that the voting procedure by itself can serve as a reliable the voting procedure by itself can serve as a reliable recognition technique only for the case of rigid motion recognition technique only for the case of rigid motion (rotation and translation) and for non complicated scenes (rotation and translation) and for non complicated scenes under the similarity transformation. It cannot be the only under the similarity transformation. It cannot be the only procedure in complicated scenes under the affine procedure in complicated scenes under the affine transformation transformation

Is the voting stage useful? Why not apply the verification Is the voting stage useful? Why not apply the verification stage directly to the candidate solutions? stage directly to the candidate solutions? - The voting stage is just a ‘filtering’ procedure which should eliminate a ‘big chunk’ of candidate false solution before the direct verification is applied. - The voting stage is just a ‘filtering’ procedure which should eliminate a ‘big chunk’ of candidate false solution before the direct verification is applied. A reliable verification procedure is usually quiet tedious and time consuming, hence big time saving can be achieved by avoiding this procedure. A reliable verification procedure is usually quiet tedious and time consuming, hence big time saving can be achieved by avoiding this procedure. Thus, we have to examine the ratio of the ‘false candidates’ emerging from the voting stage compared with the total number of candidate solutions which have to be examined by direct verification. Thus, we have to examine the ratio of the ‘false candidates’ emerging from the voting stage compared with the total number of candidate solutions which have to be examined by direct verification. This ratio is the ‘filtering factor’ of the voting stage. This ratio is the ‘filtering factor’ of the voting stage.

The ‘filtering factor’ of the voting stage equals the probability The ‘filtering factor’ of the voting stage equals the probability that a false model basis will get a vote above the preset that a false model basis will get a vote above the preset threshold. threshold. The results show that the estimated ‘filtering factor’ of the The results show that the estimated ‘filtering factor’ of the Geometric Hashing voting stage is quite significant even for Geometric Hashing voting stage is quite significant even for the more difficult affine transformation case the more difficult affine transformation case Note that the error analysis assumed a worst case error, so that no correct solution would be missed. By using a different (e.g. average case) error model, one can increase the time saved, although the recognition might be somewhat less reliable. Note that the error analysis assumed a worst case error, so that no correct solution would be missed. By using a different (e.g. average case) error model, one can increase the time saved, although the recognition might be somewhat less reliable. Conclusion: The application of the voting procedure causes a significant reduction in the complexity of recognition. Conclusion: The application of the voting procedure causes a significant reduction in the complexity of recognition.

Extensions 3-D objects recognition from range data can be accomplished by similar methods using 3 point bases. 3-D objects recognition from range data can be accomplished by similar methods using 3 point bases. Recognition of non-flat 3-D objects from 2-D images, using the following various options: Recognition of non-flat 3-D objects from 2-D images, using the following various options: 1. Approximation of the model objects by ‘almost’ planar faces and treating each such face as a model. faces and treating each such face as a model. The problem then reduces to recognition of flat 3-D objects. The problem then reduces to recognition of flat 3-D objects. This method will be especially favorable for polyhedral This method will be especially favorable for polyhedral objects, however it will not apply for objects without a stable objects, however it will not apply for objects without a stable polyhedral approximation. polyhedral approximation.

2. Discretization of the space into viewing directions. Given a viewing direction we are faced with a similarity Given a viewing direction we are faced with a similarity transformation only, which solution has a reduced transformation only, which solution has a reduced complexity. However the procedure will have to register all complexity. However the procedure will have to register all allowed viewing directions. allowed viewing directions. 3. Looking for 4 point correspondences between the 3-D model and 2-D image. model and 2-D image. Four non-coplanar points define a 3D basis. Other model points Four non-coplanar points define a 3D basis. Other model points can be represented by their coordinates in this basis. Assuming can be represented by their coordinates in this basis. Assuming the affine approximation of the viewing transformation, image the affine approximation of the viewing transformation, image points will have the same linear representation by the points will have the same linear representation by the corresponding four point set. Note, however, that this set is not corresponding four point set. Note, however, that this set is not an affine 2-D basis but only a spanning set, hence the an affine 2-D basis but only a spanning set, hence the representation is not unique. representation is not unique.

Extensions – continued Implementation of similar matching procedure based on synthesis of point and line information Implementation of similar matching procedure based on synthesis of point and line information Affine invariant curve matching Affine invariant curve matching Recognition of objects using parameterized models Recognition of objects using parameterized models

Conclusions The method is based on the representation of objects by point sets and matching corresponding sets of points The method is based on the representation of objects by point sets and matching corresponding sets of points By applying geometric constraints these sets of points can be further represented by a small subset of points (basis points) By applying geometric constraints these sets of points can be further represented by a small subset of points (basis points) The size of the basis depends on the transformation applied to the models The size of the basis depends on the transformation applied to the models A basis of 2 points is sufficient for 2-D scenes under rotation, translation and scale A basis of 2 points is sufficient for 2-D scenes under rotation, translation and scale A basis of 3 points is sufficient for affine transformation for the perspective view A basis of 3 points is sufficient for affine transformation for the perspective view The process is divided into preprocessing and recognition – reduces complexity, enables off-line preprocessing The process is divided into preprocessing and recognition – reduces complexity, enables off-line preprocessing

Recognize !

Geometric Hashing Visual Recognition Lecture 9 “Answer me speedily” Psalm, 17.

Similar presentations

Presentation on theme: "Geometric Hashing Visual Recognition Lecture 9 “Answer me speedily” Psalm, 17."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Geometric Hashing Visual Recognition Lecture 9 “Answer me speedily” Psalm, 17.

Similar presentations

Presentation on theme: "Geometric Hashing Visual Recognition Lecture 9 “Answer me speedily” Psalm, 17."— Presentation transcript:

Similar presentations

About project

Feedback