Download presentation
1
SIFT Guest Lecture by Jiwon Kim
2
SIFT Features and Its Applications
3
Autostitch Demo
4
Autostitch Fully automatic panorama generation
Input: set of images Output: panorama(s) Uses SIFT (Scale-Invariant Feature Transform) to find/align images
5
1. Solve for homography
6
1. Solve for homography
7
1. Solve for homography
8
2. Find connected sets of images
9
2. Find connected sets of images
10
2. Find connected sets of images
11
3. Solve for camera parameters
New images initialised with rotation, focal length of best matching image
12
3. Solve for camera parameters
New images initialised with rotation, focal length of best matching image
13
4. Blending the panorama Burt & Adelson 1983
Blend frequency bands over range l
14
2-band Blending Low frequency (l > 2 pixels)
High frequency (l < 2 pixels)
15
Linear Blending
16
2-band Blending
17
So, what is SIFT? Scale-Invariant Feature Transform David Lowe at UBC
Scale/rotation invariant Currently best known feature descriptor Many real-world applications Object recognition Panorama stitching Robot localization Video indexing …
18
Example: object recognition
19
SIFT properties Locality: features are local, so robust to occlusion and clutter Distinctiveness: individual features can be matched to a large database of objects Quantity: many features can be generated for even small objects Efficiency: close to real-time performance
20
SIFT algorithm overview
Feature detection Detect points that can be repeatably selected under location/scale change Feature description Assign orientation to detected feature points Construct a descriptor for image patch around each feature point Feature matching
21
1. Feature detection Detect points stable under location/scale change
Build continuous space (x, y, scale) Approximated by multi-scale Difference-of-Gaussian pyramid Select maxima/minima in (x, y, scale)
22
1. Feature detection
23
1. Feature detection Localize extrema by fitting a quadratic
Sub-pixel/sub-scale interpolation using Taylor expansion Take derivative and set to zero
24
1. Feature detection Discard low-contrast/edge points
Low contrast: discard keypoints with < threshold Edge points: high contrast in one direction, low in the other compute principal curvatures from eigenvalues of 2x2 Hessian matrix, and limit ratio
25
1. Feature detection Example (a) 233x189 image (b) 832 DOG extrema
(c) 729 left after peak value threshold (d) 536 left after testing ratio of principle curvatures
26
2. Feature description Assign orientation to keypoints
Create histogram of local gradient directions computed at selected scale Assign canonical orientation at peak of smoothed histogram
27
2. Feature description Construct SIFT descriptor
Create array of orientation histograms 8 orientations x 4x4 histogram array = 128 dimensions
28
2. Feature description Advantage over simple correlation
Gradients less sensitive to illumination change Gradients may shift: robust to deformation, viewpoint change
29
Performance: stability to noise
Match features after random change in image scale & orientation, with differing levels of image noise Find nearest neighbor in database of 30,000 features
30
Performance: stability to affine change
Match features after random change in image scale & orientation, with 2% image noise, and affine distortion Find nearest neighbor in database of 30,000 features
31
Performance: distinctiveness
Vary size of database of features, with 30 degree affine change, 2% image noise Measure % correct for single nearest neighbor match
32
3. Feature matching For each feature in A, find nearest neighbor in B
33
3. Feature matching Nearest neighbor search too slow for large database of 128-dimenional data Approximate nearest neighbor search: Best-bin-first [Beis et al. 97]: modification to k-d tree algorithm Use heap data structure to identify bins in order by their distance from query point Result: Can give speedup by factor of 1000 while finding nearest neighbor (of interest) 95% of the time
34
3. Feature matching Reject false matches
Compare distance of nearest neighbor to second nearest neighbor Common features aren’t distinctive, therefore bad Threshold of 0.8 provides excellent separation
35
3. Feature matching Now, given feature matches…
Find an object in the scene Solve for homography (panorama) …
36
3. Feature matching Example: 3D object recognition
37
3. Feature matching 3D object recognition
Assume affine transform: clusters of size >=3 Looking for 3 matches out of 3000 that agree on same object and pose: too many outliers for RANSAC or LMS Use Hough Transform Each match votes for a hypothesis for object ID/pose Voting for multiple bins & large bin size allow for error due to similarity approximation
38
3. Feature matching 3D object recognition: solve for pose
Affine transform of [x,y] to [u,v]: Rewrite to solve for transform parameters:
39
3. Feature matching 3D object recognition: verify model
Discard outliers for pose solution in prev step Perform top-down check for additional features Evaluate probability that match is correct Use Bayesian model, with probability that features would arise by chance if object was not present Takes account of object size in image, textured regions, model feature count in database, accuracy of fit [Lowe 01]
40
Planar recognition Training images
41
Planar recognition Reliably recognized at a rotation of 60° away from the camera Affine fit approximates perspective projection Only 3 points are needed for recognition
42
3D object recognition Training images
43
3D object recognition Only 3 keys are needed for recognition, so extra keys provide robustness Affine model is no longer as accurate
44
Recognition under occlusion
45
Illumination invariance
46
Applications of SIFT Object recognition Panoramic image stitching
Robot localization Video indexing … The Office of the Past Document tracking and recognition
47
Location recognition
48
Robot Localization
49
Map continuously built over time
50
Locations of map features in 3D
51
Sony Aibo SIFT usage: Recognize charging station Communicate
with visual cards Teach object recognition
52
The Office of the Past Paper everywhere
53
Unify physical and electronic desktops
Video camera Recognize video of paper on physical desktop Tracking Recognition Linking Desktop
54
Unify physical and electronic desktops
Video camera Applications Find lost documents Browse remote desktop Find electronic version History-based queries Desktop
55
Example input video
56
Demo – Remote desktop
57
System overview Video camera Computer User Desk
Here is an overview of our system. In the setup, a video camera is mounted above the desk looking straight down to record the desktop.
58
System overview Video of desk Given the video of the physical desktop,
59
System overview Video of desk Images from PDF
..and images of corresponding electronic documents extracted from PDF’s
60
System overview Video of desk Images from PDF Track & recognize
…the system tracks and recognizes the paper documents by matching between the two, Track & recognize
61
System overview Video of desk Images from PDF Internal representation
…and produces an internal graphical representation that encodes the evolution of the stack structure over time. Desk Track & recognize T T+1
62
System overview Video of desk Images from PDF Internal representation
We call each of these graphs a “scene graph”. Desk Track & recognize T T+1 Scene Graph
63
System overview Where is my W-2? Video of desk Images from PDF
Internal representation Then, when the user issues a query, such as, where is my W-2 form?, Desk Track & recognize T T+1
64
System overview Where is my W-2? Answer Video of desk Images from PDF
Internal representation …the system answers the query by consulting the scene graphs. Track & recognize Desk Desk T T+1
65
Assumptions Document Corresponding electronic copy exists
No duplicates of same document We make a number of assumptions to simplify the tracking & recognition problem. First, we assume that each paper document has a corresponding electronic copy on the computer, and also that there are no duplicate copies of the same document, in other words, each document is unique and distinct from each other.
66
Assumptions Document Motion Corresponding electronic copy exists
No duplicates of same document Motion 3 event types: move/entry/exit One document at a time Only topmost document can move A number of other assumptions are made to constrain the motion of the documents. For instance, we assume that there are 3 types of events, move/entry/exit, and only one document on top of a stack can move at a time. Although these assumptions do limit the capability of our system to handle more realistic situations, they were carefully chosen to make the problem tractable while still allowing interesting applications, as we will demonstrate later in the talk.
67
Non-assumptions Desk need not be initially empty
Also note that there are certain assumptions we don’t make. For instance, we don’t require the desk to be initially empty. The desk is allowed to start with unknown papers on it, and our system automatically discovers the documents as observations accumulate over time.
68
Non-assumptions Desk need not be initially empty Stacks may overlap
(-10 min) Also, the paper stacks are allowed to overlap with each other, forming a complex graph structure, rather than cleanly separated stacks.
69
Algorithm overview Input Frames … …
Here is a step-by-step overview of the tracking & recognition algorithm. Given the input sequence,
70
Algorithm overview Input Frames … … Event Detection before after
Lastly, we update the scene graph according to the event. The above 4 steps are repeated for each event in the input sequence. (-11 min) Now, I’ll explain each step of the algorithm.
71
“A document moved from (x1,y1) to (x2,y2)”
Algorithm overview Input Frames … … Event Detection before after Event Interpretation “A document moved from (x1,y1) to (x2,y2)” Lastly, we update the scene graph according to the event. The above 4 steps are repeated for each event in the input sequence. (-11 min) Now, I’ll explain each step of the algorithm.
72
“A document moved from (x1,y1) to (x2,y2)”
Algorithm overview Input Frames … … Event Detection before after Event Interpretation “A document moved from (x1,y1) to (x2,y2)” File1.pdf Lastly, we update the scene graph according to the event. The above 4 steps are repeated for each event in the input sequence. (-11 min) Now, I’ll explain each step of the algorithm. Document Recognition File2.pdf File3.pdf
73
“A document moved from (x1,y1) to (x2,y2)”
Algorithm overview Input Frames … … Event Detection before after Event Interpretation “A document moved from (x1,y1) to (x2,y2)” File1.pdf Lastly, we update the scene graph according to the event. The above 4 steps are repeated for each event in the input sequence. (-11 min) Now, I’ll explain each step of the algorithm. Document Recognition File2.pdf File3.pdf Scene Graph Update Desk Desk
74
“A document moved from (x1,y1) to (x2,y2)”
Algorithm overview Input Frames … … Event Detection before after Event Interpretation “A document moved from (x1,y1) to (x2,y2)” SIFT File1.pdf Lastly, we update the scene graph according to the event. The above 4 steps are repeated for each event in the input sequence. (-11 min) Now, I’ll explain each step of the algorithm. Document Recognition File2.pdf File3.pdf Scene Graph Update Desk Desk
75
Document tracking example
Here’s an example of a move event, before after
76
Document tracking example
..where this top-left document before after
77
Document tracking example
..moves to the right. before after
78
Document tracking example
To classify the event, we first extract image features in both images. before after
79
Document tracking example
..we match them between the two images. before after
80
Document tracking example
We identify features that have no match, shown in green before after
81
Document tracking example
..and discard them before after
82
Document tracking example
Next we cluster matching pairs of features according to their relative transformation Red features moved under the same xform, while blue ones stayed where they are before after
83
Document tracking example
We look at the red cluster, and if it contains sufficiently many features, the event is considered a move. Otherwise it’s a non-move and subjected to further classification. before after
84
Document tracking example
Motion: (x,y,θ) If it’s a move, we obtain the motion from the transformation of red cluster before after
85
Document Recognition Match against PDF image database … … File1.pdf
..where we match features in the region identified as the document against a database of PDF images stored on the computer, also using SIFT features. File1.pdf File2.pdf File3.pdf File4.pdf File5.pdf File6.pdf
86
Document Recognition Performance analysis
Tested 20 pages against database of 162 pages We tested the performance of our recognition method by testing 20 pages against a database of 162 pages of documents, both of which were mostly from computer science research papers, and the method was able to correctly differentiate and recognize all of them.
87
Document Recognition Performance analysis
Tested 20 pages against database of 162 pages ~200x300 pixels per document for reliable match Recognition Rate We also tested the performance with varying document image resolutions. In this graph, the X axis shows the length of the longer side of the document in pixels, and the Y axis shows the success rate of recognition. Document Resolution
88
Document Recognition Performance analysis
Tested 20 pages against database of 162 pages ~200x300 pixels per document for reliable match 0.9 Recognition Rate We found that to achieve a recognition rate of 90% the documents must be at least 200 by 300 pixels large. Note that this resolution is not high enough for recognizing text using techniques such as OCR, but is still good enough for reliable recognition of individual documents. 300 Document Resolution
89
Results Input video Running time ~40 minutes 1024x768 @ 15 fps
22 documents, 49 events Running time Video processed offline No optimization A few hours for entire video Before showing a demo of our system, let me provide some statistics on the input data and video processing. The input video was recorded over a period of 40 minutes, at 1024x768 resolution and 15 frames per second. It contained 22 documents on the desk, with 49 events. The input video was analyzed offline, that is, after the recording was over. We did not optimize the performance at all, and it took a few hours to process the entire input sequence.
90
Demo – Paper tracking (-18 min)
Let me show a demo of the query interface to our system, using the same input sequence I demoed at the beginning of the talk. The right window is the visualization panel showing the current state of the desktop. The left window shows a list of thumbnails of the documents found by the system. The user can browse this list and click on the thumbnail of the document of interest to query its location in the stack. The visualization expands the stack that contains the selected document and highlights the document. The user can open the PDF file of the selected document as well. The interface also supports a couple of alternative ways to specify a document. The user can locate a document by doing a keyword search for the title or the author. Here I’m looking for the document that contains the string “digitaldesk” in its title. The system tells me he paper is in this tack. The user can also sort the thumbnails in various ways. For example, the documents can be sorted in decreasing order of the last time the user accessed each document. The oldest document at the end of this list lies at the bottom of this stack; the second oldest document no longer exists on the desk; and the next oldest document is at the bottom of this stack, and so forth. On the other hand, the most recent document at the beginning of this list is on top of this stack; the next most recent document is on top of this stack, and so forth.
91
Photo sorting example Here’s an example of using our system for sorting digital photographs. Sorting a large number of digital photographs using the computer interface is usually a fairly tedious task.
92
Photo sorting example In contrast, it is very easy to sort printed photographs into physical stacks. So we printed out digital photographs on sheets of paper, and recorded the user sorting them into physical stacks on the desk. Here we sort the photographs from two source stacks, one shown on the bottom right of the video, and the other outside the camera view in the user's hand, into three target stacks based on the content of the pictures.
93
Demo – Photo sorting (-20 min)
After processing this video with our system, we can click on each of the three stacks in the query interface, and assign it to an appropriate folder on the computer. Then our system automatically organizes the corresponding digital photographs into the designated folder, and pops up the folder in thumbnail view. I should point out that one clear drawback is the overhead of first having to print out the photographs on paper. However, we think that this can be useful for people who are not familiar with computer interfaces.
94
Future work Enhance realism More applications
Handle more realistic desktops Real-time performance More applications Support other document tasks E.g., attach reminder, cluster documents Beyond documents Other 3D desktop objects, books/CD’s
95
Summary SIFT is: Scale/rotation invariant local feature
Highly distinctive Robust to occlusion, illumination change, 3D viewpoint change Efficient (real-time performance) Suitable for many useful applications
96
References Distinctive image features from scale-invariant keypoints
David G. Lowe, International Journal of Computer Vision, 60, 2 (2004), pp Recognising panoramas Matthew Brown and David G. Lowe, International Conference on Computer Vision (ICCV 2003), Nice, France (October 2003), pp Video-Based Document Tracking: Unifying Your Physical and Electronic Desktops Jiwon Kim, Steven M. Seitz and Maneesh Agrawala, ACM Symposium on User Interface Software and Technology (UIST 2004), pp
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.