Superresolution of Texts from Nonideal Video Xin Li Lane Dept. of CSEE West Virginia University Morgantown, WV 26506-6109 This work is partially supported by NASA WV EPSCoR Award 2005-2006
Outline Introduction SR of texts from nonideal video Conclusions What is SR? Why SR? How to achieve SR? A general framework for SR: registration + restoration Understand the boundary of formulating SR as an inverse problem SR of texts from nonideal video Problem statement: why texts and nonideal video? Analyze error accumulation in multiframe registration Address the issue of quality/PSF consistency in restoration Experimental Results Conclusions
Image Resolution W H Gonzalez “Digital Image Processing” Chip size Field-Of-View: HW Pixel size Sampling Distance
Why Higher Resolution? Improved objective fidelity Natural scene is seldom band-limited Higher resolution implies smaller representation errors Improved subjective quality Attention enhances spatial resolution Spatial resolution enhances attention? Improved measuration/recognition Law enforcement, forensics/biometrics: face recognition grand challenge (FRGC), iris recognition, vehicle license plate recognition
Towards Gigapixel: Artistic Approach Mega-pel Giga-pel Photographers and artists have manually or semi-automatically stitched hundreds of mega-pel pictures together to demonstrate how a giga-pel picture looks like the power of pixels http://triton.tpd.tno.nl/gigazoom/Delft2.htm
Scientific Solutions Sensor-based Computational (Super-resolution) Reduce pixel size: limit – 0.40m2 for a 0.35 m CMOS process Increase chip size: ineffective due to increased capacitance (bad for speeding up a charge transfer rate) Computational (Super-resolution) Exploit the tradeoff between space and time: obtain a HR from multiple LR copies Physical principles of imaging plays the fundamental role in defining the relationship between LR and HR Hybrid: the convergence of the camera and the computer Computational cameras: catadioptric camera, jitter camera (Ben-Ezra, Zomet and Nayar)
SR: A General Framework S.C. Park et al., “Super-resolution image reconstruction: a technical overview”, IEEE Signal Processing Magazine, pp. 21-36, May 2003 SR can be formulated as an inverse problem, assuming a mathematical model linking LR to HR images is known
SR: At the Intersection of SP and CV Registration problem Translational models Subpixel accuracy phase correlation (Foroosh, Zerubia and Berthod’1996) Subspace methods in the frequency domain (Vandewallea, Sbaiza, S̈usstrunka and Vetterli) Projective models or planar homography (Capel and Zisserman’2003) Images of a planar surface under arbitrary camera motion or images of a scene under fixed camera Restoration problem Model-based: regularized deblurring, robust SR (Farsiu, Elad and Milanfar’2004) Learning-based: exemplar-based SR (Freeman, Jones and Pasztor’2002), video epitome (Cheung, Frey and Jojic’2005)
Understand the Boundary of SR as an Inverse Problem Limited modeling capability Fixed enhancement ratio specified by the down-sampling operation We formulate scalable (progressive) SR: as more data become available, higher resolution can be achieved Inevitable approximation when warping gets complex We advocate nonuniform interpolation based forward approach in the case of arbitrary camera motion Sensor PSF is often unknown and time-varying We propose to adaptively select a subset of LR images
Outline Introduction What is SR? Why SR? How to achieve SR? A general framework for SR: registration + restoration Understand the boundary of formulating SR as an inverse problem SR of texts from nonideal video Problem statement: why texts and nonideal video? Analysis of error accumulation in multiframe registration Issue of phase/PSF consistency in restoration : NOT all LR images are useful Experimental Results Conclusions
SR-of-Texts from Nonideal Video HR image of license plate SR Problem Statement Given a segment of video clip that contains some texts that are illegible due to the limited resolution, how to produce a HR image in which the texts become clearly readable (by human)?
Defining the Boundary of Problem Why texts? Texts represent an important class of visual information (e.g., law enforcement applications) Relatively easy assessment of SR results by human observers Texts are often printed to a planar surface, which facilitates the registration What do we mean by nonideal video? Uncontrolled real-world acquisition conditions: handheld camera (arbitrary camera motion), unfavorable illumination, unknown PSF, inevitable compression artifacts, and so on
Our Practical Approach Consistency-guided Preprocessing Not all LR images are used in our SR scheme Homography-based Registration Accuracy is guaranteed by planar surface assumption Nonuniform Interpolation Search for an appropriate magnifying ratio and phase Diffusion-aided Blind Deconvolution Tailored for bimodal textual images
Human vision helps the selection of consistent LR images LR Image Consistency Quality consistency PSF consistency Human vision helps the selection of consistent LR images
Homography-based Multiframe Registration Sequential image 2 image 1 image K Parallel image 1 image 2 image K or Homography matrix Mosaicing: slightly-overlapped FOV sequential Superresolution: severely-overlapped FOV parallel
Nonuniform Interpolation phase of HR lattice distance of HR lattice Data grid : Fused data points from registered LR images Lattice : targeted data points at HR Target HR lattice: min d(, ) over two parameters: distance and phase
Experimental Results (I): SR Comparison on Benchmark Data Input: 20 LR images Before deblurring … … After deblurring Thanks to Prof. Milanfar for providing us the UCSC-SR software UCSC-SR Ours
Experimental Results (II): SR Results Comparison on Nonideal Video Input: 4 LR images UCSC-SR Ours
Experimental Results (II): SR Results Comparison on Nonideal Video Input: 4 LR images Ours UCSC-SR After deblurring
Experimental Results (III): Impact of Error Accumulation K=4 parallel sequential K=8 parallel sequential Error accumulation in sequential registration degrades image quality when K is large
Conclusions and Perspectives SR of texts from nonideal video A class of SR problems whose boundary can be well defined An example supporting a practical, forward approach towards SR To have a better understanding of SR techniques We need to look at the problem from a perceptual perspective New applications such as video compression, distributed coding, iris recognition, biomedical imaging will help us define the boundary of SR Spatial vs. temporal SR: fundamental space-time tradeoff