Presentation is loading. Please wait.

Presentation is loading. Please wait.

Historic Document Image De-Noising using Principal Component Analysis (PCA) and Local Pixel Grouping (LPG) Han-Yang Tang1, Azah Kamilah Muda1, Yun-Huoy.

Similar presentations


Presentation on theme: "Historic Document Image De-Noising using Principal Component Analysis (PCA) and Local Pixel Grouping (LPG) Han-Yang Tang1, Azah Kamilah Muda1, Yun-Huoy."— Presentation transcript:

1 Historic Document Image De-Noising using Principal Component Analysis (PCA) and Local Pixel Grouping (LPG) Han-Yang Tang1, Azah Kamilah Muda1, Yun-Huoy Choo1 , Mohd Sanusi Azmi  1Computational Intelligence and Technologies Lab (C.I.T Lab) Faculty of Information and Communication Technology Universiti Teknikal Malaysia Melaka (UTeM), Durian Tunggal, Melaka, Malaysia

2 Historic document image de-noising using LPG-PCA
We proposed a two stage LPG-PCA de-noising method to carry out de-noising process on degraded historic document image. The first stage will produce initial estimation image output by removing most of the noise of historic document image . The second stage will further improve the image output from first stage. Figure 1. Two stage LPG-PCA de-noising scheme

3 Principal Component Analysis (PCA)
PCA commonly used as a dimensionality reduction technique which highlights the dominant parts while reduce the noise. By transforming the original dataset into PCA domain. It will preserving only the several most significant principal components, the noise and unimportant information can be removed. Advantages of PCA are: 1. Reduced complexity in images’ grouping 2. Reduction of noise since the most significant component is chosen and unimportant information in the background are ignored automatically 3. PCA does not require large computations

4 Local Pixel Grouping (LPG)
LPG is to ensures the preservation of image local structure during the PCA noise removing process. This is due to block matching based LPG which allows only the sample block similar to the central of K x K block (which we consider as noisy variable that need to be de- noised) are used in the PCA process. Figure 2. Local Pixel Grouping

5 First Stage of LPG-PCA based de-noising
In the original dataset, noise is distributed evenly. Hence, to carry out the PCA transform estimation, we need a set of training sample which consider as noisy variable. To obtain the training sample, we will first setting a K x K window as a kernel from L x L training blocks. We use LPG carry out classification to selecting and grouping the training samples that similar to the central K x K block pixel. By using the training samples, carry out the local statistics calculation for PCA transformation estimation The goal of PCA is to generate an orthogonal transformation matrix to carry out de-correlation process on the original dataset. After this, it can distinguish the most significant component from noise and reduce the noise in the image.

6 Second stage of LPG-PCA based de-noising
We will obtain an de-noise output image from first stage LPG-PCA in which most of the noise is removed . The LPG-PCA de-noising procedure will iterate one more times to further improve the performance of de-noising. The second stage will have almost same procedure as first stage except the parameter of the dataset noise level. The main idea of having second stage de-noising process is because LPG process during the second stage will have improved accuracy due to the significant reduce of noise in the current dataset.

7 Assessment Metrics We are using objective quality evaluation methods to evaluate the quality of the de-noised image. Peak signal to noise ratio (PSNR) PSNR measure the ratio of the peak signal to noise and the difference between two images in which the ratio is often used as a quality measurement between the original and enhanced image. Structural Similarity Index (SSIM) SSIM reflect the structure similarity and measuring the similarity between two images.

8 Assessment Metrics OCR Accuracy
OCR accuracy is used to rank the performance of LPG- PCA method in OCR program. The OCR software used for the comparison was FREE OCR 5.41 which is a Windows OCR program including a Windows compiled Tesseract free OCR engine

9 Table 1. PSNR(dB) and SSIM of proposed LPG-PCA method
Experiment Result Table 1. PSNR(dB) and SSIM of proposed LPG-PCA method . Noisy Image First Stage Second Stage Image1 (0.6513) (0.8436) (0.8775) Image2 (0.4726) (0.8200) (0.8633) Image3 (0.4615) (0.8115) (0.8542) Image4 (0.6186) (0.8388) (0.8574) Image5 (0.7086) (0.8711) (0.8807) Image6 (0.6803) (0.8694) (0.8844) Image7 (0.5221) (0.8429) (0.8827) Image8 (0.4305) (0.8107) (0.8634) Image9 (0.5009) (0.8478) (0.8971) Image10 (0.5712) (0.8469) (0.8712) Most of the noise is removed after the first stage because the PSNR measures are much improved. The PSNR measures of some image in the second stage is not improved much. However, the SSIM measures, which can better reflect the image visual quality, have much improved after the second stage de-noising refinement.

10 Table 2. OCR accuracy results of proposed LPG-PCA method
Experiment Result Table 2. OCR accuracy results of proposed LPG-PCA method Noisy Image First Stage Second Stage Image1 % % % Image2 % % % Image3 % % % Image4 % % % Image5 % % % Image6 % % % Image7 % % % Image8 % % % Image9 % % % Image10 % % % There is much better OCR accuracy after the first stage compare to the noisy image. The second stage de-noise image also show a little improve on OCR accuracy compared to the first stage.

11 Experiment Result Figure 3. Image10 Noisy Image
Figure 4. Image10 Second Stage output Image

12 Conclusion Training Samples for PCA technique are chosen based on block matching LPG technique in order to preserve the local image structure and better de-noising result. The proposed LPG-PCA algorithm is iterated one more time for better noise reduction output and better characteristic preservation. The above experimental results assures the effectiveness of the LPG-PCA algorithm.

13


Download ppt "Historic Document Image De-Noising using Principal Component Analysis (PCA) and Local Pixel Grouping (LPG) Han-Yang Tang1, Azah Kamilah Muda1, Yun-Huoy."

Similar presentations


Ads by Google