Download presentation
Presentation is loading. Please wait.
Published bySteven Ryan Modified over 9 years ago
1
Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒
2
Outline Introduction HTD and VTD Class of Character Objects Similarity Measure of Documents Experimental Results Conclusions
3
Introduction Retrieval of Imaged Documents Process with OCR v.s. without OCR Language dependence v.s. language independence
4
Procedure Image Preprocessing Feature extraction of character objects Horizontal Traverse Density (HTD) Vertical Traverse Density (VTD) Clustering To Identify classes of character objects Document representation Hash Table N-Gram To construct indexes for imaged document retrieval
5
Features: HTD and VTD
6
Class of Character Objects Unsupervise Clustering with HTD and VTD Distance measure of character objects
7
Distance Measure of Character Objects
8
Examples of Character Objects
9
Similarity Measure of Documents N-Gram Algorithm Cosine angle between two documents
10
Corpus UW1 database (600 dpi)
11
Experimental Results Corpus I E01-E26
12
Experimental Results Corpus II
13
Experimental Results
17
Conclusion and Future Work A new method for image document retrieval without OCR Retrieval of language independence Improvement of robustness for different fonts and noisy documents
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.