Tuesday, 6th November 2007 Work Group CALYPOD graphiCs imAge anaLYsis from Printed Old Document Thierry Brouard, Mathieu Delalandre, Nicholas Journet and Frédéric Nicolier NaviDoMass Meeting 6th November 2007 Paris V University, Paris, France
2 General Presentation (1/2) Research Work Group Group of researchers, coming from different laboratories, teams and projects, working toward a common specific research topic. Specific topic of research Automatic processing of the graphical parts in old printed books (segmentation, pre-processing, matching, OCR, retrieval, …) Objectives 1.To develop and maintain a website to collect and centralize information (web links, bibliographic references, papers …) 2.To put in relation (mailings, meetings) every people (human and computer sciences) working on this topic and to strengthen the collaborations 3.To develop “real-life” applications (AGORA, DMOS,..) for the end-users partners of human science (CESR,..) ornamental letter headline figure headline
3 General Presentation (2/2) November December January February March April May June July August September October November December Calendar … th June, 1 st Meeting (Paris) Starting date of Calypod Group GDR-ISIS “Jeune Chercheur” Application “SILCIL” 5th July, opening of 13th July, 2sd Calypod Meeting (La Rochelle) 13th November, 3 rd Calypod Meeting (Tours) Break period ….. 6th November, Calypod talk at NaviDoMass Meeting (Paris) Calypod People (17) Busson Sébastien Baudrier Etienne Nicolier Frédéric Landré Jérôme Delalandre Mathieu Karatzas Dimosthenis Lladós Josep Nicolas Stéphane Ramos Oriol Petitjean Caroline Journet Nicholas Salmon Jean-Pierre Coustaty Mickael Brouard Thierry Ogier Jean-Marc Ramel Jean-Yves Sidere Nicolas
4 Research Project (1/2) Color (black, white) Size (small, large) Background (almost empty, riched graphics) letter (c)topic (vegetal) pattern (cross) Multi-Criterion Retrieval of Ornamental Letter Problematic ?
5 Research Project (2/2) OLR Image Pre-Processing Printing Retrieval L (90%) Style Retrieval Performance Evaluation
6 Image Pre-Processing OLR Image Pre-Processing Printing RetrievalStyle Retrieval Performance Evaluation Offset Skewing Overview Translation, SPOMF (Symetric Phase Only Matched Filter) correlation based method Rotation, SPOMF on polar form of images Scale, SPOMF + Mellin transform Approach [Thévenaz98] A Pyramid Approach to Subpixel Registration Based on Intensity, IEEE Trans Image Processing Degradation
7 Printing Retrieval (1/2) (2) Most of the images are copyrighted, a system must retrieve them in real-time in order to allow crossed queries between the databases. DB query r 1 r 2 r 3 (1) Historian people are interested in the wood plug tracking as tool to date the old books Vascosan 1555 Marnef 1576 Printing house plug exchange copy OLR Image Pre-Processing Printing Retrieval Style Retrieval Performance Evaluation
8 Printing Retrieval (2/2) Level 1 : image sizes Level 2 : image density Level 3 : RLE comparison Our key ideas (2) To use different level of operator (from more speed to more accurate) quer y 1 st Level 2 sd Level Speed Depth (1) To use a Run Length Encoding (RLE) of Image x2x2 x2x2 x2x2 x1x1 x1x1 x1x1 x2x2 line (y) image line (y+d y ) image 2 while x 1 x 2 handle image 1 while x 2 x 1 handle image 2 OLR Image Pre-Processing Printing Retrieval Style Retrieval Performance Evaluation
9 2 steps –1) Cluster the ornamental letters according to their styles –2) Apply letter recognition algorithms according to the cluster (letter black or white, background specificity…) Preprocessing Features Extraction Model Training -Binarization -Resizing -FFT, DCT, [Radon] Coefs. -Zernike Moments -Threshold Adj. Stats. -[Haralick, QMF] -SVM N-folder Cross Validation Evaluation of the best model on a test database OLR Image Pre-Processing Printing Retrieval Style Retrieval Performance Evaluation Style Retrieval (1/3)
10 C1 OLR Image Pre-Processing Printing Retrieval Style Retrieval Performance Evaluation Style Retrieval (2/3) 89,25% 420/466 87,5% 47/54 375/412 91,0% C1 C2 Test Samples (FFT, 100 coefs.) C2 Graphical style retrieval (homogeneous vs. textured)
11 Test Samples (FFT, 100 coefs.) 93,1% 298/320 90,6% 145/ /160 95,6% C1 C2 Letter color retrieval (black vs. white) C1 C2 OLR Image Pre-Processing Printing Retrieval Style Retrieval Performance Evaluation Style Retrieval (3/3)
12 Ornamental Letter Recognition (1/2) A Letter segmentation Character recognition OLR Image Pre-Processing Printing RetrievalStyle Retrieval Performance Evaluation
13 Ornamental Letter Recognition (2/2) OLR Image Pre-Processing Printing RetrievalStyle Retrieval Performance Evaluation
14 Performance Evaluation (1/1) Base Our Retrieval engine control display retrieve Metadata driven metadata acquisition Bench1Bench2 To produce OCR Image Pre-Processing Printing RetrievalStyle Retrieval Performance Evaluation Metadata file Metadata file Without retrieval With retrieval more faster reduce error
15 Conclusion Website 35 references 20 weblinks 4 test databases 1 wiki Human Network 17 people from computer and human sciences, still in progress (BCU Lausanne, ….) 4 th Meetings, 3 invited talks Research Works A common research project under way, grouped publications expected for the 1 st semester 2008 August144 Visit September196 Visit October334 Visit