Presentation is loading. Please wait.

Presentation is loading. Please wait.

FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

Similar presentations


Presentation on theme: "FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ²."— Presentation transcript:

1 FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ² Thursday 16th June 2005 ¹ PSI Laboratory, Rouen University, France ² SCSIT, Nottingham University, UK

2 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Who I am ?  Mathieu Delalandre  Thesis:Fourth year of PhD (defence in September)  Lab:PSI Laboratory, Rouen city, France  Super:E. Trupin, J.M. Ogier, J. Labiche  Team:S. Adam, H. Locteau, P. Héroux, E. Barbu, Y. Lecourtier  Field:Document Image Analysis (Graphics Recognition)  Postdoc:IPI, SCSIT, from April to September (4-5 months) with Tony Pridmore

3 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach  Introduction  Systems Overview  The Knowledge Level  Conclusion

4 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Introduction Indexing & Retrieval (I & R)  Indexing & Retrieval [Greengrass’00]  Indexing: Identification and recording of attributes of data that will aid retrieval.  Retrieval: Ability of a database management system to get back data that were stored there previously.  Applications  videos (MPEG, AVI, …)  Web pages (XML, XHTML, …)  structured documents (PDF, PS, Word, …)  images (JPG, GIF, …) -Indexing & Retrieval (I & R) -Categorization of Images -I & R of Document Images -My Topic

5 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Introduction Categorization of Images document images trademark logoheading journal manual photographies foreground/background images -Indexing & Retrieval (I & R) -Categorization of Images -I & R of Document Images -My Topic

6 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Introduction I & R of Document Images (1/3) Web Pages Images Markup Languages HTML, XHTML,.. 30%70% Document Images Logos, Headings, … Photographies 60%40%  Today, document images are not indexed by search engines due of complexity of Document Image Analysis (DIA) task [Doerman’98][Walker’00][Baird’03]  Is indexing of document images really needed ?  two questions  Question : How many document images and where [Spring’95] [Cleveland’98] [Steve’99] [Ouf’01] [Baird’03] [Hu’04] ? Deep Web Web (8.10 15 ko) 0.3% 99.3% Digital Libraries Others Softwares, Data Bases, … large (or main) part Document ImagesStructured Documents minor partmain part -Indexing & Retrieval (I & R) -Categorization of Images -I & R of Document Images -My Topic

7 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Introduction I & R of Document Images (2/3) Paper (and image) has too many desirable properties, document images and structured documents will increasingly co-exist in the future [Breul’04] Question : New or just old document images ? -Indexing & Retrieval (I & R) -Categorization of Images -I & R of Document Images -My Topic

8 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Introduction I & R of Document Images (3/3)  To Conclude :  (1) DIA is needed (and will be needed) in the future of I & R of documents [Baird’03] [Breul’04]  (2) DIA must come back today under the way of I & R [Baird’03] -Indexing & Retrieval (I & R) -Categorization of Images -I & R of Document Images -My Topic

9 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Introduction My Topic  Indexing of graphic document images  Indexing & Retrieval  Indexing  Identification and recording of attributes of data that will aid retrieval  First step before retrieval  document images  graphic document images line drawing symbollogoasian script historical heading -Indexing & Retrieval (I & R) -Categorization of Images -I & R of Document Images -My Topic

10 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach  Introduction  Systems Overview  The Knowledge Level  Conclusion

11 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Systems Overview Introduction  Overview of systems to index graphic document images  we talk about Graphics Indexing Systems  Graphics Indexing Systems are specialized from DIA systems applied to recognition and understanding of graphic document images [Tombre’03]  we talk about Graphics Recognition Systems -Introduction -Graphics Recognition Systems -Graphics Indexing Systems -Open Problems

12 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Systems Overview Graphics Recognition Systems (1/3)  Applications deal with graphics parts (symbol and linear)  text/graphics segmentation [Tombre’02], vectorisation [Mejbri’02], symbol recognition [Llados’02], document interpretation (or understanding) [Ablameko’00], … symbollineartext  Graphics Recognition Systems :  graphic document images  structured documents -Introduction -Graphics Recognition Systems -Graphics Indexing Systems -Open Problems

13 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Systems Overview Graphics Recognition Systems (2/3)  Graphics are structured and connected  Graphics Recognition Systems are based on structural methods  “relational organization of low-level features (graphic primitives) into higher-level structures (graph)” [Tombre’96] [Shi’89] symbol and its structure connected symbol in drawing line connect point connect point T link line low level features graphic primitives line connect edge higher-level structure graph T edge symbol recognition -Introduction -Graphics Recognition Systems -Graphics Indexing Systems -Open Problems

14 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Systems Overview Graphics Recognition Systems (3/3)  Graphic Primitive Extraction, some methods [Wenyin’98] [Delalandre’04] :  skeletonization [Hilaire’04], contouring [Ramel’00], tracking [Song’00], labelling [Badawy’02], transform [Couasnon’01], meshes [Vaxiviere’95], region segmentation [Cao’00], run-length [Burge’98], …  Recognition  Graph Matching [Bunke’00], Graph Transform [Blostein’05], Primitive Matching [Foggia’99], …  Architecture of Graphics Recognition Systems : -Introduction -Graphics Recognition Systems -Graphics Indexing Systems -Open Problems Graphic Primitive Extraction Recognition document images graph of graphic primitives structured document Graphic Models

15 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Systems Overview Graphics Indexing Systems (1/3)  Graphics Indexing Systems [Doerman’98] [Tombre’03], 3 classes : Title block recognition [Arias’98], [Najman’01], [Lamiroy’02], … Statistical framework [Samet’96], [Worring’99], [Tabbone’03], [Terrades’03], … Connected so no matched Partial matching Graphics indexing [Kasturi’88], [Lorenz’95], [Huang’97], [Hu’97], [Barbu’04], [Valasoulis’04], … -Introduction -Graphics Recognition Systems -Graphics Indexing Systems -Open Problems

16 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Systems Overview Graphics Indexing Systems (2/3)  Architecture of Graphics Indexing Systems : Graphic Primitive Extraction Indexing Graph of graphic primitives indexing attributes specific set of graphic primitives Index attributes+ document links -Introduction -Graphics Recognition Systems -Graphics Indexing Systems -Open Problems

17 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Works [Huang’97] [Kasturi’88] [Lorenz’95] [Barbu’04] [Hu’04] [Dosh’04] Graphic Primitives Extraction thinning and chaining run length encoding and polygonisation contouring and polygonisation thinning and neighbour analysis of skeleton’s pixels thinning, chaining, and polygonisation thinning, chaining, and polygonisation Graph of Graphic Primitives line graph of skeleton straight line graph of contours and skeleton 2-D strings of contours region adjacency graph set of straight line of skeleton set of straight line of skeleton Indexing cycle search, width and length matching of lines Fourier approximation of line graph string matching graph mining string matching vectorial signature Systems Overview Graphics Indexing Systems (3/3) thinning contouringregion graph skeleton graph statistical structural -Introduction -Graphics Recognition Systems -Graphics Indexing Systems -Open Problems

18 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Systems Overview Open Problems (1/2)  All these systems use a Lexical/Syntactic (or Bottom/Up) approach [Tombre’96]  Lexical (Bottom) : Extraction from images of graphical primitives in an fixed way  Syntactic (Up) : Analysis of graphical primitives without returns on image  So, all these systems use a Document Understanding Approach, but I & R is not an Understanding problem -Introduction -Graphics Recognition Systems -Graphics Indexing Systems -Open Problems CriterionUnderstandingI & R Image Sizelargesmall and medium Data Base Sizesmalllarge Process Executionone shotevery-time complexity Graphic Primitivesaccurateapproximated Noise Levelhigh and mediumlow and medium robustness Prior Knowledgeyesno Document Classfew and knownseveral and unknown content adaptation  content adaptation is the most important feature of I & R systems

19 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Systems Overview Open Problems (2/2) -Introduction -Graphics Recognition Systems -Graphics Indexing Systems -Open Problems region based [Roque’03] both based [Ramel’00] line based [Hilaire’04]  Examples of Content Adaptation  A broad class of document  Context text/graphics segmentation noise adaptation  To conclude  A I & R must deal with the content adaptation  Content adaptation can’t be solved without a knowledge based approach

20 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach  Introduction  Systems Overview  The Knowledge Level  Conclusion

21 SCSIT Talk, Nottingham University, Thursday 16th June 2005 The Knowledge Level Introduction  Some (general) definitions [Tuthill’90] [Holsapple’04]  Knowledge : human mental grasp of reality  Representation : placement (and meaning) of knowledge into (from) computer memory  Formalism : a set of symbols corresponding to knowledge inside computers Knowledge Human Formalism(s) Computer placementmeaning Human/Computer  Different types of knowledge  on strategies []  on case based reasoning []  on ontologies []  …. -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach

22 SCSIT Talk, Nottingham University, Thursday 16th June 2005 pixel-based formalisms vector-based formalisms graph-based formalisms graphic primitives high-level objects formalism levels The Knowledge Level Graphical Knowledge (1/2)  Graphical Knowledge [Delalandre’05] : It is a type of knowledge corresponding to human mental grasp of graphics Levels of Graphical Knowledge image symbol perception interpretation abstraction levels it is a gate ! -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach

23 SCSIT Talk, Nottingham University, Thursday 16th June 2005 primitivesline images The Knowledge Level Graphical Knowledge (2/2)  Two formalism levels [Tombre’96]  Graphic Primitives [Murray’96]  Pixel-based formalism : pixel, raster, run, connected component, …  Vector-based formalism : vector, arc, curve, ellipsis, square, …  Graph-based formalisms [Sowa 99]: Relational Attributed Graphs (RAG), Frames, Object-Oriented Languages, … Relational Attributed Graphs [Seong’93] -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach

24 SCSIT Talk, Nottingham University, Thursday 16th June 2005 The Knowledge Level Graphics Model (1/2)  Model [Seguela’01] : a knowledge representation using given formalisms and for given system’s purposes  Graphics Model [Delalandre’05] : model used to represent the graphical knowledge a (simple) shape graphic primitives extremity junction line line based model junction edge line junction based model extremity junction line edge -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach

25 SCSIT Talk, Nottingham University, Thursday 16th June 2005 The Knowledge Level Graphics Model (2/2) region-based models component loop neighbour include contour based models quadrilateral Line link Junction link skeleton based models extremity junction line edge  One system = one model  a considerable number of models  [Joseph’92] [Pasternak’93] [Han’94] [Burgue’95] [Yu’97] [Lee’98] [Ramel’00] [Couasnon’01] [Badawy’02] [Yan’04] …  Models depend of extracted graphic primitives, we can defined a graphics model taxonomy into 3 classes [Delalandre’05] -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach

26 SCSIT Talk, Nottingham University, Thursday 16th June 2005 The Knowledge Level a Perceptive Approach (1/6) Region Level Contour Level Skeleton Level Perception Level of Representations Global Local -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach specialisationaggregation two links between levels

27 SCSIT Talk, Nottingham University, Thursday 16th June 2005 The Knowledge Level a Perceptive Approach (2/6) classic models -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach Contour Level Skeleton Level Perception Level of Representations Global Local Region Level hybrid models perceptive approach (jump or browse)

28 SCSIT Talk, Nottingham University, Thursday 16th June 2005 The Knowledge Level a Perceptive Approach (3/6)  First step, the region level : connected component analysis [Alnuweiri’92] foregroundbackground foreground’s components background’s components main background loops -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach

29 SCSIT Talk, Nottingham University, Thursday 16th June 2005  Six Features  (F) Foreground  (B) Background  (R) Resolution (ie. distance) The Knowledge Level a Perceptive Approach (4/6)  (N) Neighboring  (S) Size  (I) Inclusion -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach

30 SCSIT Talk, Nottingham University, Thursday 16th June 2005  Use-Case Queries The Knowledge Level a Perceptive Approach (5/6) -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach started imageFR 1 FR 2 BR 2 BR 2 S 2 BR 2 S 2 N 2

31 SCSIT Talk, Nottingham University, Thursday 16th June 2005 The Knowledge Level a Perceptive Approach (6/6)  True-Life Query FS 1 -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach BR 2 N>2N>2

32 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach  Introduction  Systems Overview  The Knowledge Level  Conclusion

33 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Conclusion  Conclusion  It is just a bibliography study and ideas  Start on this ideas ?  Perspectives  Contour and skeleton levels ?  System to control the representation building ?


Download ppt "FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ²."

Similar presentations


Ads by Google