Download presentation
Presentation is loading. Please wait.
1
FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ² Thursday 16th June 2005 ¹ PSI Laboratory, Rouen University, France ² SCSIT, Nottingham University, UK
2
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Who I am ? Mathieu Delalandre Thesis:Fourth year of PhD (defence in September) Lab:PSI Laboratory, Rouen city, France Super:E. Trupin, J.M. Ogier, J. Labiche Team:S. Adam, H. Locteau, P. Héroux, E. Barbu, Y. Lecourtier Field:Document Image Analysis (Graphics Recognition) Postdoc:IPI, SCSIT, from April to September (4-5 months) with Tony Pridmore
3
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Introduction Systems Overview The Knowledge Level Conclusion
4
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Introduction Indexing & Retrieval (I & R) Indexing & Retrieval [Greengrass’00] Indexing: Identification and recording of attributes of data that will aid retrieval. Retrieval: Ability of a database management system to get back data that were stored there previously. Applications videos (MPEG, AVI, …) Web pages (XML, XHTML, …) structured documents (PDF, PS, Word, …) images (JPG, GIF, …) -Indexing & Retrieval (I & R) -Categorization of Images -I & R of Document Images -My Topic
5
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Introduction Categorization of Images document images trademark logoheading journal manual photographies foreground/background images -Indexing & Retrieval (I & R) -Categorization of Images -I & R of Document Images -My Topic
6
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Introduction I & R of Document Images (1/3) Web Pages Images Markup Languages HTML, XHTML,.. 30%70% Document Images Logos, Headings, … Photographies 60%40% Today, document images are not indexed by search engines due of complexity of Document Image Analysis (DIA) task [Doerman’98][Walker’00][Baird’03] Is indexing of document images really needed ? two questions Question : How many document images and where [Spring’95] [Cleveland’98] [Steve’99] [Ouf’01] [Baird’03] [Hu’04] ? Deep Web Web (8.10 15 ko) 0.3% 99.3% Digital Libraries Others Softwares, Data Bases, … large (or main) part Document ImagesStructured Documents minor partmain part -Indexing & Retrieval (I & R) -Categorization of Images -I & R of Document Images -My Topic
7
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Introduction I & R of Document Images (2/3) Paper (and image) has too many desirable properties, document images and structured documents will increasingly co-exist in the future [Breul’04] Question : New or just old document images ? -Indexing & Retrieval (I & R) -Categorization of Images -I & R of Document Images -My Topic
8
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Introduction I & R of Document Images (3/3) To Conclude : (1) DIA is needed (and will be needed) in the future of I & R of documents [Baird’03] [Breul’04] (2) DIA must come back today under the way of I & R [Baird’03] -Indexing & Retrieval (I & R) -Categorization of Images -I & R of Document Images -My Topic
9
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Introduction My Topic Indexing of graphic document images Indexing & Retrieval Indexing Identification and recording of attributes of data that will aid retrieval First step before retrieval document images graphic document images line drawing symbollogoasian script historical heading -Indexing & Retrieval (I & R) -Categorization of Images -I & R of Document Images -My Topic
10
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Introduction Systems Overview The Knowledge Level Conclusion
11
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Systems Overview Introduction Overview of systems to index graphic document images we talk about Graphics Indexing Systems Graphics Indexing Systems are specialized from DIA systems applied to recognition and understanding of graphic document images [Tombre’03] we talk about Graphics Recognition Systems -Introduction -Graphics Recognition Systems -Graphics Indexing Systems -Open Problems
12
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Systems Overview Graphics Recognition Systems (1/3) Applications deal with graphics parts (symbol and linear) text/graphics segmentation [Tombre’02], vectorisation [Mejbri’02], symbol recognition [Llados’02], document interpretation (or understanding) [Ablameko’00], … symbollineartext Graphics Recognition Systems : graphic document images structured documents -Introduction -Graphics Recognition Systems -Graphics Indexing Systems -Open Problems
13
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Systems Overview Graphics Recognition Systems (2/3) Graphics are structured and connected Graphics Recognition Systems are based on structural methods “relational organization of low-level features (graphic primitives) into higher-level structures (graph)” [Tombre’96] [Shi’89] symbol and its structure connected symbol in drawing line connect point connect point T link line low level features graphic primitives line connect edge higher-level structure graph T edge symbol recognition -Introduction -Graphics Recognition Systems -Graphics Indexing Systems -Open Problems
14
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Systems Overview Graphics Recognition Systems (3/3) Graphic Primitive Extraction, some methods [Wenyin’98] [Delalandre’04] : skeletonization [Hilaire’04], contouring [Ramel’00], tracking [Song’00], labelling [Badawy’02], transform [Couasnon’01], meshes [Vaxiviere’95], region segmentation [Cao’00], run-length [Burge’98], … Recognition Graph Matching [Bunke’00], Graph Transform [Blostein’05], Primitive Matching [Foggia’99], … Architecture of Graphics Recognition Systems : -Introduction -Graphics Recognition Systems -Graphics Indexing Systems -Open Problems Graphic Primitive Extraction Recognition document images graph of graphic primitives structured document Graphic Models
15
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Systems Overview Graphics Indexing Systems (1/3) Graphics Indexing Systems [Doerman’98] [Tombre’03], 3 classes : Title block recognition [Arias’98], [Najman’01], [Lamiroy’02], … Statistical framework [Samet’96], [Worring’99], [Tabbone’03], [Terrades’03], … Connected so no matched Partial matching Graphics indexing [Kasturi’88], [Lorenz’95], [Huang’97], [Hu’97], [Barbu’04], [Valasoulis’04], … -Introduction -Graphics Recognition Systems -Graphics Indexing Systems -Open Problems
16
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Systems Overview Graphics Indexing Systems (2/3) Architecture of Graphics Indexing Systems : Graphic Primitive Extraction Indexing Graph of graphic primitives indexing attributes specific set of graphic primitives Index attributes+ document links -Introduction -Graphics Recognition Systems -Graphics Indexing Systems -Open Problems
17
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Works [Huang’97] [Kasturi’88] [Lorenz’95] [Barbu’04] [Hu’04] [Dosh’04] Graphic Primitives Extraction thinning and chaining run length encoding and polygonisation contouring and polygonisation thinning and neighbour analysis of skeleton’s pixels thinning, chaining, and polygonisation thinning, chaining, and polygonisation Graph of Graphic Primitives line graph of skeleton straight line graph of contours and skeleton 2-D strings of contours region adjacency graph set of straight line of skeleton set of straight line of skeleton Indexing cycle search, width and length matching of lines Fourier approximation of line graph string matching graph mining string matching vectorial signature Systems Overview Graphics Indexing Systems (3/3) thinning contouringregion graph skeleton graph statistical structural -Introduction -Graphics Recognition Systems -Graphics Indexing Systems -Open Problems
18
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Systems Overview Open Problems (1/2) All these systems use a Lexical/Syntactic (or Bottom/Up) approach [Tombre’96] Lexical (Bottom) : Extraction from images of graphical primitives in an fixed way Syntactic (Up) : Analysis of graphical primitives without returns on image So, all these systems use a Document Understanding Approach, but I & R is not an Understanding problem -Introduction -Graphics Recognition Systems -Graphics Indexing Systems -Open Problems CriterionUnderstandingI & R Image Sizelargesmall and medium Data Base Sizesmalllarge Process Executionone shotevery-time complexity Graphic Primitivesaccurateapproximated Noise Levelhigh and mediumlow and medium robustness Prior Knowledgeyesno Document Classfew and knownseveral and unknown content adaptation content adaptation is the most important feature of I & R systems
19
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Systems Overview Open Problems (2/2) -Introduction -Graphics Recognition Systems -Graphics Indexing Systems -Open Problems region based [Roque’03] both based [Ramel’00] line based [Hilaire’04] Examples of Content Adaptation A broad class of document Context text/graphics segmentation noise adaptation To conclude A I & R must deal with the content adaptation Content adaptation can’t be solved without a knowledge based approach
20
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Introduction Systems Overview The Knowledge Level Conclusion
21
SCSIT Talk, Nottingham University, Thursday 16th June 2005 The Knowledge Level Introduction Some (general) definitions [Tuthill’90] [Holsapple’04] Knowledge : human mental grasp of reality Representation : placement (and meaning) of knowledge into (from) computer memory Formalism : a set of symbols corresponding to knowledge inside computers Knowledge Human Formalism(s) Computer placementmeaning Human/Computer Different types of knowledge on strategies [] on case based reasoning [] on ontologies [] …. -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach
22
SCSIT Talk, Nottingham University, Thursday 16th June 2005 pixel-based formalisms vector-based formalisms graph-based formalisms graphic primitives high-level objects formalism levels The Knowledge Level Graphical Knowledge (1/2) Graphical Knowledge [Delalandre’05] : It is a type of knowledge corresponding to human mental grasp of graphics Levels of Graphical Knowledge image symbol perception interpretation abstraction levels it is a gate ! -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach
23
SCSIT Talk, Nottingham University, Thursday 16th June 2005 primitivesline images The Knowledge Level Graphical Knowledge (2/2) Two formalism levels [Tombre’96] Graphic Primitives [Murray’96] Pixel-based formalism : pixel, raster, run, connected component, … Vector-based formalism : vector, arc, curve, ellipsis, square, … Graph-based formalisms [Sowa 99]: Relational Attributed Graphs (RAG), Frames, Object-Oriented Languages, … Relational Attributed Graphs [Seong’93] -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach
24
SCSIT Talk, Nottingham University, Thursday 16th June 2005 The Knowledge Level Graphics Model (1/2) Model [Seguela’01] : a knowledge representation using given formalisms and for given system’s purposes Graphics Model [Delalandre’05] : model used to represent the graphical knowledge a (simple) shape graphic primitives extremity junction line line based model junction edge line junction based model extremity junction line edge -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach
25
SCSIT Talk, Nottingham University, Thursday 16th June 2005 The Knowledge Level Graphics Model (2/2) region-based models component loop neighbour include contour based models quadrilateral Line link Junction link skeleton based models extremity junction line edge One system = one model a considerable number of models [Joseph’92] [Pasternak’93] [Han’94] [Burgue’95] [Yu’97] [Lee’98] [Ramel’00] [Couasnon’01] [Badawy’02] [Yan’04] … Models depend of extracted graphic primitives, we can defined a graphics model taxonomy into 3 classes [Delalandre’05] -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach
26
SCSIT Talk, Nottingham University, Thursday 16th June 2005 The Knowledge Level a Perceptive Approach (1/6) Region Level Contour Level Skeleton Level Perception Level of Representations Global Local -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach specialisationaggregation two links between levels
27
SCSIT Talk, Nottingham University, Thursday 16th June 2005 The Knowledge Level a Perceptive Approach (2/6) classic models -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach Contour Level Skeleton Level Perception Level of Representations Global Local Region Level hybrid models perceptive approach (jump or browse)
28
SCSIT Talk, Nottingham University, Thursday 16th June 2005 The Knowledge Level a Perceptive Approach (3/6) First step, the region level : connected component analysis [Alnuweiri’92] foregroundbackground foreground’s components background’s components main background loops -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach
29
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Six Features (F) Foreground (B) Background (R) Resolution (ie. distance) The Knowledge Level a Perceptive Approach (4/6) (N) Neighboring (S) Size (I) Inclusion -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach
30
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Use-Case Queries The Knowledge Level a Perceptive Approach (5/6) -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach started imageFR 1 FR 2 BR 2 BR 2 S 2 BR 2 S 2 N 2
31
SCSIT Talk, Nottingham University, Thursday 16th June 2005 The Knowledge Level a Perceptive Approach (6/6) True-Life Query FS 1 -Introduction -Graphical Knowledge -Graphics Model -a Perceptive Approach BR 2 N>2N>2
32
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Introduction Systems Overview The Knowledge Level Conclusion
33
SCSIT Talk, Nottingham University, Thursday 16th June 2005 Conclusion Conclusion It is just a bibliography study and ideas Start on this ideas ? Perspectives Contour and skeleton levels ? System to control the representation building ?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.