Download presentation
Presentation is loading. Please wait.
1
Document Analysis Group
Sophea Prum, Mickael Coustaty, Gaël Ducerf, Van Nhu Nguyen, Norbert Tsopze, Surapong Uttama
2
History… Starting date: Context: Scientific topics L3i members:
2000 / 2001 Context: Administrative document analysis Historical document analysis Technical document analysis Comics analysis Scientific topics Image analysis Computer Vision Pattern Recognition Data mining Knowledge Management L3i members: Permanent:8 Professor : 3 Assistant Professor : 5 PostDoc: 3 PhD:more than 10 Engineer:7 Permanent : (prof + mettre de conf) PostDoc (+visiting researcher, etc
3
Scientific and technologic
...E D M... Human made documents High-level structuration Huge amount of data Specific features Features extraction Robust Statististical Structural Indexing of data masses Scientific and technologic advances
4
Results Publications Software Collaborations with labs and companies
More than 20 journals paper (±2 per year) More than 100 conferences (±10 per year) Software More than 20 libraries and applications Collaborations with labs and companies Europe (CVC, DKFI, …) Asia (Vietnam, Cambodia, …)
5
Actually and future... Comics analysis Administrative documents
E-BDTheque Administrative documents Itesoft SOOD Reconomad Historical documents piXL
6
PEDIVHANDI Starting data: 10/2011 Duration: 3 years
Context: Education & equality chances Collaboration: L3i – Université de La Rochelle IRMA – Université de Poitiers – Université de La Rochelle L3i members: (12) Permanent: 6 PostDoc: PhD: 3 Engineer: 3 Number of internship: 0
7
PEDIVHANDI Core system Audio Text Lecture video Video Sensor data
Capturing Abstraction Fusion & Structuration Storage
8
PEDIVHANDI Project functionality
Automatic indexing of audiovisual educational podcasts Definition of relevant indices of audiovisual content Development tools for extracting indices Development of multimodal combination strategies To improve the quality of indexing Construction of Rich-Media documents Navigation in the audiovisual (Rich-Media) corpus Efficient search engines, which rely on indexing
9
RecoNomad Starting date: 2008 Duration: 3 years Context: Eurêka
Collaboration: DocLedge, Belgium Company L3i members: (11) Permanents: 3 PostDoc: 1 PhD: 1 Engineers: 6 Internships: 2
10
characters recognition
RecoNomad Project functionality ? Step1: Form identification On-line signal ... ? Step2: Handwriting Isolated characters recognition Step 5: Database indexation Step 4: Writer identification ? Step3: Handwriting cursive words recognition ?
11
RecoNomad Result Perspective Publications : 5 papers for Inter. Conf.
Commercialized Perspective Handwriting words recognition to be completed Industrialize and modularize the library Vision : Create company
12
Madonne / Navidomass L3i members: Permanent: 5 / 6 PostDoc: 0 / 1
Starting date: 2003, January Duration: 8 years Context: Historical document analysis Collaboration: 8 labs from France L3i members: Permanent: 5 / 6 PostDoc: 0 / 1 PhD: 2 / 1 Engineer: 1 / 1 Number of internship: 6 / 4
13
Madonne Extract document content to characterize it
Text / Graphic separation Image description using specific signatures
14
Navidomass To preserve their content To make them available
from degradations To make them available Online consultation Simultaneous consultation To navigate / retrieve similar images To date them To identify printer …
15
Navidomass Three-step process Image description Image annotation
Lettrine indexing and retrieval CBIR Image annotation Associate keywords to images (or subpart) Inference Rules To reduce the semantic gap
16
Content-based Image Retrieval System
Navidomass Lettrine Indexing and Retrieval Content-based Image Retrieval System feedback Feature Extraction Indexing -With/ Without Segmentation -Keypoint Localization -Feature Extraction Offline Query Image Matching Database Verify with ground truth Result
17
Attribute values deduction
Navidomass Image classification Image Annotation Final ontolgy - Knowledge Database Regions annotation Regions Image processing tools Inference rules New attributes Attribute values deduction Consistency Taxomony ...
18
Computer science knowledge
Inference rules isLetter Located in the center of the image With few holes The biggest region that satisfies the two first criteria Computer science knowledge Historian knowledge Deduced knowledge Spatial relations Complex Knowledge Database Texture Regions Shapes regions Inference Rules Historian knowledge User Queries Lettrine has a figurative pattern The region has few holes The region is light grey The region is in the center of the lettrine The region is not labelled as « isLetter » isBody
19
Madonne / Navidomass Result Perspective 35 Publications
5 journals 30 conferences Perspective Extend this topic to other kinds of documents piXL: Pole d’Excellence du Numérique Meet together 7 labs and 8 Companies 2 big projects: BNF & SDP
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.