Document Analysis Group

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Review of AI from Chapter 3. Journal May 13  What advantages and disadvantages do you see with using Expert Systems in real world applications like business,
MADONNE MAsses de DONnées issues de la Numérisation du patrimoiNE Project Leader : Jean-Marc OGIER L3i Laboratory, la Rochelle University Tel :
Image Retrieval: Current Techniques, Promising Directions, and Open Issues Yong Rui, Thomas Huang and Shih-Fu Chang Published in the Journal of Visual.
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
Chapter 5: Introduction to Information Retrieval
C6 Databases.
Content-Based Image Retrieval
H-Tech. H-Tech: Objective European online university Technical and humanist For students around the world Engineering Bachelors and Masters degrees.
Image Information Retrieval Shaw-Ming Yang IST 497E 12/05/02.
1 Texmex – November 15 th, 2005 Strategy for the future Global goal “Understand” (= structure…) TV and other MM documents Prepare these documents for applications.
1 Content-Based Retrieval (CBR) -in multimedia systems Presented by: Chao Cai Date: March 28, 2006 C SC 561.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Visual Information Retrieval Chapter 1 Introduction Alberto Del Bimbo Dipartimento di Sistemi e Informatica Universita di Firenze Firenze, Italy.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Visual Information Systems visual information retrieval.
Visual Information System visual information retrieval (VIR) Lilian Tang.
Multimedia Data Mining Arvind Balasubramanian Multimedia Lab (ECSS 4.416) The University of Texas at Dallas.
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Search Engines and Information Retrieval Chapter 1.
Multimedia Databases (MMDB)
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
© 2007 Tom Beckman Features:  Are autonomous software entities that act as a user’s assistant to perform discrete tasks, simplifying or completely automating.
Context-based Search in Topic Centered Digital Repositories Christo Dichev, Darina Dicheva Winston-Salem State University Winston-Salem, N.C. USA {dichevc,
Semantic Learning Instructor: Professor Cercone Razieh Niazi.
IST DIVAS Presentation 1 Advanced search technologies for digital audio-visual content.
MULTIMEDIA DATABASES -Define data -Define databases.
Subtask 1.8 WWW Networked Knowledge Bases August 19, 2003 AcademicsAir force Arvind BansalScott Pollock Cheng Chang Lu (away)Hyatt Rick ParentMark (SAIC)
Search Engine Architecture
Prof. Thomas Sikora Technische Universität Berlin Communication Systems Group Thursday, 2 April 2009 Integration Activities in “Tools for Tag Generation“
March 31, 1998NSF IDM 98, Group F1 Group F Multi-modal Issues, Systems and Applications.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
MMDB-9 J. Teuhola Standardization: MPEG-7 “Multimedia Content Description Interface” Standard for describing multimedia content (metadata).
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
Ask a Librarian: The Role of Librarians in the Music Information Retrieval Community Jenn Riley, Indiana University Constance A. Mayer, University of Maryland.
Problem Query image by content in an image database.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Relevance Feedback in Image Retrieval System: A Survey Tao Huang Lin Luo Chengcui Zhang.
BIT 3193 MULTIMEDIA DATABASE CHAPTER 4 : QUERING MULTIMEDIA DATABASES.
Competence Centre on Information Extraction and Image Understanding for Earth Observation PLATO for Information Mining in Satellite Imagery Soufiane RITAL,
MULTIMEDIA SYSTEMS CBIR & CBVR. Schedule Image Annotation (CBIR) Image Annotation (CBIR) Video Annotation (CBVR) Video Annotation (CBVR) Few Project Ideas.
Digital Video Library - Jacky Ma.
Presented by Mathieu Delalandre CESR Meeting CESR, Tours, France
Visual Information Retrieval
Introduction Multimedia initial focus
SAMT 2006.
Multimedia Content-Based Retrieval
Search Engine Architecture
Multimedia and Vision Lab, Queen Mary,
Content-Based Image Retrieval Readings: Chapter 8:
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Color-Texture Analysis for Content-Based Image Retrieval
Crossing the gap between multimedia data and semantics
Multimedia Information Retrieval
TDM=Text Mining “automated processing of large amounts of structured digital textual content for purposes of information retrieval, extraction, interpretation.
Multimedia Information Retrieval
Example of Event-Based Video Data (Touch-down Scenario)
Presentation transcript:

Document Analysis Group Sophea Prum, Mickael Coustaty, Gaël Ducerf, Van Nhu Nguyen, Norbert Tsopze, Surapong Uttama

History… Starting date: Context: Scientific topics L3i members: 2000 / 2001 Context: Administrative document analysis Historical document analysis Technical document analysis Comics analysis Scientific topics Image analysis Computer Vision Pattern Recognition Data mining Knowledge Management L3i members: Permanent:8 Professor : 3 Assistant Professor : 5 PostDoc: 3 PhD:more than 10 Engineer:7 Permanent : (prof + mettre de conf) PostDoc (+visiting researcher, etc

Scientific and technologic ...E D M... Human made documents High-level structuration Huge amount of data Specific features Features extraction Robust Statististical Structural Indexing of data masses Scientific and technologic advances

Results Publications Software Collaborations with labs and companies More than 20 journals paper (±2 per year) More than 100 conferences (±10 per year) Software More than 20 libraries and applications Collaborations with labs and companies Europe (CVC, DKFI, …) Asia (Vietnam, Cambodia, …)

Actually and future... Comics analysis Administrative documents E-BDTheque Administrative documents Itesoft SOOD Reconomad Historical documents piXL

PEDIVHANDI Starting data: 10/2011 Duration: 3 years Context: Education & equality chances Collaboration: L3i – Université de La Rochelle IRMA – Université de Poitiers Cellule @ctice – Université de La Rochelle L3i members: (12) Permanent: 6 PostDoc: PhD: 3 Engineer: 3 Number of internship: 0

PEDIVHANDI Core system Audio Text Lecture video Video Sensor data Capturing Abstraction Fusion & Structuration Storage

PEDIVHANDI Project functionality Automatic indexing of audiovisual educational podcasts Definition of relevant indices of audiovisual content Development tools for extracting indices Development of multimodal combination strategies To improve the quality of indexing Construction of Rich-Media documents Navigation in the audiovisual (Rich-Media) corpus Efficient search engines, which rely on indexing

RecoNomad Starting date: 2008 Duration: 3 years Context: Eurêka Collaboration: DocLedge, Belgium Company L3i members: (11) Permanents: 3 PostDoc: 1 PhD: 1 Engineers: 6 Internships: 2

characters recognition RecoNomad Project functionality ? Step1: Form identification On-line signal 176 53 1 7 175 53 1 8 175 54 1 9 ... ? Step2: Handwriting Isolated characters recognition Step 5: Database indexation Step 4: Writer identification ? Step3: Handwriting cursive words recognition ?

RecoNomad Result Perspective Publications : 5 papers for Inter. Conf. Commercialized Perspective Handwriting words recognition to be completed Industrialize and modularize the library Vision : Create company

Madonne / Navidomass L3i members: Permanent: 5 / 6 PostDoc: 0 / 1 Starting date: 2003, January Duration: 8 years Context: Historical document analysis Collaboration: 8 labs from France L3i members: Permanent: 5 / 6 PostDoc: 0 / 1 PhD: 2 / 1 Engineer: 1 / 1 Number of internship: 6 / 4

Madonne Extract document content to characterize it Text / Graphic separation Image description using specific signatures

Navidomass To preserve their content To make them available from degradations To make them available Online consultation Simultaneous consultation To navigate / retrieve similar images To date them To identify printer …

Navidomass Three-step process Image description Image annotation Lettrine indexing and retrieval CBIR Image annotation Associate keywords to images (or subpart) Inference Rules To reduce the semantic gap

Content-based Image Retrieval System Navidomass Lettrine Indexing and Retrieval Content-based Image Retrieval System feedback Feature Extraction Indexing -With/ Without Segmentation -Keypoint Localization -Feature Extraction Offline Query Image Matching Database Verify with ground truth Result

Attribute values deduction Navidomass Image classification Image Annotation Final ontolgy - Knowledge Database Regions annotation Regions Image processing tools Inference rules New attributes Attribute values deduction Consistency Taxomony ...

Computer science knowledge Inference rules isLetter Located in the center of the image With few holes The biggest region that satisfies the two first criteria Computer science knowledge Historian knowledge Deduced knowledge Spatial relations Complex Knowledge Database Texture Regions Shapes regions Inference Rules Historian knowledge User Queries Lettrine has a figurative pattern The region has few holes The region is light grey The region is in the center of the lettrine The region is not labelled as « isLetter » isBody

Madonne / Navidomass Result Perspective 35 Publications 5 journals 30 conferences Perspective Extend this topic to other kinds of documents piXL: Pole d’Excellence du Numérique Meet together 7 labs and 8 Companies 2 big projects: BNF & SDP