Writer identification through information retrieval Ralph Niels, Franc Grootjen & Louis Vuurpijl.

Slides:



Advertisements
Similar presentations
Information Retrieval (IR) on the Internet. Contents  Definition of IR  Performance Indicators of IR systems  Basics of an IR system  Some IR Techniques.
Advertisements

Chapter 5: Introduction to Information Retrieval
Toward Automatic Music Audio Summary Generation from Signal Analysis Seminar „Communications Engineering“ 11. December 2007 Patricia Signé.
Mining External Resources for Biomedical IE Why, How, What Malvina Nissim
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Multiple Agents for Pattern Recognition Louis Vuurpijl
Introducing of handwritten icons Ralph Niels, Don Willems and Louis Vuurpijl.
WMES3103 : INFORMATION RETRIEVAL INDEXING AND SEARCHING.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Overview of Web Data Mining and Applications Part I
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
Content-Based Video Retrieval System Presented by: Edmund Liang CSE 8337: Information Retrieval.
Handwriting Copybook Style Analysis Of Pseudo-Online Data Student and Faculty Research Day Mary L. Manfredi, Dr. Sung-Hyuk Cha, Dr. Charles Tappert, Dr.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
1 Stuart West Content-Based Information Retrieval (CBIR) in Images The Applications and the Real World Uses.
Loop Investigation for Cursive Handwriting Processing and Recognition By Tal Steinherz Advanced Seminar (Spring 05)
K. Zagoris, K. Ergina and N. Papamarkos Image Processing and Multimedia Laboratory Department of Electrical & Computer Engineering Democritus University.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Ralph Niels & Louis Vuurpijl Nijmegen Institute for Cognition and Information Radboud University Nijmegen The Netherlands.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
NoteSearch - Find what you’re looking for. Prototype Team B.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Chapter 6: Information Retrieval and Web Search
Distributed Information Retrieval Server Ranking for Distributed Text Retrieval Systems on the Internet B. Yuwono and D. Lee Siemens TREC-4 Report: Further.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Online Signature Verification Based on Dynamic Regression Signature Verification 11/06/2003.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Recommendations for E-Learning Personalization.
Comparison of Handwritings Miroslava Božeková Thesis supervisor: Doc. RNDr. Milan Ftáčnik, CSc.
Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media.
Yajuan Lü, Jin Huang and Qun Liu EMNLP, 2007 Presented by Mei Yang, May 12nd, 2008 Improving SMT Preformance by Training Data Selection and Optimization.
Information Retrieval Part 2 Sissi 11/17/2008. Information Retrieval cont..  Web-Based Document Search  Page Rank  Anchor Text  Document Matching.
Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Psychiatric document retrieval using a discourse-aware model Presenter : Wu, Jia-Hao Authors : Liang-Chih.
NATURAL LANGUAGE PROCESSING Zachary McNellis. Overview  Background  Areas of NLP  How it works?  Future of NLP  References.
Citation-Based Retrieval for Scholarly Publications 指導教授:郭建明 學生:蘇文正 M
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Arabic Handwriting Recognition Thomas Taylor. Roadmap  Introduction to Handwriting Recognition  Introduction to Arabic Language  Challenges of Recognition.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
1 A Statistical Matching Method in Wavelet Domain for Handwritten Character Recognition Presented by Te-Wei Chiang July, 2005.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Information Retrieval in Practice
Linguistic Graph Similarity for News Sentence Searching
Multimedia Information Retrieval
Multimedia Information Retrieval
Lecture 8 Information Retrieval Introduction
Chapter 5: Information Retrieval and Web Search
Handwritten Characters Recognition Based on an HMM Model
Department of Computer Science Ben-Gurion University of the Negev
Information Retrieval and Web Design
Automatic Handwriting Generation
Presentation transcript:

Writer identification through information retrieval Ralph Niels, Franc Grootjen & Louis Vuurpijl

A search engine for forensic experts Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl

Overview Forensic writer identification Prototypical shapes in handwriting Information retrieval (IR) Traditional Writer identification using prototypes Experiments Method Results Conclusions & future work Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl

Forensic writer identification Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl

Forensic information retrieval Web search: query of words to search in documents containing words Forensic search: query of characters to search in documents containing characters Previous work*: sub-character level, binary features Based on characters: improves justification possibilities Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl * A. Bensefia, T. Paquet, and L. Heutte. A writer identification and verification system. Pattern Recogn. Letters, 26(13):2080–2092, 2005.

Forensic information retrieval Dictionary of character shapes: prototypes –Experts use prototypes –Describe query & documents by prototype usage instances of prototype Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl

Character to prototype matcher Find most similar prototype for each character W48 h16 a9 t1 y2 o1 u23 d16 i25 d12 i6 s12 (…) a5 a9 a16 a52 (…) Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl

Prototypes Averaged shapes of real handwritten characters Dynamic Time Warping-distance to find most similar prototype Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl R. Niels & L. Vuurpijl & L. Schomaker. Automatic allograph matching in forensic writer identification. International Journal of Pattern Recognition and Artificial Intelligence. Vol. 21, No. 1. Pages February 2007.

The IR model for writer identification Character to prototype matcher Indexing Matching Character to prototype matcher Writer input Query input Prototype list af(q) af(w)aw(w) Ranked list Justification Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl

Indexing: create weighted vectors Vector of prototype usage for each writer: af(w) Adjust weight of prototypes in that vector: Protos used by many writers: not distinctive -> lower weight wf(p) = number of writers using proto p Weighted vector of prototype use for each writer Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl

The IR model for writer identification Character to prototype matcher Indexing Matching Character to prototype matcher Writer input Query input Prototype list af(q) af(w)aw(w) Ranked list Justification Prototype frequency in query Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl

The IR model for writer identification Character to prototype matcher Indexing Matching Character to prototype matcher Writer input Query input Prototype list af(q) af(w)aw(w) Ranked list Justification Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl

Matching Input ‘Database writers’: Indexed writer vectors aw(w) ‘Query writer’: Vector af(q) Match: Calculate cosine of angle between af(q) and each aw(w) Output Ranked list of writers (similarity to query) Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl

The IR model for writer identification Character to prototype matcher Indexing Matching Character to prototype matcher Writer input Query input Prototype list af(q) af(w)aw(w) Ranked list Justification Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl

Justification Similarity value (cosine of angle) Prototype contribution to retrieval result Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl

Justification Forensic expert can further inspect justification Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl

Experiment 43 writers from plucoll database Online data Segmented into characters How well does our technique perform given a certain amount of data (characters)? Amount of characters in database (d) Amount of characters in query (q) Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl

Experiment Pick d random letters from each database writer Pick q random other letters from one writer, and use those as query Find most similar writer Prototypes iwf(p), aw(w) Matching Vary d and q Repeat 10 times for each writer Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl Repeat 10 times for each comb. of d and q

Results d q d q Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl

Conclusions & future work Needed for 100%: 70 chars (q), 300 chars (d) Average English sentence: characters No black box: results are justified Online data: forensic practice? Extract semi-automatically with help expert Use offline matching technique Just 43 writers Bigger (n writers & n techniques) experiments planned Promising results Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl

Writer identification through information retrieval Ralph Niels Franc Grootjen Louis Vuurpijl A search engine for forensic experts