Document Examiner Feature Extraction: Thinned vs Skeletonised Images

Slides:



Advertisements
Similar presentations
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advertisements

Advanced Piloting Cruise Plot.
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 5 Author: Julia Richards and R. Scott Hawley.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
XP New Perspectives on Microsoft Office Word 2003 Tutorial 2 1 Microsoft Office Word 2003 Tutorial 2 – Editing and Formatting a Document.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Determine Eligibility Chapter 4. Determine Eligibility 4-2 Objectives Search for Customer on database Enter application signed date and eligibility determination.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Year 6 mental test 5 second questions
Around the World AdditionSubtraction MultiplicationDivision AdditionSubtraction MultiplicationDivision.
ZMQS ZMQS
Success with ModelSmart3D
Calypso Construction Features
Richmond House, Liverpool (1) 26 th January 2004.
1 Photometric Stereo Reconstruction Dr. Maria E. Angelopoulou.
Graham Leedham & Vladimir Pervouchine
ABC Technology Project
Capacity-Approaching Codes for Reversible Data Hiding Weiming Zhang, Biao Chen, and Nenghai Yu Department of Electrical Engineering & Information Science.
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
Dummy Dealers welcome !. euroCADcrete Step 1: General data euroCADcrete: Educational software according EC2.
VOORBLAD.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
1 Evaluations in information retrieval. 2 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied.
Squares and Square Root WALK. Solve each problem REVIEW:
Lecture 8: Testing, Verification and Validation
Chapter 5 Test Review Sections 5-1 through 5-4.
SIMOCODE-DP Software.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Lecture 4 vector data analysis. 2014年10月11日 2014年10月11日 2014年10月11日 2 Introduction Based on the objects,such as point,line and polygon Based on the objects,such.
CORRECTION/COMPLETION OF RAINFALL DATA PRIMARY & SECONDARY VALIDATION –FLAGGING : SUSPECT OR INCORRECT VALUES –MISSING : NON-OBSERVANCE OR LOSS OF DATA.
Addition 1’s to 20.
25 seconds left…...
Polynomial Functions of Higher Degree
Test B, 100 Subtraction Facts
Week 1.
We will resume in: 25 Minutes.
A SMALL TRUTH TO MAKE LIFE 100%
1 Unit 1 Kinematics Chapter 1 Day
PSSA Preparation.
TASK: Skill Development A proportional relationship is a set of equivalent ratios. Equivalent ratios have equal values using different numbers. Creating.
Was studierst du? Kapitel 4. 4 | 2 Copyright © Cengage Learning. All rights reserved. Present tense of werden.
DIGITAL IMAGE PROCESSING
Objectives: You will understand: How analyst can individualize handwriting to a particular person. What types of evidence are submitted to the document.
(Off-Line) Cursive Word Recognition Tal Steinherz Tel-Aviv University.
FEATURE EXTRACTION FOR JAVA CHARACTER RECOGNITION Rudy Adipranata, Liliana, Meiliana Indrawijaya, Gregorius Satia Budhi Informatics Department, Petra Christian.
Document Analysis. Document analysis consists of many different parts of the document Document analysis consists of many different parts of the document.
HOnors Forensic Science.  I. Document Examiners  A. Involves examination of handwriting and typewriting to ascertain the source or authenticity of a.
1 Forensic Science Questioned Documents. 2 Questioned Documents Questioned Documents Any object that contains handwritten or typewritten/printed markings.
DOCUMENT AND HANDWRITING ANALYSIS. DOCUMENTS AS EVIDENCE Document specialists are called to : Verify handwriting and signatures Authenticate documents.
Computer-based identification and tracking of Antarctic icebergs in SAR images Department of Geography, University of Sheffield, 2004 Computer-based identification.
Handwritten Signature Verification
Forensic Science.  I. Document Examiners  A. Involves examination of handwriting and typewriting to ascertain the source or authenticity of a questioned.
Handwriting Analysis Part 2. Characteristics Handwriting experts generally look at 12 characteristics of a person’s writing. They try and compare a sample.
Unabomber Reading Summarize the article and tell me your thoughts on Ted Kaczynski. Describe how he was caught by the FBI. Needs to be ½ page for full.
Course 3 Binary Image Binary Images have only two gray levels: “1” and “0”, i.e., black / white. —— save memory —— fast processing —— many features of.
Arabic Handwriting Recognition Thomas Taylor. Roadmap  Introduction to Handwriting Recognition  Introduction to Arabic Language  Challenges of Recognition.
Handwriting Comparison
Target 4-2 Handwriting Analysis.
Handwriting Analysis Like Fingerprints, every person’s handwriting is unique and personalized Handwriting is difficult to disguise or forge Questioned.
Presentation transcript:

Document Examiner Feature Extraction: Thinned vs Skeletonised Images Vladimir Pervouchine and Graham Leedham Forensics and Security Laboratory School of Computer Engineering Nanyang Technological University Singapore

Outline Forensic handwriting examination The need for accurate stroke extraction Thinning based method Vector skeletonisation method Feature extraction From thinned images From vector skeletons Writer classification method Results Conclusions

Forensic handwriting examination Variation of the word “the” written by 8 different writers. Source: Harrison, 1981

Variation of the letters “G” and “R” written by 15 different writers. Forensic handwriting examination Variation of the letters “G” and “R” written by 15 different writers. Source: Harrison, 1981

Forensic handwriting examination Example of variation in letter formation styles in 10 letters from 9 different writers. Source: Harrison, 1981

Current Methods used by Forensic Document Examiners Primarily involves manual extraction and comparison of various global and local visible features. They are usually doing a comparison test between a “Questioned Document” and a set of “Known Documents”. The objective is to determine whether the “Questioned Document” was, or was not, written by a particular individual. The “Questioned Document” may be in disguised handwriting.

Forgery / Disguise / Alteration Is the writing GENUINE? (the author is who he claims to be) Is the writing FORGED? (the author is not who he claims to be and is attempting to assert the writing is the same as someone else’s) or Is the writing DISGUISED? (the author wishes to deny doing the writing at a later date) or Is the writing ALTERED? (Has someone modified or altered the original document?)

Extraction of handwritten strokes from images Forensic document examiners analyse the pen tip trajectory The trajectory is not readily available from the grayscale handwriting images To mimic extraction of document examiner features it is necessary to approximate pen trajectory We need to preserve individual information in character shapes Many algorithms have been proposed for a similar problem in offline handwriting recognition, but they do not need to preserve the individual traits of characters Need for accurate stroke representation: Many features used by forensic document examiners to describe character shapes are extracted from the pen tip trajectory. In order to extract the same or similar features automatically it is necessary to have extract the trajectory from grayscale images of handwriting. Since the features of handwriting used to distinguish between different writers, the approximation of pen tip trajectory should preserve the individual traits of the trajectory. Many algorithms for approximation of handwritten strokes have been designed for handwriting recognition. Thus, preservation of the individual traits of characters was a drawback rather than an advantage (since it is better for handwriting recognition when characters written by different people look as similar as possible). On the opposite, for the problem of extraction of features to distinguish writers the letters that characters represent are assumed to be known and the feature extraction is focused on differences in shapes of the same characters written by different people. Hence, approximation of handwritten strokes should be different from that used in handwriting recognition. The more accurately we approximate original pen trajectory, the more accurately we can measure individual features from it.

Thinning based stroke approximation Matlab Image Processing toolbox thinning (Zhang and Suen thinning algorithm) is used for the first approximation Post processing is applied to remove extra branches remove spurious loops remove small connected components Feature extraction attempts to overcome remaining artifacts Original image Binarisation Thinning Remove small connected components Find junction points Find end points Correct spurious loops While changes are made Prune short branches

Thinning based stroke approximation 1. Original image 2. Binarised image 3. Thinned image 4. Corrected image

Vector skeletonisation method 1st stage: vectorisation. Spline-approximated skeletal branches are formed 2nd stage: minimum cost configuration of branch interconnections is found. Branches are grouped into strokes For each retraced segment of stroke restoration of hidden loop is attempted 3rd stage: Near-junction and loop spline knots are adjusted to make strokes smoother Original image Vectorisation Binary encoding of junction points configuration GA optimisation to find configuration with lowest cost Adjustment of loop and near-junction knots

Vector skeletonisation method 1. Original image 2. Skeletal branches 4. Adjusted skeleton Red arrows show segments that changed after adjustment (left to right: two branches, retraced branch, hidden loop) Thick lines were put manually on the skeleton to make it more visible. 3. Strokes with retraced segments and loops

Feature extraction: list of features Features extracted from both raster and vector skeletons Height Width Height to width ratio Distance HC Distance TC Distance TH Angle between TH and TC Slant of stem of t Slant of stem of h Position of t-bar Connected/disconnected t and h Average stroke width Average pseudo-pressure Standard deviation of average pseudo-pressure Features extracted from vector skeleton only Standard deviation of stroke width Number of strokes Number of loops and retraced branches Straightness of t-stem Straightness of t-bar Straightness of h-stem Presence of loop at top of t-stem Presence of loop at top of h-stem Maximum curvature of h-knee Average curvature of h-knee Relative size (diameter) of h-knee

Feature extraction Position of t-bar feature is binary: 1 if t-bar crosses stem and 0 if touches or is separated or missing Size of h-knee is measured parallel to a horizontal line Pseudo-pressure is measured as the gray level normalised to 1. Straightness is measured as the ratio of the stroke length to the distance between its ends h-knee t-bar t-stem h-stem

Writer classification scheme Constructive ANN with spherical threshold units (DistAl) was used as classifier 100 samples of grapheme “th” drawn from 20 different writers 5-fold cross-validation method is used to evaluate classification accuracy Three experiments: Original feature set (features 1-14), features extracted using raster skeleton Original feature set, features extracted using vector skeleton Extended feature set (features 1-25),features extracted from vector skeleton Additionally, accuracy of feature extraction was measured

Results: accuracy of feature extraction Extraction software performed analysis of shape to detect various parts of character Analysis was performed step by step At each step some feature was extracted If at least one feature was not extracted or extracted incorrectly, the sample was counted as “failure” Input: original image, binarised image, skeleton Feature vector Height, width, height to width ratio Analysis of branches originating from top end points Stem features Method Accuracy, % Raster 87 Vector 94 Search for t-bar …

Results: accuracy of writer classification Method Writer classification accuracy, % Original feature set + raster skeleton 73 Original feature set + vector skeleton 87 Extended feature set + vector skeleton 98 Conclusions Use of vector skeleton results in less feature extraction failures Use of vector skeleton produces higher writer classification accuracy even on the same feature set – this indicates that feature values are measured more accurately Vector skeletonisation enables extraction of more structural features, which, in turn, increases writer classification accuracy Advantages in green, drawbacks in red.