Document Examiner Feature Extraction: Thinned vs Skeletonised Images

Document Examiner Feature Extraction: Thinned vs Skeletonised Images
Vladimir Pervouchine and Graham Leedham Forensics and Security Laboratory School of Computer Engineering Nanyang Technological University Singapore

Outline Forensic handwriting examination
The need for accurate stroke extraction Thinning based method Vector skeletonisation method Feature extraction From thinned images From vector skeletons Writer classification method Results Conclusions

Forensic handwriting examination
Variation of the word “the” written by 8 different writers. Source: Harrison, 1981

Variation of the letters “G” and “R” written by 15 different writers.
Forensic handwriting examination Variation of the letters “G” and “R” written by 15 different writers. Source: Harrison, 1981

Forensic handwriting examination Example of variation in letter formation styles in 10 letters from 9 different writers. Source: Harrison, 1981

Current Methods used by Forensic Document Examiners
Primarily involves manual extraction and comparison of various global and local visible features. They are usually doing a comparison test between a “Questioned Document” and a set of “Known Documents”. The objective is to determine whether the “Questioned Document” was, or was not, written by a particular individual. The “Questioned Document” may be in disguised handwriting.

Forgery / Disguise / Alteration
Is the writing GENUINE? (the author is who he claims to be) Is the writing FORGED? (the author is not who he claims to be and is attempting to assert the writing is the same as someone else’s) or Is the writing DISGUISED? (the author wishes to deny doing the writing at a later date) or Is the writing ALTERED? (Has someone modified or altered the original document?)

Extraction of handwritten strokes from images
Forensic document examiners analyse the pen tip trajectory The trajectory is not readily available from the grayscale handwriting images To mimic extraction of document examiner features it is necessary to approximate pen trajectory We need to preserve individual information in character shapes Many algorithms have been proposed for a similar problem in offline handwriting recognition, but they do not need to preserve the individual traits of characters Need for accurate stroke representation: Many features used by forensic document examiners to describe character shapes are extracted from the pen tip trajectory. In order to extract the same or similar features automatically it is necessary to have extract the trajectory from grayscale images of handwriting. Since the features of handwriting used to distinguish between different writers, the approximation of pen tip trajectory should preserve the individual traits of the trajectory. Many algorithms for approximation of handwritten strokes have been designed for handwriting recognition. Thus, preservation of the individual traits of characters was a drawback rather than an advantage (since it is better for handwriting recognition when characters written by different people look as similar as possible). On the opposite, for the problem of extraction of features to distinguish writers the letters that characters represent are assumed to be known and the feature extraction is focused on differences in shapes of the same characters written by different people. Hence, approximation of handwritten strokes should be different from that used in handwriting recognition. The more accurately we approximate original pen trajectory, the more accurately we can measure individual features from it.

Thinning based stroke approximation
Matlab Image Processing toolbox thinning (Zhang and Suen thinning algorithm) is used for the first approximation Post processing is applied to remove extra branches remove spurious loops remove small connected components Feature extraction attempts to overcome remaining artifacts Original image Binarisation Thinning Remove small connected components Find junction points Find end points Correct spurious loops While changes are made Prune short branches

Thinning based stroke approximation
1. Original image 2. Binarised image 3. Thinned image 4. Corrected image

Vector skeletonisation method
1st stage: vectorisation. Spline-approximated skeletal branches are formed 2nd stage: minimum cost configuration of branch interconnections is found. Branches are grouped into strokes For each retraced segment of stroke restoration of hidden loop is attempted 3rd stage: Near-junction and loop spline knots are adjusted to make strokes smoother Original image Vectorisation Binary encoding of junction points configuration GA optimisation to find configuration with lowest cost Adjustment of loop and near-junction knots

Vector skeletonisation method
1. Original image 2. Skeletal branches 4. Adjusted skeleton Red arrows show segments that changed after adjustment (left to right: two branches, retraced branch, hidden loop) Thick lines were put manually on the skeleton to make it more visible. 3. Strokes with retraced segments and loops

Feature extraction: list of features
Features extracted from both raster and vector skeletons Height Width Height to width ratio Distance HC Distance TC Distance TH Angle between TH and TC Slant of stem of t Slant of stem of h Position of t-bar Connected/disconnected t and h Average stroke width Average pseudo-pressure Standard deviation of average pseudo-pressure Features extracted from vector skeleton only Standard deviation of stroke width Number of strokes Number of loops and retraced branches Straightness of t-stem Straightness of t-bar Straightness of h-stem Presence of loop at top of t-stem Presence of loop at top of h-stem Maximum curvature of h-knee Average curvature of h-knee Relative size (diameter) of h-knee

Feature extraction Position of t-bar feature is binary: 1 if t-bar crosses stem and 0 if touches or is separated or missing Size of h-knee is measured parallel to a horizontal line Pseudo-pressure is measured as the gray level normalised to 1. Straightness is measured as the ratio of the stroke length to the distance between its ends h-knee t-bar t-stem h-stem

Writer classification scheme
Constructive ANN with spherical threshold units (DistAl) was used as classifier 100 samples of grapheme “th” drawn from 20 different writers 5-fold cross-validation method is used to evaluate classification accuracy Three experiments: Original feature set (features 1-14), features extracted using raster skeleton Original feature set, features extracted using vector skeleton Extended feature set (features 1-25),features extracted from vector skeleton Additionally, accuracy of feature extraction was measured

Results: accuracy of feature extraction
Extraction software performed analysis of shape to detect various parts of character Analysis was performed step by step At each step some feature was extracted If at least one feature was not extracted or extracted incorrectly, the sample was counted as “failure” Input: original image, binarised image, skeleton Feature vector Height, width, height to width ratio Analysis of branches originating from top end points Stem features Method Accuracy, % Raster 87 Vector 94 Search for t-bar …

Results: accuracy of writer classification
Method Writer classification accuracy, % Original feature set + raster skeleton 73 Original feature set + vector skeleton 87 Extended feature set + vector skeleton 98 Conclusions Use of vector skeleton results in less feature extraction failures Use of vector skeleton produces higher writer classification accuracy even on the same feature set – this indicates that feature values are measured more accurately Vector skeletonisation enables extraction of more structural features, which, in turn, increases writer classification accuracy Advantages in green, drawbacks in red.

Document Examiner Feature Extraction: Thinned vs Skeletonised Images

Similar presentations

Presentation on theme: "Document Examiner Feature Extraction: Thinned vs Skeletonised Images"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Document Examiner Feature Extraction: Thinned vs Skeletonised Images

Similar presentations

Presentation on theme: "Document Examiner Feature Extraction: Thinned vs Skeletonised Images"— Presentation transcript:

Similar presentations

About project

Feedback