Handwriting Copybook Style Analysis Of Pseudo-Online Data Student and Faculty Research Day Mary L. Manfredi, Dr. Sung-Hyuk Cha, Dr. Charles Tappert, Dr. Sungsoo Yoon May 6, 2005
Sample of Copybook Style
Common Forensic Problem – Determine Writer of Questioned Document: e.g., Ransom Note
Feasibility Study that might help in the writer identification process Reduce the suspect population of the writer of a questioned document Identify the Copybook Style of a questioned document Use cluster analysis to reveal similarities among copybook styles
I/O Sample Input Output Copybook style = usa2
Many Techniques Used for Writer Identification (Online and Offline) Fragmented Connected-Component Contours Hidden Markov Methods Neural Networks Gray Level Distribution Fractal Analysis Directional Element Features and Linear Transform All techniques have pros and limitations More accurate ways always being sought
Offline versus Online Data Offline data are usually scanned images and only have static information Online uses special equipment (e.g., tablet digitizers, pen computers) to also capture the dynamic information (stroke number, order, direction, pressure, velocity etc..) as a person writes
Pseudo-Online Data Taking offline data and tracing it using a mouse or pen-enabled tablet to give it dynamic characteristics.
Offline Feature extraction 3 (left)8 (middle)8 (right) 3 (left) (middle) (right)54610 Distances between characters using Template Matching
Pseudo-Online Feature extraction 3 (left)8 (middle)8 (right) 3 (left) (middle) (right)1980 Distances between characters using SDSS (stroke direction sequence strings)
Stroke Directions
DB Creation Obtain various Roman copybook styles both cursive and manuscript Scan in each letter Apply normalization procedures to the letters Trace each letter with either a mouse or pen-enabled tablet and capture its dynamic characteristics
Methodology for processing a questioned document Determine Copybook Style Trace each letter of interest from a questioned document Match each letter against each of the corresponding letters in the DB. The letter with the smallest distance is the matched copybook style
Questioned Document
Distance Matrix (try to determine the questioned document style )
Further analysis of Copybook Styles Clustering It is the unsupervised organization of patterns into groups so that the patterns in the same group are more similar to each other than to patterns in other groups Bring most similar letters together based on certain characteristics (here it is distance) We take each letter in the copybook style DB and calculate the distance between all others of the same letter (compare all uppercase As against all other uppercase As )
Partial Distance Matrix for Cluster Analysis
Dendrogram of Uppercase As
Clusters Resulting from the Uppercase A copybook style
Summary Feasibility study for a new approach to help writer identification by narrowing the suspect population by determining the copybook style of a questioned document Cluster analysis of the copybook styles Will it help ? More work necessary to determine this
Future Work Do a more extensive analysis of the copybook styles. This would include adding more copybook styles to the DB, producing and naming the resulting clusters and producing various copybook style classifications. Get input from document examiners as to the possible value of this work