Download presentation
Presentation is loading. Please wait.
Published byJulianna Osborne Modified over 9 years ago
1
IIIT Hyderabad
2
Handwriting Graphical representation of thoughts Using predefined symbols Still used frequently (e.g., note taking) An acquired skill Years of habituation and practice Complex generation process Neuromuscular perceptual-motor task Hand contains some 27 bones and 40 muscles
3
IIIT Hyderabad Handwriting Identification Handwritten documents have associated identity Handwriting Identification Study of writership of the documents Comparison with reference handwritten documents
4
IIIT Hyderabad Individuality (example)
5
IIIT Hyderabad Recognition Vs Identification Handwriting Recognition To automatically understand the underline text in the document Design of automated handwritten document reading systems Suppress variation due to writer or handwriting style Handwriting Identification Study to determine the writer of the document Enhance the variation due to different handwriting styles
6
IIIT Hyderabad Problem Statement Writer Identification Identify writer of a questioned document Given pool of writers Writer Verification Verify whether the claimed identity is right? Given: Data based of writers Forensic Document Analysis Verify whether two given documents are written by same person?
7
IIIT Hyderabad Identification Reference Data Base Questioned Document 35 50 65 Matching Score Result Writer - 3 Comparisons Who wrote this document? 1: N Matching
8
IIIT Hyderabad Verification Reference Data Base Questioned Document Mayank: I wrote this document !!! MayankSachin Amit Comparator Distance < Threshold Yes NO Threshold: decided based on training documents’ Within and Between writer distance distributions 1: 1 Matching
9
IIIT Hyderabad Individuality Features Sub-character and character level Shape and size Choice of allograph Word level Connections and character spacing Aspect Ratio Line level Slant and slope Word spacing Paragraph and page level Indentations and arrangements of text Uniformity of margins W1W2 Character Level Individuality W1 W2 Word Level Individuality
10
IIIT Hyderabad Line and Paragraph Level Writer-1Writer-2 Slant and Slope of lines Parallelism of Lines Word Spacing – number of words in a lineUniformity of Margins Overall Texture
11
IIIT Hyderabad Challenges High within writer variations Due to mood dependent nature of handwriting No two piece of handwriting by any individual are same Low between writer variations Handwriting must be readable Degree of variations are low
12
IIIT Hyderabad Online Vs Offline Offline Matrix of integers Only shape and size information is available Temporal information about how stroke is drawn is lost Online Sequence of X-Y coordinates, Pen up-down events Shape and size information is available Sequencing of points and strokes is available
13
IIIT Hyderabad Data collection and Annotation Major Hurdle Sequential process: Devices needed for online handwriting People are reluctant to writing Standard databases are not available Online handwriting collection devices are not accurate Automatic segmentation and annotation Research problem Data collection 600 pages of data from around 50 writers in various scripts
14
IIIT Hyderabad State of the Art Done by handwriting experts Mostly manually State of art systems are not available Using Context dependent information such as origin, type and condition of the documents Difficult to model mathematically
15
IIIT Hyderabad Theme Identifying consistent features automatically To discriminate between writers Usability of discriminating features Preserve discrimination
16
IIIT Hyderabad Major Contributions Text-independent writer identification Designing codebook of writers Automatically identifying and extracting discriminating features Text-dependent writer verification Writer-specific text generation Robust to forgery Forensic document examination Repudiation detection in handwritten documents
17
IIIT Hyderabad Text-independent writer identification
18
IIIT Hyderabad Text-independent ? Underline text is not known Data is not annotated Given: Sequence of strokes and x-y coordinate values Challenges of text-independent Extract consistent curves (features) from documents Compare similar features between two documents Design codebook of individual writers
19
IIIT Hyderabad Consistency…
20
IIIT Hyderabad Codebook of a writer Six different clusters extracted from Devanagari script.
21
IIIT Hyderabad Theoretical background Handwriting modeling studies Strokes is the combination of different forces Handwriting curves become consistent due to habituation Relative velocity points of strokes are constant for same writer ( Empirical results ) Velocity Profile of above stroke Stroke from Devanagari Script
22
IIIT Hyderabad Classifier Soft Classification NN 1 NN 2 NN 3 NN n ……. Combined Result Classify Writers Summarized framework Questioned document Cluster into different clusters Writer Classification
23
IIIT Hyderabad Results Experimented with Roman, Hindi, Cyrillic, Arabic and Hebrew Training data Approx. 300-400 curves for Roman Approx. 700-800 curves for others Test Data 100 curves for Roman 200-300 curves for others Tables and graphs are on next page…..
24
IIIT Hyderabad Varying No of Curves Accuracy increases with number of curves. >85% accuracy reached with 200 curves (10-12 words). Accuracy with 12 words
25
IIIT Hyderabad Script Vs Accuracy ~10 writers for all scripts For Most Scripts Top-2 accuracy is nearly 100% except Chinese Confusion between pairs of writers
26
IIIT Hyderabad Related work Line level features –Word spacing –Lower and Upper profile –Fractal & wavelet features –Loops and Blobs Paragraph level features –Image processing Grey scale histogram Run length coding Fractal image compression –Texture features Gabor filter, Wavelet Contour-let GGD Grey scale covariance matrix –Online features Pen pressure, velocity, azimuth Velocity of Bary center –Codebook generation Using directional features Our approach –Code book design using –Sub-character features –Script independent framework –Online handwriting data –Identification with less amount of data –Automatic Identification of consistent and discriminating features
27
IIIT Hyderabad Result comparison Schomaker et al[28] Combination of directional, texture and image processing features Identification: accuracy of 87% with 900 writers Verification: Equal error rate of 3%-8% Test Data size: 1 page of handwritten data Our approach[5] Using shape based features Identification accuracy of ~85% with 15 writers Test data size: 12 words (1 line)
28
IIIT Hyderabad Analysis Shape and size based primitives Obtain reasonable accuracy with simple algorithm. Chinese script Most of the strokes are straight line segment Inter-stroke relations based features can be used To increase accuracy Robust clustering and classification algorithm Fusion with high level like line and paragraph primitive
29
IIIT Hyderabad Text dependent writer Verification
30
IIIT Hyderabad Problem Statement Text-independent systems Large amount of data needed Text-dependent framework Higher Accuracy Small amount of data needed Problems (Text-dependent systems) Forgery (due to fixed text known in advance) Authentication text not known (usually random text is used)
31
IIIT Hyderabad Signature Vs Text-dependent Signature and Text-dependent handwriting Variations are unlimited, signature need not be readable Writer consciously tries to write the same signature Challenges Discrimination between Within and Between writer variation has to be done Discriminating distance method have to find out
32
IIIT Hyderabad System Specification Empirical finding Discriminating power of primitives vary for individuals Primitives: sub-characters, characters, words, etc. System Specifications Writer – specific text For higher accuracies With limited amount of text Varying text across multiple authentication Robust to forgery
33
IIIT Hyderabad Boosting? Classifier combination method Combines weak classifiers to generate a accurate learning algorithm Greedy algorithm Select weak classifiers on each stage based on previously selected classifier Maintains a distribution of weights over training samples
34
IIIT Hyderabad Framework Verification as 2-class problem Positive samples Vs Negative samples Given Set of writers and primitives Table of discriminating power Randomness is included at each stage Proportional to the Discriminating power of the classifier More Discriminating: more probable to be accepted
35
IIIT Hyderabad Text Generation Process Bag of Primitives List of Writers W1W2W3 W4W5W6 Randomness is included at selection process. Threshold selected Is biased: accepting the writer For lower False Rejection Rates Fix Threshold and Reject Writers Select it or not? Accuracy
36
IIIT Hyderabad Effect of Boosting Distance Probability X1 Within writer Distance Between writer Distance Number of Boosting Stages
37
IIIT Hyderabad Dynamic Time Warping Naïve Alignment Re-sampled series DTW Alignment Time Series Alignment Dynamic Programming Approach Different length feature vectors can be compared
38
IIIT Hyderabad Stroke Comparison Dynamic Time Warping Alignment of stroke done using dynamic programming Directional features Strokes representation: 12 Bins of curvature directions Curvature angle: Different between adjacent tangents direction 112334300001 0360
39
IIIT Hyderabad Results Experimented with English script (20 writers) and Hindi script(10 writers) DTW and Directional feature extraction methods are used Each user written about 10-12 words each 3 fold cross-validation is used
40
IIIT Hyderabad Performance measures False acceptance rate Percentage of user forge user those are accepted Should be lower for forensic application Security is the major concern False rejection rates Percentage of genuine users those are rejected Should be lower for civilian applications Usability is the major concern
41
IIIT Hyderabad False Accept Rate (Directional Feature)
42
IIIT Hyderabad False Reject Rate (Directional Features)
43
IIIT Hyderabad False Accept Rate (DTW)
44
IIIT Hyderabad False Reject Rate (DTW)
45
IIIT Hyderabad Definition Threshold-1 Control the range of variations within writers Decided based on positive samples Threshold-2 Confidence before rejecting other writers (negative samples) Lower threshold-2 == Higher confidence
46
IIIT Hyderabad Effect of thresholds.. (DTW and Hindi script)
47
IIIT Hyderabad Effect of thresholds.. (DTW and Hindi script)
48
IIIT Hyderabad No. of word comparisons.. (DTW & Hindi script)
49
IIIT Hyderabad Effect of thresholds.. (Directional feature and Hindi script)
50
IIIT Hyderabad Effect of thresholds.. (Directional feature and Hindi script)
51
IIIT Hyderabad Effect of thresholds.. (Directional features and English script)
52
IIIT Hyderabad Effect of thresholds.. (Directional features and English script)
53
IIIT Hyderabad No. of word comparisons.. (Directional & Hindi script)
54
IIIT Hyderabad No. of word comparisons.. (Directional & English Script)
55
IIIT Hyderabad Number of writers Vs Accuracy (English)
56
IIIT Hyderabad Number of writers Vs Accuracy (Hindi Script)
57
IIIT Hyderabad Analysis and Summary Writer-specific text generation framework Automatic text generation Automatic threshold generation Text is Varied Robust to forgery
58
IIIT Hyderabad Related work Features –Character level GSC features Structural features Directional features –Word level Word model recognition Shape curvature Shape context Morphological features Feature selection –Static feature selection –PCA based discriminating power Our approach –Writer-specific text generation –Boosting based framework –Text variation –Higher accuracy with limited amount of data
59
IIIT Hyderabad Comparison Srihari et al.[17] Shape context, Shape curvature, GSC features, WMR features Performance: 42%, 22%, 62% and 28% respectively (1000 writers) Test data size- 10 words Our approach Directional features Performance: 95% (20 writers) Test data size: 5 words
60
IIIT Hyderabad Repudiation Detection in Handwriting Documents
61
IIIT Hyderabad Traditional writer identification Vs QDE Assumption of Natural Handwriting Biometrics Terms Repudiation (Negative Biometrics) Forgery (Positive Biometrics) Quantity and quality of data available Cost factor involved Used as expert witness in legal Verdict
62
IIIT Hyderabad Repudiation The rejection or renunciation of a duty or obligation (as under a contract) Merriam-Webster's Dictionary of Law Handwriting Repudiation Deliberately alter his natural handwriting to avoid detection To deny involvement in the case
63
IIIT Hyderabad Repudiation Comparator Calculate Distance Significant Distance? 1 : 1 Matching Questioned Document Data Base Reference Document Same Writer ? Different Writers ? Hypothesis Testing Written by same writer? No Database Dis
64
IIIT Hyderabad Verify whether given documents written by same person or different without assuming Natural Handwriting
65
IIIT Hyderabad hard problem? Normal HandwritingRepudiated Handwriting
66
IIIT Hyderabad Challenges With in writer variations become high Between-writer variations become less as compared. Learning can’t be done as data is not available.
67
IIIT Hyderabad Ray of Hope One can’t exclude from one’s own writing, those discriminating elements of which he/she is not aware Maximum and minimum velocity points remain the same in-spite of absolute velocity. Words have significant overlap at sub-character level.
68
IIIT Hyderabad Framework Statistically significant score between two documents. Utilize online information that can be available No assumptions about distribution of data. May lead to erroneous conclusions.
69
IIIT Hyderabad Assumptions Questioned and reference document either have significant overlap or are same at word level. Reference document is collected in online mode.
70
IIIT Hyderabad System Framework Hypothesis Testing Word Segmentation Word Comparison
71
IIIT Hyderabad Hypothesis Testing To calculate significance of distance between two distributions. According to Neyman Pearson paradigm H0 : Documents written by same writer (Null Hypothesis) H1 : Document written by different writers (Alternative Hypothesis) Intra-document word distances and inter-document word distances are two distribution to be compared. Distributions are compared to find out whether they are generated from same population.
72
IIIT Hyderabad Distribution Comparison KL divergence test (make assumptions on nature of distribution) Kolmogorov Smirnov Test (don’t make any assumptions)
73
IIIT Hyderabad Results Data being collected from 23 different users in English. Each users 3 pages of normal data and 3 pages of repudiated data is collected. Preprocessing: –Words are segmented using semi-automatic toolkit for word segmentation.
74
IIIT Hyderabad Results Intra-document distance Inter-document distance
75
IIIT Hyderabad ROC Curve Genuine Rejection – 82% @ Genuine Acceptance – 100%
76
IIIT Hyderabad Analysis of Results Semi automatic System Used as an aid to expert Null Hypothesis is never accepted without expert intervention. 1 0 Similar Different strong probability of identification probable indications no conclusion indications did not probably did not strong probability did not Scale Used by Forensic Experts
77
IIIT Hyderabad Conclusion and Future work Learning based framework to learn similarity, in- spite of discrimination between documents. Can we tell whether writer is trying to repudiate. Framework which can learn more features and can give independent scores on each feature.
78
IIIT Hyderabad Conclusions Proposed algorithms for automatic identification and extraction of discriminating features for online handwriting Framework proposed for writer-specific text generation and text variations for text-dependent systems Introduced the problem of repudiation and proposed a hypothesis testing based framework for the same
79
IIIT Hyderabad Sachin Gupta and Anoop M. Namboodiri, Repudiation Detection in Handwritten Documents Proc of The 2nd International Conference on Biometrics (ICB'07), PP. 356- 365 Seoul, Korea, 27-29 August, 2007. Anoop M. Namboodiri and Sachin Gupta Text Independent Writer Identification from Online Handwriting, International Workshop on Frontiers in Handwriting Recognition(IWFHR'06), October 23-26, 2006, La Baule, Centre de Congress Atlantia, France. Sachin Gupta and Anoop M. Namboodiri Text dependent Writer Verification using Boosting, submitted to International Conference on Frontiers in Handwriting Recognition (ICFHR’08), Montreal, Canada Sachin Gupta and Anoop M. Namboodiri Text dependent Writer Verification, planned in IEEE Transactions on Information Forensics and Security, 2008 Publications
80
IIIT Hyderabad Future work Fusion of online and offline features for higher accuracies Can we automatically detect person intention to repudiate or forge Based on single document More robust algorithms for feature extraction Different than standard feature selection approaches
81
IIIT Hyderabad THANKING YOU gupta.sachin25@gmail.com
82
IIIT Hyderabad Representation: Incident Angle [1] Curvature [2-4] Size [5-8] Where: S j be the j th primitive C k be the k th cluster W i be the i th Writer is the discriminability of the k th cluster for the i th writer. Proposed framework Online Text Document Critical Points: Minimum and Maximum velocity points. Shape curve: Curve between any two consecutive minimum velocity points. Velocity Profile of above stroke Stroke from Devanagari Script 1 4 3 2 5 67 8 Consistent Primitive Repeating curves Extraction Unsupervised learning algorithms Experimental setup K-Means Six different clusters extracted From Devanagari script. Curve Extraction Representation Characteristic curve Extraction Writer Identification
83
IIIT Hyderabad Number of Writers Vs Accuracy Accuracy Number of writers Results for Devanagari Script Accuracy dependent on the individuality of specific writer
84
IIIT Hyderabad Proposed Framework (example)
85
IIIT Hyderabad Framework (Authentication)
86
IIIT Hyderabad Writer-specific Text Generation Given A bunch of primitives Varying discriminating power for different pairs of writers Aim To select the optimal set of weights for primitives To discriminate specific writer from others Dynamic Feature selection Static feature selection achieve single optimum
87
IIIT Hyderabad Writer-specific Text Generation Text Variation require features robust to forgery Handwriting can have different optimums Different combination of handwriting can provide desired results
88
IIIT Hyderabad Boosting Algorithm Given set of training samples(X) and underline labels(Y) Set of weak hypothesis (h) Initialize weights distribution(D) ( over training samples ) Select weak hypothesis h j, such that m – total number of training samples t - boosting stage
89
IIIT Hyderabad Boosting Update weights Where, Final Hypothesis -- Weight of the classifier t - boosting stage T– total number of Boosting stages
90
IIIT Hyderabad Discriminating Power of primitives
91
IIIT Hyderabad Text Generation Process Rejected Writer Distance Probability X1 X2 X3 X4 X5 X6 Rejected Writer Distance Probability X1 X3 X4 X6 Rejected Writer Distance Probability X1 X4 Randomness is included at each stage. Each classifier might be rejected Based on discriminating power. Threshold is Biased towards accepting writer Writer specific thresholds Rejection at any stage will also reject claims Calculate Threshold Select or not?
92
IIIT Hyderabad Normal Handwriting Repudiated Handwriting Repudiated writer - 1 Repudiated writer - 2 Normal writer - 1 Normal writer - 2 Why Repudiation is hard problem? I am confused
93
IIIT Hyderabad Word Comparison Sub-character Information DTW Matching
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.