Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evaluation of a Stylometry System on Various Length Portions of Books

Similar presentations


Presentation on theme: "Evaluation of a Stylometry System on Various Length Portions of Books"— Presentation transcript:

1 Evaluation of a Stylometry System on Various Length Portions of Books
Ida Schulstad, Mark Boga, Cranston Jordan, Kara Pally, Vinnie Monaco, Richard DeStefano, John Stewart, and Charles Tappert Authentication Identification Verification

2 Stylometry “Stylometry is the application of the study of linguistic style, usually to written language …” and “… is often used to attribute authorship to anonymous or disputed documents” – Wikipedia

3 Book Text Experiments In this study, stylometry was used to verify the identity of authors Data: 30 authors and 10 books from each author System: earlier developed stylometry system System enhanced with additional features Performance of the stylometry system was determined on these literary texts In particular, the degree of performance increase with increasing text lengths

4 Classification System: Cha’s Dichotomy Model
Used in All of Our Biometric Authentication Systems The feature space is transformed into a feature-difference space by calculating vector distances between pairs of samples of the same person (intra-person distances) and between pairs of samples of different people (inter-person distances). Mulitdimension space to a 2 dimension space: same person vs. different people For example: you have 3 people with 3 samples each. Calculate differences between samples in terms of distance and plot it. Can then determine within class or between class (same or different person) using NN. 24 is Yoon??? Enter citation (a) Feature space (b) Feature-difference space Transformation from feature space (a) to feature distance space (b)

5 Receiver Operating Characteristic (ROC) Curves
Book Text Experiments - #1 The 30 Author Main Experiment Training and testing files were split in to 5 books for each author. Strong training – the system was trained on the test subjects. EERs for word sizes of 2, 5, and 10 K: 34%, 30%, and 25% Receiver Operating Characteristic (ROC) Curves 250, 500, 1K, 2K, 5K, 10K words. The Equal Error Rate (EER) increases with the Text Length

6 Receiver Operating Characteristic (ROC) Curves
Book Text Experiments - #2 Strong training on 15 of the authors. Trained on 5 books from each author, tested on remaining 5 Performance improved with fewer subjects EERs ~20% for 10K, 24% for 5K, and 30% for 2K word samples. Receiver Operating Characteristic (ROC) Curves 2K, 5K, 10K words

7 Equal Error Rate (EER) vs. Text Length
in Literary Book Texts from 30 Authors EER decreases logarithmically as a function of text length


Download ppt "Evaluation of a Stylometry System on Various Length Portions of Books"

Similar presentations


Ads by Google