Download presentation
Presentation is loading. Please wait.
Published byBuddy Hunter Modified over 9 years ago
1
1 Authorship Verification as a One- Class Classification Problem Moshe Koppel Jonathan Schler Bar-Ilan University, Israel International Conference on Machine Learning (ICML) July 2004 Acceptance Rate = 65/368 = 18% 好想法, 實驗 …, 文章不夠詳細 … ( 或是我沒仔細看 )
2
2 Introduction Authorship Attribution (>90% accuracy) Authorship Verification Authorship attribution 被做爛了 … who wrote did he/she write
3
3 Authorship Verification Problems Lots of negative examples Need to construct a set of representative negative examples Consider an SVM for Shakespeare V.S. non-Shakespeare What is non-Shakespeare?
4
4 Corpus (1/2) 21 English books written by 10 different authors Each book > 500KB
5
5 Corpus (2/2)
6
6 Training
7
7 Unmasking Method (1/2) Given chunks of an anonymous text (X) Given chunks of an author’s writing (A) Choose 250 most frequently occurring words in A and X as features Transform each chunk in A and X into an SVM vector 特徵值是什麼 (0/1…)?
8
8 Unmasking Method (2/2) 1.Use SVM to determine the accuracy of a 10-fold cross-validation experiment for A against X 2.Remove top 6 most discriminating features 3.Go to step 1 4.Plot a graph of Accuracy V.S. Iteration 怎麼選 6 個特徵 ?
9
9 Observation Cross-Validation Accuracy Iteration 整篇文章的重點都在這裡 Same-Author Curve Different-Author Curves Book: An Ideal Husband
10
10 Identify Same-Author Curves (1/2) How to tell the difference between same- author and different-author curves? Characterize a curve by a feature vector Accuracy after k elimination rounds Accuracy difference between round k and k+1 Accuracy difference between round k and k+2 k’th highest accuracy drop in one iteration k’th highest accuracy drop in two iterations
11
11 Identify Same-Author Curves Train SVM on same-author and different- author curve vectors 需要有代表性的負面範例 …
12
12 Classifying
13
13 Classification Given an anonymous text Y Use unmasking method on Y Get the accuracy curve for Y Convert curve to feature vector Classify the new curve vector as same- author or different-author using SVM
14
14 Training 蝦米碗糕 ?
15
15 Classifying 蝦米碗糕 ?
16
16 Testing
17
17 Experiment (1/2) Leave-one-book-out Test Take out one Book (B) from the 21 books Perform unmasking method on the remaining 20 books against all authors Train an SVM on same-author and different- author curves Perform unmasking method on B against all authors
18
18 Experiment (2/2) Classify B’s curves as same-author or different-author using the trained SVM 我合理懷疑這實驗怪怪的 …
19
19
20
20 Baseline Given chunks of an anonymous text (X) Given chunks of an author’s writing (A) Choose 250 most frequently occurring words in A features Train SVM on A chunks If more than half of X chunks are assigned to A, then X was written by the author 負面例子呢 ?
21
21 Experiment Result Unmasking 181 of 189 different-author curves correct 19 of 20 same-author curves correct 95.7% accuracy Baseline 143 of 189 different-author curves correct 6 of 20 same-author curves correct 71.3% accuracy
22
22 Real World Mystery Cross-Validation Accuracy Iteration The Case of the Bashful Rabbi: Ben Ish Chai Ben Ish Chai 19 世紀有名猶太傳教士, 謊稱自己的文章是別處找到的. Others
23
23 Conclusion Use accuracy curves to solve author verification problem achieves high accuracy Unmasking method can be extended to other fields 實驗重做 !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.