Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Authorship Verification as a One- Class Classification Problem Moshe Koppel Jonathan Schler Bar-Ilan University, Israel International Conference on Machine.

Similar presentations


Presentation on theme: "1 Authorship Verification as a One- Class Classification Problem Moshe Koppel Jonathan Schler Bar-Ilan University, Israel International Conference on Machine."— Presentation transcript:

1 1 Authorship Verification as a One- Class Classification Problem Moshe Koppel Jonathan Schler Bar-Ilan University, Israel International Conference on Machine Learning (ICML) July 2004 Acceptance Rate = 65/368 = 18% 好想法, 實驗 …, 文章不夠詳細 … ( 或是我沒仔細看 )

2 2 Introduction  Authorship Attribution  (>90% accuracy)  Authorship Verification Authorship attribution 被做爛了 … who wrote did he/she write

3 3 Authorship Verification Problems  Lots of negative examples  Need to construct a set of representative negative examples  Consider an SVM for Shakespeare V.S. non-Shakespeare  What is non-Shakespeare?

4 4 Corpus (1/2)  21 English books written by 10 different authors  Each book > 500KB

5 5 Corpus (2/2)

6 6 Training

7 7 Unmasking Method (1/2)  Given chunks of an anonymous text (X)  Given chunks of an author’s writing (A)  Choose 250 most frequently occurring words in A and X as features  Transform each chunk in A and X into an SVM vector 特徵值是什麼 (0/1…)?

8 8 Unmasking Method (2/2) 1.Use SVM to determine the accuracy of a 10-fold cross-validation experiment for A against X 2.Remove top 6 most discriminating features 3.Go to step 1 4.Plot a graph of Accuracy V.S. Iteration 怎麼選 6 個特徵 ?

9 9 Observation Cross-Validation Accuracy Iteration 整篇文章的重點都在這裡 Same-Author Curve Different-Author Curves Book: An Ideal Husband

10 10 Identify Same-Author Curves (1/2)  How to tell the difference between same- author and different-author curves?  Characterize a curve by a feature vector  Accuracy after k elimination rounds  Accuracy difference between round k and k+1  Accuracy difference between round k and k+2  k’th highest accuracy drop in one iteration  k’th highest accuracy drop in two iterations

11 11 Identify Same-Author Curves  Train SVM on same-author and different- author curve vectors 需要有代表性的負面範例 …

12 12 Classifying

13 13 Classification  Given an anonymous text Y  Use unmasking method on Y  Get the accuracy curve for Y  Convert curve to feature vector  Classify the new curve vector as same- author or different-author using SVM

14 14 Training 蝦米碗糕 ?

15 15 Classifying 蝦米碗糕 ?

16 16 Testing

17 17 Experiment (1/2)  Leave-one-book-out Test  Take out one Book (B) from the 21 books  Perform unmasking method on the remaining 20 books against all authors  Train an SVM on same-author and different- author curves  Perform unmasking method on B against all authors

18 18 Experiment (2/2)  Classify B’s curves as same-author or different-author using the trained SVM 我合理懷疑這實驗怪怪的 …

19 19

20 20 Baseline  Given chunks of an anonymous text (X)  Given chunks of an author’s writing (A)  Choose 250 most frequently occurring words in A features  Train SVM on A chunks  If more than half of X chunks are assigned to A, then X was written by the author 負面例子呢 ?

21 21 Experiment Result  Unmasking  181 of 189 different-author curves correct  19 of 20 same-author curves correct  95.7% accuracy  Baseline  143 of 189 different-author curves correct  6 of 20 same-author curves correct  71.3% accuracy

22 22 Real World Mystery Cross-Validation Accuracy Iteration The Case of the Bashful Rabbi: Ben Ish Chai Ben Ish Chai 19 世紀有名猶太傳教士, 謊稱自己的文章是別處找到的. Others

23 23 Conclusion  Use accuracy curves to solve author verification problem achieves high accuracy  Unmasking method can be extended to other fields 實驗重做 !


Download ppt "1 Authorship Verification as a One- Class Classification Problem Moshe Koppel Jonathan Schler Bar-Ilan University, Israel International Conference on Machine."

Similar presentations


Ads by Google