Authorship Attribution CS533 – Information Retrieval Systems Metin KOÇ Metin TEKKALMAZ Yiğithan DEDEOĞLU 7 April 2006.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Introduction to Support Vector Machines (SVM)
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
ONLINE ARABIC HANDWRITING RECOGNITION By George Kour Supervised by Dr. Raid Saabne.
An Introduction of Support Vector Machine
CHAPTER 10: Linear Discrimination
An Introduction of Support Vector Machine
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
An Overview of Machine Learning
Statistical Classification Rong Jin. Classification Problems X Input Y Output ? Given input X={x 1, x 2, …, x m } Predict the class label y  Y Y = {-1,1},
Chapter 4: Linear Models for Classification
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Fei Xing1, Ping Guo1,2 and Michael R. Lyu2
Discriminative and generative methods for bags of features
Support Vector Machine
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Active Learning with Support Vector Machines
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Support Vector Machines
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
This week: overview on pattern recognition (related to machine learning)
Efficient Model Selection for Support Vector Machines
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Authorship Attribution By Allison Pollard. What is Authorship Attribution? The way of determining who wrote a text when it is unclear who wrote it. It.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:
An Introduction to Support Vector Machine (SVM)
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
KNN & Naïve Bayes Hongning Wang
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Instance Based Learning
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Support Vector Machines
Basic machine learning background with Python scikit-learn
Overview of Supervised Learning
COSC 4335: Other Classification Techniques
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Support Vector Machines and Kernels
Machine Learning with Clinical Data
Linear Discrimination
Presentation transcript:

Authorship Attribution CS533 – Information Retrieval Systems Metin KOÇ Metin TEKKALMAZ Yiğithan DEDEOĞLU 7 April 2006

CS533 Information Retrieval Systems 2 Outline Overview What is Authorship Attribution? Brief History Where and How to use it? Stylometry Style Markers Classification Methods Naïve Bayes Support Vector Machine k-Nearest Neighbor

7 April 2006CS533 Information Retrieval Systems 3 What is Authorship Attribution? The way of determining who wrote a text when it is unclear who wrote it. It is useful when two or more people claim to have written something or when no one is willing (or able) to stay that (s)he wrote the piece In a typical scenario, a set of documents with known authorship are used for training; the problem is then to identify which of these authors wrote unattributed documents.

7 April 2006CS533 Information Retrieval Systems 4 A Brief History The advent of non-traditional authorship attribution techniques can be traced back to 1887, when Mendenhall first created the idea of counting features such as word length. His work was followed by work from Yule (1938) and Morton(1965) with the use of sentence lengths to judge authorship

7 April 2006CS533 Information Retrieval Systems 5 Where to use it? Authorship Attribution can be used in a broad range of applications To analyze anonymous or disputed documents/books, such as the plays of Shakespeare (shakespeareauthorship.com) Plagiarism detection - it can be used to establish whether claimed authorship is valid.

7 April 2006CS533 Information Retrieval Systems 6 Where to use it? (Cont’d) Criminal Investigation - Ted Kaczynski was targeted as a primary suspect in the Unabomber case, because authorship attribution methods determined that he could have written the Unabomber’s manifesto Forensic investigations - Verifying the authorship of s and newsgroup messages, or identifying the source of a piece of intelligence.

7 April 2006CS533 Information Retrieval Systems 7 Motivation So many publications existed, but no detailed work has been given for Turkish literature Idea Originated from: “Kayıp Yazarın İzi, Elias’ın Gizi” by S. Oğuzertem Our work is going to support his idea?

7 April 2006CS533 Information Retrieval Systems 8 How to do it? When an author writes they use certain words unconsciously. Find some underlying ‘fingerprint’ for an authors style. The fundamental assumption of authorship attribution is that each author has habits in wording that make their writing unique.

7 April 2006CS533 Information Retrieval Systems 9 How to do it? (Cont’d) It is well known that certain writers can be quickly identified by their writing style. Extract features from text that distinguish one author from another Apply some statistical or machine learning technique given training data Showing examples and counterexamples of an author's work

7 April 2006CS533 Information Retrieval Systems 10 How to do it – Problems? Highly interdisciplinary area Expertise in linguistics, statistics, text authentication, literature? Too many style measures to apply? Statistical method – complicated or so simple? Also too many exist in the literature as well

7 April 2006CS533 Information Retrieval Systems 11 How to do it? (Cont’d) Determine style markers. Parse all of the documents and extract the features Combine the results in order to get certain characteristics about the authors Apply each of the statistical/machine learning approaches to assign a given document to the most likely author.

7 April 2006CS533 Information Retrieval Systems 12 Stylometry The science of measuring literary style What are the distinguishing styles? Study the rarest, most striking features of the writer? Study how writers use bread-and-butter words (e.g. "to", "with" etc. in English)?

7 April 2006CS533 Information Retrieval Systems 13 Stylometry "People's unconscious use of everyday words comes out with a certain stamp", David Holmes - stylometrist at the College of New Jersey "Rare words are noticeable words, which someone else might pick up or echo unconsciously. It's much harder for someone to imitate my frequency pattern of 'but' and 'in'.", John Burrows - emeritus English professor of the University of Newcastle in Australia

7 April 2006CS533 Information Retrieval Systems 14 Style Markers in Our Study Frequency of Most Frequent Words Token and Type Lengths Token: All words Type: Unique words For the sentence “I cannot bear to see a bear” 7 tokens, 6 (context-free) types Sentence Lengths Syllable Count in Tokens Syllable Count in Types

7 April 2006CS533 Information Retrieval Systems 15 Style Markers in General Some commonly used style markers Average sentence length Average syllables per word Average word length Distribution of parts of speech Function word usage The Type-Token ratio Word frequencies Vocabulary distributions

7 April 2006CS533 Information Retrieval Systems 16 Test Set

7 April 2006CS533 Information Retrieval Systems 17 Test Set

7 April 2006CS533 Information Retrieval Systems 18 Test Set

7 April 2006CS533 Information Retrieval Systems 19 Test Set

7 April 2006CS533 Information Retrieval Systems 20 Classification Methods How the style markers are used? Several methods exist such as k-NN (k Nearest Neighbor) Bayesian analysis SVM (Support Vector Machines) PCA (Principal Components Analysis) Markovian Models Neural Networks Decision Trees We are planning to use Naïve Bayes SVM K-NN

7 April 2006CS533 Information Retrieval Systems 21 Naïve Bayes Approach In general each style marker is considered to be a feature or a feature set Existing text whose author is known is used for training Several choices are possible to find out the distributions of the feature values in a text with a known author such as Maximum likelihood estimation Bayes Density Estimation Maximization-Estimation etc.

7 April 2006CS533 Information Retrieval Systems 22 Naïve Bayes Approach Values of the features (x) for the unattributed text is found Since the probability densities are known for each author, Bayes formula is used to find the author of the “anonymous” text A * = argmax A i (P(A i |x) = p(x|A i ) P(A i ))

7 April 2006CS533 Information Retrieval Systems 23 An Oversimplified Sample Scenario Assume that There are texts from two authors (two classes) As the style marker only the number of words with 3 characters is used (one feature) Classifier is trained with the text pdf's obtained

7 April 2006CS533 Information Retrieval Systems 24 An Oversimplified Sample Scenario Assume that the unattributed text has 10 words with 3 characters Check whether the author 1 or the author 2 has higher probability of having 10 words with 3 characters The unattributed text is assigned to the author with a higher probability for 10 words with 3 characters

7 April 2006CS533 Information Retrieval Systems 25 Support Vector Machines (SVMs) Supervised learning method for classification and regression Quite popular and successful in Text Categorization (Joachim et al.) Seeks for an hyper plane separating two classes by: Maximizing the margin Minimizing the classification error Solution is obtained using quadratic optimization techniques

7 April 2006CS533 Information Retrieval Systems 26 Support Vector Machines (SVMs) denotes +1 denotes -1 Sample adapted from Andrew Moore’s SVM slides

7 April 2006CS533 Information Retrieval Systems 27 Support Vector Machines (SVMs) denotes +1 denotes -1

7 April 2006CS533 Information Retrieval Systems 28 Support Vector Machines (SVMs) denotes +1 denotes -1

7 April 2006CS533 Information Retrieval Systems 29 Support Vector Machines (SVMs) denotes +1 denotes -1

7 April 2006CS533 Information Retrieval Systems 30 Support Vector Machines (SVMs) denotes +1 denotes -1

7 April 2006CS533 Information Retrieval Systems 31 Support Vector Machines (SVMs) denotes +1 denotes -1

7 April 2006CS533 Information Retrieval Systems 32 Support Vector Machines (SVMs) denotes +1 denotes -1 Margin

7 April 2006CS533 Information Retrieval Systems 33 Support Vector Machines (SVMs) denotes +1 denotes -1 Support Vectors define the hyperplane Maximum margin linear classifier, simplest SVM Support vectors lie on the margin and carry all the relevant information

7 April 2006CS533 Information Retrieval Systems 34 Support Vector Machines (SVMs)

7 April 2006CS533 Information Retrieval Systems 35 Support Vector Machines (SVMs) denotes +1 denotes -1 How to find the hyperplane? x=0

7 April 2006CS533 Information Retrieval Systems 36 Support Vector Machines (SVMs) denotes +1 denotes -1 Move training data into higher dimension with kernel functions x=0

7 April 2006CS533 Information Retrieval Systems 37 Support Vector Machines (SVMs) denotes +1 denotes -1 The hyperplane may not be linear in the original space x=0

7 April 2006CS533 Information Retrieval Systems 38 Support Vector Machines (SVMs) Basis functions are of the form: Common kernel functions: Polynomial Sigmoidal Radial basis

7 April 2006CS533 Information Retrieval Systems 39 Multi-class SVM SVM only works for binary classification, how to handle multi-class (N classes) cases? Create N SVMs SVM 1 learns “Output==1” vs “Output != 1” SVM 2 learns “Output==2” vs “Output != 2” : SVM N learns “Output==N” vs “Output != N” While predicting the output, assign the label of the SVM which puts the input point into furthest positive region

7 April 2006CS533 Information Retrieval Systems 40 SVM Issues Choice of kernel functions Computational complexity of the optimization problem

7 April 2006CS533 Information Retrieval Systems 41 k-Nearest Neighbour Classification Method Key idea: keep all the training instances Given query example, take vote amongst its k neighbours Neighbours are determined by using a distance function

7 April 2006CS533 Information Retrieval Systems 42 k-Nearest Neighbour Classification Method (k=1) (k=4) Probability interpretation: estimate p(y|x) as Sample adapted from Rong Jin’s slides

7 April 2006CS533 Information Retrieval Systems 43 k-Nearest Neighbour Classification Method Advantages: Training is really fast Can learn complex target functions Disadvantages Slow at query time: Efficient data structures are needed to speed up the query

7 April 2006CS533 Information Retrieval Systems 44 How to choose k? Use validation with leave-one-out method For k = 1, 2, …, K Err(k) = 0; 1.Randomly select a training data point and hide its class label 2.Using the remaining data and given k to predict the class label for the left data point 3.Err(k) = Err(k) + 1 if the predicted label is different from the true label Repeat the procedure until all training examples are tested Choose the k whose Err(k) is minimal

7 April 2006CS533 Information Retrieval Systems 45 How to choose k? Use validation with leave-one-out method For k = 1, 2, …, K Err(k) = 0; 1.Randomly select a training data point and hide its class label 2.Using the remaining data and given k to predict the class label for the left data point 3.Err(k) = Err(k) + 1 if the predicted label is different from the true label Repeat the procedure until all training examples are tested Choose the k whose Err(k) is minimal

7 April 2006CS533 Information Retrieval Systems 46 How to choose k? Use validation with leave-one-out method For k = 1, 2, …, K Err(k) = 0; 1.Randomly select a training data point and hide its class label 2.Using the remaining data and given k to predict the class label for the left data point 3.Err(k) = Err(k) + 1 if the predicted label is different from the true label Repeat the procedure until all training examples are tested Choose the k whose Err(k) is minimal (k=1)

7 April 2006CS533 Information Retrieval Systems 47 How to choose k? Use validation with leave-one-out method For k = 1, 2, …, K Err(k) = 0; 1.Randomly select a training data point and hide its class label 2.Using the remaining data and given k to predict the class label for the left data point 3.Err(k) = Err(k) + 1 if the predicted label is different from the true label Repeat the procedure until all training examples are tested Choose the k whose Err(k) is minimal Err(1) = 1

7 April 2006CS533 Information Retrieval Systems 48 How to choose k? Use validation with leave-one-out method For k = 1, 2, …, K Err(k) = 0; 1.Randomly select a training data point and hide its class label 2.Using the remaining data and given k to predict the class label for the left data point 3.Err(k) = Err(k) + 1 if the predicted label is different from the true label Repeat the procedure until all training examples are tested Choose the k whose Err(k) is minimal Err(1) = 1

7 April 2006CS533 Information Retrieval Systems 49 How to choose k? Use validation with leave-one-out method For k = 1, 2, …, K Err(k) = 0; 1.Randomly select a training data point and hide its class label 2.Using the remaining data and given k to predict the class label for the left data point 3.Err(k) = Err(k) + 1 if the predicted label is different from the true label Repeat the procedure until all training examples are tested Choose the k whose Err(k) is minimal Err(1) = 3 Err(2) = 2 Err(3) = 6 k = 2

7 April 2006CS533 Information Retrieval Systems 50 Future Work & Conclusion Preliminary features distributions seem discriminative Will apply classification methods on the feature set Will rank the features’ success rate May come up with new style markers