Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stylometry Project IT691 & CS615 Computer Information Systems Projects December, 2007.

Similar presentations


Presentation on theme: "Stylometry Project IT691 & CS615 Computer Information Systems Projects December, 2007."— Presentation transcript:

1 Stylometry Project IT691 & CS615 Computer Information Systems Projects December, 2007

2 Team Members Geraldine McCabe: Team Leader Huriya Manzar: Programmer Melissa Connors: Programmer Kristina Calix: Implementer De Havaland Levy: Quality Assurance

3 Overview This program can be used by any researcher attempting to identify the authorship of email text messages

4 Overview of Existing Program A pattern recognition system to identify the author of arbitrary email using Stylometry features Existing C# program used raw keystroke data and converted into simple text files Performs feature extraction for statistical analysis, followed by classification using K- nearest neighbor

5 Program Modifications Collected larger data set of plain text email samples for improved accuracy of testing, 10 samples from each of 12 different authors averaging 150 words Keystroke features were removed from existing program and new features added to provide a total of 55 stylistic features for extraction.

6 Modifications Cont. Demographics for each author was added as per client’s request Reset option was added to allow for single input of demo info for multiple samples from each individual author. Feature vector data was normalized in the range 0-1 and formatted to provide a CSV file.

7 Modifications Cont. GUI was enhanced to eliminate unnecessary menu options & provide relevant options for new modifications

8 Demonstration Plaintext email samples Create Base Data Set Normalize Base Data Set Output normalized data as CSV/Excel file Compare unknown author

9 Future Work Add additional features for per client’s requests : Since formatting plays a big part in Stylometry. features such as indentations, number of blank lines between paragraphs, number of blank lines between the last sentence and the closing, number of spaces after periods (some people type 1 space, some people type 2 spaces), could be added Grammatical features: For example, stylometry experts have noticed that women tend to use adverbs more than men. Identify gender based on stylistic & linguistic habits

10 Questions Contact gm60518w@pace.edu for more information or visit http://utopia.csis.pace.edu/cs691/2007- 2008/team2/index2.htmgm60518w@pace.edu http://utopia.csis.pace.edu/cs691/2007- 2008/team2/index2.htm


Download ppt "Stylometry Project IT691 & CS615 Computer Information Systems Projects December, 2007."

Similar presentations


Ads by Google