Stylometry Project IT691 & CS615 Computer Information Systems Projects December, 2007.

Slides:



Advertisements
Similar presentations
Data Mining Tools Overview Business Intelligence for Managers.
Advertisements

MP IP Strategy public Stateye Training (Getting Started) Please enable author’s notes for a textual description of the slides. A audio file.
Chapter 10, part D. IV. Inferences about differences between two population proportions You will have two population proportions, p1 and p2. The true.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Stylometry System CSIS Stylometry System – Use Cases and Feasibility Study Gregory Shalhoub, Robin Simon, Jayendra Tailor, Ramesh Iyer, Dr. Sandra Westcott.
T EAMS 2 & 4 R ESEARCH D AY P RESENTATION P RESENTERS T EAMS 2 & 4 T HE M ICHAEL L. G ARGANO 9 TH A NNUAL R ESEARCH D AY P RESENTATION P RESENTERS E DYTA.
MLP Lyrical Analysis ● % of Unique Words ● # of Unique Words ● Average Word Length ● # of Lyrics ● # of Characters Input Feature Vectors:
Keystroke Biometrics Study Software Engineering Project Team + DPS Student.
Mouse Movement Biometrics, Pace University, Fall'20071 Mouse Movement Biometrics Fall 2007 Capstone -Team Members Rafael Diaz Michael Lampe Nkem Ajufor.
Long Text Keystroke Biometrics Study Gary Bartolacci, Mary Curtin, Marc Katzenberg, Ngozi Nwana Sung-Hyuk Cha, Charles Tappert (Software Engineering Project.
CS Team 5 Alex Wong Raheel Khan Rumeiz Hasseem Swati Bharati Biometric Authentication System.
Chapter 2: Pattern Recognition
Stylometry Project May 4, 2007 Pace’s Research Day.
Keystroke Biometric Studies Security Research at Pace Keystroke Biometric Drs. Charles Tappert and Allen Stix Seidenberg School of CSIS.
Keystroke Biometric Studies Keystroke Biometric Identification and Authentication on Long-Text Input Book chapter in Behavioral Biometrics for Human Identification.
Mouse Movement Project Customer: Larry Immohr Professor: Dr. Charles Tappert Team: Shinese Noble Anil Ramapanicker Pranav Shah Adam Weiss.
Stylometry System CSIS Stylometry Projects, mostly Fall 2009 Project Seidenberg School of Computer Science and Information Systems.
PCA Channel Student: Fangming JI u Supervisor: Professor Tom Geoden.
Creating PowerPoint Presentations For Presentations submitted to Professor Blank.
Accelerated Computer Technologies Company Overview.
IWBAT compare and order positive and negative numbers.
The 12 screens to follow contain a number of Tool descriptions, some instructions on their use, and in some cases a Task or two. If you dedicate one hour.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
.
Knowledge Systems Lab JN 8/24/2015 A Method for Temporal Hand Gesture Recognition Joshua R. New Knowledge Systems Laboratory Jacksonville State University.
Bret Juliano. Introduction What Documentation is Required? – To use a program – To believe a program – To modify a program The Flow-Chart Curse Self-Documenting.
Advanced File Processing
Gadgets & More…. “Date Range” Gadgets Allows you to choose a specific date, before or after a date or a range of dates using the Workflows calendar.
Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.
Microsoft SharePoint Document Libraries & Management 1.
APATE CS 501 Presentation 2 04/5/2007 Presented by Homan Lee Kelly Li Yan Zhang Will Cheng.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
CS 396 Pattern Recognition Project Language Classifier v1.0 By Paul Troncone, David Keiper, Eugene Schvarts.
StAR web server tutorial for ROC Analysis. ROC Analysis ROC Analysis: This module allows the user to input data for several classifiers to be tested.
OCLC Online Computer Library Center Kathy Kie December 2007 OCLC Cataloging & Metadata Services an introduction.
Keystroke Biometric System Client: Dr. Mary Villani Instructor: Dr. Charles Tappert Team 4 Members: Michael Wuench ; Mingfei Bi ; Evelin Urbaez ; Shaji.
A Web Based Workorder Management System for California Schools.
Created by Rachel Murphy and Lori Christiansen. How to Create and Manage Key Features of Outlook 2007 Outlook 2007 Overview Create and Manage Folders.
Obtaining Data for Face Recognition from the web By Tal blum Advisor: Henry Schneiderman.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
NoteSearch - Find what you’re looking for. Prototype Team B.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
ITCS373: Internet Technology Lecture 5: More HTML.
1 ADVANCED MICROSOFT WORD Lesson 14 – Editing in Workgroups Microsoft Office 2003: Advanced.
Quiz. Quiz One Type your name, ID and address in the.asm file. MUST DO or ZERO SCORE. Change q1 to yourStudentID_q1. You were working as a programming.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
MICROSOFT WORD TRAINING Lesson 4. Lesson 4: Formatting Paragraphs and Working with Styles When you type information into Microsoft Word, each time you.
—————————— CACI Products Company - ——————————————————— COMNET III —————————————— 1-1 Day 1 - COMNET Program Operation, Network Topology.
MyAPNIC Project Update Bangkok, 7 March Overview Project objective Project history What’s new in MyAPNIC prototype v.2 5 service area Demo What.
Upgrading to Secure FTP 6/28/2012. Agenda Purpose and New Features Overview of Changes using Filezilla Quick Demo, Switchover Logistics and Contact Info.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Integrate, check and share documents Module 3.3. Integrate, check and share documents Module 3.3.
IT533 Lectures ASP.NET AJAX.
Text Annotation By: Harika kode Bala S Divakaruni.
1 PDMLink Application - User Features & Functions Module 6: Search Capabilities.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Big Data Processing of School Shooting Archives
Indent markers In some cases, you may want to have more control over indents. Word provides indent markers that allow you to indent paragraphs to.
The Three R’s of Office 2013 and Office 365
Discrete Convolution Demo
Field Mapper Julian Ramirez February 5, 2015.
Guide To UNIX Using Linux Third Edition
Keystroke Biometric System
Geoprocessing Sample Tools for Lidar Data Management
VISUAL COMMUNICATION USING ADOBE PHOTOSHOP CREATIVE SUITE 5
Presentation transcript:

Stylometry Project IT691 & CS615 Computer Information Systems Projects December, 2007

Team Members Geraldine McCabe: Team Leader Huriya Manzar: Programmer Melissa Connors: Programmer Kristina Calix: Implementer De Havaland Levy: Quality Assurance

Overview This program can be used by any researcher attempting to identify the authorship of text messages

Overview of Existing Program A pattern recognition system to identify the author of arbitrary using Stylometry features Existing C# program used raw keystroke data and converted into simple text files Performs feature extraction for statistical analysis, followed by classification using K- nearest neighbor

Program Modifications Collected larger data set of plain text samples for improved accuracy of testing, 10 samples from each of 12 different authors averaging 150 words Keystroke features were removed from existing program and new features added to provide a total of 55 stylistic features for extraction.

Modifications Cont. Demographics for each author was added as per client’s request Reset option was added to allow for single input of demo info for multiple samples from each individual author. Feature vector data was normalized in the range 0-1 and formatted to provide a CSV file.

Modifications Cont. GUI was enhanced to eliminate unnecessary menu options & provide relevant options for new modifications

Demonstration Plaintext samples Create Base Data Set Normalize Base Data Set Output normalized data as CSV/Excel file Compare unknown author

Future Work Add additional features for per client’s requests : Since formatting plays a big part in Stylometry. features such as indentations, number of blank lines between paragraphs, number of blank lines between the last sentence and the closing, number of spaces after periods (some people type 1 space, some people type 2 spaces), could be added Grammatical features: For example, stylometry experts have noticed that women tend to use adverbs more than men. Identify gender based on stylistic & linguistic habits

Questions Contact for more information or visit /team2/index2.htm