Strategies for Cleaning Organizational Emails with an Application to Enron Email Dataset Yingjie Zhou, Research Assistant, RPI Mark Goldberg, Professor,

Slides:



Advertisements
Similar presentations
ACDV B50 Joyce Kirst. Content Consider whether what you have to say is best said through Consider whether the person you are writing to needs to.
Advertisements

.  The sender and recipient(s) of an message do not have to be online at the same time. When one person sends a message, it is stored on an.
Coursework Task By Miles Fajembola 10E. Send An Here is the I sent to wacky mountain bikes.
Standard Grade Computing Electronic Communication.
Staff Computer Training Exchange 2003: More User Friendly Vicki Hecht Cherry Delaney ITaP Luncheon October 14, 2003.
COS 420 DAY 25. Agenda Assignment 5 posted Chap Due May 4 Final exam will be take home and handed out May 4 and Due May 10 Today we will discuss.
Chapter 30 Electronic Mail Representation & Transfer
HUNTINGTON BEACH PUBLIC LIBRARY Basics. What is ? short for electronic mail send & receive messages over the internet.
By Laura Trawin.
Using Microsoft Outlook: Basics. Objectives Guided Tour of Outlook –Identification –Views Basics –Contacts –Folders –Web Access Q&A.
» Explain the way that electronic mail ( ) works » Configure an client » Identify message components » Create and send messages.
1 © 2001, Cisco Systems, Inc. All rights reserved. Voice Connector Features Voic Interoperability – 4.0(5) Voice Connector features Rahul Singh.
COMPUTER TECHNOLOGY Electronic Mail Advantages of Using Less intrusive than a phone call Cheaper and faster than a letter Less hassle than a.
OCR Nationals – Unit 1 AO2 (Part 2) – s. Overview of AO2 (Part 2) To select and use tools and facilities to download files/information and to send.
Data Communications and Computer Networks Chapter 2 CS 3830 Lecture 9
Intro to Computer Networks Bob Bradley The University of Tennessee at Martin.
Review: –How do we address “a network end-point”? –What services are provided by the Internet? –What is the network logical topology observed by a network.
This presentation will be all about s, etiquette and software. I will be going through each one of these individually and thoroughly step.
Copyright © 2005 janusNET Pty Ltd UNCLASSIFIED Official information in – managing the risk of leakage ● Reduce risk via protective markings ● Simplify.
1 Write here 1) KEEP S SHORT. 2 Write here 1) KEEP S SHORT Don’t write a novel. Consider using an attachment for longer, detailed information.
1 Using Messages sent from machine to machine and stored for later reading. You will use a client to read –Type mail or pine in UNIX to read.
Chapter 4 – Slide 1 Effective Communication for Colleges, 10 th ed., by Brantley & Miller, 2005© Technology and Electronic Communication.
Examples – RFC 5322 MIME messages multipart/alternative Human Readable Text To: "External, User" From: "Test, User1" Subject: I-D ACTION:draft-ietf-mailext-pipeline-01.txt.
evidence By Sam Phillips. evidence Sending and replying.
What is and How Does it Work?  Electronic mail ( ) is the most popular use of the Internet. It is a fast and inexpensive way of sending messages.
By: Star Duncan & Hannah Cole Computer Apps 4 th Period.
Presented to AIIM William Penn Chapter Meeting 5/13/08.
Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.
1 Electronic mail security Ola Flygt Växjö University, Sweden
Using . Creating and Sending Messages The Inbox view serves as Outlook’s interface Click the Inbox icon in the Outlook Bar or Folder List.
Evidence Including tools and etiquette.
1 Chinese . 2 Introduction  Support SMTP/POP3/IMAP4  On Unix platform  Provide Webmail –Functions: On line registration On line sending and receiving.
Teach Yourself Windows 95 Module 4: Using Microsoft Exchange for Faxes and .
EVON TAN KA VUN THECLA JOSEPH NOR FAEEZA ISMALI JESSICCA TOKIROI.
Basic Features and Options Accessing  Means of communicating electronically via the Internet.  Used by individuals, businesses,
April 5, 2004 Prof. Paul Lin 1 CPET 355 Data Communications & Networking 7. The Application Layer: Paul I-Hai Lin, Professor Electrical and Computer.
CAN SPAM and Your Marketing Best Practices for Senders By Lars Helgeson Cooler .
Preparing s Using Etiquette Lesson A4-3.
Preparing s Using Etiquette. Learning Objectives Define . List the parts of an and an header. List rules for etiquette.
Chapter 18 Digital Communication: , Instant Messages, Blogs, and Wikis Strategies for Technical Communication in the Workplace Laura J. Gurak John.
CS 3830 Day 9 Introduction 1-1. Announcements r Quiz #2 this Friday r Demo prog1 and prog2 together starting this Wednesday 2: Application Layer 2.
ViciDocs Safe Creating Info repositories from documents.
1 E- Mail. 2 Electronic Mail ( ) is simply a means of sending messages via computer Business is using more and fax To access you must.
Teach Yourself Windows 98 Module 5: Working with and Using Newsgroups.
Microsoft Outlook 2010 Instructor: Julie Thorngren
Living Online Lesson 3 Using the Internet IC3 Basics Internet and Computing Core Certification Ambrose, Bergerud, Buscge, Morrison, Wells-Pusins.
Learning Intentions: To understand what is required to achieve a Pass, Merit or Distinction for Task 3.
Enron datasets LING 575 Fei Xia 01/04/2011.
Messages 1. Outline Fields of an Subject line One point per The expected response Be a good correspondent Final tips 2.
-to-Blog How It Works. This Is The « -to-blog» System Architecture.
and the business environment Explain what is and how is it used in a business environment A02 .
Introduction to Your OUSD and other district technology applications Leah Jensen, Instructional Technologist (510) (office), (510)
Do Now: Describe the steps used to access the comments tool in MS Word. ( review your notes for the answer) Ex: Step 1. Select the text or item you want.
CS440 Computer Networks 1 Neil Tang 12/01/2008.
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Electronic Mail.
Take a Second Look Before You Send a Message. Do Not Default to "Reply All”
1. Keep it simple. Don’t write a book. Keep your s brief. ing to apologize or admit guilt can be risky. 2. Keep it clean. Grammatical errors,
Sending effective and professional s . Session aims and objectives Lesson Aims and Objectives send s that are fit for purpose and audience.
Spring 2006 CPE : Application Layer_ 1 Special Topics in Computer Engineering Application layer: Some of these Slides are Based on Slides.
Using Using Computers Safely, Effectively and Responsibly.
© MMII JW RyderCS 428 Computer Networks1 Electronic Mail  822, SMTP, MIME, POP  Most widely used application service  Sometimes only way a person ever.
Internet Business Associate v2.0
is short for electronic mail!
Huntington Beach Public Library
Electronic Mail Computer Technology.
August 17, 2015 J. Boles, J.Burnias and M.Garcia Office 2013
(Includes setup) FAQ ON DOCUMENTS (Includes setup)
Sending an with attachments
Basics HURY DEPARTMENT OF COMPUTER SCIENCE M.TEJASWINI.
(Includes setup) FAQ ON DOCUMENTS (Includes setup)
Presentation transcript:

Strategies for Cleaning Organizational s with an Application to Enron Dataset Yingjie Zhou, Research Assistant, RPI Mark Goldberg, Professor, RPI Malik Magdon-Ismail, Associate Professor, RPI William A. Wallace, Professor, RPI Supported by the NSF Grants # , # , # , # , and by the ONR Grant # N

6/8/2007NAACSOS Outline Introduction Properties of Organizational s Difficulties in Cleaning Organizational s Procedures of Cleaning Organizational s Introduction to Enron Dataset Application of Cleaning Procedures to Enron Dataset Results Conclusions and Future Work

6/8/2007NAACSOS Introduction s Organizational s Inter-organizational s Intra-organizational s The features of organizational data make it potential for various studies data has its own problems and is noisy

6/8/2007NAACSOS Properties of Organizational s s are formatted, and the format is usually defined and followed. s are normally stored in a server and can be easily collected. s are unobtrusive. s are time stamped. In addition, The senders and recipients of the s are employees of the organization. Each employee is normally assigned one or more unique addresses within the organizational domain.

6/8/2007NAACSOS Difficulties in Cleaning Organizational s Multiple addresses, names, or IDs exist for the same person. Duplicate s exist. The content of the is difficult to extract.

6/8/2007NAACSOS Procedures of Cleaning Organizational s Map aliases to employees Parse last name, first name, and ID in headers Raw Formats Extracted Formats Employee 1 Raw Formats Extracted Formats Employee 2 Raw Formats Extracted Formats Employee N …… Organizational Dataset Generalized Formats

6/8/2007NAACSOS Procedures of Cleaning Organizational s (Cont’d) Remove duplicate s content + date + recipients Consolidate date and time Convert to machine time Extract Content Signatures Features of parent message Greetings and names Organizational Dataset Generalized Formats Unique Message Dataset Remove Duplicates Employee Dataset Cleaned Employee Dataset Date & Time Consolidation Content Extraction

6/8/2007NAACSOS Introduction to Enron Dataset Federal Energy Regulatory Commission (FERC) posted the Enron dataset on the web in May of ,446 s Professor Leslie Kaelbling from MIT purchased the dataset SRI - integrity and security Professor William W. Cohen - CMU dataset 150 user folders 517,431 s 400Mb

6/8/2007NAACSOS Introduction to Enron Dataset (Cont’d) Sender Receiver/Receivers Date + Time Subject Body ?Forwarded or replied text ?Signature  Attachment Message-ID: Date: Thu, 30 Nov :50: (PST) From: To: Subject: Self Evaluation - Short Version Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-From: Eugenio Perez X-To: Sally Beck X-cc: X-bcc: X-Folder: \Sally_Beck_Nov2001\Notes Folders\All documents X-Origin: BECK-S X-FileName: sbeck.nsf Please let me know if you need anything else. Regards, Eugenio

6/8/2007NAACSOS Introduction to Enron Dataset (Cont’d) From, To, Cc, Bcc X-From, X-To, X-cc, X-bcc Example1: davis-d\deleted_items\101 From: To: X-From: Davis, Mark Dana X-To: Davis, Dana Example2: cash-m\sent_items\505 From: To: legal X-From: Cash, Michelle X-To: Taylor, Mark E (Legal) Doesn’t make sense! Wrong!

6/8/2007NAACSOS Application of Cleaning Procedures to Enron Dataset phillip k allen phillip allen allen, phillip allen, phillip k. phillip k allen allen, phillip allen, phillip k. “phillip allen” phillip phillip allen “allen, phillip k"

6/8/2007NAACSOS Application of Cleaning Procedures to Enron Dataset (Cont’d) 150 folders => 156 employees 517,431 s => 252,830 unique s All s are from the same time zone, and s with wrong dates are discarded 22,241 s among 156 employees from Nov – Jun “Original Message”, “Forwarded by”, “Thanks”, “Regards”, etc. Signatures Susan S. Bailey Senior Legal Specialist Enron Wholesale Services Legal Department 1400 Smith Street, Suite 3803A Houston, Texas phone: (713) fax: (713)

6/8/2007NAACSOS Conclusions and Future Work Conclusions In general, the procedures are practical and served well in cleaning the Enron s. Future Work Name disambiguation Misdirected detection Broadcast s removal Various analysis

6/8/2007NAACSOS Thank you! Any Comments?