Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pixel Visualization of keyword search results in large email databases. Jay Koven Fall 2013.

Similar presentations


Presentation on theme: "Pixel Visualization of keyword search results in large email databases. Jay Koven Fall 2013."— Presentation transcript:

1 Pixel Visualization of keyword search results in large email databases. Jay Koven Fall 2013

2 Research Overview ● The problem: Both criminal and Civil investigations are being over with with information in the cyber age. ● New techniques are needed to handle the overload ● Visualization of data can provide solutions

3 The Investigative Problem ● Datasets are rapidly growing in size for all types of investigation ○ National Security ○ Criminal ○ Civil ● The Datasets ○ Most investigations focus on communications ○ Emails are the largest portion of these communication ○ Chats, IM, Phone logs and other social communication channels are also becoming important.

4 Related Research ● Jigsaw ○ Open Source Investigative tool kit being developed at Georgia Tech. ○ Focus on entity relationships and time relationships ○ Views are traditional

5 Related Research continued ● Daniel Keim ○ Pixel oriented display visualization ○ Large amounts of data can be viewed at once ○ Alternative display methodologies ○ Personal mailbox analysis

6 Related Research continued ●Other Visual Email analysis Techniques ○ EmailTime SFU Vancouver ■ Plots email relationships overtime by sender or by threads ● Run on Enron dataset ● Not sure why ○ Thread arcs - IBM ■ Traces a single thread using arcs to show trends ● Interactive, highlights individuals, can highlight attributes ● Used to analyze trends ○ Graphs and maps ■ Show relationships but not very useful for Ultra large datasets

7 Related Research continued ●Chris North - Use of Large Displays ○ Not specific to email but useful thoughts ●W. Bradford Paley - Textarc ○ Relationships of words in a concordance ○ Images behind my proposal

8 My proposed research ●Pixel Visualization of Large Email Datasets ○ Search by Keywords ○ Multiple displays of returned email sets ■ Entity - Entity ■ Entity - Keyword ■ Keyword - Time ■ Entity - Time ○ Interaction to Refine Search ■ Add / Remove Keywords ■ Add / Remove Entities ■ Limit time frame ○ Interaction to Drill Down to actual messages ■ By Subject ■ By Message Content

9 Key issues to be solved for investigative visualization of emails ●Relative weights of emails must be calculated against some standard ●Visualizations should minimize the distance of related emails between points to show important clusters around entities, keywords and time.

10 My proposal - “Document Galaxy” ●Basic idea is to treat documents as stars in a circular galaxy ○ Place relevant data points, such as entities, around outside with associated weights. ○ Place documents inside galaxy based on relative “attraction” to outside points. ●Possible to have multiple outside rings to add additional attributes to calculations ●User interacts with outside rings to add / remove / move attraction points. ●User can explore contents of inner points and clusters to derive information about document content. ●Colors of documents can used to show additional attributes

11 Might look something like this

12 What use is this? ●Might make a good lead in tool to add to jigsaw as a lead in to reduce size of document set to be explored ●Separate tool for exploring e-discovery datasets


Download ppt "Pixel Visualization of keyword search results in large email databases. Jay Koven Fall 2013."

Similar presentations


Ads by Google