Download presentation
Presentation is loading. Please wait.
Published byBaldric Parrish Modified over 9 years ago
1
Pixel Visualization of keyword search results in large email databases. Jay Koven Fall 2013
2
Research Overview ● The problem: Both criminal and Civil investigations are being over with with information in the cyber age. ● New techniques are needed to handle the overload ● Visualization of data can provide solutions
3
The Investigative Problem ● Datasets are rapidly growing in size for all types of investigation ○ National Security ○ Criminal ○ Civil ● The Datasets ○ Most investigations focus on communications ○ Emails are the largest portion of these communication ○ Chats, IM, Phone logs and other social communication channels are also becoming important.
4
Related Research ● Jigsaw ○ Open Source Investigative tool kit being developed at Georgia Tech. ○ Focus on entity relationships and time relationships ○ Views are traditional
5
Related Research continued ● Daniel Keim ○ Pixel oriented display visualization ○ Large amounts of data can be viewed at once ○ Alternative display methodologies ○ Personal mailbox analysis
6
Related Research continued ●Other Visual Email analysis Techniques ○ EmailTime SFU Vancouver ■ Plots email relationships overtime by sender or by threads ● Run on Enron dataset ● Not sure why ○ Thread arcs - IBM ■ Traces a single thread using arcs to show trends ● Interactive, highlights individuals, can highlight attributes ● Used to analyze trends ○ Graphs and maps ■ Show relationships but not very useful for Ultra large datasets
7
Related Research continued ●Chris North - Use of Large Displays ○ Not specific to email but useful thoughts ●W. Bradford Paley - Textarc ○ Relationships of words in a concordance ○ Images behind my proposal
8
My proposed research ●Pixel Visualization of Large Email Datasets ○ Search by Keywords ○ Multiple displays of returned email sets ■ Entity - Entity ■ Entity - Keyword ■ Keyword - Time ■ Entity - Time ○ Interaction to Refine Search ■ Add / Remove Keywords ■ Add / Remove Entities ■ Limit time frame ○ Interaction to Drill Down to actual messages ■ By Subject ■ By Message Content
9
Key issues to be solved for investigative visualization of emails ●Relative weights of emails must be calculated against some standard ●Visualizations should minimize the distance of related emails between points to show important clusters around entities, keywords and time.
10
My proposal - “Document Galaxy” ●Basic idea is to treat documents as stars in a circular galaxy ○ Place relevant data points, such as entities, around outside with associated weights. ○ Place documents inside galaxy based on relative “attraction” to outside points. ●Possible to have multiple outside rings to add additional attributes to calculations ●User interacts with outside rings to add / remove / move attraction points. ●User can explore contents of inner points and clusters to derive information about document content. ●Colors of documents can used to show additional attributes
11
Might look something like this
12
What use is this? ●Might make a good lead in tool to add to jigsaw as a lead in to reduce size of document set to be explored ●Separate tool for exploring e-discovery datasets
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.