Download presentation
Presentation is loading. Please wait.
Published byBertha Greer Modified over 9 years ago
1
Social Networks and Surveillance: Evaluating Suspicion by Association Ryan P. Layfield Dr. Bhavani Thuraisingham Dr. Latifur Khan Dr. Murat Kantarcioglu The University of Texas at Dallas {layfield, bxt043000, lkhan, muratk}@utdallas.edu
2
Overview Introduction ► Our Goal ► System Design ► Social Networks ► Threat Detection ► Correlation Analysis The Experiment ► Setup ► Current Results ► Issues ► Future Work
3
Introduction Automated message surveillance is essential to communication monitoring ► Widespread use of electronic communication ► Exponential data growth ► Impossible to sift through all ‘by hand’ Going beyond basic surveillance ► Identifying groups rather than individuals ► Monitoring conversations rather than messages
4
Our Goal Design new techniques and apply existing algorithms to… ► Create a machine-understandable model of existing social networks ► Identify abnormal conversations and behavior ► Monitor a given communications system in real-time ► Continuously learn and adapt to a dynamic environment
5
System Design Three major components: ► Social Network Modeler ► Initial Activity Detector ► Correlated Activity Investigator
6
Social Networks Individuals engaged in suspicious or undesirable behavior rarely act alone We can infer than those associated with a person positively identified as suspicious have a high probability of being either: ► Accomplices (participants in suspicious activity) ► Witnesses (observers of suspicious activity) Making these assumptions, we create a context of association between users of a communication network
7
Social Networks Within our model: ► Every node is a unique user ► Every message creates or strengthens a link between nodes Over time, the network changes ► Frequent communication leads to stronger links ► Intermittent messaging implies weakening social ties The strength of the link implies how strong an association between individuals is From this data, we can theoretically identify ► Hubs ► Groups ► Liaisons
8
Social Networks
9
Threat Detection Every message sent is scrutinized in the interest of identifying suspicious communication ► Keywords analysis ► Prior context (i.e. previous message content) When a detection algorithm yields a strong result, a token is created ► The token is created at the origin and passed to the recipient(s) ► Existing tokens, if any, are cloned instead The result is a web that potentially reflects the dissemination of suspicious information activity
10
Correlation Analysis Future messages with similar suspicious topics are not always identifiable with the same ‘initial’ techniques ► Quick replies ► Pronoun use ► Assumption that recipient is aware of topic If a token is present at the sender when a message is sent: ► Message token is associated with and new message are analyzed ► If analysis yields a strong match, the token is further cloned and passed to recipient
11
The Experiment A rare set of words shared between two or more messages are candidates for keyword analysis, but they are not always easily sifted from ‘noise’ Noise within text-based messages comes in a variety of forms ► Misspelled words ► Unusual word choice ► Incompatible variations of the same language (i.e. British vs. American English) ► Unexpected language However, we do not want to eliminate potential keywords ► Document names ► Terminology specific to a subject ► ‘Buzz’ words
12
The Experiment We proposed an experiment that attempts to eliminate false positives due to noisy data while strengthening and expanding our correlation techniques
13
Setup Tools ► Running word ‘rank’ database ► Implementation of word set theory infrastructure ► JAMA Matrix Library Singular Value Decomposition Our Approach ► Apply SVD noise filtering based on 100 messages ► Analyze word frequency correlation between current message and prior suspicious messages ► Generate a score based on the results
14
Setup Construct a matrix based on the last 100 messages words messages More common Less common
15
Setup Decompose and rebuild U VTVT A Eliminate ‘weak’ singular values
16
Setup Pulled from messages j and k ‘Raw’ total score for word w i Pulled from ‘running’ word database Counts only intersection of words Predefined fixed threshold
17
Current Results Method is not currently accurate Large fluctuations ► Correlation easily swayed by plethora of common words ► Uncommon words not given enough weight
18
Current Results 1000 messages evaluated, first 100 used to seed word ranks.
19
Issues Word frequencies fluctuate wildly during beginning of experiment (0.0 – 10.0+) Extreme cost for current construction methods and computation Filtering context limited to recent global history Affected by large bodies of text
20
Future Work Tap potential of existing matrix for further analysis Adaptive filtering feedback algorithms Speed improvements to accommodate real-time streams Flexible communication platform monitoring Addition of pipe architecture for modular threat detection and correlation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.