Beyond Keyword Filtering for Message and Conversation Detection David Skillicorn School of Computing, Queen’s University Math and CS, Royal Military College.

Slides:



Advertisements
Similar presentations
Selecting Suspicious Messages in Intercepted Communication David Skillicorn School of Computing, Queens University Research in Information Security, Kingston.
Advertisements

Smart Mobs Key: AWL to Study, Low-frequency Vocabulary What is a Smart Mob?
Finding Unusual Correlation Using Matrix Decompositions David Skillicorn School of Computing, Queen’s University Math and CS, Royal Military College
General Linear Model With correlated error terms  =  2 V ≠  2 I.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Dimensionality Reduction PCA -- SVD
INTERNET SAFETY.
Maths for Computer Graphics
Tagging Systems Mustafa Kilavuz. Tags A tag is a keyword added to an internet resource (web page, image, video) by users without relying on a controlled.
بسم الله الرحمن الرحيم NETWORK SECURITY Done By: Saad Al-Shahrani Saeed Al-Smazarkah May 2006.
Hardware-based Load Generation for Testing Servers Lorenzo Orecchia Madhur Tulsiani CS 252 Spring 2006 Final Project Presentation May 1, 2006.
The CSI Stick: The Threat is Real Jennifer Wilson HTM 304.
Chapter 16 Electronic and Information Warfare. Basics Electronic Attack Deception Soft Kill/Hard Kill Electronic protection Electronic Support.
Traffic Matrix Estimation: Existing Techniques and New Directions A. Medina (Sprint Labs, Boston University), N. Taft (Sprint Labs), K. Salamatian (University.
Network Security. Network security starts from authenticating any user. Once authenticated, firewall enforces access policies such as what services are.
Chapter 14 The Second Component: The Database.
Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.
1 Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as a i,j and elements of.
SM3121 Software Technology Mark Green School of Creative Media.
Intro to Matrices Don’t be scared….
Remote Access. What is the Remote Access Domain? remote access: the ability for an organization’s users to access its non-public computing resources from.
With Alex Conger – President of Webmajik.com FrontPage 2002 Level I (Intro & Training) FrontPage 2002 Level I (Intro & Training)
Lecture 22 Page 1 Advanced Network Security Other Types of DDoS Attacks Advanced Network Security Peter Reiher August, 2014.
January 27, 2002 ECEN5033 University of Colorado -- Class Testing 1 Specifying interactions Remainder of slides assume Operations defined by a class are.
Sublinear time algorithms Ronitt Rubinfeld Computer Science and Artificial Intelligence Laboratory (CSAIL) Electrical Engineering and Computer Science.
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
Internet Safety Basics Being responsible -- and safer -- online Visit age-appropriate sites Minimize chatting with strangers. Think critically about.
Lesson 10: Working with Tables and Forms. Learning Objectives After studying this lesson, you will be able to:  Insert a table in a document  Modify,
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Maintaining a Secure Messaging Environment Across , IM, Web and Other Protocols Jim Jessup Regional Manager, Information Risk Management Specialist.
Section 2.2 Echelon Forms Goal: Develop systematic methods for the method of elimination that uses matrices for the solution of linear systems. The methods.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 24, Slide 1 Chapter 24 Paired Samples and Blocks.
Matrix Sparsification. Problem Statement Reduce the number of 1s in a matrix.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Copyright © 2003 OPNET Technologies, Inc. Confidential, not for distribution to third parties. Session 1341: Case Studies of Security Studies of Intrusion.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts.
SINGULAR VALUE DECOMPOSITION (SVD)
McGraw-Hill/Irwin The O’Leary Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Lab 6 Creating and Using Lists and.
Operations with Matrices
Random Dot Product Graphs Ed Scheinerman Applied Mathematics & Statistics Johns Hopkins University IPAM Intelligent Extraction of Information from Graphs.
Copyright © Cengage Learning. All rights reserved. 7 Linear Systems and Matrices.
Social Networks and Surveillance: Evaluating Suspicion by Association Ryan P. Layfield Dr. Bhavani Thuraisingham Dr. Latifur Khan Dr. Murat Kantarcioglu.
Review of Parnas’ Criteria for Decomposing Systems into Modules Zheng Wang, Yuan Zhang Michigan State University 04/19/2002.
Ethical Questions Issues in IT Ethics. What Do You Think? Spammers are just exercising their free speech rights.
Marketing Research Chapter 29. The Marketing Research Process The five steps that a business follows when conducting marketing research are: Defining.
Copyright © Cengage Learning. All rights reserved. 2 SYSTEMS OF LINEAR EQUATIONS AND MATRICES.
The single most important skill for a computer programmer is problem solving Problem solving means the ability to formulate problems, think creatively.
Optimizing Parallel Programming with MPI Michael Chen TJHSST Computer Systems Lab Abstract: With more and more computationally- intense problems.
Stallings, Wireless Communications & Networks, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Spread Spectrum Chapter.
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets 
ENGINEERING 2304 Computer Programming for Engineers ENGR Spring 2015 Week 8.
Data Mining What is to be done before we get to Data Mining?
4.1 Exploring Data: Matrix Operations ©2001 by R. Villar All Rights Reserved.
Modeling and Simulation CS 313
Properties and Applications of Matrices
Matrix Operations Free powerpoints at
With Microsoft FrontPage 2000
Matrices - Addition and Subtraction
Modeling and Simulation CS 313
Matrix Operations Free powerpoints at
Matrix Operations.
Matrix Operations Add and Subtract Matrices Multiply Matrices
Advanced Security Architecture System Engineer Cisco: practice-questions.html.
APPLICATIONS OF MATRICES APPLICATION OF MATRICES IN COMPUTERS Rabab Maqsood (069)
Matrix Operations Free powerpoints at
An Introduction to Privacy and Anonymous Communication
Matrices Elements, Adding and Subtracting
Elementary Row Operations Gaussian Elimination Method
Welcome to Middle School
Welcome to Middle School
Presentation transcript:

Beyond Keyword Filtering for Message and Conversation Detection David Skillicorn School of Computing, Queen’s University Math and CS, Royal Military College

The problem: Pick out the most `interesting’ intercepted messages when conventional markers (sender/receivers etc.) are missing. The solution: Look for correlated use of words that are used with the “wrong” frequency, caused by substitution to evade keyword filtering. The technique: Use singular value decomposition and independent component analysis applied to noun frequency profiles; suspicious related messages appear as outliers. Messages with ordinary word frequencies and lone eccentrics do not show up. So it can be applied to large sets of messages to select the interesting few.

THE PROBLEM

Many governments collect and analyze message traffic (e.g. Echelon) – , file traffic/web, cellphone traffic, radio. There are 3 levels of analysis: 1. Match the content of individual messages against a watch list of words that suggest the message is suspicious. German Federal Intelligence Service: nuclear proliferation (2000 terms), arms trade (1000), terrorism (500), drugs (400), as of 2000 (certainly changed now). Countermeasures: use a speech code (hard in realtime) or use locutions (“the package is ready”). Main benefit: Changes behavior of those who DON’T want their messages intercepted.

2. Look for sets of messages that are connected, that form a conversation, based on some of their properties: sender/receiver identities, time of transmission, specialized word use, etc.. (Social Network Analysis) Countermeasures: conceal the connections between the messages by making sure they share no obvious attributes: * use temporary addresses, stolen cell phones * decouple by using intermediaries * smear time factors e.g. by using web sites In general, hide in the background noise.

3. Look for sets of messages that are connected in more subtle ways because of correlation among their properties. Workable countermeasures are hard to find because: * conversations are about something, so that correlation in their content arises naturally * sensitivity to watch list surveillance alters the way words are used We hypothesize that related messages among a threat group in the context of watch list surveillance will be characterized by correlated word use; but that the words will be used with the “wrong” frequencies. Common words will be used as if they were uncommon; uncommon words will be used as if they were common.

THE DATA

The frequency of words in English (and many other languages) is Zipf – frequent words are very frequent, and frequency drops off very quickly. We restrict our attention to nouns. In English Most common noun – time 3262 nd most common noun – quantum We assume that messages are reduced to a frequency histogram of their nouns (this can be done reliably with a tagger).

A message-frequency matrix has a row corresponding to each message, and a column corresponding to each noun. The ij th entry is the frequency of noun j in message i. The matrix is very sparse. We generate artificial datasets using a Poisson distribution with mean f * 1/j+1, where f models the base frequency. We add 10 extra rows representing the correlated threat messages, using a block of 6 columns, uniformly randomly 0s and 1s, added at columns 301—306.

A message-rank matrix has a row corresponding to each message, and a column corresponding to the rank, in English, of the j th most frequent noun in the message. Message-rank matrices have many fewer columns, which makes them easier and faster to work with (e.g. Enron dataset: 200,000+ `words’ but average number of nouns per message <200). Message-frequency matrices have been extensively studied in IR, but message-rank matrices not at all. Message-rank messages are insensitive to countermeasures such as using words with almost the right frequency.

messages nouns

messages rank of jth noun in message

THE TECHNIQUES

Matrix decompositions. The basic idea: * Treat the dataset as a matrix, A, with n rows and m columns; * Factor A into the product of two matrices, C and F A = C F where C is n x r, F is r x m and r is smaller than m. Think of F as a set of underlying `real’ somethings and C as a way of `mixing’ these somethings together to get the observed attribute values. Choosing r smaller than m forces the decomposition to somehow represent the data more compactly. F A = C

Two matrix decompositions are useful : Singular value decomposition (SVD) – the rows of F are orthogonal axes such that the maximum possible variation in the data lies along the first axis; the maximum of what remains along the second, and so on. The rows of C are coordinates in this space. Independent component analysis (ICA) – the rows of F are statistically independent factors. The rows of C describe how to mix these factors to produce the original data. Strictly speaking, the row of C are not coordinates, but we can plot them to get some idea of structure.

First 3 dimensions – SVD The messages with correlated unusual word usage are marked with red circles

First 3 dimensions – ICA

(Fortunately) both unusual word use and correlated word use are necessary to make such messages detectable. Correlation with proper word frequencies (SVD) So ordinary conversations don’t show up as false positives!!

Correlation with proper word frequencies (ICA)

Uncorrelated with unusual word frequencies (SVD) Conversations about unusual things don’t show up as false positives either!!

Uncorrelated with unusual word frequencies (ICA)

This trick permits a new level of sophistication in connecting related messages into conversations when the usual indicators are not available. It does exactly the right thing – ignoring conversations about ordinary topics, and conversations about unusual topics, but homing in on conversations about unusual topics using inappropriate words. Because the dataset is sparse, SVD takes time linear in the number of messages. The complexity of ICA is less clear but there are direct hardware implementations.

Message-rank matrices are useful because they defend against the countermeasure of rules like “use the word 5 ranks below the one you want to use”. Such rules are easy to apply with access to the internet, for example the site However, this isn’t so easy in real-time communication.

SVD of message- rank matrix has a fan shape. Points are labelled with the length of each message

Same plot with messages labelled by the average rank of the nouns they contain. Length of message and average rank are correlated – partly because of opportunity, but it’s not clear that this the whole story.

Replacing words with those, say, five positions down the list does not show up in the SVD of a message-frequency matrix:

But it’s very clear in the SVD of a message-rank matrix:

We have been applying these techniques to the Enron dataset, which is a good surrogate for intercepted communications: * about 500,000 s * about 1500 people * partially known `command and control’ structure Early results from several groups were presented at the Workshop on Link Analysis, Counterterrorism and Security: also New York Times Week in Review this weekend

?