Download presentation
Presentation is loading. Please wait.
Published byVivian Webb Modified over 9 years ago
1
Crime Scene Investigation: SMS Spam Data Analysis Ilona Murynets AT&T Security Research Center New York, NY ilona@att.com Roger Piqueras Jover AT&T Security Research Center New York, NY roger.jover@att.com IMC’12, November 14–16, 2012, Boston, Massachusetts, USA.
4
SMS-spam consume network resources for legitimate services otherwise. user pays at a per received message basis exposes smart phone users to viruses fraudulent messaging activities such as phishing, identity theft and fraud This paper: used for SMS spam detection engine
5
Outline three data sets for analysis Data analysis – Account information – Messaging Abuse Response ratio Message timing and time series – The Scene of the Crime Location & targets Mobility – Hardware choice – Voice and IP traffic
6
three data sets: SMS cell M2M tier-1 cellular operator Call Detail Records (CDR) of 9000 SMS spammer & 17000 legitimate (cell & M2M) Mobile Originated (MO):transmitting party Mobile Terminated (MT):receiver Spammers identified & disconnected from the network. SMS : prepaid cell : postpaid M2M: TAC
7
three data sets for analysis
8
Outline three data sets for analysis Data analysis – Account information – Messaging Abuse Response ratio Message timing and time series – The Scene of the Crime Location & targets Mobility – Hardware choice – Voice and IP traffic
9
notes In all the figures throughout the paper, legitimate cellphone users, M2M systems and spammers (SMS) are represented in green, blue and red, respectively.
10
Account information spammers (99.64%) are using pre-paid accounts with unlimited messaging plans SIM cards are constantly switched to circumvent detection schemes discard it once an account is canceled and work with a new one average age is 7 to 11 days (legitimate user is several months to a couple years)
11
Outline three data sets for analysis Data analysis – Account information – Messaging Abuse Response ratio Message timing and time series – The Scene of the Crime Location & targets Mobility – Hardware choice – Voice and IP traffic
12
Messaging Abuse
13
Spammers generate a large load of messages Spammers not only send but also receive more than legitimate customers do – opt-out – trick
14
Messaging Abuse Actual spam messages often attempt to trick the recipient into replying to the message. Despite a small percentage of users will reply, the large amount of accounts targeted in a spam campaign results in many responses.
15
Messaging Abuse
16
legitimate accounts have a small set of recipients. (7 on average) spammers hit a couple of thousand victims legitimate users send multiple messages to a small set of destinations spammers send one message to each victim
17
Outline three data sets for analysis Data analysis – Account information – Messaging Abuse Response ratio Message timing and time series – The Scene of the Crime Location & targets Mobility – Hardware choice – Voice and IP traffic
18
Response ratio
19
legitimate users, messages are sent in response to a previous message in a sequential way. the response ratio close to 1. For spammers the amount of MT SMSs is proportionally very small to the number of transmitted messages. the response ratio is close to 0
20
Outline three data sets for analysis Data analysis – Account information – Messaging Abuse Response ratio Message timing and time series – The Scene of the Crime Location & targets Mobility – Hardware choice – Voice and IP traffic
21
Message timing and time series
23
Inter-SMS intervals for spammers are short less random -- low entropy intervals for legitimate messages are less frequently random--higher entropy. Messaging activities of certain M2M devices are prescheduled.
24
Message timing and time series
25
Outline three data sets for analysis Data analysis – Account information – Messaging Abuse Response ratio Message timing and time series – The Scene of the Crime Location & targets Mobility – Hardware choice – Voice and IP traffic
26
Location & targets
27
California, Sacramento and Orange Los Angeles New York/New Jersey/Long Island Miami Beach Illinois, Michigan North Carolina and Texas.
28
Location & targets
29
The legitimate recipients -- local area (i.e. the area around the subscriber’s home or areas where the subscriber works, used to live or where friends and relatives reside). The spam recipients distributed uniformly over the US population.
30
Location & targets
31
Spammers are characterized by messaging a large number of area codes, always greater than those of cell-phone users and M2M.
32
Location & targets
33
low entropy (legitimate cell) -- contacts repeatedly the same area codes. High entropy (SMS) -- sends messages to a more random set of area codes. Network enabled appliances (M2M) -- a predefined set of cell-phones, the entropy is the lowest.
34
Location & targets
35
linear relation -- SMS spammers Both M2M systems and cell-phone users cluster around the bottom-left area of the graph. M2M send up to 20000 messages to 1 single destination???
36
Location & targets
37
Cellphone users destinations-to-messages ratio and a small set of area codes. A great majority of spammers exhibit the opposite behavior. bottom-right corner (SMS) target very specific geographical regions. ratio of one destination/message. targeted area codes is limited
38
Outline three data sets for analysis Data analysis – Account information – Messaging Abuse Response ratio Message timing and time series – The Scene of the Crime Location & targets Mobility – Hardware choice – Voice and IP traffic
39
mobility
41
Outline three data sets for analysis Data analysis – Account information – Messaging Abuse Response ratio Message timing and time series – The Scene of the Crime Location & targets Mobility – Hardware choice – Voice and IP traffic
42
Hardware choice 1. USB Modem/Aircard A1 2. Feature mobile-phone M1 3. Feature mobile-phone M2 4. USB Modem/Aircard A2 5. USB Modem/Aircard A3
43
Outline three data sets for analysis Data analysis – Account information – Messaging Abuse Response ratio Message timing and time series – The Scene of the Crime Location & targets Mobility – Hardware choice – Voice and IP traffic
44
Voice call
46
IP traffic
47
Voice call
48
IP traffic
49
STOPPING THE CRIME An advanced SMS spam detection algorithm is proposed based on an ensemble of decision trees Over 40 specific features are extracted from messaging patterns and processed through a combination of decision trees
50
CONCLUSIONS pre-paid accounts ---- 7 and 11 days. large number of messages sent to a wide target(also receive a large amount) five different models of hardware large number of phone calls, very short duration main geographical sources in US: Sacramento, Los Angeles-Orange County and Miami Beach certain networked appliances have messaging behavior close to that of a spammer.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.