Authors: Qinglong Wang Amir Yahyavi Bettina Kemme Wenbo He

Slides:



Advertisements
Similar presentations
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Advertisements

Traffic Morphing: An Efficient Defense Against Statistical Traffic Analysis Presented by Yang Gao 11/2/2011 Charles V. Wright MIT Lincoln Laboratory Scott.
Exploring timing based side channel attacks against i CCMP Suman Jana, Sneha K. Kasera University of Utah Introduction
Ensemble Learning: An Introduction
Authors: Thomas Ristenpart, et at.
Ensemble Learning (2), Tree and Forest
Signatures As Threats to Privacy Brian Neil Levine Assistant Professor Dept. of Computer Science UMass Amherst.
1 Secure Cooperative MIMO Communications Under Active Compromised Nodes Liang Hong, McKenzie McNeal III, Wei Chen College of Engineering, Technology, and.
Presentation on Osi & TCP/IP MODEL
Computer Concepts 2014 Chapter 5 Local Area Networks.
MANETS Justin Champion Room C203, Beacon Building Tel 3292,
Network Security. 2 SECURITY REQUIREMENTS Privacy (Confidentiality) Data only be accessible by authorized parties Authenticity A host or service be able.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Rushing Attacks and Defense in Wireless Ad Hoc Network Routing Protocols ► Acts as denial of service by disrupting the flow of data between a source and.
Understanding Computer Viruses: What They Can Do, Why People Write Them and How to Defend Against Them Computer Hardware and Software Maintenance.
CHAPTER 9 Sniffing.
Bradley Cowie Supervised by Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University DATA CLASSIFICATION FOR CLASSIFIER.
Performance of Adaptive Beam Nulling in Multihop Ad Hoc Networks Under Jamming Suman Bhunia, Vahid Behzadan, Paulo Alexandre Regis, Shamik Sengupta.
Intrusion Detection Systems Paper written detailing importance of audit data in detecting misuse + user behavior 1984-SRI int’l develop method of.
Classification Ensemble Methods 1
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Xiangnan Kong,Philip S. Yu An Ensemble-based Approach to Fast Classification of Multi-label Data Streams Dept. of Computer Science University of Illinois.
SESSION HIJACKING It is a method of taking over a secure/unsecure Web user session by secretly obtaining the session ID and masquerading as an authorized.
Introduction to Machine Learning, its potential usage in network area,
Ensemble Classifiers.
Machine Learning: Ensemble Methods
When CSI Meets Public WiFi: Inferring Your Mobile Phone Password via WiFi Signals Warren Yeu When CSI Meets Public Wifi.
BUILD SECURE PRODUCTS AND SERVICES
When CSI Meets Public WiFi: Inferring Your Mobile Phone Password via WiFi Signals Adekemi Adedokun May 2, 2017.
Authors: Jiang Xie, Ian F. Akyildiz
Free for All! Assessing User Data Exposure to Advertising Libraries on Android Campbell Foskin.
Instructor Materials Chapter 6 Building a Home Network
POLYGRAPH: Automatically Generating Signatures for Polymorphic Worms
Honeypot in Mobile Network Security
THE OSI MODEL By: Omari Dasent.
Sampling Distributions and Estimation
Panagiotis Demestichas
Outline Introduction Characteristics of intrusion detection systems
Location Cloaking for Location Safety Protection of Ad Hoc Networks
Understanding the OSI Reference Model
Firewalls.
Cloud Computing By P.Mahesh
Unit 27: Network Operating Systems
De-anonymizing the Internet Using Unreliable IDs By Yinglian Xie, Fang Yu, and Martín Abadi Presented by Peng Cheng 03/22/2017.
Internet of Things Vulnerabilities
Data Mining Practical Machine Learning Tools and Techniques
Home Internet Vulnerabilities
Provision of Multimedia Services in based Networks
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Ensembles.
Ensemble learning.
Ensemble learning Reminder - Bagging of Trees Random Forest
Identifying Slow HTTP DoS/DDoS Attacks against Web Servers DEPARTMENT ANDDepartment of Computer Science & Information SPECIALIZATIONTechnology, University.
ONLINE SECURE DATA SERVICE
Parametric Methods Berlin Chen, 2005 References:
Protocols.
MIS2502: Data Analytics Classification Using Decision Trees
Sofia Pediaditaki and Mahesh Marina University of Edinburgh
WJEC GCSE Computer Science
Yining ZHAO Computer Network Information Center,
Modeling IDS using hybrid intelligent systems
When Machine Learning Meets Security – Secure ML or Use ML to Secure sth.? ECE 693.
LM 5. Wireless Network Security
Protocols.
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
TRANCO: A Research-Oriented Top Sites Ranking Hardened Against Manipulation By Prudhvi raju G id:
Presentation transcript:

I KNOW WHAT YOU DID ON YOUR SMARTPHONE:INFERRING APP USAGE OVER ENCRYPTED DATA TRAFFIC Authors: Qinglong Wang Amir Yahyavi Bettina Kemme Wenbo He Presented By: -Sai Lekha

Agenda Experiments Defense Mechanism Conclusion Introduction Related Work Design of Architecture Adversary Model Wireless Traffic Collection Experiments Defense Mechanism Conclusion

Introduction Currently 167.9 million people in the U.S. own smart phones and 42% own tablets and the numbers are increasing. Most of internet traffic is generated by apps on smart phones. These applications provide native support for a wide range of services and offer a faster and simpler interface than their web-based counterparts. 74.1% of user uses FACEBOOK for wide range of services like browsing, chatting, social networks, streaming audio & video, etc.,

Devices rely on wireless connectivity, they can be highly susceptible to security and privacy risks. Networks use encryption technologies such as WEP, WPA, and WPA2 to prevent unauthorized access to the network traffic. Traffic analysis can infer online activity performed by employing packet level statistical information to find identifiable patterns of traffic.

What data can they get access to? Knowing what kind of activity a user might perform is useful for all kinds of malicious behaviour. But knowing the specific apps used is an even more serious privacy & security concern. For privacy, in its simplest form, an attacker simply aims to monitor a victim’s activities. Some apps such as health monitoring apps and dating apps can be particularly privacy sensitive.

Possible threats Android Virus Detectors (AVDs) can be nullified by the android system during engine update, leaving the device exposing to severe threats. Home security systems controlled by smart phones might leak information about the presence of the owner. Companies may track customer’s movements, behaviour and their credit card usage. An insurance company might collect information about users using a specific health monitoring app.

What the papers talks about.. It is possible to determine the particular apps a victim is using even if encryption is in place and the attacker has no access to the encryption keys. Author of this paper performed this attack using packet-level traffic analysis in which he has used side-channel information leaks to identify specific patterns in packets regardless of whether encryption is used or not. Traffic generated by an individual was quite unique. Traffic from different apps have a good potential to be distinguishable from each other.

Side-Channel Information Leaks Side-channel information leakage comes from information that mobile application developers don't realize that it is being cached, logged or stored. Either due to their application or by the operating system of the mobile device.

Random forest Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks. They operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

Related works By using patterns in the encrypted traffic, one can infer private user activities such as web browsing, identification and behaviours . The features of encrypted traffic used for privacy attacks are often referred to as side-channel information leaks . Mobile devices have to rely on wireless communications making them particularly vulnerable to wireless traffic analysis attacks. 3G side-channel information has been used to reliably identify smart phone usage.

Inferred behavioural patterns of users depending on techniques that use the statistical features contained in the encrypted traffic, to identify users’ identity or their behaviours. Analyzed the traffic generated by 3 apps to identifying the traffics generated for each specific apps based on the source and destination IP addresses. Developed a wireless MAC-based approach to infer laptop users’ online activities regardless of the encryption techniques used in the Wi-Fi network. Exploited RF for evaluating the importance of extracted features.

DESIGN OF ARCHITECTURE Adversary Model Wireless Traffic Collection Classification

Adversary Model The attacker sniffs the encrypted traffic on the same WLAN channel. In order to perform the attack against a known user, the author assumed that they have the user device’s MAC address. Wireless localization techniques can also determine location of the user and identify him.

Wireless Traffic Collection The author collected data from different WPA2-encrypted environments like at home or campus. The wireless card was put in monitor mode to collect all the wireless traffic between hosts and the access point. A popular sniffing tool like was used to collect the traffic and perform network protocol analysis. In order to perform the classification analysis the attacker filters the traffic collected according to the target’s MAC address

App Selection Selected some of the popular apps covering a wide range of different app categories including browsing, games, multimedia, online chat, and social networks. Showed that the attack can effectively detect a large part of current application activity. Focused on detecting individual app use and not whether the user uses any kind of app within a certain category. Different apps in the same category can have very different traffic patterns. For any other apps out of the scope of our consideration, the system should ideally indicate that it is an unknown app.

Data Set and Feature Extraction Each app was run for sufficient monitoring time and traffic information was collected which was then fed into machine learning approach. This analysis learns each app’s traffic pattern, signature, in order to maximize the accuracy of the detection. This sequence of traffic flows can be represented by a burst. Each burst collected is comprised of WPA2-encrypted frames. The sample were further pre-processed to obtain additional statistical features hidden in the original logs: frame arrival time, frame size, and direction.

Then calculate different distributions of frame sizes from both sending and receiving transmissions. They also calculate the inter-arrival time between sequential frames for both sending and receiving traffic. Compute the average inter-arrival time for different distributions, and the average and deviation of overall inter-arrival time of this sample. In total, authors have twenty features, ten of which are for transmitting(Tx) traffic and the other ten are for receiving(Rx) traffic.

Classification Employed random forests as the classification method to build a forest of uncorrelated decision trees for classification. Properties that make RF useful for classification task: RF is an ensemble learning method which can handle situations where substantial noise exists in the dataset . Both variance and bias are somewhat mitigated by using RF. RF is capable of providing evaluation of feature importance. RF provides an internal unbiased out-of-bag estimate of the classification performance on the fly. RF is faster than both bagging and boosting, especially when using parallelization

EXPERIMENTS Analyzed the influence of the many parameters in our configuration. Looked how feature selection and importance as well as the number of estimators can influence performance. Looked at the impact of data set pre-processing parameters. Analyzed how services could be correctly determined whether they are accessed through an app or through a web-browser. Analyzed the influence of noise, that is what happens when multiple apps are running simultaneously or there exist unknown apps.

Classification Performance Analyzed the classification performance under different parameter configurations. 1) Feature Importance: Not all these features are equally important regarding their influence on the classification performance. Insignificant features have importance values lower than 0:05. Eliminating them might not lead to significant performance loss. By using RF and generating many decision trees, we can exploit all the features when needed, and at the same time avoid over fitting for those applications where the features are not relevant

2)Influence of the Number of Estimators: The number of estimators(Ne) used when building the RF can influence the classification performance Both the strength of each independent estimator and the correlation between any two estimators are closely related to Ne. The strength and correlation together influence the generalization error, which has been proved to converge to a limit as Ne increase

3)Influence of Monitoring Time: Explored the influence of monitoring time on the classification performance. Investigated how long we have to observe the app use before we can achieve a high classification accuracy. An attacker might have to adjust the monitoring time depending on the applications he wants to detect. RF allows for an effective online app usage attack.

4) Influence of Window Size: Divided the original traffic into samples using a sliding window of size W. A larger window size means each sample will contain more frames within itself, and therefore has a higher chance of covering frames with various sizes. This might lead to some (burst) features to be less distinctive. Meanwhile, a smaller window narrows the observation to a few frames, possibly leaving the classification more vulnerable to noise. Result implied that when the traffic is aggregated to larger samples, the averaged statistical features are less affected, and as a result the statistical distribution of frames is closer to the real average values.

Implies that taking W = 30can be considered as sufficient to achieve high classification accuracy. Classification probability distributions for Boom Beach with varying window sizes.

Browser vs App Analyzed whether the network traffic generated by a service used through a browser is similar to the one generated by using the corresponding app.

Influence of Noise App traffic can be contaminated by different types of noise. A User can use several apps at the same time. Traffic might contain flows generated by apps that are not included in the training set. An effective and efficient inference method should be capable of identifying that a given traffic pattern does not belong to any of the apps under consideration. Different samples should be categorized by the classifier to belong to a number of different known apps.

Not possible or desirable to build a classifier that effectively classify all these cases. There may be too many combinations of concurrent apps when the number of known apps are large. New apps are appearing continuously, and older apps are becoming unpopular rendering them unimportant. Any given attacker is likely only interested in a small subset of apps. The tested data assigned to a wide range of apps, the attacker can derive that there are unknown apps or noise caused by concurrent apps.

Detection of Unknown and Concurrent Apps.

DEFENSE MECHANISMS Research on defence mechanisms against side-channel traffic analysis techniques is limited. Methods such as traffic padding and traffic morphing can battle these attacks but they have high overhead problems. One possible solution is to develop more intelligent scheduling algorithms for message transmission. Second, the detection of several concurrent apps is not as effective. Thus, by running multiple apps simultaneously, or combining their traffic with other apps more noise can be added to traffic.

Conclusion In this paper, author focuses on the problem of side-channel information leaks in mobile systems. Authors develop machine learning algorithms to verify the possibility of inferring users’ online activities by analysing packet-level traffic, even when the traffic is encrypted. These attacks are hard to defend against & have a high accuracy and can be used in monitoring of a victim’s behaviour. A wide range of experiments are done to verify the feasibility and accuracy of these attacks. These attacks can be further used in different wireless networks including cellular networks and detection of new apps can be fairly easily added to the system.