Qiang Xu†, Yong Liao‡, Stanislav Miskovic‡, Z. Morley Mao†, Mario Baldi‡, Antonio Nucci‡, Thomas Andrews† †University of Michigan, ‡Symantec, Inc.

Slides:



Advertisements
Similar presentations
WEB AND WIRELESS AUTOMATION connecting people and processes InduSoft Web Solution Welcome.
Advertisements

Google News Personalization: Scalable Online Collaborative Filtering
Performance Testing - Kanwalpreet Singh.
Taming User-Generated Content in Mobile Networks via Drop Zones Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University.
Unsupervised Learning
Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.
Networking Problems in Cloud Computing Projects. 2 Kickass: Implementation PROJECT 1.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 OSI Transport Layer Network Fundamentals – Chapter 4.
RTP: A Transport Protocol for Real-Time Applications Provides end-to-end delivery services for data with real-time characteristics, such as interactive.
A Data Stream Management System for Network Traffic Management Shivnath Babu Stanford University Lakshminarayanan Subramanian Univ. California, Berkeley.
DSPIN: Detecting Automatically Spun Content on the Web Qing Zhang, David Y. Wang, Geoffrey M. Voelker University of California, San Diego 1.
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
ARP Traffic Study Jim Rees, Manish Karir Research and Development Merit Network Inc.
 Guy Jacob  Roee Shapiro Project B Spring, 2009 Cloudio  Project Supervisor: Eddie Bortnikov  Lab Chief Engineer: Dr. Ilana David.
 Firewalls and Application Level Gateways (ALGs)  Usually configured to protect from at least two types of attack ▪ Control sites which local users.
NAT (Network Address Translator) Atif Karamat In the name of God the most merciful and the most compassionate.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
Basic Data Mining Techniques
What is adaptive web technology?  There is an increasingly large demand for software systems which are able to operate effectively in dynamic environments.
Junxian Huang 1 Feng Qian 2 Yihua Guo 1 Yuanyuan Zhou 1 Qiang Xu 1 Z. Morley Mao 1 Subhabrata Sen 2 Oliver Spatscheck 2 1 University of Michigan 2 AT&T.
Developing Content for Mobile Devices Larry D. Lee Web Developer for K4Health.
1 Digital Circulation Marketing Marketing ROI Project Discovery Phase Update 14 th July 2011.
KaZaA: Behind the Scenes Shreeram Sahasrabudhe Lehigh University
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
A Theoretical Study of Optimization Techniques Used in Registration Area Based Location Management: Models and Online Algorithms Sandeep K. S. Gupta Goran.
1 A Comparative Study of Handheld and Non-Handheld Traffic in Campus Wi-Fi Networks Aaron Gember, Ashok Anand, and Aditya Akella University of Wisconsin—Madison.
Internet  Major:Safety science and engineering  Author:jiangqian( 蒋乾 )
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience Zhichun Li, Manan Sanghi, Yan Chen, Ming-Yang Kao and Brian.
ECE 526 – Network Processing Systems Design Packet Processing I: algorithms and data structures Chapter 5: D. E. Comer.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
Improving Cloaking Detection Using Search Query Popularity and Monetizability Kumar Chellapilla and David M Chickering Live Labs, Microsoft.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
Juan Ortega 8/13/09 NTS300. “The problem with version 5 relates to an experimental TCP/IP protocol called the Internet Stream Protocol, Version 2, originally.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,
Network security Product Group 2 McAfee Network Security Platform.
Unconstrained Endpoint Profiling Googling the Internet Ionut Trestian, Supranamaya Ranjan, Alekandar Kuzmanovic, Antonio Nucci Reviewed by Lee Young Soo.
CCDA DESCRIBE THE METHODOLOGY USED TO DESIGN A NETWORK.
Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University
PROTEUS: Network Performance Forecast for Real- Time, Interactive Mobile Applications Qiang Xu* Sanjeev Mehrotra# Z. Morley Mao* Jin Li# *University of.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Firewalls2 By using a firewall: We can disable a service by throwing out packets whose source or destination port is the port number for that service.
Testing in Android. Methods Unit Testing Integration Testing System Testing Regression Testing Compatibility Testing Black Box (Functional) White Box.
Alex Leifheit NETWORKS. NETWORK A number of interconnected computers, machines, or operations. Key Components Network components, Network Architecture,
BotCop: An Online Botnet Traffic Classifier 鍾錫山 Jan. 4, 2010.
Understanding the Impact of Network Dynamics on Mobile Video User Engagement M. Zubair Shafiq (Michigan State University) Jeffrey Erman (AT&T Labs - Research)
Wikipedia Edit. Internet of Things It is the idea of enabling everyday objects with software, sensors and network connectivity. The connectivity would.
Welcome to Laubrass INC. CREATORS OF UMT PRODUCS.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
CSE5803 Advanced Internet Protocols and Applications (13) Introduction Existing IP (v4) was developed in late 1970’s, when computer memory was about.
IS3220 Information Technology Infrastructure Security
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
Our Place in the Cloud DCIA P2P & Cloud Market Conference March 9, 2010.
Unveiling Zeus Automated Classification of Malware Samples Abedelaziz Mohaisen Omar Alrawi Verisign Inc, VA, USA Verisign Labs, VA, USA
 Abbreviation of fourth generation wireless technology  It will provide a comprehensive IP solution where voice, data and multimedia can be given to.
Multicast in Information-Centric Networking March 2012.
Skype.
Fault – Tolerant Distributed Multimedia Streaming Web Application By Nirvan Sagar – Srishti Ganjoo – Syed Shahbaaz Safir
5G Wireless Technology.
OS Fingerprinting and Tethering Detection in Mobile Networks
Device Maintenance and Management, Parental Control, and Theft Protection for Home Users Made Easy with Remo MORE and Power of Azure MICROSOFT AZURE APP.
CLIF meets Jenkins Performance testing in continuous integration, and more... Bruno Dillenseger - Orange Labs CLIF is OW2's load testing framework project,
Authors – Johannes Krupp, Michael Backes, and Christian Rossow(2016)
RTP: A Transport Protocol for Real-Time Applications
Seminar on…. 5G Wireless Technology By: Niki Upadhyay
Pyramid Sketch: a Sketch Framework
Network Profiler: Towards Automatic Fingerprinting of Android Apps
GANG: Detecting Fraudulent Users in OSNs
Presentation transcript:

Qiang Xu†, Yong Liao‡, Stanislav Miskovic‡, Z. Morley Mao†, Mario Baldi‡, Antonio Nucci‡, Thomas Andrews† †University of Michigan, ‡Symantec, Inc.

 What is the problem ? - Mobile applications remain hidden in generic http traffic  How does that matter ? – service providers want to regain the control of network  Solution ? – FLOWR  FLOWR focuses on key-value pair in HTTP headers  Supervised learning approach – app identification  Why do we even need App identification ? o App market providers use it for promotion o Users can be provided with only the interested content o Network providers optimize the resource allocation for apps

 Similarity Widespread use of Content delivery network(CDN) and cloud services  Scalability Large number of apps making supervised learning approach impractical  Coverage volatile nature of app popularity

1. Developer includes specific prior information in the app that is connected to the CDN 2. Flows repeatedly observed from the same devices within short time intervals are likely to come from the same apps. What do you think about these assumptions ? I think No. 2 is suspicious.

 The solution automatically identifies the app signature through the online traffic analysis  FLOWR is the self learning system with minimal supervised training  FLOWR scales automatically to the size of app market  Methodology: 1. App Features and Signatures 2. Counting Co-occurrence of App Features 3. Flow Regression 4. Seeding the Knowledge Base

 Definition I. An app feature is a concatenation of the name of a web service employed by the app and a key-value pair in the query part of the service’s HTTP URI, i.e. F = {name : K = V }.  Definition II. An app feature F that identifies app X with good confidence is a signature of app X.  Definition III. Feature F’s co-occurrence likelihood with app X is defined as a ratio of the number of unique IP addresses for which feature F co-occurs with app X’s signatures, and the total number of IP addresses in which F can be observed.

 GET /pagead/images/go_arrow.png HTTP/1.1 Host: pagead2.googlesyndication.com Referer: &msid=zz.rings.rww2&... User-Agent: Mozilla/5.0 (Linux; U; Android 2.3.3;...  GET /getAd.php5?sdkapid=67526&...&country=US &age=45&zip=90210&income=50000&... HTTP/1.1 Connection: Keep-Alive Host: androidsdk.ads.mp.mydas.mobi User-Agent: Apache-HttpClient/UNAVAILABLE (java 1.4)  google.com/store/apps/details?id=com.facebook.katana.  “mydas.mobi:country=US” is also an app feature but NOT SIGNATURE

 FLOWR has problems in identifying encrypted or hashed network traffic or traffic originated by apps that do not use a* services  The solution is not applicable to the apps using protocols other than HTTP  FLOWR’s basic methodology of tracking co-occurrence for signature building is generic and can be universally applied to apps that use other protocols  Coverage bounded by initial seeding signature  Two datasets are employed - first dataset (FlowSet) is a network trace from a nationwide cellular network provider. The second dataset (AppSet) is a lab trace generated by running more than 10K most popular Android apps in software emulators  While the feature is promoting as Signature, when P[X|F] ≈ 1, the promotion inevitably incurs some false positives in app identification.

 The initial training set of known signatures will vary based on location, should consider the top N apps of the location as well as global  The co-occurrence feature not necessarily belongs to the same app even though the time difference is less. If it goes wrong, it leads to increase in false positive rate  The memory required will be very high to conduct the real-time traffic analysis about billion amounts of data  Constant updation of new known signatures into the initial seeding signatures set  The app feature with no prior knowledge will simply be ignored

 It’s a real time app identification with the speeds of up to 5 Gb/s of input traffic.  In a 6 day 10 billion flow trace from a nationwide cellular network, FLOWR was capable of identifying 86–95% of flows related to the signature seeds with “tolerable” false positives.  To guarantee false positives lower than 5%, means setting p higher than 0.8. To avoid any false positives, according to our extensive datasets, p should be set to  With a false positive rate lower than 1%, FLOWR uniquely identifies the generating apps of 26–30% of the flows; for another 60–65% of the flows FLOWR narrows down the generating app of each flow to 5 or fewer candidates.

 Work extension can include one more technique like “Man-in-the-middle” along with this FLOWR to cope up with the encrypted traffic.  May include classification strategy to reduce the noises