Presentation is loading. Please wait.

Presentation is loading. Please wait.

Qiang Xu†, Yong Liao‡, Stanislav Miskovic‡, Z. Morley Mao†, Mario Baldi‡, Antonio Nucci‡, Thomas Andrews† †University of Michigan, ‡Symantec, Inc.

Similar presentations


Presentation on theme: "Qiang Xu†, Yong Liao‡, Stanislav Miskovic‡, Z. Morley Mao†, Mario Baldi‡, Antonio Nucci‡, Thomas Andrews† †University of Michigan, ‡Symantec, Inc."— Presentation transcript:

1 Qiang Xu†, Yong Liao‡, Stanislav Miskovic‡, Z. Morley Mao†, Mario Baldi‡, Antonio Nucci‡, Thomas Andrews† †University of Michigan, ‡Symantec, Inc.

2  What is the problem ? - Mobile applications remain hidden in generic http traffic  How does that matter ? – service providers want to regain the control of network  Solution ? – FLOWR  FLOWR focuses on key-value pair in HTTP headers  Supervised learning approach – app identification  Why do we even need App identification ? o App market providers use it for promotion o Users can be provided with only the interested content o Network providers optimize the resource allocation for apps

3

4  Similarity Widespread use of Content delivery network(CDN) and cloud services  Scalability Large number of apps making supervised learning approach impractical  Coverage volatile nature of app popularity

5 1. Developer includes specific prior information in the app that is connected to the CDN 2. Flows repeatedly observed from the same devices within short time intervals are likely to come from the same apps. What do you think about these assumptions ? I think No. 2 is suspicious.

6  The solution automatically identifies the app signature through the online traffic analysis  FLOWR is the self learning system with minimal supervised training  FLOWR scales automatically to the size of app market  Methodology: 1. App Features and Signatures 2. Counting Co-occurrence of App Features 3. Flow Regression 4. Seeding the Knowledge Base

7  Definition I. An app feature is a concatenation of the name of a web service employed by the app and a key-value pair in the query part of the service’s HTTP URI, i.e. F = {name : K = V }.  Definition II. An app feature F that identifies app X with good confidence is a signature of app X.  Definition III. Feature F’s co-occurrence likelihood with app X is defined as a ratio of the number of unique IP addresses for which feature F co-occurs with app X’s signatures, and the total number of IP addresses in which F can be observed.

8  GET /pagead/images/go_arrow.png HTTP/1.1 Host: pagead2.googlesyndication.com Referer: http://googleads.g.doubleclick.net:80/&... &msid=zz.rings.rww2&... User-Agent: Mozilla/5.0 (Linux; U; Android 2.3.3;...  GET /getAd.php5?sdkapid=67526&...&country=US &age=45&zip=90210&income=50000&... HTTP/1.1 Connection: Keep-Alive Host: androidsdk.ads.mp.mydas.mobi User-Agent: Apache-HttpClient/UNAVAILABLE (java 1.4)  https://play. google.com/store/apps/details?id=com.facebook.katana.  “mydas.mobi:country=US” is also an app feature but NOT SIGNATURE

9

10  FLOWR has problems in identifying encrypted or hashed network traffic or traffic originated by apps that do not use a* services  The solution is not applicable to the apps using protocols other than HTTP  FLOWR’s basic methodology of tracking co-occurrence for signature building is generic and can be universally applied to apps that use other protocols  Coverage bounded by initial seeding signature  Two datasets are employed - first dataset (FlowSet) is a network trace from a nationwide cellular network provider. The second dataset (AppSet) is a lab trace generated by running more than 10K most popular Android apps in software emulators  While the feature is promoting as Signature, when P[X|F] ≈ 1, the promotion inevitably incurs some false positives in app identification.

11  The initial training set of known signatures will vary based on location, should consider the top N apps of the location as well as global  The co-occurrence feature not necessarily belongs to the same app even though the time difference is less. If it goes wrong, it leads to increase in false positive rate  The memory required will be very high to conduct the real-time traffic analysis about billion amounts of data  Constant updation of new known signatures into the initial seeding signatures set  The app feature with no prior knowledge will simply be ignored

12  It’s a real time app identification with the speeds of up to 5 Gb/s of input traffic.  In a 6 day 10 billion flow trace from a nationwide cellular network, FLOWR was capable of identifying 86–95% of flows related to the signature seeds with “tolerable” false positives.  To guarantee false positives lower than 5%, means setting p higher than 0.8. To avoid any false positives, according to our extensive datasets, p should be set to 0.97.  With a false positive rate lower than 1%, FLOWR uniquely identifies the generating apps of 26–30% of the flows; for another 60–65% of the flows FLOWR narrows down the generating app of each flow to 5 or fewer candidates.

13  Work extension can include one more technique like “Man-in-the-middle” along with this FLOWR to cope up with the encrypted traffic.  May include classification strategy to reduce the noises


Download ppt "Qiang Xu†, Yong Liao‡, Stanislav Miskovic‡, Z. Morley Mao†, Mario Baldi‡, Antonio Nucci‡, Thomas Andrews† †University of Michigan, ‡Symantec, Inc."

Similar presentations


Ads by Google