Download presentation
Presentation is loading. Please wait.
Published byBenjamin Alvin Walters Modified over 9 years ago
1
Security Analytics Thrust Anthony D. Joseph (UCB) Rachel Greenstadt (Drexel), Ling Huang (Intel), Dawn Song (UCB), Doug Tygar (UCB)
2
Outline Our view of Security Analytics Adversaries, Humans, and Machine Learning Joint research with McAfee Our proposed malware analysis pipeline Today’s Security Analytics talks
3
Our View of Security Analytics Using robust ML for adversary resistant security metrics and analytics Pattern mining and prediction at scale on big data Detecting malware, spam, and malicious sites/URLs Identifying authors of User Generated Content and malware Also, Sybil detection in crowds and obfuscating authors of UGC Detecting human biosignals – EEG, vision tracking, SAFE continuous authentication Helping the humans-in-the-loop (situational awareness) End-users of systems Crowds and human reviewers Domain experts
4
Adversarial Exploitation of ML Traditional approach – Evading Adversary Attacker determines decision boundary Crafts (positive instance) content that is classified as negative Newer approach – Influencing Adversary Patient attacker operates during periodic retraining stage by injecting “tricky” positive instances Shifts decision boundary over time during retraining such that (positive instance) content is eventually classified as negative Need novel adaptive, robust ML techniques to defend against Influencing Adversaries
5
Synergy between Humans and ML Users – providing clear answers and usable security Is this content spam or malicious? What is the reasoning behind a security decision? Can my UGC be identified as being mine? Also, understanding how users reason about security Crowds – augmenting ML with human capabilities Leveraging humans to disambiguate borderline instances (e.g., is this a malicious or benign application or website) Domain Experts – prioritizing a limited resource Identifying when to rely on experts to evaluate model changes Helping determine authorship identification for malware
6
Collaboration with McAfee Special academic-industry collaboration Unique opportunity for academic access to massive scale real-world adversarial data Pathway for research to yield real-world impact Two Robust ML research efforts Current: Active protection Future: Malicious URL/site detection (Site Advisor) Update: Signed University-level NDAs with UC Berkeley and Drexel Had meetings at Intel and UC Berkeley Delivered prototype ML-based malware classification system that supports large-scale classification of polymorphic threats Ongoing: Refining research focus and exploring Artemis sample dataset
7
Artemis and GTI collect voluminous “suspicious events and metadata” from millions of end host McAfee needs to: Classify events into clean/dirty label Cluster events into groups Rank groups according to their suspiciousness level Help identify malware families (authorship classification) Our planned efforts Build a large-scale, online, adaptive ML system for automated malware classification with humans in the loop Apply stylometry for forensic analysis and malware classification Artemis and GTI
8
Proposed Malware Analysis Pipeline Program code Mobile Apps Executables Machine Learning Malware Classification Models Machine Learning Feature Encoding Program Analysis Static/ Dynamic/ Human Analysis Program Features Feedback Further analysis Program Features Human: Domain Experts Data from McAfee’s GTI and Google’s VirusTotal Categorization and Prioritization are critical!
9
Security Analytics Talks (Session 1) Big data for security analytics Using adaptive, large-scale ML to identify and classify malware families using code features Learning as an “attack”: De-anonymization Automated analysis of encrypted traffic – Identifying the URLs/topics of SSL-encrypted web pages Learning for web-based malware detection Not code features, rather: Where scripts and objects comes from, Who makes the requests, How user gets to the site
10
Security Analytics Talks (Session 2) Using Network Science to detect Sybils in social networks Leveraging social structure to detect fake accounts and improve user authentication Learning as an “attack”: De-anonymization Automated analysis and identification of underground forums users Understanding how End Users reason about Risk Security, privacy, and a 9-dimensional model for users
11
Security Analytics Goals Developing tools combining machine learning and analysis to automatically extract features and build models Improving users’ experiences by translating the reasoning behind security decisions into human understandable concepts Designing robust algorithms for large-scale machine-learning in the presence of adversarial manipulation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.