Security Analytics Thrust Anthony D. Joseph (UCB) Rachel Greenstadt (Drexel), Ling Huang (Intel), Dawn Song (UCB), Doug Tygar (UCB)

Outline Our view of Security Analytics Adversaries, Humans, and Machine Learning Joint research with McAfee Our proposed malware analysis pipeline Today’s Security Analytics talks

Our View of Security Analytics Using robust ML for adversary resistant security metrics and analytics  Pattern mining and prediction at scale on big data  Detecting malware, spam, and malicious sites/URLs  Identifying authors of User Generated Content and malware Also, Sybil detection in crowds and obfuscating authors of UGC  Detecting human biosignals – EEG, vision tracking, SAFE continuous authentication Helping the humans-in-the-loop (situational awareness)  End-users of systems  Crowds and human reviewers  Domain experts

Adversarial Exploitation of ML Traditional approach – Evading Adversary  Attacker determines decision boundary  Crafts (positive instance) content that is classified as negative Newer approach – Influencing Adversary  Patient attacker operates during periodic retraining stage by injecting “tricky” positive instances  Shifts decision boundary over time during retraining such that (positive instance) content is eventually classified as negative Need novel adaptive, robust ML techniques to defend against Influencing Adversaries

Synergy between Humans and ML Users – providing clear answers and usable security  Is this content spam or malicious?  What is the reasoning behind a security decision?  Can my UGC be identified as being mine?  Also, understanding how users reason about security Crowds – augmenting ML with human capabilities  Leveraging humans to disambiguate borderline instances (e.g., is this a malicious or benign application or website) Domain Experts – prioritizing a limited resource  Identifying when to rely on experts to evaluate model changes  Helping determine authorship identification for malware

Collaboration with McAfee Special academic-industry collaboration  Unique opportunity for academic access to massive scale real-world adversarial data  Pathway for research to yield real-world impact Two Robust ML research efforts  Current: Active protection  Future: Malicious URL/site detection (Site Advisor) Update:  Signed University-level NDAs with UC Berkeley and Drexel  Had meetings at Intel and UC Berkeley  Delivered prototype ML-based malware classification system that supports large-scale classification of polymorphic threats  Ongoing: Refining research focus and exploring Artemis sample dataset

Artemis and GTI collect voluminous “suspicious events and metadata” from millions of end host McAfee needs to:  Classify events into clean/dirty label  Cluster events into groups  Rank groups according to their suspiciousness level  Help identify malware families (authorship classification) Our planned efforts  Build a large-scale, online, adaptive ML system for automated malware classification with humans in the loop  Apply stylometry for forensic analysis and malware classification Artemis and GTI

Proposed Malware Analysis Pipeline Program code Mobile Apps Executables Machine Learning Malware Classification Models Machine Learning Feature Encoding Program Analysis Static/ Dynamic/ Human Analysis Program Features Feedback Further analysis Program Features Human: Domain Experts Data from McAfee’s GTI and Google’s VirusTotal Categorization and Prioritization are critical!

Security Analytics Talks (Session 1) Big data for security analytics  Using adaptive, large-scale ML to identify and classify malware families using code features Learning as an “attack”: De-anonymization  Automated analysis of encrypted traffic – Identifying the URLs/topics of SSL-encrypted web pages Learning for web-based malware detection  Not code features, rather: Where scripts and objects comes from, Who makes the requests, How user gets to the site

Security Analytics Talks (Session 2) Using Network Science to detect Sybils in social networks  Leveraging social structure to detect fake accounts and improve user authentication Learning as an “attack”: De-anonymization  Automated analysis and identification of underground forums users Understanding how End Users reason about Risk  Security, privacy, and a 9-dimensional model for users

Security Analytics Goals Developing tools combining machine learning and analysis to automatically extract features and build models Improving users’ experiences by translating the reasoning behind security decisions into human understandable concepts Designing robust algorithms for large-scale machine-learning in the presence of adversarial manipulation

Security Analytics Thrust Anthony D. Joseph (UCB) Rachel Greenstadt (Drexel), Ling Huang (Intel), Dawn Song (UCB), Doug Tygar (UCB)

Similar presentations

Presentation on theme: "Security Analytics Thrust Anthony D. Joseph (UCB) Rachel Greenstadt (Drexel), Ling Huang (Intel), Dawn Song (UCB), Doug Tygar (UCB)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Security Analytics Thrust Anthony D. Joseph (UCB) Rachel Greenstadt (Drexel), Ling Huang (Intel), Dawn Song (UCB), Doug Tygar (UCB)

Similar presentations

Presentation on theme: "Security Analytics Thrust Anthony D. Joseph (UCB) Rachel Greenstadt (Drexel), Ling Huang (Intel), Dawn Song (UCB), Doug Tygar (UCB)"— Presentation transcript:

Similar presentations

About project

Feedback