Semantics for Big Data (,) Security and Privacy Tim Finin and Anupam Joshi University of Maryland, Baltimore County Baltimore MD NSF Workshop on Big Data.

Slides:



Advertisements
Similar presentations
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
Advertisements

Policy based Cloud Services on a VCL platform Karuna P Joshi, Yelena Yesha, Tim Finin, Anupam Joshi University of Maryland, Baltimore County.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
UMBC AN HONORS UNIVERSITY IN MARYLAND Future Research Challenges and Needed Resources for The Web, Semantics and Data Mining Tim Finin UMBC, Baltimore.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Trust, Privacy, and Security Moderator: Bharat Bhargava Purdue University.
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.
Ragib Hasan Johns Hopkins University en Spring 2011 Lecture 10 04/18/2011 Security and Privacy in Cloud Computing.
Machine Reasoning about Anomalous Sensor Data Matt Calder, Francesco Peri, Bob Morris Center for Coastal Environmental Sensoring Networks CESN University.
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Audumbar Chormale Advisor: Dr. Anupam Joshi M.S. Thesis Defense
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
An Intelligent Broker Architecture for Context-Aware Systems A PhD. Dissertation Proposal in Computer Science at the University of Maryland Baltimore County.
What Can Do for You! Fabian Christ
Managing & Integrating Enterprise Data with Semantic Technologies Susie Stephens Principal Product Manager, Oracle
By Mihir Joshi Nikhil Dixit Limaye Pallavi Bhide Payal Godse.
Semantics for Privacy and Context Tim Finin University of Maryland, Baltimore County Joint work with Anupam Joshi, Prajit Das, Primal Pappachan, Eduado.
Semantic Web outlook and trends May The Past 24 Odd Years 1984 Lenat’s Cyc vision 1989 TBL’s Web vision 1991 DARPA Knowledge Sharing Effort 1996.
1 NETE4631 Mobile Cloud Computing Lecture Notes #10.
An approach to Intelligent Information Fusion in Sensor Saturated Urban Environments Charalampos Doulaverakis Centre for Research and Technology Hellas.
Intelligent Agents Meet the Semantic Web in Smart Spaces Harry Chen,Tim Finin, Anupam Joshi, and Lalana Kagal University of Maryland, Baltimore County.
Deploying Trust Policies on the Semantic Web Brian Matthews and Theo Dimitrakos.
Tables to Linked Data Zareen Syed, Tim Finin, Varish Mulwad and Anupam Joshi University of Maryland, Baltimore County
Honeypot and Intrusion Detection System
Tim Finin University of Maryland, Baltimore County 29 January 2013 Joint work with Anupam Joshi, Laura Zavala and our students SRI Social Media Workshop.
What is the Internet? Internet: The Internet, in simplest terms, is the large group of millions of computers around the world that are all connected to.
UMBC iConnect Audumbar Chormale, Dr. A. Joshi, Dr. T. Finin, Dr. Z. Segall.
Data Management Information Management Knowledge Management Data and Applications Security Challenges Bhavani Thuraisingham October 2006.
Populating A Knowledge Base From Text Clay Fink, Tim Finin, Christine Piatko and Jim Mayfield.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Semantics for Cybersecurity and Privacy Tim Finin, UMBC Joint work with Anupam Joshi, Karuna Joshi, Zareen Syed andmany UMBC graduate students
Future Learning Landscapes Yvan Peter – Université Lille 1 Serge Garlatti – Telecom Bretagne.
Streaming Knowledge Bases Onkar Walavalkar, Anupam Joshi Tim Finin and Yelena Yesha University of Maryland, Baltimore County 27 October 2008.
Page 1 Alliver™ Page 2 Scenario Users Contents Properties Contexts Tags Users Context Listener Set of contents Service Reasoner GPS Navigator.
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
Exploitation of Semantic Web Technology in ERP Systems Amin Andjomshoaa, Shuaib Karim Ferial Shayeganfar, A Min Tjoa (andjomshoaa, skarim, ferial,
Semantic Enhancement: Key to Massive and Heterogeneous Data Pools Violeta Damjanovic, Thomas Kurz, Rupert Westenthaler, Wernher Behrendt, Andreas Gruber,
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
Semantic Publishing Benchmark Task Force Fourth TUC Meeting, Amsterdam, 03 April 2014.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Computational Policies in a Need to Share Environment Tim Finin University of Maryland, Baltimore County SemGrail workshop, Redmond WA, 21 June 2007.
Digital Forensics Dr. Bhavani Thuraisingham The University of Texas at Dallas Network Forensics - III November 3, 2008.
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
Understand Malware LESSON Security Fundamentals.
IoT Meets Big Data Standardization Considerations
Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Cyber Security Research on Engineering Solutions Dr. Bhavani.
Semantic Web in Context Broker Architecture Presented by Harry Chen, Tim Finin, Anupan Joshi At PerCom ‘04 Summarized by Sungchan Park
NSF Cyber Trust Annual Principal Investigator Meeting September 2005 Newport Beach, California UMBC an Honors University in Maryland Trust and Security.
Making Software Agents Smarter Tim Finin University of Maryland, Baltimore County ICAART 2010, 22 January 2010
CISC 849 : Applications in Fintech Vaishnavi Gandra Dept of Computer & Information Sciences University of Delaware Extracting Cybersecurity Related Linked.
Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.
1 Web Services for Semantic Interoperability and Integration Tim Finin University of Maryland, Baltimore County Dagstuhl, 20 September 2004
IS3220 Information Technology Infrastructure Security
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Windows Vista Configuration MCTS : Internet Explorer 7.0.
Anupam Joshi University of Maryland, Baltimore County Joint work with Tim Finin and several students Computational/Declarative Policies.
Some Great Open Source Intrusion Detection Systems (IDSs)
Common System Exploits Tom Chothia Computer Security, Lecture 17.
CompTIA Security+ Study Guide (SY0-401)
FaceBlock: Semantic Context-Aware Privacy for Mobile Devices
Towards a framework for architectural design decision support
Secure Software Confidentiality Integrity Data Security Authentication
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Data Quality: Practice, Technologies and Implications
CompTIA Security+ Study Guide (SY0-401)
Wikitology Wikipedia as an Ontology
ISMS Information Security Management System
Presentation transcript:

Semantics for Big Data (,) Security and Privacy Tim Finin and Anupam Joshi University of Maryland, Baltimore County Baltimore MD NSF Workshop on Big Data Security and Privacy , University of Texas at Dallas

The plot outline Big data → Variety → Need for integration & fusion → Must understand data semantics → Use semantic languages & tools (reasoners, ML) → Have shared ontologies & background knowledge Relevance to security and privacy – Protect personal information, especially in mobile/IOT scenarios – Better intrusion detection systems

Use Case Examples We’ve used semantic technologies in support of assured information tasks including – Representing & enforcing information sharing policies – Negotiating for cloud services respecting organizational constraints (e.g., data privacy, location, …) – Modeling context for mobile users and using this to manage information sharing – Acquiring, using and sharing knowledge for situationally-aware intrusion detection systems Key technologies include Semantic Web languages (OWL, RDF) and tools and information extraction from text

Context-Aware Privacy and Security Smart mobile devices know a great deal about their users, including their current context Acquiring and using this knowledge helps them provide better services Sharing the information with other users, organizations and service providers can also be beneficial (Mobile Ad-Hoc Knowledge Networks) Context-aware policies can be used to limit information sharing as well as to control the actions and information access of mobile apps We’re in a two-hour budget meeting at X with A, B and C We’re in a impor- tant meeting We’re busy

Context-aware power management Maintaining context model uses power We empirically determine power usage for a phone’s sensors and use this for optimization

Context-aware power management Maintaining the context model use power We developed an accurate power models for a phone’s sensors and use this for optimization When updating context model 1. Only enable sensors required by policy, reuse recent sensor readings whenever appropriate e.g., disable GPS sensor when at home in evening 2. Prefer sensors with lower energy footprint or already in use when several available e.g., Choose Wifi to GPS for location at office during day 3.Reorder rule conditions to reduce energy use e.g., Check conditions requiring no sensor access first When updating context model 1. Only enable sensors required by policy, reuse recent sensor readings whenever appropriate e.g., disable GPS sensor when at home in evening 2. Prefer sensors with lower energy footprint or already in use when several available e.g., Choose Wifi to GPS for location at office during day 3.Reorder rule conditions to reduce energy use e.g., Check conditions requiring no sensor access first

Intrusion Detection Systems Current intrusion detection systems poor for zero-day and “low and slow” attacks, and APTs Sharing Information from heterogeneous data sources can provide useful information even when an attack signature is unavailable Implemented prototypes that integrate and reason over data from IDSs, host and network scanners, and text at the knowledge level We’ve established the feasibility of the approach in simple evaluation experiments

From dashboards & watchstanding (Simple) Analysis

… to situational awareness Non Traditional “Sensors” Traditional Sensors Facts / Information Context/Situation Rules Policies Analytics Alerts Use-after-free vulnerability in Microsoft Internet Explorer 6 through 8 …. [ a IDPS:text_entity; IDPS:has_vulnerability_term "true"; IDPS:has_security_exploit "true"; IDPS:has_text “Internet Explorer"; IDPS:has_text “arbitrary code "; IDPS:has_text "remote attackers".] [ a IDPS:system; IDPS:host_IP " ”.] [ a IDPS:scannerLog IDPS:scannerLogIP " "; …] [ a IDPS:gatewayLog IDPS:gatewayLogIP " "; …] [ IDPS:scannerLog IDPS:hasBrowser ?Browser IDPS:gatewayLog IDPS:hasURL ?URL ?URL IDPS:hasSymantecRating “unsafe” IDPS: scannerLog IDPS:hasOutboundConnection “true” IDPS:WiresharkLog IDPS:isConnectedTo ?IPAddress ?IPAddress IDSP:isZombieAddress “true”] => [IDPS:system IDPS:isUnderAttack “user-after-free vulnerability” IDPS:attack IDPS:hasMeans “Backdoor” IDPS:attack IDPS:hasConsequence “UnautorizedRemoteAccess”]

Maintaining the vulnerability KB Our approach requires us to keep the KB of software products and known or suspected vulnerabilities and attacks up to date Resources like NVD are great, but tapping into text can enrich their info and give earlier warn-ings of problems CVE disclosed (01/14/13) Vendor deploys software Attacker finds vuln. & exploits it (01/10/13) Exploit reported in mailing list (01/10/13) Vuln. reported in NVD RSS feed Analysis Vuln. Analyzed & included in NVD feed (02/16/2013) Vendor Analysis Threat disclosed in vendor bulletin (03/04/2013) Patch development Patch released (Critical Patch Update) (06/18/2013) Resolution System update

Information extraction from text CVE Buffer overflow in msvcrt.dll in Microsoft Windows Vista SP2, Windows Server 2008 SP2, R2, and R2 SP1, and Windows 7 Gold and SP1 allows remote attackers to execute arbitrary code via a crafted media file, aka ”Msvcrt.dll Buffer Overflow Vulnerability.” ebqids:hasMean s Identify relationships e/Buffer_overflow Link concepts to entities ows_7 ebqids:affectsProduct We use information extraction techniques to identify entities, relations and concepts in security related text These are mapped to terms in our ontology and the DBpedia LOD KB (based on Wikipedia) Google’s slogan: “Things, not strings”

Security Bulletins Blogs Maintaining the vulnerability KB Unstructured Data (Vuln. Summaries) Entity & Concept Spotter Extracted Concepts Web Text Triple Store NVD dataset Structured Data (XML) IDS Ontology Linked Cybersecurity Data Consumers Linking & Mapping Entities RDF Generation

Faceblock Click image to play 80 second video or go to YoutubeYoutube

Faceblock Ontology Faceblock’s (OWL) ontology lets one to write context policy rules using predefined activity and place types

Faceblock Ontology Faceblock’s (OWL) ontology lets one to write context policy rules using predefined activity and place types

Faceblock Protocols User device maintains context, reasons with policy rules and informs glass devices of Faceblock property: True or Fase

Taming Wild Big Data WBD is structured or semi-structured data for which we lack schema-level understanding – e.g, raw tables, graphs, xml, logs Developed tools to generate semantic data from background ontologies & KBs, e.g. for clinical trial tables It’s harder when the domain is not even known. We’re developing systems that use large background KBs (e.g., Google’s Freebase) to predict types/subtypes of data instances

Conclusion Google’s new slogan: things, not strings We also need: measurements, not numbers Common ontologies in semantic representations enable big data integration at a “knowledge level” – data, meta-data, provenance, certainty, rules Many advantages: – Enhancing discovery, integration and interoperability – Enabling inference and knowledge-level analytics – Expressing policy constraints in common semantic terms

Threat/Vulnerability Alert Knowledge Base ReasonerOntology Domain Expert Knowledge RDFS Knowledge Web Text Sources (Blogs, Forums, Feeds) Entity/Concept Extractor Named Entities Security Vulnerability Entities Extractor Security Vulnerability Terms IDS/IPS sensors Reports and Logs Host Based Activity Monitor Host Activity Logs Network Activity Monitor Network Activity Logs Hardware Security Sensors Security Logs System Architecture 2

Populating KBs from Text Kelvin is a system for populating KBs with entities and relations extracted from text – Developed at JHU Human Language Technology Center of Excellence – E.g., extracts 300K entities and 3M relations from 50K newswire articles Supports analytics at KB level: inference, proba- bilistic reasoning, entities linking across KBs, … Top system in 2012 & 2013 NIST Text Analytics Conference Coldstart KBP task evaluations