Towards A Semantic Web Application for NVD-CPE

Towards A Semantic Web Application for NVD-CPE
Vaibhav Khadilkar Jyothsna Rachapalli Dr. Bhavani Thuraisingham The University of Texas at Dallas PC: High level comment – make sure all words in slide header are capitalized (with exception of a, the…)

Semantic Web Humans are capable of using the Web to carry out tasks such as finding the Finnish word for "monkey", reserving a library book, searching for a low price for a DVD. However, a Computer cannot accomplish the same tasks without human direction because web pages are designed to be read by people, not machines. The semantic web is a vision of information that is understandable by computers, so that they can perform more of the tedious work involved in finding, sharing, and combining information on the web.

Common Platform Enumeration
CPE is a structured naming scheme for IT systems, platforms, and packages. A CPE Name is represented by a URI. Each name consists of the prefix "cpe:" and is followed by up to seven different components. These components are used to help build consistent and unique names. The components relate to platform part, vendor, product name, version, update level, Edition language.

Agenda Motivation to opt for semantic web technology
Architecture of a semantic web application Semantic web technologies overview Strategy for creation of semantic web application Performance metrics JR: Highlighted words alone can be retained for each bullet if that sounds better . (?) PC: I like the way you did this, I would keep it how it is. JR: Data migration utility has not been included in this slide as that can be done after we justify migration from relational model to semantic model .(?)

Motivation National Vulnerability Database (NVD)
Contains product and vulnerability management data Based on a relational model Goal is to enable automation of Vulnerability management Security measurement and compliance Relational model imposes limitations Product composition difficult to achieve. Find all products containing a TCP/IP device? Find all products within common codebase? Advantage of semantic model - Reasoning! PC: Capitalize v in vulnerability (NVD). PC: I would not say “can not” be achieved, maybe use “difficult to achieve.” Using words like can not always gets a rise out of hecklers who are going to explain how they could do it. It is possible to do, the graph model is just much better suited towards it. PC: The bullets “Find all products containing a TCP/IP device? Find all products within common codebase?” Are not really limitations. Were these supposed to be a sub bullet under the “product composition cannot be achieved”?

Ontology An ontology provides a precise vocabulary with which knowledge can be represented” “This vocabulary allows us to specify which entities will be represented, how they can be grouped, and what relationship connect them together”

Resource Description Framework
RDF is a language for representing information about resources in the World Wide Web. RDF is intended for situations in which this information needs to be processed by applications, rather than being only displayed to people. RDF is intended to provide a simple way to make statement the part that identifies the thing the statement is about is called the subject. The part that identifies the property of the subject is called the predicate and the part that identifies the value of that property is called the object.

Project Objectives Creation of products ontology for NVD-CPE
Creation of a corresponding view in relational DB Migrate data from relational to semantic model Create a web application using the new model This application should enable user to Navigate Search Query the data JR: Speaker is expected to elaborate the points during the presentation. PC: understood, for the first bullet be sure to elaborate that the ontology is representing NVD’s viewpoint on products and is geared towards the use case supporting statements of applicability between IT concepts (e.g. CVE, CCE, Checklists) and IT products.

Semantic Technology Converter RDF Parser and Serializer
Converts data form various sources(e.g.,tables, spreadsheets, webpages) into RDF RDF Parser and Serializer Facilitates reading and writing RDF in one of several file formats (e.g., N3, N-TRIPLE, RDF/XML) RDF Store (or triple store) Is a database that is optimized for the storage and retrieval of many short statements called triples PC: I don’t think you have defined SDB yet so you may want to do that before using it. Also I don’t quite get the difference between relational and storing to disk directly (doesn’t a relational database use a disk?) You may want to clarify that is possible to store RDF in a relational database (optimized for storing triples) as well as an application designed just to store triples (dedicated triple store like AllegroGraph).

Semantic Technology Reasoner SPARQL Application interface
A program that performs inferences according to specified inference rules SPARQL The W3C standard query language for RDF Application interface Uses the content of an RDF store in an interaction with some user

Semantic Technology-Examples
Converters D2RQ used during first approach Jena API to read relational data into a Jena model Parser/Serializer Jena API to read and write the triples into any serialization format RDF Store RDB, SDB and Allegrograph Inferencing Pellet Reasoner SPARQL ARQ is a query engine for Jena that supports SPARQL PC: I would put Pellet Reasonor on a sub-bullet, since that is how the rest of the examples work.

Semantic Technology-Jena
The Jena Framework provides A RDF API Reading and writing RDF in RDF/XML, N3 and N-Triples An OWL API In-memory and persistent storage SPARQL query engine Built in Reasoners Plug-in for external reasoners PC: It may be a good idea to list that Jena also supplies extension points for plugging in third party functionality (e.g. third party reasoners).

Application Architecture
SPARQL Ontology API Core RDF Model API Inference API (Reasoners) RDF FILES APPLICATION Converters Parser Serializer DB RDB SDB AllegroGraph RDF/Triple Stores PC: I think this slide should go after you discuss the technology (after slide 9). Audience will understand this better if they have an overview of the individual parts. (I feel like I may be contradicting something I previously said, sorry about that).

Strategy Step 1 - Use Cases
Describe initial, most difficult requirements in conversational, informal English Work with domain experts to create use cases required by a given domain Use case examples Searching – “What are all the products that have a Vendor of Microsoft and a product name of windows_nt?” Equality – “Determine if two instances are equal” PC: Label your second use case like you do for the first, something like “Equality – We need to be able to determine …” PC: were these the only use cases we had? If not you may want to list them all.

Strategy Step 2 - Ontology creation and validation
Use an ontology editor to create an ontology/schema based on the use cases created in Step 1 Ontology editor used: Protégé 4.0 External reasoner plug-in: Pellet Creation of Classes and corresponding subclasses Properties: Object properties as well as data properties Individuals of a class Run the reasoner to validate the correctness of model

High-level NVD Ontology Overview
Relationship connecting the two structures Identification concept hierarchy Product category concept hierarchy hasIdentification = <owl:Class> = <rdfs:subClassOf> ABC = <rdf:Property>

Strategy Step 3 - Ontology migration to Jena Step 4 - Data migration
Create Java classes using Ontology generated in Step 2 Java classes are created using Schemangen Input to Schemagen: Ontology.owl Output from Schemagen: Ontology.java Step 4 - Data migration Perform Data Migration – Two approaches First approach Mapping relational data to RDF with a mapping tool Second approach Mapping relational data to RDF using database view PC: I am assuming that explaining what Schemagen is will be part of your talking track.

Data migration utility–First approach
Database to Relational Query (D2RQ) allows us to view the relational database as an RDF triples D2RQ mapping file Maps database columns to predicates in the ontology Use the mapping file to convert the relational database into triples A triple is created as follows primary key of table ---> subject column name ---> predicate value of the cell ---> object PC: You may want to explain what D2RQ is before stating what you did with it. Audience members will likely lose you before you start. Also explain exactly what a mapping file is for (i.e. maps database columns to predicates in the ontology).

Data Migration Utility
First approach limitations D2RQ is not required when a combined view of different tables is used as is the case with the NVD- CPE database D2RQ does not allow us to update database tables Second approach Involves creating a new relational schema that is closely related to the ontology This schema will serve as a stepping stone for the data along the path to the semantic store

Data migration utility–Second approach
Create a view that combines required columns from various tables Read tuples from this view (table) to convert the product information into triples The triple is now created as primary key ( cpe name ) > subject predicate based on the ontology ---> predicate value of the cell > object

Strategy -Continued Step 5 - Reasoning
The process by which new triples are systematically added to a graph based on patterns in existing triples. Inference rules Systematic patterns defining which of the triples should be inferred. Steps involved Choose a reasoner - Pellet (External reasoner) Create inference rules as part of the ontology using OWL Run the reasoner Verify the correctness of the inference rules using inferred triples PC: When you mention rules are you talking about horn clauses (SWRL rules)? Or are you talking about the rules inherit in RDFS/OWL. If the latter you may want to state that these rules are created as part of the ontology when modeling with the W3C modeling languages like OWL.

Strategy Step 6 - SPARQL queries Step 7 - Application
SPARQL queries are very similar to SQL queries. Write SPARQL queries for each of the use cases from Step 1 Step 7 - Application Integrate the newly implemented functionality with the web application. Create user interface that enables Navigation Search Querying

Strategy Step 8 - Performance with triple stores
Performance metrics to test for Load time - Load triples in to triple store Query times - Running time of the sparql queries for various use cases Perform testing on triple stores like RDB, SDB and AllegroGraph and document corresponding performance metrics Step 9 - Cyclic process Write additional use case scenarios and repeat the process until all use cases have been modeled Refine model until correct inferences are being drawn. PC: for step 9 you may also want to state that you need to keep tweaking the model until the correct inferences are being drawn.

Strategy - Cyclic Process
Use cases Ontology Creation Ontology Migration Data Migration Reasoning Sparql queries Application Performance Strategy

Performance Metrics RDB,SDB and Allegrograph triple stores are optimized and indexed Metrics measure performance on 94216 products without reasoning 5961 products with reasoning Example Queries List all the vendors List all the products List products created in given range of time period List all products for a given vendor or given creation date Example Queries with reasoning Products containing TCP/IP devices Products containing a given shared library PC: for step 9 you may also want to state that you need to keep tweaking the model until the correct inferences are being drawn.

Performance Metrics: Load Statistics
Relational View RDB SDB AllegroGraph Version SQL Server 05 Jena-2.5.6 SDB-1.1 AllegroGraph-3.2 Size(Rows/Triples) 96485 (R) (T) Total Space (MB) 13.08 302.63 387.00 Index Space (MB) 0.008 674.22 75.55 316.06 Log Space (MB) - 285.06 82.44 Load time 231.6 s 284.6 s 164.8 s

Load time with reasoning
Metric RDB SDB AllegroGraph Version Jena-2.5.6 SDB-1.1 AllegroGraph-3.2 Size(Rows/Triples) 97814 (T) Total Space (MB) 118.31 61.38 38 Index Space (MB) 66.98 9.65 31.46 Log Space (MB) 13.31 38.38 - Load time 18.58 hrs 17.62 hrs 19.06 hrs

Performance Metrics: Query time
Query (Triples) RDBMS(ms) RDB(ms) SDB(ms) AllegroGraph(ms) Vendors (9898 ) 53.2 737.4 711.2 945.6 Products (96216) 10.6 1013.2 723.4 5572.8 MS Products (2616) 12 26.4 30.0 141.4 ‘win ce’ Agent (1) 27 74.8 8.4 11.0 All CPE names(96216) 11 1235.0 1274.6 7321.2 Given CPE name(1) 1 838.6 472.2 5425 All creation dates (96216) 8.2 1183.8 1499.4 5464.4 Given creation date (56811) 70.6 937.4 1427.4 5519 Type ‘a’ (82981) 34 749.6 1120.6 5325 Group by Type h=4941, o=8294, a=82981 92.6 768.4 1243.8 5406.2 PC: Be sure to cover what each query was for in your talking track. PC: Was allegrograph indexted for this benchmark? PC: You may want to just throw the rows data into the first column (e.g. Vendors (9898 rows/triples)) because by having it as a column it is confusing as to if it is representing a number of miliseconds or rows returned. If everything else is in mili seconds you should not have a column counting something else.

Query times with reasoning
JR : Table or Bar-Graph for performance metrics representation? Views? PC: I like this style, be sure to label the x axis with MS if that is what it is counting. JR : Yet to lable Y axis as milliseconds Reasoning Performed on 5961 products Total Number of products

Application

Conclusion Choice of semantic model instead of relational model enhances automation of Vulnerability management Creating a comprehensive list of use cases at once is challenging. Cyclical process makes incorporation of new use cases flexible Efforts must be taken to optimize triple store performance Implementation of a system must carefully choose a triple store/reasoner for their implementation Trade-off between speed and power JR: Need input for this slide. PC: I would focus on the conversion process and what you learned. Some suggestions: Traditional relational database modeling and semantic model differ in many ways (e.g. Relational models are table/class focused, semantic models are relationship focused (i.e. what inferences can be drawn from certain relationships)) and modelers must take these differences into account when conducting a conversion. If an ontology is built in the same way as the relational database many of the added benefits from using a semantic model will be lost. Creating a comprehensive list of use cases in the beginning of an effort is challenging, even for domain experts. Many use cases pop up in the middle of an effort as modelers begin to realize things are missing (truly is a cyclical process). Efforts must be taken to optimize triple store performance (e.g. indexing) in order to gain the most power out of a triple store. Implementation of a system must carefully choose a triple store/reasoner for their implementation (can have multiple pairings if needed). By adding speed you will normally lose power (i.e. will support less and less of owl as speed goes up).

References http://jena.sourceforge.net/ http://nvd.nist.gov/
Dean Allemang, James Hendler: Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL John Hebeler , Matthew Fisher , Ryan Blace , Andrew Perez-Lopez:Semantic Web Programming

Towards A Semantic Web Application for NVD-CPE

Similar presentations

Presentation on theme: "Towards A Semantic Web Application for NVD-CPE"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Towards A Semantic Web Application for NVD-CPE

Similar presentations

Presentation on theme: "Towards A Semantic Web Application for NVD-CPE"— Presentation transcript:

Similar presentations

About project

Feedback