Andrei G. Stoica and Csilla Farkas Integrated Security Framework for Semantically Enhanced Semi-Structured Data Andrei G. Stoica and Csilla Farkas Department of Computer Science & Engineering University of South Carolina i
Overview Machine understandable data semantics: domain and context definition ontologies metadata What are the security implications? New security mechanisms? New security paradigm?
XML Language High-level application messaging Used for storage application reduces computation overhead uniform access Base for semantic orientated languages - RDF, DAML Increased popularity
Semantic Tools The information process is augmented with a semantic layer. Infrastructure allows computers to reason about data meaning. Computers exchange information transparently on behalf of the user. Implications Intelligent high-volume processing
Security Setup Increased Connectivity + Extensive XML support + Semantic Infrastructure = New Security Threats Established Security Models do not address this dimension: Indirect disclosure Undesired Inference Available inference models difficult to transfer from database security open domains
Related Work Document Instance Security XML Access Control Models Digital Signatures Encryption XML Access Control Models Security labels assignment Multi-level XML Security Extensions from Database Security
Problems? Semantic correlations ignored Inconsistent reply Indirect unauthorized disclosure
Example View over UC data medicalFiles <medicalFiles> UC <countyRec> S <patient> S <name>John Smith </name> UC <phone>111-2222</phone> S </patient> <physician>Jim Dale </physician> UC </countyRec> <milBaseRec> TS <name>Harry Green</name> UC <phone>333-4444</phone> S <physician>Joe White </physician> UC <milTag>MT78</milTag> TS </milBaseRec> </medicalFiles> countyRec milBaseRec physician Jim Dale physician Joe White milTag MT78 patient patient name John Smith phone 111-2222 name Harry Green phone 333-4444 View over UC data
Inference Set of data + associations derive the target data Traditionally a human task At the limit, infer any target given enough related data and metadata.
Problems? If the inference target is confidential information Security Violation
Example Simulation Exploitation Using Open Source Information: Objective: US Government would like to share a limited simulation software with friendly countries. Can this software be used to explore the capabilities of US weaponry? Can sufficient information be found from public sources to create such simulation?
Example Findings: Most of the information needed for the simulation was available on the Internet. Needed human aid to combine available information
Proposed Solution What do we do? XML Views Considering Semantic Dimension. do not disclose more information (including structure of the document). cover stories. Web Inference make sure the information we publish does not lead to our confidential data.
Proposed Solution XML Access Control Global Disclosure Control semantic consistent reply prevent illegal inference from query reply (cover stories). Global Disclosure Control detect and prevent a set of undesired inferences using public Internet data in correlation with public local data
Global Data Privacy Control Security Engine Local Organization Access Control Corrective Measures Request SecView Local XML Database Interface Module Return Oxsegin Update Local Ontology Upload Global Data Privacy Control Local Data Internet Data
Secure XML Views Builds secure & semantic consistent single security level partial views Minimum Semantic Conflict Graph avoids semantic conflicts Multi-Plane DTD Graph MPG structural relationships between tags Andrei Stoica, Csilla Farkas. “Secure XML Views”, In Proc. of IFIP 2002
Example DTD Graph MSCG medicalFiles name phone countyRec milBaseRec emrgRec physician patient milTag physician name phone
Oxsegin Inference Engine Security Violations Corrective Measures Local Classified Database Inference Engine Local Public Database Security Violations Internet Databases Corrective Measures
Corrective Measures Local Public Data Remove information Release misleading information Internet Public Data Target desirable inference results
Inference Engine Replicated Data Inf. Violation Prob. Coef. Pointers Public+Local Database Local Classified Database Violation Pointers Prob. Coef. Correlated Data Inf. Inf. Struct Ontology
Replicated Data Inference Identifies replicated information under different security classifications Violation Pointer = similar units of data at different security levels Inference is guided by inference structures built on ontology concept hierarchy Andrei Stoica, Csilla Farkas. “Ontology guided XML Security Engine”, In Journal of Intelligent Information Systems, to appear.
Replicated Data Inference Inf. Tree Ontology Public Data file Classified Data file A Patriot Freq. N0 M1 B B C , M2 M4 M3 N1 N2 D E PAC-2 Freq. PAC-3 Freq. PAC-2 Freq. PAC-3 Freq. M7 M7 N5 N5 N6 N7 Scientific data on radar components Missiles Tracking Systems Confidence Level (M7,N5) = ƒ (,,,)
Correlated Data Inference Identifies sensitive data in the public domain (relative to a given classified database – usually the local database). Inference guidance: Ontology concept hierarchy Structural similarity of public data Csilla Farkas, Andrei Stoica. “Correlated Data Inference, Ontology Guided XML Security Engine”, In Proc of IFIP 2003.
Correlated Data Inference Features of similarity: Levels of abstraction for each node Distance of associated nodes from association root Similarity of the distances Length of the distance Similarity of sub-trees originating from correlated nodes
Correlated Data Inference Association similarity: Distance of each node from the association root Difference of the distance of the nodes from the association root Similarity of the sub-trees originating at nodes Air show address fort address fort
Correlated Data Inference Object[]. waterSource :: Object basin :: waterSource place :: Object district :: place address :: place base :: Object fort :: base fort address basin district ? base Water source
Correlated Data Inference Object[]. waterSource :: Object basin :: waterSource place :: Object district :: place address :: place base :: Object fort :: base place base Public address fort Public district basin Water Source Water source base Confidential
Summary Secure XML Views provide semantic consistent query reply and cover stories. Oxegin architecture and methods detect undesired inferences Structural similarity Semantic concept hierarchy Confidence in derived inferences
Next Class Stream data