QUANTIFYING INFORMATION LOSS AFTER REDACTING DATA PROVENANCE TEAM: AVINI SOGANI VAISHNAVI SUNKU VENUGOPAL BOPPA
INTERNET OF THINGS
SEMANTIC WEB AND PROVENANCE Meaning behind anything you say Semantic web is the platform that provides secure sharing of heterogeneous data on the web. Provenance of data can be traced down to the origin of the data or can be simply an immediate source. Provides assessment of authenticity, enables trust, and provides assurance for data quality and thereby allows reproducibility of that resource.
REDACTION Imposing restrictions to data access by users Types – DAC, MAC, RBAC Process of removing or hiding sensitive data Protect sensitive information from unauthorized users
RELATED WORK
PRIVACY CONTROL ACTS HIPAA – Health Insurance Portability and Accountability Act Regulates EMR/EPR PHI – Protected Health Information PII – Personally Identifiable Information HITECH Act – Health Information Technology for Economic and Clinical Health Minimum necessary for the stated purpose
W3C RECOMMENDATIONS A.C. model applications File systems Database Provenance? Data Models: RDF (Triples, subject, predicate, object) OPM Querying: OPQL (From(e), to, from -1 (n), to -1, prev(n), next) SPARQL (Regular expressions)
REDACTION POLICIES Medical Scenario
REDACTION ON DATA PROVENANCE Why med: Doc1_2?
REDACTION BY GRAPH GRAMMAR AND R.E.
ARCHITECTURE
LIMITATIONS
No Quantification of the information lost by the process of redaction The availability of redacted information available from different source (internet, knowledge of the context..)
OUR PROPOSALS
INFORMATION LOSS Relevance of the data to the user Vectorial model formula for calculating the relevance Terms: True relevant data Retrieved data Relevant data F Measure (precision and recall) NMI (Normalized Mutual Information)
INFORMATION LOSS
CONCLUSION
REFERENCES: Query Language Constructs for Provenance, Murali Mani, Mohamad Alawa, Arunlal Kalyanasundaram Tyrone Cadenhead, Vaibhav Khadilkar, Murat Kantarcioglu, and Bhavani Thuraisingham Transforming provenance using redaction. In Proceedings of the 16th ACM symposium on Access control models and technologies (SACMAT '11). ACM, New York, NY, USA, Tyrone Cadenhead, Vaibhav Khadilkar, Murat Kantarcioglu and Bhavani Thuraisingham, A Language for Provenance Access Control Nettleton, David F., and Daniel Abril. "An Information Retrieval Approach to Document Sanitization." Advanced Research in Data Privacy. Springer International Publishing, Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval: The Concepts and Technology Behind Search, 2nd edn. ACM Press Books, England (2011)
THANK YOU..