Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland.

Slides:



Advertisements
Similar presentations
Re-use of PSI Data Protection Issues Cécile de Terwangne Professor at the Law Faculty, Research Director at CRIDS University of Namur (Belgium) 2 nd LAPSI.
Advertisements

PRIVACY ASPECTS OF RE-USE OF PSI: BETWEEN PRIVATE AND PUBLIC SECTOR
Licences for Europe Sarah Davis – Guardian Media Group plc.
DATA PROTECTION and Research University Research Ethics Committee – David Cauchi David Cauchi Office of the Commissioner for Data Protection.
Advanced topics in touchdevelop privacy transparent privacy control via information flow analysis Disclaimer: This document is provided “as-is”. Information.
I2NSF Use Cases in Access Networks Diego Lopez Telefónica I+D IETF91, Honolulu, 9-14 Nov.
Identity Management Based on P3P Authors: Oliver Berthold and Marit Kohntopp P3P = Platform for Privacy Preferences Project.
DATA PROTECTION and Research University Research Ethics Committee – David Cauchi Office of the Data Protection Commissioner.
EFQUEL 2008 Forum, June 2008, Lisbon 1 Learning2.0 Kirsti Ala-Mutka, Christine Redecker & Yves Punie European Commission, JRC Institute for Prospective.
A Virtual Organisation for e-Learning Nicola Capuano, Pierre Carrolaggi, Jerome Combaz, Fabio Crestani, Matteo Gaeta, Erich Herber, Enver Sangineto, Krassen.
For Isle of Wight County Middle and High School Teachers for Isle of Wight County Middle and High School Teachers.
Using Social Semantic Web Data for Privacy Policies Presentation of the Bachelor Thesis Emily Kigel.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
EU: Bilateral Agreements of Member States. Formerly concluded international agreements of Member States with third countries Article 351 TFEU The rights.
Web 2.0 Web 2.0 is the term given to describe a second generation of the World Wide Web (WWW) that is focused on the ability for people to collaborate.
FI-WARE – Future Internet Core Platform FI-WARE Security July 2011 High-level Description.
Chapter 12 File Management Systems
Text Privacy and Data Protection in Sweden Christine Kirchberger.
What if my organization conducts business across borders ? Your footnote Privacy and “Personal Information” have different meanings in different countries;
Social Media and Recordkeeping Allegra Huxtable Manager Government Recordkeeping Tasmanian Archive and Heritage Office.
INTERNET and CODE OF CONDUCT
Using a Content Management System Website for the Dissemination of Official Statistics By Edwin St Catherine, Director of Statistics, SAINT LUCIA UN Regional.
1. 2 ECRF survey - Electronic signature Mr Yves Gonner Luxembourg, June 12, 2009.
Class 13 Internet Privacy Law European Privacy.
Attorney at the Bars of Paris and Brussels Database exploitation & Data protection Thibault Verbiest Amsterdam 1 April 2005
Automated Tracking of Online Service Policies J. Trent Adams 1 Kevin Bauer 2 Asa Hardcastle 3 Dirk Grunwald 2 Douglas Sicker 2 1 The Internet Society 2.
Lawyer at the Brussels Bar Lecturer at the University of Strasbourg Assistant at the University of Brussels Data Protection & Electronic Communications.
Online Information Services and Social Media Social Networking.
Dealing with confidential research information and consent agreements in research Louise Corti Associate Director UK Data Archive University of Glamorgan.
1 When hate speech tangles privacy... When hate speech tangles privacy...
Info Day on New Calls and Partner Café Brussels, 10 February 2011 How to apply: Legal Framework – Beneficiaries – Application and Selection Procedure.
1 Privacy issues on pan-European White Pages service 4rd TF-LSD Meeting Amsterdam, Peter Gietz
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
Understanding the External Links of Video Sharing Sites: Measurement and Analysis.
WEB 2.0 and SOCIAL NETWORKING Mike Wood Executive Director Media Resources Center.
Computer Science 101 Database Concepts. Database Collection of related data Models real world “universe” Reflects changes Specific purposes and audience.
Protecting Sensitive Labels in Social Network Data Anonymization.
FI-CORE Data Context Media Management Chapter Release 4.1 & Sprint Review.
Ethical Issues Lecture 14 th. Summary: Understanding Sampling Choice of sampling techniques depends upon the research question(s) and their objectives.
A bad case of content reuse Validator Website to Validate License Violations Validator – Only requires the URI of the site to check This work by Oshani.
A bad case of content reuse Validator Website to Validate License Violations Validator – Only requires the URI of the site to check for a license violation.
Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Facilitating Document Annotation using Content and Querying Value.
What’s MPEG-21 ? (a short summary of available papers by OCCAMM)
Archimer Ifremer’s institutional repository Fred Merceur IAMSLIC's 32nd annual conference Every Continent, Every Ocean October 8-12, 2006 Portland, Oregon,
EHR stakeholder workshop – 11th October EHR integration for clinical research: Legal & Privacy issues Mats Sundgren – AstraZeneca Petra Wilson -
Data Protection Principles as Basic Foundation for Data Protection in EU/EEA Introduction to Data Protection Theory Seminar - AFIN Stephen.
Data Protection Principles as Basic Foundation for Data Protection in EU/EEA Introduction to Data Protection Theory Seminar - AFIN Stephen.
+ Welcome to PAHO/WHO Sustainable Development and Health Toolkit for the UN Global Conference RIO + 20 Welcome to PAHO/WHO Sustainable Development and.
E-C OMMERCE : T HE E -C ONSUMER AND THE ATTACKS AGAINST THE PERSONAL DATA Nomikou Eirini Attorney at Law, Piraeus Bar Association Master Degree in Web.
František Nonnemann Skopje, 10th October 2012 JHA Data protection and re-use of PSI as a tool for public control–CZ approach.
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
Understanding Privacy An Overview of our Responsibilities.
Students’ Unions 2011 Data Protection and Students’ Unions Mairead O’Reilly 19 July 2011.
Efficient and secure transborder exchange of patient data
Building Trustworthy Semantic Webs
Issues of personal data protection in scientific research
STRESS TESTS and TAIWAN PEER REVIEW PROCESS
A Network Science Approach to Fake News Detection on Social Media
Business environment in the EU Prepared by Dr. Endre Domonkos (PhD)
Axel Polleres Technical aspects vs. Innovation challenges of Enabling and Enhancing Privacy Axel Polleres
Internet Privacy and You
The Ethics of Social Media
The activity of Art. 29. Working Party György Halmos
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
The EDPS: competences and processing of personal data in EU funds
PRESENTATION OF MONTENEGRO
Knowledge Sharing Mechanism in Social Networking for Learning
Presentation transcript:

Enforcing Policies on Social Media Data Extracted from the Web Nicoletta Fornara and Truc-Vien T. Nguyen Università della Svizzera italiana Lugano, Switzerland

Summary Web/Internet data collection is becoming increasingly important for many social science fields Being able to formalize and enforce policies for regulating the collection and the use of those data is crucial, especially taking into account privacy and confidentiality wishes of who provided the data Even if such policies are not all enforced by data publishers their fulfilment is crucial to follow an ethics in Internet Research We present the SemPolicy Manager Tool, which is able to enforce a given set of policies by taking into account the meaning of the collected data

Web/Internet data collection technologies Internet data collection by means of –web service interfaces: a software designed to support Machine- to-Machine interaction over a network, or –system specific APIs (Application Program Interface) a specific interface for accessing the data of a data provider Web data collection by means of web crawlers: a software which is able to ssystematically browse the World Wide Web, building a local repository of the portion of the Web that it visits, very often the purpose is Web indexing Examples used in the paper: –Facebook RestFB a Facebook Graph API written in JavaRestFB –Twitter REST API an interface for programmatic access to read and write Twitter dataREST API

Type of Policies Ethical guidelines proposed by various associations for social research (e.g. American Association for Public Opinion Research at point I.A.5American Association for Public Opinion Research Legal constrains on the processing of personal data (e.g. the European Union Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data.European Union Directive 95/46/EC –This directive states the necessity of anonymization at point (26), define the notion of personal data and processing of personal data in Article 2, and constraint personal data processing in Article 8 Web site policies/terms on how the data available on a web site can be used for automatic data collection (e.g. Facebook Automated Data Collection Terms and robot.txt files)Facebook Automated Data Collection Termsrobot.txt

The SemPolicy Manager Tool Innovative technologies used for realizing the tool: 1.Semantic Web technologies for expressing the meaning of the data 2.Declarative norms formalization and enforcement for expressing policies 3.Natual Language Processing Techniques used to enrich the collected data with new semantic information contained in unstructured text

Architecture of the SemPolicy Manager Tool

Using the SemPolicy Manager Tool (1) We evaluated the tool on a specific use case: the collection of social network data from Facebook and Twitter, and the enforcement on those data of certain articles of the EU Directive 95/46/EC, stating the necessity of anonymization of personal data and of data revealing confidential information on people (point 26, Article 2 and 8). The enforced policies are: Policy 1. It is obligatory to make anonymous all personal data relating to an identified or identifiable natural person in order to store, retrieve, and use them. Those properties include: username, user ID, first name, last name, full name, web site. Policy 2. It is obligatory to anonymize or remove a text if it reveals racial or ethnic origin, political opinions, religious or philosophical beliefs.

Policy 1 and 2 -> 3 Obligations From Policy 1 and 2 we formalized the following three obligations having an activation condition and an action to be performed: Policy 1-Obligation 1: it is activated when in the SN Ontology there is a user personal data which is not popular. The obliged action consists in retrieving all user's personal information and then anonymize them. Policy 1-Obligation 2: it is activated when in the SN Ontology there is a message (the content of a post or of a comment or of a twit) and it contains personal information. The obliged action consists in anonymizing all personal information that appear in the content of posts/comments/twits. Policy 2-Obligation 1: it is activated when in the semantically enriched collected data there is a statement (post or comment or twit) whose content is related to a sensitive topic. The obliged action consists in removing sensitive topics in the content of posts or comments, or twitts.

Using the SemPolicy Manager Tool (2) The Semantic Analysis Component needs to identify in the collected data (post, comments, and twits) 1. personal data: first name, last name, full name (of people), web sites (popular names do not need to be anonymized) 2. sensitive data: data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs. The Enforcement Service is in charge of checking if the policies, stored in the Policy Ontology, are active (this depends on the semantic content of the collected data) and it is in charge of enforcing the active policies.

Evaluation of the SemPolicy Manager Tool The response time, for the enforcement of the three obligations* reaches a stable level at some point, this means that our application can be applied in reality. 1.The first obligation takes more time (with Facebook data it takes 50 minutes with 200 seed users, with Twitter data it takes 12 minutes with 500 users) than the other ones because there are many private attributes of facebook/twitter users, even more than the number of private data entries found within the messages. 2.The second obligation requires 5 minutes for 200 Facebook seed users and 12 minutes for 500 Twitter users. 3.The third obligation requires 0.20 minutes for 200 Facebook seed users and 0.28 minutes for 400 Twitter users. * using a PC with Intel(R) Core(TM) 2 Quad CPU 3.00Ghz and 4GB RAM

Conclusions Thanks to the use of Semantic Web Technologies for representing the collected data and the policies, it is possible to change the activation condition of the formalized policies without the need to reprogram the tool The tool can be used to enforce other policies but it may be necessary to program the software for the execution of the obliged action and/or extending the Semantic Analysis Component In our future work we plan to study how to improve the user interface of the SemPolicy Manager Tool

Thank you for your attention! Questions?