Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data Security and Privacy

Similar presentations


Presentation on theme: "Big Data Security and Privacy"— Presentation transcript:

1 Big Data Security and Privacy
FUNG Kin Shing, Kenneth ZHANG Yinglu LI Danmin

2 What is Big Data Four characteristics
Volume – data sizes (From terabytes to zettabytes) Variety – data in different formats and structure Velocity(速度) – time required to act on the data is very small Huge number of data sources – integration and cross-correlation among data sets from different source

3 What big data for Technological advances and novel applications are making possible to capture, process, and share huge amounts of data  referred to as big data To extract useful knowledge such as patterns, from these data and predict trends and events Managed well, the data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to accounts (The Economist, 2010)

4 CIA Theory Confidentiality: protection of data against unauthorized disclosure Integrity: prevention of unauthorized and improper data modification Availability: prevention and recovery from hardware and software errors and from malicious data access denials making the database system unavailable.

5 Data Confidentiality CHALLENGERS
Several data confidentiality techniques exist – the most notable being access control and encryption. 1. Merging large numbers of access control policies Multi-system = Multi sets of data = Multi sets of access control policy Integration of system = Integration of database = Enforced access control policy 2. Automatically administration authorizations for big data and in particular for granting permissions Authorizations can be automatically granted, possibly based on the user profile  machine learning CHALLENGERS

6 Data Confidentiality (con’t)
3. Enforcing access control policies in big data stores Current solutions for querying big data sets rely on the use of scripts and jobs written in programming languages The challenge is how to embed fine-grained access control policies into jobs and scripts extend such approaches to support more complex access control policies To investigate encryption-based approaches for enforcing access control policies in stores

7 Data Trustworthiness One major application for big data  DECISION MAKING Therefore data must be TRUSTWORTHY Not only free of errors, also protected the data from malicious parties aiming at deceiving the data users Currently there is no comprehensive solution to the problem of high assurance data trustworthiness several relevant techniques have been proposed in different areas

8 Data Trustworthiness (con’t)
1. User support for data use based on trustworthy assessment As data have ultimately to be used by some human users, it is critical that users be provided with some indicators about the trustworthiness level of the data they receive For example, the trust score a data item value provided by a given data source is a function of two factors: trustworthiness level = the reputation of the source + the difference of the value with respect to values reported by other sources

9 Data Trustworthiness (con’t)
2. Data Correlation techniques interconnected big data often form large heterogeneous information networks with information redundancy Such redundancy represents an important opportunity to crosscheck conflicting data values and to correlate data 3. High assurance and efficient provenance Data provenance is often a critical factor for assessing data trustworthiness Provenance information be protected from tampering when flowing across various parties in a system extended for use in dynamic mobile environments

10 Data Trustworthiness (con’t)
4. Source correlation techniques relationships among data sources be also taken into account suppose that we observe that the same data value is provided by three different sources In general this may lead one to conclude that the data value is trustworthy. However if these three sources have a very strong relationship it may not be realistic to assume that the data value is provided by three independent sources Wiki Three independent source? Kenneth Journal A Journal C

11 Privacy Risks • Exchange and integration of data across multiple sources – Data becomes available to multiple parties – Re-identification of anonymized data becomes easier • Security tasks such as authentication and access control may require detailed information about users – For example, location-based access control requires information about user location and may lead to collecting data about user mobility – Continuous authentication requires collecting information such as typing speed, browsing habits, mouse movements

12 Privacy Risks(con’t) • The various social networking sites varying degrees to open their users’ real-time data , which was collected not only by a number of data providers, but also a number of monitoring data analysis agencies.

13 Privacy Enhancing Techniques
Privacy-preserving data matching protocols based on hybrid strategies. (by M. Kantarcioglu et al.) Open issues: – Scalability –  Support for complex matching, such as semantic matching –  Definition of new security models

14 Privacy Enhancing Techniques(con’t)
Privacy-preserving collaborative data mining (earlier work by C. Clifton, M. Kantarcioglu et al.) open issues: – Scalability – Data mining on the cloud Privacy-preserving biometric authentication (by E. Bertino et al.) –  Reducing false rejection rates –  Using homomorphic techniques

15 Privacy Enhancing Techniques(con’t)
Privacy-preserving data management on the cloud – CryptDB (by CSAIL) – DBMask (by E. Bertino et al.) Issues: – Weak security (CryptDB) – Weak protection (or lack or protection) of access patterns

16 Privacy-enhancing challenging
1. Efficiency: Challenge - Unable to scale to large data sets Solutions: - Develop efficient cryptographic building blocks More work needs to be done on: - engineering protocols and system - parallel processing techniques for cryptographic protocols - metrics to assess efficiency - data privacy and utility in the use of the different building blocks - support mission-oriented tradeoff among efficiency, data privacy, and data utility

17 Privacy-enhancing challenging(con’t)
2. Security with privacy Challenge: - Can security and privacy can be reconciled? if we want to achieve security, we must give up privacy; if we are keen on assuring privacy, we may undermine security. Solutions: - Recent advances in applied cryptography are making possible to work on encrypted data – for example for performing analytics on encrypted data. More work needs to be done: - For that data privacy techniques heavily depend on the specific use of data and the security tasks at hand.

18 Privacy-enhancing challenging(con’t)
3. Data ownership Challenge: - Problem of who is the owner of a data item The owner of a data item can be defined to be the user, whose information is recorded in the data item, The owner can be defined to the party that created the data item by collecting information from the user. Solutions: - Replace the concept of data owner with the concept of stakeholder. More work needs to be done: - Technology, organizational, and legal solutions need to be investigated to manage conflicts.

19 Privacy-enhancing challenging(con’t)
4. Privacy-aware Data lifecycle framework A comprehensive approach to privacy for big data needs to be based on a systematic data lifecycle approach. Data acquisition - Need mechanisms and tools To prevent devices from acquiring data about other individuals which is relevant when devices, like mobile phones, are used. Data sharing - Users need to be informed about data sharing/transferred to other parties. - It is thus critical to devise legal guidelines on which technical mechanisms can be based.

20 Conclusion Focus on research challenges specific to confidentiality, trustworthiness, and privacy in big data. Suggest some possible solutions to the challenges. Still require multidisciplinary research drawing from many different areas, including computer science and engineering, information systems, statistics, risk models, economics, social sciences, political sciences, human factors, psychology. We believe that all these perspectives are needed to achieve effective solutions to the problem of privacy and security in the era of big data and especially to the problem of reconciling security and privacy.

21 Thank you!


Download ppt "Big Data Security and Privacy"

Similar presentations


Ads by Google