A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization Using MapReduce on Cloud.

Slides:



Advertisements
Similar presentations
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
Advertisements

CloudMoV: Cloud-based Mobile Social TV
On the Node Clone Detection inWireless Sensor Networks.
Optimizing Cloud Resources for Delivering IPTV Services Through Virtualization.
Toward a Statistical Framework for Source Anonymity in Sensor Networks.
Abstract Cloud data center management is a key problem due to the numerous and heterogeneous strategies that can be applied, ranging from the VM placement.
Abstract Load balancing in the cloud computing environment has an important impact on the performance. Good load balancing makes cloud computing more.
A Secure Protocol for Spontaneous Wireless Ad Hoc Networks Creation.
Personalized QoS-Aware Web Service Recommendation and Visualization.
Abstract Provable data possession (PDP) is a probabilistic proof technique for cloud service providers (CSPs) to prove the clients' data integrity without.
Secure Encounter-based Mobile Social Networks: Requirements, Designs, and Tradeoffs.
Cross-Domain Privacy-Preserving Cooperative Firewall Optimization.
A Survey of Mobile Cloud Computing Application Models
NICE :Network Intrusion Detection and Countermeasure Selection in Virtual Network Systems.
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
A Framework for Mining Signatures from Event Sequences and Its Applications in Healthcare Data.
Privacy-Preserving Public Auditing for Secure Cloud Storage
BestPeer++: A Peer-to-Peer Based Large-Scale Data Processing Platform.
Improving Network I/O Virtualization for Cloud Computing.
Privacy Preserving Data Sharing With Anonymous ID Assignment
Mobile Relay Configuration in Data-Intensive Wireless Sensor Networks.
m-Privacy for Collaborative Data Publishing
PACK: Prediction-Based Cloud Bandwidth and Cost Reduction System
EAACK—A Secure Intrusion-Detection System for MANETs
A Fast Clustering-Based Feature Subset Selection Algorithm for High- Dimensional Data.
Optimal Client-Server Assignment for Internet Distributed Systems.
Protecting Sensitive Labels in Social Network Data Anonymization.
Identity-Based Secure Distributed Data Storage Schemes.
Incentive Compatible Privacy-Preserving Data Analysis.
Enabling Dynamic Data and Indirect Mutual Trust for Cloud Computing Storage Systems.
Hiding in the Mobile Crowd: Location Privacy through Collaboration.
LARS*: An Efficient and Scalable Location-Aware Recommender System.
Cooperative Caching for Efficient Data Access in Disruption Tolerant Networks.
Anonymization of Centralized and Distributed Social Networks by Sequential Clustering.
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
Identity-Based Distributed Provable Data Possession in Multi-Cloud Storage.
Content Sharing over Smartphone-Based Delay- Tolerant Networks.
A System for Denial-of- Service Attack Detection Based on Multivariate Correlation Analysis.
Modeling the Pairwise Key Predistribution Scheme in the Presence of Unreliable Links.
Privacy Preserving Delegated Access Control in Public Clouds.
Scalable Distributed Service Integrity Attestation for Software-as-a-Service Clouds.
Anomaly Detection via Online Over-Sampling Principal Component Analysis.
A Method for Mining Infrequent Causal Associations and Its Application in Finding Adverse Drug Reaction Signal Pairs.
A Highly Scalable Key Pre- Distribution Scheme for Wireless Sensor Networks.
Abstract With the advent of cloud computing, data owners are motivated to outsource their complex data management systems from local sites to the commercial.
Facilitating Document Annotation using Content and Querying Value.
Traffic Pattern-Based Content Leakage Detection for Trusted Content Delivery Networks.
Privacy Preserving Back- Propagation Neural Network Learning Made Practical with Cloud Computing.
Two tales of privacy in online social networks. Abstract Privacy is one of the friction points that emerges when communications get mediated in Online.
Participatory Privacy: Enabling Privacy in Participatory Sensing
Preventing Private Information Inference Attacks on Social Networks.
Supporting Privacy Protection in Personalized Web Search.
Twitsper: Tweeting Privately. Abstract Although online social networks provide some form of privacy controls to protect a user's shared content from other.
m-Privacy for Collaborative Data Publishing
Attribute-Based Encryption With Verifiable Outsourced Decryption.
Multiparty Access Control for Online Social Networks : Model and Mechanisms.
A New Algorithm for Inferring User Search Goals with Feedback Sessions.
Data Mining with Big Data. Abstract Big Data concerns large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development.
Harnessing the Cloud for Securely Outsourcing Large- Scale Systems of Linear Equations.
Securing Broker-Less Publish/Subscribe Systems Using Identity-Based Encryption.
Security Analysis of a Privacy-Preserving Decentralized Key-Policy Attribute-Based Encryption Scheme.
Privacy-Enhanced Web Service Composition. Abstract Data as a Service (DaaS) builds on service-oriented technologies to enable fast access to data resources.
Privacy-Preserving and Content-Protecting Location Based Queries.
Mona: Secure Multi-Owner Data Sharing for Dynamic Groups in the Cloud.
Whole Test Suite Generation. Abstract Not all bugs lead to program crashes, and not always is there a formal specification to check the correctness of.
Distributed Processing of Probabilistic Top-k Queries in Wireless Sensor Networks.
Load Rebalancing for Distributed File Systems in Clouds.
Facilitating Document Annotation Using Content and Querying Value.
Fast Transmission to Remote Cooperative Groups: A New Key Management Paradigm.
Dynamic Query Forms for Database Queries. Abstract Modern scientific databases and web databases maintain large and heterogeneous data. These real-world.
Presentation transcript:

A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization Using MapReduce on Cloud

Abstract: A large number of cloud services require users to share private data like electronic health records for data analysis or mining, bringing privacy concerns. Anonymizing data sets via generalization to satisfy certain privacy requirements such as k- anonymity is a widely used category of privacy preserving techniques. At present, the scale of data in many cloud applications increases tremendously in accordance with the Big Data trend, thereby making it a challenge for commonly used software tools to capture, manage, and process such large-scale data within a tolerable elapsed time. As a result, it is a challenge for existing anonymization approaches to achieve privacy preservation on privacy-sensitive large-scale data sets due to their insufficiency of scalability. In this paper, we propose a scalable two-phase top-down specialization (TDS) approach to anonymize large-scale data sets using the MapReduce framework on cloud. In both phases of our approach, we deliberately design a group of innovative MapReduce jobs to concretely accomplish the specialization computation in a highly scalable way. Experimental evaluation results demonstrate that with our approach, the scalability and efficiency of TDS can be significantly improved over existing approaches.

Abstract con… As a result, it is a challenge for existing anonymization approaches to achieve privacy preservation on privacy-sensitive large-scale data sets due to their insufficiency of scalability. In this paper, we propose a scalable two-phase top-down specialization (TDS) approach to anonymize large-scale data sets using the MapReduce framework on cloud. In both phases of our approach, we deliberately design a group of innovative MapReduce jobs to concretely accomplish the specialization computation in a highly scalable way. Experimental evaluation results demonstrate that with our approach, the scalability and efficiency of TDS can be significantly improved over existing approaches.

Existing System Cloud computing, a disruptive trend at present, poses a significant impact on current IT industry and research communities [1], [2], [3]. Cloud computing provides massive computation power and storage capacity via utilizing a large number of commodity computers together, enabling users to deploy applications cost-effectively with¬out heavy infrastructure investment. Cloud users can reduce huge upfront investment of IT infrastructure, and concentrate on their own core business. However, numer¬ous potential customers are still hesitant to take advantage of cloud due to privacy and security concerns [4], [5]. The research on cloud privacy and security has come to the picture [6], [7], [8], [9].

Architecture Diagram:

System Specifications HARDWARE REQUIREMENTS Processor : intel Pentium IV Ram : 512 MB Hard Disk : 80 GB HDD SOFTWARE REQUIREMENTS Operating System : windows XP / Windows 7 FrontEnd : Java BackEnd : MySQL 5

THANK YOU