1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護.

Slides:



Advertisements
Similar presentations
ADBIS 2007 Aggregating Multiple Instances in Relational Database Using Semi-Supervised Genetic Algorithm-based Clustering Technique Rayner Alfred Dimitar.
Advertisements

Crew Pairing Optimization with Genetic Algorithms
Ali Husseinzadeh Kashan Spring 2010
Cluster Analysis: Basic Concepts and Algorithms
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Hierarchical Clustering, DBSCAN The EM Algorithm
CS6800 Advanced Theory of Computation
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
1 Transportation problem The transportation problem seeks the determination of a minimum cost transportation plan for a single commodity from a number.
On the Genetic Evolution of a Perfect Tic-Tac-Toe Strategy
Introduction to Bioinformatics
Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Minimum Spanning Tree Partitioning Algorithm for Microaggregation
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
2015/6/201 Minimum Spanning Tree Partitioning Algorithm for Microaggregation 報告者:林惠珍.
Iterative Improvement Algorithms
Intro to AI Genetic Algorithm Ruth Bergman Fall 2002.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2004.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Genetic Algorithm.
Computer Implementation of Genetic Algorithm
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
Improved Gene Expression Programming to Solve the Inverse Problem for Ordinary Differential Equations Kangshun Li Professor, Ph.D Professor, Ph.D College.
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
Zorica Stanimirović Faculty of Mathematics, University of Belgrade
An Iterative Heuristic for State Justification in Sequential Automatic Test Pattern Generation Aiman H. El-MalehSadiq M. SaitSyed Z. Shazli Department.
Chih-Ming Chen, Student Member, IEEE, Ying-ping Chen, Member, IEEE, Tzu-Ching Shen, and John K. Zao, Senior Member, IEEE Evolutionary Computation (CEC),
Optimization Problems - Optimization: In the real world, there are many problems (e.g. Traveling Salesman Problem, Playing Chess ) that have numerous possible.
Optimal resource assignment to maximize multistate network reliability for a computer network Yi-Kuei Lin, Cheng-Ta Yeh Advisor : Professor Frank Y. S.
Design of an Evolutionary Algorithm M&F, ch. 7 why I like this textbook and what I don’t like about it!
Applying Genetic Algorithm to the Knapsack Problem Qi Su ECE 539 Spring 2001 Course Project.
Genetic Algorithms Introduction Advanced. Simple Genetic Algorithms: Introduction What is it? In a Nutshell References The Pseudo Code Illustrations Applications.
GENETIC ALGORITHM A biologically inspired model of intelligence and the principles of biological evolution are applied to find solutions to difficult problems.
Derivative Free Optimization G.Anuradha. Contents Genetic Algorithm Simulated Annealing Random search method Downhill simplex method.
Doshisha Univ., Kyoto, Japan CEC2003 Adaptive Temperature Schedule Determined by Genetic Algorithm for Parallel Simulated Annealing Doshisha University,
A genetic approach to the automatic clustering problem Author : Lin Yu Tseng Shiueng Bien Yang Graduate : Chien-Ming Hsiao.
A New Evolutionary Approach for the Optimal Communication Spanning Tree Problem Sang-Moon Soak Speaker: 洪嘉涓、陳麗徽、李振宇、黃怡靜.
DYNAMIC FACILITY LAYOUT : GENETIC ALGORITHM BASED MODEL
Capacity Enhancement with Relay Station Placement in Wireless Cooperative Networks Bin Lin1, Mehri Mehrjoo, Pin-Han Ho, Liang-Liang Xie and Xuemin (Sherman)
New Measures of Data Utility Mi-Ja Woo National Institute of Statistical Sciences.
Spectral Sequencing Based on Graph Distance Rong Liu, Hao Zhang, Oliver van Kaick {lrong, haoz, cs.sfu.ca {lrong, haoz, cs.sfu.ca.
1 Effect of Spatial Locality on An Evolutionary Algorithm for Multimodal Optimization EvoNum 2010 Ka-Chun Wong, Kwong-Sak Leung, and Man-Hon Wong Department.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
GENETIC ALGORITHM Basic Algorithm begin set time t = 0;
Rate Distortion Theory. Introduction The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a.
Application of the GA-PSO with the Fuzzy controller to the robot soccer Department of Electrical Engineering, Southern Taiwan University, Tainan, R.O.C.
Agenda  INTRODUCTION  GENETIC ALGORITHMS  GENETIC ALGORITHMS FOR EXPLORING QUERY SPACE  SYSTEM ARCHITECTURE  THE EFFECT OF DIFFERENT MUTATION RATES.
Genetic algorithms for task scheduling problem J. Parallel Distrib. Comput. (2010) Fatma A. Omara, Mona M. Arafa 2016/3/111 Shang-Chi Wu.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
On the Ability of Graph Coloring Heuristics to Find Substructures in Social Networks David Chalupa By, Tejaswini Nallagatla.
Extracting Minimum Unsatisfiable Cores with a Greedy Genetic Algorithm Jianmin Zhang, Sikun Li, and Shengyu Shen School of Computer Science, National University.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Intelligent Exploration for Genetic Algorithms Using Self-Organizing.
Introduction to genetic algorithm
Evolutionary Technique for Combinatorial Reverse Auctions
Discrete ABC Based on Similarity for GCP
Bulgarian Academy of Sciences
Balancing of Parallel Two-Sided Assembly Lines via a GA based Approach
Comparing Genetic Algorithm and Guided Local Search Methods
Multi-Objective Optimization
Aiman H. El-Maleh Sadiq M. Sait Syed Z. Shazli
EE368 Soft Computing Genetic Algorithms.
Density-Based Image Vector Quantization Using a Genetic Algorithm
Pasi Fränti and Sami Sieranoja
Presentation transcript:

1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護

2 Outline Introduction Basics of Micro-Aggregation and Methods Privacy Protection Through Genetic-Algorithm- Based Micro-aggregation A Hybrid Approach Experimental Results Conclusions and Future Challenges

3 Outline Introduction Basics of Micro-Aggregation and Methods Privacy Protection Through Genetic-Algorithm- Based Micro-aggregation A Hybrid Approach Experimental Results Conclusions and Future Challenges

4 Privacy!! Privacy V.S. Data utility Data collection Statistics Data aggregation Releasing Respondent Safe

5 Contribution Micro-aggregation for distorting data and guaranteeing respondents privacy. Optimal micro-aggregation is NP-hard, so the author uses GA and some modification to solve the problem. A hybrid method for solving above problem.

6 Outline Introduction Basics of Micro-Aggregation and Methods Privacy Protection Through Genetic-Algorithm- Based Micro-aggregation A Hybrid Approach Experimental Results Conclusions and Future Challenges

7 SDC (Statistical Disclosure Control) (Statistical Disclosure Limitation , SDL) Data Transform Public Data utility Statistical confidentiality Respondent Enough protection & Minimize information loss Method Micro-aggregation Micro-data 個人資料 Clustering problem Cluster size!

8 Two goals for micro-aggregation Preserving data utility. Protecting the privacy of the respondents.

9 Preserving data utility As less noise as possible into data So, we should aggregate similar elements instead of different ones.

10 Protecting the privacy of the respondents Data have to be sufficiently modified to make re-identification difficult. Increasing the number of aggregated elements can increase data privacy.

11 Whether two elements are similar Similarity function ex : Euclidean Distance Univariate Data set Element numbers in Duni The i-th element in Duni Average element Multivariate Data set Dimension numbers of each element The j-th component of the average element The j-th component of the i-th element in Dmulti Multiple subsets Subset number Element numbers in the i-th subset The j-th element in the i-th subset The average element of the i-th subset

12 Micro-aggregation problem (k-micro-aggregation problem) SSEk A security parameter. Determines the minimum cardinality of the subsets. Data set D (n elements) To obtain a k-partition Homogeneity of is maximized A k-partition of D is a partition where its parts have, at least, k elements of D. ex: k= Average element = (SSE 的值要小 ) NP-hard for multivariate data sets Use heuristic methods!! Definition

13 Multivariate Micro-Aggregation Methods Minimum Spanning Tree Partitioning (MSTP) Maximum Distance Method (MD) Maximum Distance to Average Vector Method (MDAV) Variable-MDAV

14 Minimum Spanning Tree Partitioning (MSTP) Step : 1. MST construction 2. Edge cutting 3. Cluster generation Limitation : In its foundation, MST. Fail to properly adapt to the scattered data points.

15 Maximum Distance Method (MD) The main advantage is its simplicity and it achieves very good results in most data sets. r s Most distant (by Euclidean Distance) Form a group with r(s) and the closet k-1 elements. Check the remaining element numbers. 1.num>=2k repeat 2.k<=num<=2k-1 a new group 3.num<=k-1 assign each element to the closet group Micro-aggregated data : Replacing each record by the centroid of the group to which it belongs. Shortcoming : computational complexity

16 Maximum Distance to Average Vector Method (MDAV) MDAV improves on MD in terms of computational complexity while maintaining the performance in terms of SSE. MDAV is the most popular method used for micro-aggregating data sets.

17 MDAV Algorithm Build two groups at each iteration. When (RR =k a new group

18 MDAV Process Centroid c Distance Matrix Most distant s r Distance Matrix Micro-aggregated data : Replacing each record by the centroid of the group to which it belongs. Shortcoming : Lack of flexibility It only generates subsets of fixed cardinality k.

19 Variable-MDAV V-MDAV intends to overcome the limitation by computing a variable-size k-partition with a computational cost similar to the MDAV cost.

20 V-MDAV Process Distance Matrix Centroid c Check the remaining element numbers. 1. RR>=k form groups 2. RR<=k-1 assign each element to the closet group Distance Matrix Most distant e Closet Distance : d_in e_min Closet Distance : d_out If (d_in < γ*d_out) assign e_min to the current group MDAV is the most popular one, so authors use it as a reference for comparison. extend the group ( up to k-1 )

21 Outline Introduction Basics of Micro-Aggregation and Methods Privacy Protection Through Genetic- Algorithm-Based Micro-aggregation A Hybrid Approach Experimental Results Conclusions and Future Challenges

22 Coding sequence Initializing the population The fitness function Selection scheme and genetic operators (crossover & mutation)

23 Coding Sequence Binary codings : N-ary codings : Real-valued codings : … ….BADFEACCBF

24 Univariate V.S. Multivariate Univariate micro-aggregation : binary codings  Data set :  Sorted data set :  Binary codings may be :  But, there is no way of sorting multivariate records without giving a higher priority to one of the attributes

25 Univariate V.S. Multivariate (cont.) Multivariate micro-aggregation : N-ary codings  Maximum number of groups  Each symbol represents one group of the k-partition.  Chromosome length : the number of records in the data set  The i-th gene value →the group of the k-partition which the i-th record in the data set belongs to

26 Example n = 11 k = 3 G = 11/3 = 3 3-character alphabet : A 、 B 、 C Chromosome length : 11 ABCAABBCCAA 3-partition : group A = {1,2,3,10,11} group B = {4,5,6} group C = {7,8,9}

27 Initializing the Population Generally using random method n records and G different alphabet symbols : But, only a small fraction meets the cardinality constraints. “In an optimal k-partition, each group has between k and 2k-1 records.” (Domingo & Mateo) Minimum number of groups possible chromosomes

28 Initializing the Population (cont.) Random initialization is not suitable to obtain candidate optimal k-partitions. So, the cardinality constraints must be embedded in the initialization procedure. →Algorithm 2 Guarantee that each group( part) has at most 2k-1 elements.

29 The Fitness Function Obtain a measure of the homogeneity of the groups in the k-partition represented by a given chromosome through SSE. The goal is to minimize SSE. Thus, the fitness value of a chromosome is s : group 的總數 ni :第 i 個 group 的 record 數目 Penalize the chromosome which includes a non-optimal k-partition.

30 Selection Scheme and Genetic Operators Selection scheme : roulette-wheel selection Genetic operators : one-point crossover and mutation

31 Outline Introduction Basics of Micro-Aggregation and Methods Privacy Protection Through Genetic-Algorithm- Based Micro-aggregation A Hybrid Approach Experimental Results Conclusions and Future Challenges

32 A Hybrid Approach GAMDAV Good SSE Adapting to very large data sets Low performance to very large data sets Worse than GA in terms of SSE Hybrid approach 1. Good SSE 2. Adapting to very large data sets Name : Two-step partitioning

33 Two-step partitioning k→ small value K→ larger than k and K% k = 0 ; small enough to be suitable for GA Ex : k=3 ; K=21 Use MDAV to build 3-partition Use MDAV to build macro-groups (sets of average vectors) of size K/k (21/3=7) K-partition Replace the vectors by the k original records Finally, apply the GA to each macro-group in the K-partition in order to generate an optimal or near optimal k-partition of the macro-group.

34 One-step MDAV V.S. Two-step MDAV Better

35 Outline Introduction Basics of Micro-Aggregation and Methods Privacy Protection Through Genetic-Algorithm- Based Micro-aggregation A Hybrid Approach Experimental Results Conclusions and Future Challenges

36 Experiment Approaches : GA-based micro-aggregation Hybrid micro-aggregation Comparison with MDAV and ES (exhaustive search). ES is only possible with tiny data sets of up to 11 elements. Data sets : 1. The example data set (Table 1) 2. Small data sets 3. Real and large data sets Each experiment consists of 12,100 runs of GA. Mutation rate : 0 、 0.1 、 0.2 、 0.3 、 0.4 、 0.5 、 0.6 、 0.7 、 0.8 、 0.9 、 1→11 種 Crossover rate : 0 、 0.1 、 0.2 、 0.3 、 0.4 、 0.5 、 0.6 、 0.7 、 0.8 、 0.9 、 1→11 種 Population size : 10 、 20 、 30 、 40 、 50 、 60 、 70 、 80 、 90 、 100→10 種 GA was run 10 times for each parameter setting.

37 Results for the Running Example GA running time depends on the number of generations. Most of the tests converge in less than 5,000 iterations. Although MDAV is faster, the SSE obtained with the GA is better. (90% →14.82)

38 Results in Small Data Sets Mutation rate should be low. Ex : 0.1 GA-based approach cannot deal with large data sets. Same!!

39 Results in Real and Large Data Sets Use the hybrid technique x x x 11 Better

40 Outline Introduction Basics of Micro-Aggregation and Methods Privacy Protection Through Genetic-Algorithm- Based Micro-aggregation A Hybrid Approach Experimental Results Conclusions and Future Challenges

41 Conclusions and Future Challenges The reported experimental results demonstrate the usefulness of the proposed methods and open the door to an invigorating research line. Lots of questions remain open :  Look for better codings.  Test the efficiency of other selection algorithms.  Evaluate the importance of genetic operators such as multiple-point crossover or inversion.