Data Mining and Its Applications to Image Processing 資料挖掘技術及其在影像處理之應用 指導教授: Chang, Chin-Chen (張真誠) 研究生: Lin, Chih-Yang (林智揚) Department of Computer Science and Information Engineering, National Chung Cheng University
The Fields of Data Mining Mining Association Rules Sequential Mining Clustering (Declustering) Classification ……………
Outline Part I: Design and Analysis Data Mining Algorithms Part II: Data Mining Applications to Image Processing
Part I: Design and Analysis Data Mining Algorithms 1. Perfect Hashing Schemes for Mining Association Rules (or for Mining Traversal Patterns)
Mining Association Rules Support Obtain Large Itemset Confidence Generate Association Rules
D C1 L1 Apriori Scan D C2 Sup=2 C2 L2 Scan D C3 C3 L3 Scan D TID Items 100 A C D 200 B C E 300 A B C E 400 B E Itemset Sup. {A} 2 {B} 3 {C} {D} 1 {E} Itemset Sup. {A} 2 {B} 3 {C} {E} Scan D C2 Sup=2 C2 L2 Itemset {A B} {A C} {A E} {B C} {B E} {C E} Itemset Sup. {A B} 1 {A C} 2 {A E} {B C} {B E} 3 {C E} Itemset Sup. {A C} 2 {B C} {B E} 3 {C E} Scan D 名詞解釋 因為求初始Large Itemset為整個演算法最花執行時間所在,而在initial的表現以Apriori的表現為最佳,所以以此與DHP做比較的基準 Minimum Support=2 C3因為{BC},{BE}有共同的第一項,所以測試{CE}是否也為Large Itemset,是,所以得到{BCE}為候選 C3 C3 L3 Scan D Itemset {B C E} Itemset Sup. {B C E} 2 Itemset Sup. {B C E} 2
Apriori Cont. Disadvantages Inefficient Produce much more useless candidates
DHP Prune useless candidates in advance Reduce database size at each iteration
D C1 Count {A} 2 {B} 3 {C} {D} 1 {E} L1 {A} {B} {C} {E} Min sup=2 TID Items 100 A C D 200 B C E 300 A B C E 400 B E Making a hash table 100 {A C} 200 {B C},{B E},{C E} 300 {A B},{A C},{A E},{B C},{B E},{C E} 400 {B E} H{[x y]}=((order of x )*10+(order of y)) mod 7; {A E} {B E} {C E} {B C} {A C} {A B} 3 2 1 4 5 6 Hash 方法的介紹,包括雜湊函數,方法等 在資料庫D完成1-subset support掃瞄後,2-item的雜湊表也同時完成,依照資料庫D用2-item做區分 照排序帶入雜湊函數,並丟入Hash table 計算每個bucket的數量 利用buckets count(大於s=2)可得到bit vector,再用其過濾L1*L1就可得到較小的C2 Hash table H2 Hash address Bit vector The number of items hashed to bucket 0
Perfect Hashing Schemes (PHS) for Mining Association Rules
Motivation Apriori and DHP produce Ci from Li-1 that may be the bottleneck Collisions in DHP Designing a perfect hashing function for every transaction databases is a thorny problem
Definition Definition. A Join operation is to join two different (k-1)-itemsets, , respectively, to produces a k-itemset, where = p1p2…pk-1 = q1q2…qk-1 and p2=q1, p3=q2,…,pk-2=qk-3, pk-1=qk-2. Example: ABC, BCD 3-itemsets of ABCD: ABC, ABD, ACD, BCD only one pair that satisfies the join definition
Algorithm PHS (Perfect Hashing and Data Shrinking)
(BC)(BD)(BE)(CD)(CE)(DE) L1 Itemset Sup. {B} 3 {C} {D} 2 {E} Example1 (sup=2) TID Items 100 ACD 200 BCE 300 BCDE 400 BE TID Items 100 (CD) 200 (BC) (BE)(CE) 300 (BC)(BD)(BE)(CD)(CE)(DE) 400 (BE) Itemsets (BC) (BD) (BE) (CD) (CE) (DE) Support 2 1 3 Encoding A B C D Original (BC) (BE) (CD) (CE)
Decode: AD -> (BC)(CE) = BCE Example2 (sup=2) TID Items 100 Null 200 (AD) 300 (AC)(AD) 400 Itemsets (AB) (AC) (AD) (BC) (BD) (CD) Support 1 2 Encoding A Original (AD) Decode: AD -> (BC)(CE) = BCE
Problem on Hash Table Consider a database contains p transactions, which are comprised of unique items and are of equal length N, and the minimum support of 1. At iteration k, the # of candidate k-itemsets is The # of buckets required in the next pass is= , where m = While the actual # of the next candidates is Loading density :
How to Improve the Loading Density Two level perfect hash scheme (parital hash) Itemsets (AB) (AC) (AD) (BC) (BD) (CD) Support 1 2 A B C Hash Table D Null Count 1 2
Experiments
Experiments
Experiments
Part II: Data Mining Applications to Image Processing 1. A Prediction Scheme for Image Vector Quantization based on Mining Association Rules 2. Reversible Steganography for VQ-compressed Images Using Clustering and Relocation 3. A Reversible Steganographic Method Using SMVQ Approach based on Declustering
A Prediction Scheme for Image Vector Quantization Based on Mining Association Rules
Vector Quantization (VQ) Image encoding and decoding techniques
SMVQ(cont.) Codebook State Codebook
Framework of the Proposed Method v/10 (Quantized)
If “X y' , there is no such rule X' y', Condition Horizontal, Vertical, Diagonal, Association Rules If “X y' , there is no such rule X' y', where X' X and y' = y.
The Prediction Strategy
Example Rules DB ? may be 5, 1, 8, or 10. How to decide? Query Result Matched set of rules Matched vertical rules Matched horizontal rules Matched diagonal rules (4, 2, 3, 3 5) confv = 90% (12, 12, 1, 3 5) confh = 90% (6, 4, 2, 2, 3 5) confd=100% (4, 2, 3 1) confv = 85% (12, 12 1) confh = 95% (6, 4, 2, 2 8) confd =70% X (6, 4, 2 10) confd = 75% ? may be 5, 1, 8, or 10. How to decide?
Example cont. The weight of 5: 4*90%+4*90%+5*100%= 12.2 Matched set of rules Matched vertical rules Matched horizontal rules Matched diagonal rules (4, 2, 3, 3 5) confv = 90% (12, 12, 1, 3 5) confh = 90% (6, 4, 2, 2, 3 5) confd=100% (4, 2, 3 1) confv = 85% (12, 12 1) confh = 95% (6, 4, 2, 2 8) confd =70% X (6, 4, 2 10) confd = 75% The weight of 5: 4*90%+4*90%+5*100%= 12.2 The weight of 1: 3*85%+2*95% = 4.45 The weight of 8: 4*70% = 2.8 The weight of 10: 3*75% = 2.25 {5, 1} is called the consequence list, which size is determined by the user
Experiments Reconstructed image by the proposed method Original Image Reconstructed image by full-search VQ
Experiments cont. The performance comparisons on various methods Performance Lena Pepper F16 Full-search VQ PSNR (dB) 32.25 31.41 31.58 Bit-rate (bpp) 0.5 SMVQ 28.57 28.04 27.94 0.33 0.32 Our Scheme 30.64 30.05 29.74 0.34
Experiments cont. Overfitting problem
Advantages Mining association rules can be applied to image prediction successfully Broader spatial correlation is considered than that of SMVQ More efficient than that of SMVQ since no Euclidean distances should be calculated
Reversible Steganography for VQ-compressed Images Using Clustering and Relocation
Flowchart of the Proposed Method X
Construction of the Hit Map 13 1 13 7 13 4 6 7 1 1 4 4 2 7 3 11 . . . Sorted codebook Hit map
Assume that the size of a codebook is 15: cw0, cw1, …, cw14 Clustering Codebook Assume that the size of a codebook is 15: cw0, cw1, …, cw14 Clustering: C1: cw0, cw1, cw3, cw6, cw8, cw10 C2: cw4, cw14 C3: cw2, cw5, cw9 C4: cw12 C5: cw7, cw11, cw13
Assume that the size of the state codebook is 4 L cw14 Assume that the size of the state codebook is 4 Relocation cw0, cw1 cw3, cw6 cw8, cw10 cw2, cw5 cw9 cw4, cw14 cw7, cw11 cw13 cw12
Embedding Secret bits: 1011 Only the codewords in G0 can embed the secret bits Embedding The codewords in G1 should be replaced with the codewords in G2 cw14 cw12 cw1 cw2 cw6 cw3 cw10 cw8 Secret bits: 1011 cw4 cw12 cw0 cw2 cw6 cw5 cw3 cw8 cw1
Extraction & Reversibility cw4 cw12 cw0 cw2 cw6 cw5 cw3 cw8 cw1 1 1 1 recover cw14 cw12 cw1 cw2 cw6 cw3 cw10 cw8 Secret bits:
12 hit maps (600 bits), 250 clusters Experiments Method Measure Lena Pepper Sailboat Baboon Modified Tian’s method PSNR (dB) 26.92 26.45 25.05 22.70 Payload (bits) 2777 3375 3283 2339 MFCVQ 28.03 26.43 26.60 24.04 5892 5712 5176 1798 Proposed method 30.23 29.15 28.00 8707 8421 7601 3400 12 hit maps (600 bits), 250 clusters
Experiments Tian’s method MFCVQ
Using clustering and multiple hit maps Single hit map Multiple hit maps without clustering Using clustering and multiple hit maps
Using Lena as the cover image Experiments Using Lena as the cover image
A Reversible Steganographic Method Using SMVQ Approach based on Declustering
Find the most dissimilar pairs (De-clustering) … CW1 CW8 CW2 CW9 CW3 CW10 CW4 CW11 CW5 CW12 CW6 CW13 CW7 CW14 1 Dissimilar
Embedding Using Side-Match CW1 CW8 :Dissimilar Pair Assume X = CW1 V0 = ((U13+L4)/2, U14, U15, U16, L8, L12, L16) V1 = (X1, X2, X3, X4, X5, X9, X13)CW1 V8 = (X1, X2, X3, X4, X5, X9, X13)CW8 d1=Euclidean_Distance(V0, V1) d8=Euclidean_Distance(V0, V8) If (d1<d8), then Block X is replaceable Otherwise, Block X is non-replaceable
A secret message: 1 0 1 0 1 0 0 1 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 Secret bits Index Table If (d6<d13) CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 Embedding Result 1
A secret message: 1 0 1 0 1 0 0 1 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 Secret bits Index Table If (d2<d9) CW1, CW2, CW3, CW4 CW5, CW6 CW7, CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 9 Embedding Result 1
A secret message: 1 0 1 0 1 0 0 1 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 Secret bits Index Table If (d12>=d5) CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 9 15||12 Embedding Result 1 CW15: embed 1
A secret message: 1 0 1 0 1 0 0 1 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 Secret bits Index Table If (d9>=d2) CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 9 15||12 0||9 Embedding Result 1 CW0: embed 0
Steganographic Index Table Extraction and Recovery 6 9 15||12 0||9 1 Extract Secret bits Steganographic Index Table If (d6<d13) CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 Recovery 1
Steganographic Index Table Extraction and Recovery 6 9 15||12 0||9 1 Extract Secret bits Steganographic Index Table If (d9>=d2) CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 2 Recovery 1
Steganographic Index Table Extraction and Recovery 6 9 15||12 0||9 1 1 Extract Secret bits Steganographic Index Table CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 2 12 Recovery 1
Steganographic Index Table Extraction and Recovery 6 9 15||12 0||9 1 1 Extract Secret bits Steganographic Index Table CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 2 12 9 Recovery 1
Find Dissimilar Pairs PCA projection
Improve Embedding Capacity Partition into more groups
Experiments Codebook size: 512 Codeword size: 16 The number of original image blocks:128*128=16384 The number of non-replaceable blocks: 139
Experiments Codebook size: 512 Codeword size: 16 The number of original image blocks:128*128=16384 The number of non-replaceable blocks: 458
Size of the state codebook Experiments Embedding capacity Images Tian’s method MFCVQ Chang et al.’s method Proposed Method (3 groups) (9 groups) (17 groups) Lena 2,777 5,892 10,111 16,129 45,075 55,186 Baboon 2,339 1,798 4,588 36,609 39,014 Time Comparison Image Lena Methods Tian’s method MFCVQ Chang et al.’s method Proposed mehtod Time (sec) 0.55 1.36 Size of the state codebook Number of groups 4 8 16 32 3 5 9 17 14.59 29.80 58.8 161.2 0.11 0.13 0.14 0.19
Future Research Directions Extend the proposed reversible steganographic methods to other image formats Apply perfect hashing schemes to other applications
Thanks all