Data Mining and Its Applications to Image Processing

Data Mining and Its Applications to Image Processing
資料挖掘技術及其在影像處理之應用指導教授： Chang, Chin-Chen (張真誠) 研究生： Lin, Chih-Yang (林智揚) Department of Computer Science and Information Engineering, National Chung Cheng University

The Fields of Data Mining
Mining Association Rules Sequential Mining Clustering (Declustering) Classification ……………

Outline Part I: Design and Analysis Data Mining Algorithms
Part II: Data Mining Applications to Image Processing

Part I: Design and Analysis Data Mining Algorithms
1. Perfect Hashing Schemes for Mining Association Rules (or for Mining Traversal Patterns)

Mining Association Rules
Support Obtain Large Itemset Confidence Generate Association Rules

D C1 L1 Apriori Scan D C2 Sup=2 C2 L2 Scan D C3 C3 L3 Scan D TID Items
100 A C D 200 B C E 300 A B C E 400 B E Itemset Sup. {A} 2 {B} 3 {C} {D} 1 {E} Itemset Sup. {A} 2 {B} 3 {C} {E} Scan D C2 Sup=2 C2 L2 Itemset {A B} {A C} {A E} {B C} {B E} {C E} Itemset Sup. {A B} 1 {A C} 2 {A E} {B C} {B E} 3 {C E} Itemset Sup. {A C} 2 {B C} {B E} 3 {C E} Scan D 名詞解釋因為求初始Large Itemset為整個演算法最花執行時間所在，而在initial的表現以Apriori的表現為最佳，所以以此與DHP做比較的基準 Minimum Support=2 C3因為{BC},{BE}有共同的第一項，所以測試{CE}是否也為Large Itemset，是，所以得到{BCE}為候選 C3 C3 L3 Scan D Itemset {B C E} Itemset Sup. {B C E} 2 Itemset Sup. {B C E} 2

Apriori Cont. Disadvantages Inefficient
Produce much more useless candidates

DHP Prune useless candidates in advance
Reduce database size at each iteration

D C1 Count {A} 2 {B} 3 {C} {D} 1 {E} L1 {A} {B} {C} {E} Min sup=2 TID
Items 100 A C D 200 B C E 300 A B C E 400 B E Making a hash table 100 {A C} 200 {B C},{B E},{C E} 300 {A B},{A C},{A E},{B C},{B E},{C E} 400 {B E} H{[x y]}=((order of x )*10+(order of y)) mod 7; {A E} {B E} {C E} {B C} {A C} {A B} 3 2 1 4 5 6 Hash 方法的介紹，包括雜湊函數，方法等在資料庫D完成1-subset support掃瞄後，2-item的雜湊表也同時完成，依照資料庫D用2-item做區分照排序帶入雜湊函數，並丟入Hash table 計算每個bucket的數量利用buckets count(大於s=2)可得到bit vector，再用其過濾L1*L1就可得到較小的C2 Hash table H2 Hash address Bit vector The number of items hashed to bucket 0

Perfect Hashing Schemes (PHS) for Mining Association Rules

Motivation Apriori and DHP produce Ci from Li-1 that may be the bottleneck Collisions in DHP Designing a perfect hashing function for every transaction databases is a thorny problem

Definition Definition. A Join operation is to join two different (k-1)-itemsets, , respectively, to produces a k-itemset, where = p1p2…pk-1 = q1q2…qk-1 and p2=q1, p3=q2,…,pk-2=qk-3, pk-1=qk-2. Example: ABC, BCD 3-itemsets of ABCD: ABC, ABD, ACD, BCD only one pair that satisfies the join definition

Algorithm PHS (Perfect Hashing and Data Shrinking)

(BC)(BD)(BE)(CD)(CE)(DE)
L1 Itemset Sup. {B} 3 {C} {D} 2 {E} Example1 (sup=2) TID Items 100 ACD 200 BCE 300 BCDE 400 BE TID Items 100 (CD) 200 (BC) (BE)(CE) 300 (BC)(BD)(BE)(CD)(CE)(DE) 400 (BE) Itemsets (BC) (BD) (BE) (CD) (CE) (DE) Support 2 1 3 Encoding A B C D Original (BC) (BE) (CD) (CE)

Decode: AD -> (BC)(CE) = BCE
Example2 (sup=2) TID Items 100 Null 200 (AD) 300 (AC)(AD) 400 Itemsets (AB) (AC) (AD) (BC) (BD) (CD) Support 1 2 Encoding A Original (AD) Decode: AD -> (BC)(CE) = BCE

Problem on Hash Table Consider a database contains p transactions, which are comprised of unique items and are of equal length N, and the minimum support of 1. At iteration k, the # of candidate k-itemsets is The # of buckets required in the next pass is= , where m = While the actual # of the next candidates is Loading density :

How to Improve the Loading Density
Two level perfect hash scheme (parital hash) Itemsets (AB) (AC) (AD) (BC) (BD) (CD) Support 1 2 A B C Hash Table D Null Count 1 2

Experiments

Part II: Data Mining Applications to Image Processing
1. A Prediction Scheme for Image Vector Quantization based on Mining Association Rules 2. Reversible Steganography for VQ-compressed Images Using Clustering and Relocation 3. A Reversible Steganographic Method Using SMVQ Approach based on Declustering

A Prediction Scheme for Image Vector Quantization Based on Mining Association Rules

Vector Quantization (VQ)
Image encoding and decoding techniques

SMVQ(cont.) Codebook State Codebook

Framework of the Proposed Method
v/10 (Quantized)

If “X  y' , there is no such rule X'  y',
Condition Horizontal, Vertical, Diagonal, Association Rules If “X  y' , there is no such rule X'  y', where X'  X and y' = y.

The Prediction Strategy

Example Rules DB ? may be 5, 1, 8, or 10. How to decide? Query Result
Matched set of rules Matched vertical rules Matched horizontal rules Matched diagonal rules (4, 2, 3, 3  5) confv = 90% (12, 12, 1, 3  5) confh = 90% (6, 4, 2, 2, 3  5) confd=100% (4, 2, 3  1) confv = 85% (12, 12  1) confh = 95% (6, 4, 2, 2  8) confd =70% X (6, 4, 2  10) confd = 75% ? may be 5, 1, 8, or 10. How to decide?

Example cont. The weight of 5: 4*90%+4*90%+5*100%= 12.2
Matched set of rules Matched vertical rules Matched horizontal rules Matched diagonal rules (4, 2, 3, 3  5) confv = 90% (12, 12, 1, 3  5) confh = 90% (6, 4, 2, 2, 3  5) confd=100% (4, 2, 3  1) confv = 85% (12, 12  1) confh = 95% (6, 4, 2, 2  8) confd =70% X (6, 4, 2  10) confd = 75% The weight of 5: 4*90%+4*90%+5*100%= 12.2 The weight of 1: 3*85%+2*95% = 4.45 The weight of 8: 4*70% = 2.8 The weight of 10: 3*75% = 2.25 {5, 1} is called the consequence list, which size is determined by the user

Experiments Reconstructed image by the proposed method Original Image
Reconstructed image by full-search VQ

Experiments cont. The performance comparisons on various methods
Performance Lena Pepper F16 Full-search VQ PSNR (dB) 32.25 31.41 31.58 Bit-rate (bpp) 0.5 SMVQ 28.57 28.04 27.94 0.33 0.32 Our Scheme 30.64 30.05 29.74 0.34

Experiments cont. Overfitting problem

Advantages Mining association rules can be applied to image prediction successfully Broader spatial correlation is considered than that of SMVQ More efficient than that of SMVQ since no Euclidean distances should be calculated

Reversible Steganography for VQ-compressed Images Using Clustering and Relocation

Flowchart of the Proposed Method
X

Construction of the Hit Map
13 1 13 7 13 4 6 7 1 1 4 4 2 7 3 11 . . . Sorted codebook Hit map

Assume that the size of a codebook is 15: cw0, cw1, …, cw14
Clustering Codebook Assume that the size of a codebook is 15: cw0, cw1, …, cw14 Clustering: C1: cw0, cw1, cw3, cw6, cw8, cw10 C2: cw4, cw14 C3: cw2, cw5, cw9 C4: cw12 C5: cw7, cw11, cw13

Assume that the size of the state codebook is 4
L cw14 Assume that the size of the state codebook is 4 Relocation cw0, cw1 cw3, cw6 cw8, cw10 cw2, cw5 cw9 cw4, cw14 cw7, cw11 cw13 cw12

Embedding Secret bits: 1011
Only the codewords in G0 can embed the secret bits Embedding The codewords in G1 should be replaced with the codewords in G2 cw14 cw12 cw1 cw2 cw6 cw3 cw10 cw8 Secret bits: 1011 cw4 cw12 cw0 cw2 cw6 cw5 cw3 cw8 cw1

Extraction & Reversibility
cw4 cw12 cw0 cw2 cw6 cw5 cw3 cw8 cw1 1 1 1 recover cw14 cw12 cw1 cw2 cw6 cw3 cw10 cw8 Secret bits:

12 hit maps (600 bits), 250 clusters
Experiments Method Measure Lena Pepper Sailboat Baboon Modified Tian’s method PSNR (dB) 26.92 26.45 25.05 22.70 Payload (bits) 2777 3375 3283 2339 MFCVQ 28.03 26.43 26.60 24.04 5892 5712 5176 1798 Proposed method 30.23 29.15 28.00 8707 8421 7601 3400 12 hit maps (600 bits), 250 clusters

Experiments Tian’s method MFCVQ

Using clustering and multiple hit maps
Single hit map Multiple hit maps without clustering Using clustering and multiple hit maps

Using Lena as the cover image
Experiments Using Lena as the cover image

A Reversible Steganographic Method Using SMVQ Approach based on Declustering

Find the most dissimilar pairs
(De-clustering) … CW1 CW8 CW2 CW9 CW3 CW10 CW4 CW11 CW5 CW12 CW6 CW13 CW7 CW14 1 Dissimilar

Embedding Using Side-Match
CW1 CW8 :Dissimilar Pair Assume X = CW1 V0 = ((U13+L4)/2, U14, U15, U16, L8, L12, L16) V1 = (X1, X2, X3, X4, X5, X9, X13)CW1 V8 = (X1, X2, X3, X4, X5, X9, X13)CW8 d1=Euclidean_Distance(V0, V1) d8=Euclidean_Distance(V0, V8) If (d1<d8), then Block X is replaceable Otherwise, Block X is non-replaceable

A secret message: 1 1 1 1 1 1 1 1 Secret bits Index Table If (d6<d13) CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 Embedding Result 1

A secret message: 1 1 1 1 1 1 1 1 Secret bits Index Table If (d2<d9) CW1, CW2, CW3, CW4 CW5, CW6 CW7, CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 9 Embedding Result 1

A secret message: 1 1 1 1 1 1 1 1 Secret bits Index Table If (d12>=d5) CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 9 15||12 Embedding Result 1 CW15: embed 1

A secret message: 1 1 1 1 1 1 1 1 Secret bits Index Table If (d9>=d2) CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 9 15||12 0||9 Embedding Result 1 CW0: embed 0

Steganographic Index Table
Extraction and Recovery 6 9 15||12 0||9 1 Extract Secret bits Steganographic Index Table If (d6<d13) CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 Recovery 1

Extraction and Recovery 6 9 15||12 0||9 1 Extract Secret bits Steganographic Index Table If (d9>=d2) CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 2 Recovery 1

Extraction and Recovery 6 9 15||12 0||9 1 1 Extract Secret bits Steganographic Index Table CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 2 12 Recovery 1

Extraction and Recovery 6 9 15||12 0||9 1 1 Extract Secret bits Steganographic Index Table CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 2 12 9 Recovery 1

Find Dissimilar Pairs PCA projection

Improve Embedding Capacity
Partition into more groups

Experiments Codebook size: 512 Codeword size: 16
The number of original image blocks:128*128=16384 The number of non-replaceable blocks: 139

Experiments Codebook size: 512 Codeword size: 16
The number of original image blocks:128*128=16384 The number of non-replaceable blocks: 458

Size of the state codebook
Experiments Embedding capacity Images Tian’s method MFCVQ Chang et al.’s method Proposed Method (3 groups) (9 groups) (17 groups) Lena 2,777 5,892 10,111 16,129 45,075 55,186 Baboon 2,339 1,798 4,588 36,609 39,014 Time Comparison Image Lena Methods Tian’s method MFCVQ Chang et al.’s method Proposed mehtod Time (sec) 0.55 1.36 Size of the state codebook Number of groups 4 8 16 32 3 5 9 17 14.59 29.80 58.8 161.2 0.11 0.13 0.14 0.19

Future Research Directions
Extend the proposed reversible steganographic methods to other image formats Apply perfect hashing schemes to other applications

Thanks all

Data Mining and Its Applications to Image Processing

Similar presentations

Presentation on theme: "Data Mining and Its Applications to Image Processing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Mining and Its Applications to Image Processing

Similar presentations

Presentation on theme: "Data Mining and Its Applications to Image Processing"— Presentation transcript:

Similar presentations

About project

Feedback