Download presentation
Presentation is loading. Please wait.
Published byEzra Mosley Modified over 6 years ago
1
Data Mining and Its Applications to Image Processing
資料挖掘技術及其在影像處理之應用 指導教授: Chang, Chin-Chen (張真誠) 研究生: Lin, Chih-Yang (林智揚) Department of Computer Science and Information Engineering, National Chung Cheng University
2
The Fields of Data Mining
Mining Association Rules Sequential Mining Clustering (Declustering) Classification ……………
3
Outline Part I: Design and Analysis Data Mining Algorithms
Part II: Data Mining Applications to Image Processing
4
Part I: Design and Analysis Data Mining Algorithms
1. Perfect Hashing Schemes for Mining Association Rules (or for Mining Traversal Patterns)
5
Mining Association Rules
Support Obtain Large Itemset Confidence Generate Association Rules
6
D C1 L1 Apriori Scan D C2 Sup=2 C2 L2 Scan D C3 C3 L3 Scan D TID Items
100 A C D 200 B C E 300 A B C E 400 B E Itemset Sup. {A} 2 {B} 3 {C} {D} 1 {E} Itemset Sup. {A} 2 {B} 3 {C} {E} Scan D C2 Sup=2 C2 L2 Itemset {A B} {A C} {A E} {B C} {B E} {C E} Itemset Sup. {A B} 1 {A C} 2 {A E} {B C} {B E} 3 {C E} Itemset Sup. {A C} 2 {B C} {B E} 3 {C E} Scan D 名詞解釋 因為求初始Large Itemset為整個演算法最花執行時間所在,而在initial的表現以Apriori的表現為最佳,所以以此與DHP做比較的基準 Minimum Support=2 C3因為{BC},{BE}有共同的第一項,所以測試{CE}是否也為Large Itemset,是,所以得到{BCE}為候選 C3 C3 L3 Scan D Itemset {B C E} Itemset Sup. {B C E} 2 Itemset Sup. {B C E} 2
7
Apriori Cont. Disadvantages Inefficient
Produce much more useless candidates
8
DHP Prune useless candidates in advance
Reduce database size at each iteration
9
D C1 Count {A} 2 {B} 3 {C} {D} 1 {E} L1 {A} {B} {C} {E} Min sup=2 TID
Items 100 A C D 200 B C E 300 A B C E 400 B E Making a hash table 100 {A C} 200 {B C},{B E},{C E} 300 {A B},{A C},{A E},{B C},{B E},{C E} 400 {B E} H{[x y]}=((order of x )*10+(order of y)) mod 7; {A E} {B E} {C E} {B C} {A C} {A B} 3 2 1 4 5 6 Hash 方法的介紹,包括雜湊函數,方法等 在資料庫D完成1-subset support掃瞄後,2-item的雜湊表也同時完成,依照資料庫D用2-item做區分 照排序帶入雜湊函數,並丟入Hash table 計算每個bucket的數量 利用buckets count(大於s=2)可得到bit vector,再用其過濾L1*L1就可得到較小的C2 Hash table H2 Hash address Bit vector The number of items hashed to bucket 0
10
Perfect Hashing Schemes (PHS) for Mining Association Rules
11
Motivation Apriori and DHP produce Ci from Li-1 that may be the bottleneck Collisions in DHP Designing a perfect hashing function for every transaction databases is a thorny problem
12
Definition Definition. A Join operation is to join two different (k-1)-itemsets, , respectively, to produces a k-itemset, where = p1p2…pk-1 = q1q2…qk-1 and p2=q1, p3=q2,…,pk-2=qk-3, pk-1=qk-2. Example: ABC, BCD 3-itemsets of ABCD: ABC, ABD, ACD, BCD only one pair that satisfies the join definition
13
Algorithm PHS (Perfect Hashing and Data Shrinking)
14
(BC)(BD)(BE)(CD)(CE)(DE)
L1 Itemset Sup. {B} 3 {C} {D} 2 {E} Example1 (sup=2) TID Items 100 ACD 200 BCE 300 BCDE 400 BE TID Items 100 (CD) 200 (BC) (BE)(CE) 300 (BC)(BD)(BE)(CD)(CE)(DE) 400 (BE) Itemsets (BC) (BD) (BE) (CD) (CE) (DE) Support 2 1 3 Encoding A B C D Original (BC) (BE) (CD) (CE)
15
Decode: AD -> (BC)(CE) = BCE
Example2 (sup=2) TID Items 100 Null 200 (AD) 300 (AC)(AD) 400 Itemsets (AB) (AC) (AD) (BC) (BD) (CD) Support 1 2 Encoding A Original (AD) Decode: AD -> (BC)(CE) = BCE
16
Problem on Hash Table Consider a database contains p transactions, which are comprised of unique items and are of equal length N, and the minimum support of 1. At iteration k, the # of candidate k-itemsets is The # of buckets required in the next pass is= , where m = While the actual # of the next candidates is Loading density :
17
How to Improve the Loading Density
Two level perfect hash scheme (parital hash) Itemsets (AB) (AC) (AD) (BC) (BD) (CD) Support 1 2 A B C Hash Table D Null Count 1 2
18
Experiments
19
Experiments
20
Experiments
21
Part II: Data Mining Applications to Image Processing
1. A Prediction Scheme for Image Vector Quantization based on Mining Association Rules 2. Reversible Steganography for VQ-compressed Images Using Clustering and Relocation 3. A Reversible Steganographic Method Using SMVQ Approach based on Declustering
22
A Prediction Scheme for Image Vector Quantization Based on Mining Association Rules
23
Vector Quantization (VQ)
Image encoding and decoding techniques
24
SMVQ(cont.) Codebook State Codebook
25
Framework of the Proposed Method
v/10 (Quantized)
26
If “X y' , there is no such rule X' y',
Condition Horizontal, Vertical, Diagonal, Association Rules If “X y' , there is no such rule X' y', where X' X and y' = y.
27
The Prediction Strategy
28
Example Rules DB ? may be 5, 1, 8, or 10. How to decide? Query Result
Matched set of rules Matched vertical rules Matched horizontal rules Matched diagonal rules (4, 2, 3, 3 5) confv = 90% (12, 12, 1, 3 5) confh = 90% (6, 4, 2, 2, 3 5) confd=100% (4, 2, 3 1) confv = 85% (12, 12 1) confh = 95% (6, 4, 2, 2 8) confd =70% X (6, 4, 2 10) confd = 75% ? may be 5, 1, 8, or 10. How to decide?
29
Example cont. The weight of 5: 4*90%+4*90%+5*100%= 12.2
Matched set of rules Matched vertical rules Matched horizontal rules Matched diagonal rules (4, 2, 3, 3 5) confv = 90% (12, 12, 1, 3 5) confh = 90% (6, 4, 2, 2, 3 5) confd=100% (4, 2, 3 1) confv = 85% (12, 12 1) confh = 95% (6, 4, 2, 2 8) confd =70% X (6, 4, 2 10) confd = 75% The weight of 5: 4*90%+4*90%+5*100%= 12.2 The weight of 1: 3*85%+2*95% = 4.45 The weight of 8: 4*70% = 2.8 The weight of 10: 3*75% = 2.25 {5, 1} is called the consequence list, which size is determined by the user
30
Experiments Reconstructed image by the proposed method Original Image
Reconstructed image by full-search VQ
31
Experiments cont. The performance comparisons on various methods
Performance Lena Pepper F16 Full-search VQ PSNR (dB) 32.25 31.41 31.58 Bit-rate (bpp) 0.5 SMVQ 28.57 28.04 27.94 0.33 0.32 Our Scheme 30.64 30.05 29.74 0.34
32
Experiments cont. Overfitting problem
33
Advantages Mining association rules can be applied to image prediction successfully Broader spatial correlation is considered than that of SMVQ More efficient than that of SMVQ since no Euclidean distances should be calculated
34
Reversible Steganography for VQ-compressed Images Using Clustering and Relocation
35
Flowchart of the Proposed Method
X
36
Construction of the Hit Map
13 1 13 7 13 4 6 7 1 1 4 4 2 7 3 11 . . . Sorted codebook Hit map
37
Assume that the size of a codebook is 15: cw0, cw1, …, cw14
Clustering Codebook Assume that the size of a codebook is 15: cw0, cw1, …, cw14 Clustering: C1: cw0, cw1, cw3, cw6, cw8, cw10 C2: cw4, cw14 C3: cw2, cw5, cw9 C4: cw12 C5: cw7, cw11, cw13
38
Assume that the size of the state codebook is 4
L cw14 Assume that the size of the state codebook is 4 Relocation cw0, cw1 cw3, cw6 cw8, cw10 cw2, cw5 cw9 cw4, cw14 cw7, cw11 cw13 cw12
39
Embedding Secret bits: 1011
Only the codewords in G0 can embed the secret bits Embedding The codewords in G1 should be replaced with the codewords in G2 cw14 cw12 cw1 cw2 cw6 cw3 cw10 cw8 Secret bits: 1011 cw4 cw12 cw0 cw2 cw6 cw5 cw3 cw8 cw1
40
Extraction & Reversibility
cw4 cw12 cw0 cw2 cw6 cw5 cw3 cw8 cw1 1 1 1 recover cw14 cw12 cw1 cw2 cw6 cw3 cw10 cw8 Secret bits:
41
12 hit maps (600 bits), 250 clusters
Experiments Method Measure Lena Pepper Sailboat Baboon Modified Tian’s method PSNR (dB) 26.92 26.45 25.05 22.70 Payload (bits) 2777 3375 3283 2339 MFCVQ 28.03 26.43 26.60 24.04 5892 5712 5176 1798 Proposed method 30.23 29.15 28.00 8707 8421 7601 3400 12 hit maps (600 bits), 250 clusters
42
Experiments Tian’s method MFCVQ
43
Using clustering and multiple hit maps
Single hit map Multiple hit maps without clustering Using clustering and multiple hit maps
44
Using Lena as the cover image
Experiments Using Lena as the cover image
45
A Reversible Steganographic Method Using SMVQ Approach based on Declustering
46
Find the most dissimilar pairs
(De-clustering) … CW1 CW8 CW2 CW9 CW3 CW10 CW4 CW11 CW5 CW12 CW6 CW13 CW7 CW14 1 Dissimilar
47
Embedding Using Side-Match
CW1 CW8 :Dissimilar Pair Assume X = CW1 V0 = ((U13+L4)/2, U14, U15, U16, L8, L12, L16) V1 = (X1, X2, X3, X4, X5, X9, X13)CW1 V8 = (X1, X2, X3, X4, X5, X9, X13)CW8 d1=Euclidean_Distance(V0, V1) d8=Euclidean_Distance(V0, V8) If (d1<d8), then Block X is replaceable Otherwise, Block X is non-replaceable
48
A secret message: 1 1 1 1 1 1 1 1 Secret bits Index Table If (d6<d13) CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 Embedding Result 1
49
A secret message: 1 1 1 1 1 1 1 1 Secret bits Index Table If (d2<d9) CW1, CW2, CW3, CW4 CW5, CW6 CW7, CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 9 Embedding Result 1
50
A secret message: 1 1 1 1 1 1 1 1 Secret bits Index Table If (d12>=d5) CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 9 15||12 Embedding Result 1 CW15: embed 1
51
A secret message: 1 1 1 1 1 1 1 1 Secret bits Index Table If (d9>=d2) CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 9 15||12 0||9 Embedding Result 1 CW0: embed 0
52
Steganographic Index Table
Extraction and Recovery 6 9 15||12 0||9 1 Extract Secret bits Steganographic Index Table If (d6<d13) CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 Recovery 1
53
Steganographic Index Table
Extraction and Recovery 6 9 15||12 0||9 1 Extract Secret bits Steganographic Index Table If (d9>=d2) CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 2 Recovery 1
54
Steganographic Index Table
Extraction and Recovery 6 9 15||12 0||9 1 1 Extract Secret bits Steganographic Index Table CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 2 12 Recovery 1
55
Steganographic Index Table
Extraction and Recovery 6 9 15||12 0||9 1 1 Extract Secret bits Steganographic Index Table CW1, CW2, CW3, CW4 CW5, CW6 CW7 , CW15 CW8, CW9 CW10, CW11 CW12, CW13 CW14 , CW0 6 2 12 9 Recovery 1
56
Find Dissimilar Pairs PCA projection
57
Improve Embedding Capacity
Partition into more groups
58
Experiments Codebook size: 512 Codeword size: 16
The number of original image blocks:128*128=16384 The number of non-replaceable blocks: 139
59
Experiments Codebook size: 512 Codeword size: 16
The number of original image blocks:128*128=16384 The number of non-replaceable blocks: 458
60
Size of the state codebook
Experiments Embedding capacity Images Tian’s method MFCVQ Chang et al.’s method Proposed Method (3 groups) (9 groups) (17 groups) Lena 2,777 5,892 10,111 16,129 45,075 55,186 Baboon 2,339 1,798 4,588 36,609 39,014 Time Comparison Image Lena Methods Tian’s method MFCVQ Chang et al.’s method Proposed mehtod Time (sec) 0.55 1.36 Size of the state codebook Number of groups 4 8 16 32 3 5 9 17 14.59 29.80 58.8 161.2 0.11 0.13 0.14 0.19
61
Future Research Directions
Extend the proposed reversible steganographic methods to other image formats Apply perfect hashing schemes to other applications
62
Thanks all
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.