Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tofik AliPartha Pratim Roy Department of Computer Science and Engineering Indian Institute of Technology Roorkee CVIP-WM 2017 Paper ID 172 Word Spotting.

Similar presentations


Presentation on theme: "Tofik AliPartha Pratim Roy Department of Computer Science and Engineering Indian Institute of Technology Roorkee CVIP-WM 2017 Paper ID 172 Word Spotting."— Presentation transcript:

1 Tofik AliPartha Pratim Roy Department of Computer Science and Engineering Indian Institute of Technology Roorkee CVIP-WM 2017 Paper ID 172 Word Spotting based on Pyramidal Histogram of Characters Code for Handwritten Text Documents

2 Word Spotting Retrieval of similar word to query word. Query word has two form 1.Query by example 2.Query by String PHOC Code It is a n encoded representation of a word. Almaz´an, J., Gordo, A., Forn´es, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(12), 2552–2566 (2014) Paper ID 172 Word Spotting based on Pyramidal Histogram of Characters Code for Handwritten Text Documents

3 Phase 1 Word segmentation from handwritten text documents. Phase 2 Transformation of word image/String into PHOC code. Phase 3 Retrieval of words similar to query word. Paper ID 172 Word Spotting based on Pyramidal Histogram of Characters Code for Handwritten Text Documents

4 Phase 1 Word segmentation from handwritten text documents. C O N V 5 X 5 (32) C O N V 5 X 5 (32) MAX POOL (2,2) MAX POOL (2,2) 32@ 512x512 C O N V 5 X 5 (32) C O N V 5 X 5 (32) C1_1 C1_2 MP1 32@ 512x512 32@ 256x256 C O N V 3 X 3 (64) C O N V 3 X 3 (64) MAX POOL (2,2) MAX POOL (2,2) 64@ 256x256 C O N V 3 X 3 (64) C O N V 3 X 3 (64) C2_1 C2_2 MP2 64@ 256x256 64@ 128x128 C O N V 3 X 3 (128) C O N V 3 X 3 (128) MAX POOL (2,2) MAX POOL (2,2) 128@ 128x128 C O N V 3 X 3 (128) C O N V 3 X 3 (128) C3_1 C3_2 MP3 128@ 128x128 128@ 64x64 C O N V 3 X 3 (128) C O N V 3 X 3 (128) 128@ 64x64 C O N V 3 X 3 (128) C O N V 3 X 3 (128) C4_2 C4_1 128@ 64x64 UP- SAMP LE (2,2) UP- SAMP LE (2,2) UP1 128@ 128x128 C O N V 3 X 3 (64) C O N V 3 X 3 (64) 64@ 128x128 C O N V 3 X 3 (64) C O N V 3 X 3 (64) C5_2 C5_1 64@ 128x128 UP- SAMP LE (2,2) UP- SAMP LE (2,2) UP2 64@ 256x256 Merge 256@ 128x128 C O N V 3 X 3 (32) C O N V 3 X 3 (32) 32@ 256x256 C O N V 3 X 3 (32) C O N V 3 X 3 (32) C6_2 C6_1 32@ 256x256 UP- SAMP LE (2,2) UP- SAMP LE (2,2) UP2 32@ 512x512 Merge 128@ 256x256 C O N V 3 X 3 (1) C O N V 3 X 3 (1) C7_1 1@ 512x512 Merge 64@ 512x512

5 Original Document Image Ground Truth Final Bounding Box Output of Segmentation CNN Phase 1 Results of word segmentation

6 How the PHOC code generated? Level 1 PHOC code 3 level PHOC code Phase 2 Transformation of word image/String into PHOC code. place Level 2 PHOC code Level 3 PHOC code

7 Phase 2 PHOCNet : Existing deep CNN based architecture Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016)

8 3x3 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

9 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 5x5 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

10 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 9x9 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

11 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 13x13 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

12 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 21x21 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

13 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 29x29 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

14 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 37x37 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

15 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 45x45 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

16 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 53x53 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

17 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 61x61 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

18 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 69x69 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

19 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 77x77 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

20 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

21 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 3x3 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

22 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 5x5 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

23 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 7x7 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

24 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 9x9 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

25 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 11x11 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

26 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 13x13 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

27 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 15x15 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

28 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 17x17 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

29 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 19x19 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

30 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 21x21 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

31 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 42x42 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

32 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 44x44 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

33 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 46x46 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

34 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 92x92 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

35 Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 94x94 85x85 receptive field Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer

36 Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 96x96 85x85 receptive field

37 Phase 2 PHOCNet : Receptive field at last layer minimum size input Spatial Pyramidal max pooling Layer Sudholt, S., Fink, G.A.: Phocnet: A deep convolutional neural network for word spotting in handwritten documents. In: 15th International Conference on Frontiers in Handwriting Recognition. pp. 277–282 (2016) Minimum size 96x96 85x85 receptive field

38 Phase 2 Modified PHOCNet: It require less memory and time to train and test. Minimum size 48x48 45x45 receptive field C O N V 3 X 3 (64) C O N V 3 X 3 (64) MAX POO L (2,2) MAX POO L (2,2) C O N V 3 X 3 (64) C O N V 3 X 3 (64) C1_ 1 C1_ 2 MP1 C O N V 3 X 3 (128) C O N V 3 X 3 (128) MAX POO L (2,2) MAX POO L (2,2) C O N V 3 X 3 (128) C O N V 3 X 3 (128) C2_ 1 C2_ 2 MP1 C O N V 3 X 3 (256) C O N V 3 X 3 (256) C O N V 3 X 3 (256) C O N V 3 X 3 (256) C3_ 1 C3_ 2 C O N V 3 X 3 (512) C O N V 3 X 3 (512) C O N V 3 X 3 (512) C O N V 3 X 3 (512) C4_ 1 C4_ 2 S P N 1 X [1,2,4,7,11] (512x25 =12800) S P N 1 X [1,2,4,7,11] (512x25 =12800) SPP1_1 F C N (4096) F C N (4096) F C N (1024) F C N (1024) FCN_ 1 FCN_ 2 F C N (674) F C N (674) FCN_ 3 P H O C c o d e ( 6 7 4 ) Fixed the input word image height as 48 pixels Remove 5 convolution layers to cooperate with image height Introduce overlapped Spatial pyramidal max pooling layer

39 Phase 2 Modified PHOCNet: It require less memory and time to train and test. Minimum size 48x48 45x45 receptive field C O N V 3 X 3 (64) C O N V 3 X 3 (64) MAX POO L (2,2) MAX POO L (2,2) C O N V 3 X 3 (64) C O N V 3 X 3 (64) C1_ 1 C1_ 2 MP1 C O N V 3 X 3 (128) C O N V 3 X 3 (128) MAX POO L (2,2) MAX POO L (2,2) C O N V 3 X 3 (128) C O N V 3 X 3 (128) C2_ 1 C2_ 2 MP1 C O N V 3 X 3 (256) C O N V 3 X 3 (256) C O N V 3 X 3 (256) C O N V 3 X 3 (256) C3_ 1 C3_ 2 C O N V 3 X 3 (512) C O N V 3 X 3 (512) C O N V 3 X 3 (512) C O N V 3 X 3 (512) C4_ 1 C4_ 2 S P N 1 X [1,2,4,7,11] (512x25 =12800) S P N 1 X [1,2,4,7,11] (512x25 =12800) SPP1_1 F C N (4096) F C N (4096) F C N (1024) F C N (1024) FCN_ 1 FCN_ 2 F C N (674) F C N (674) FCN_ 3 P H O C c o d e ( 6 7 4 ) Fixed the input word image height as 48 pixels Remove 5 convolution layers to cooperate with image height Introduce overlapped Spatial pyramidal max pooling layer

40 Paper ID 172 Phase 2 Transformation of word image/String into PHOC code

41 C O N V 5 X 5 (32) MAX POOL (2,2) 32@ 512x512 C O N V 5 X 5 (32) C1_1 C1_2 MP1 32@ 512x512 32@ 256x256 C O N V 3 X 3 (64) MAX POOL (2,2) 64@ 256x256 C O N V 3 X 3 (64) C2_1 C2_2 MP2 64@ 256x256 64@ 128x128 C O N V 3 X 3 (128) MAX POOL (2,2) 128@ 128x128 C O N V 3 X 3 (128) C3_1 C3_2 MP3 128@ 128x128 128@ 64x64 C O N V 3 X 3 (128) 128@ 64x64 C O N V 3 X 3 (128) C4_2 C4_1 128@ 64x64 UP- SAM PLE (2,2) UP1 128@ 128x128 C O N V 3 X 3 (64) 64@ 128x128 C O N V 3 X 3 (64) C5_2 C5_1 64@ 128x128 UP- SAM PLE (2,2) UP2 64@ 256x256 Merge 256@ 128x128 C O N V 3 X 3 (32) 32@ 256x256 C O N V 3 X 3 (32) C6_2 C6_1 32@ 256x256 UP- SAM PLE (2,2) UP2 32@ 512x512 Merge 128@ 256x256 C O N V 3 X 3 (1) C7_1 1@ 512x512 Merge 64@ 512x512

42 C O N V 3 X 3 (64) MAX POOL (2,2) C O N V 3 X 3 (64) C1_1 C1_2 MP1 C O N V 3 X 3 (128) MAX POOL (2,2) C O N V 3 X 3 (128) C2_1 C2_2 MP1 C O N V 3 X 3 (256) C O N V 3 X 3 (256) C3_1 C3_2 C O N V 3 X 3 (512) C O N V 3 X 3 (512) C4_1 C4_2 S P N 1 X [1,2,4,7,11] (512x25 =12800) SPP1_1 F C N (4096) F C N (1024) FCN_1 FCN_2 F C N (674) FCN_3 P H O C c o d e ( 6 7 4 )

43 Original Document Image Ground TruthFinal Bounding Box Output of Segmentation CNN

44 Original Document Image Ground TruthFinal Bounding Box Output of Segmentation CNN

45 Can you examine Document ImageTextSegmented WordsRetrieved Words

46 Handwritten Documents Segmentation using CNN PHOC code Generation Query Word Minimum Edit Distance Calculation Information Retrieval as Word Spotting PHOC Codes Repository with Indexing


Download ppt "Tofik AliPartha Pratim Roy Department of Computer Science and Engineering Indian Institute of Technology Roorkee CVIP-WM 2017 Paper ID 172 Word Spotting."

Similar presentations


Ads by Google