Department of Computer Science Ben-Gurion University of the Negev

Slides:

Advertisements

Similar presentations

Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)

Advertisements

A brief review of non-neural-network approaches to deep learning

A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.

Face Recognition: A Convolutional Neural Network Approach

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Chapter 6: Multilayer Neural Networks

Radial Basis Function Networks

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

What is the Best Multi-Stage Architecture for Object Recognition Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo.

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,

Fully Convolutional Networks for Semantic Segmentation

Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.

Feedforward semantic segmentation with zoom-out features

Convolutional Neural Network

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Lecture 3b: CNN: Advanced Layers

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Machine Learning Supervised Learning Classification and Regression

Recent developments in object detection

Convolutional Sequence to Sequence Learning

Deep Learning for Dual-Energy X-Ray

Learning to Compare Image Patches via Convolutional Neural Networks

Faster R-CNN – Concepts

CS 6501: 3D Reconstruction and Understanding Convolutional Neural Networks Connelly Barnes.

Object Detection based on Segment Masks

Compact Bilinear Pooling

CLASSIFICATION OF TUMOR HISTOPATHOLOGY VIA SPARSE FEATURE LEARNING Nandita M. Nayak1, Hang Chang1, Alexander Borowsky2, Paul Spellman3 and Bahram Parvin1.

Robust Lung Nodule Classification using 2

Natural Language Processing of Knee MRI Reports

ECE 6504 Deep Learning for Perception

Lecture 5 Smaller Network: CNN

Training Techniques for Deep Neural Networks

Efficient Deep Model for Monocular Road Segmentation

CS6890 Deep Learning Weizhen Cai

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules

Fully Convolutional Networks for Semantic Segmentation

Bird-species Recognition Using Convolutional Neural Network

Computer Vision James Hays

Image Classification.

A Comparative Study of Convolutional Neural Network Models with Rosenblatt’s Brain Model Abu Kamruzzaman, Atik Khatri , Milind Ikke, Damiano Mastrandrea,

Counting in Dense Crowds using Deep Learning

Introduction of MATRIX CAPSULES WITH EM ROUTING

Object Classification through Deconvolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE

LECTURE 35: Introduction to EEG Processing

Neural Networks Geoff Hulten.

LECTURE 33: Alternative OPTIMIZERS

Visualizing and Understanding Convolutional Networks

Forward and Backward Max Pooling

Object Tracking: Comparison of

Analysis of Trained CNN (Receptive Field & Weights of Network)

边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University

Heterogeneous convolutional neural networks for visual recognition

Face Recognition: A Convolutional Neural Network Approach

Department of Computer Science Ben-Gurion University of the Negev

Human-object interaction

Introduction to Neural Networks

Deep Object Co-Segmentation

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Semantic Segmentation

Volodymyr Bobyr Supervised by Aayushjungbahadur Rana

Self-Supervised Cross-View Action Synthesis

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Presentation transcript:

Department of Computer Science Ben-Gurion University of the Negev Binarization Free Layout Analysis for Arabic Historical Documents Using Fully Convolutional Networks Berat Kurar Barakat Jihad El-Sana Hi, I am Berat Kurar, from Ben Gurion University of the Negev. I am going to present you our work with the title binarization free layout analysis for Arabic historical documents which I work on it under supervision of Prof. Jihad Elsana. Department of Computer Science Ben-Gurion University of the Negev

Table of Contents Problem definition Contribution Related work Method Experiments Results Conclusion I will make a problem definition and describe our contribution for it Then I will discuss some related works on this problem. Then I will explain our method and details of experiments Finally I will present the results and conclude.

Problem definition Side text Main text Page layout analysis is an important pre-processing step for the other document image processing algorithms. Given a document image, we first need to segment it into homogeneous regions and then classify these regions. For example in this paper we classify the regions as main text and side text.

Contribution Original Page layout analysis Non-binarized historical manuscript Complex layout Fully Convolutional Network (FCN) Patch based classification Pixel level fine segmentation We propose a method for page layout analysis of non-binarized historical manuscripts with complex layout as shown in the image here. This method is uses FCN and makes patch based predictions. And the output is a fine segmentation in pixel level

Related work Binarized Bukhari et al, 2012 Binarized historical manuscript Complex layout Multilayer Perceptron (MLP) Connected component based classification Post processing on test set Pixel level fine segmentation Previously Bukhari et al, 2012 made fine segmentation in pixel level on the same dataset with complex layouts. However their method differs from ours, by their input is binarized and they used MLP to make connected component based classification using context features, which in turn leads to redundant and expensive computation in overlapping contexts. In addition they do post processing with parameters that are fitted to test set.

Contribution Simple layout Complex layout Xu et al, 2017 Non-binarized historical manuscript Simple layout Fully Convolutional Network (FCN) Patch based classification Pixel level fine segmentation FCN has been already used for page layout analysis by Xu et al, 2017. On non binarized historical manuscripts to make pixel level fine segmentation using patch based classification. However as we can see in these images their dataset does not cover full range of difficulties in comparison to our dataset with multi-skewed and curved lines in multi orientations.

Method 8 x upsampled Prediction (FCN8) VGG-16 image conv1 pool1 conv2 4x conv7 2x pool4 pool3 m The FCN architecture we used is based on the FCN proposed for semantic segmentation by Long et al. Input is the image. Then we have convolutional blocks. Each block consists of several convolutional layers and a maxpooling layer. First five blocks follow the design of VGG 16 network except the discarded final layer. Instead of the final classification layer, two convolutional layers, conv6 and conv7 are added. This is a conventional Convolutional Network (CNN) and is called the encoder part of the FCN. Through the encoder, input image is downsampled by max pooling layers. Then the decoder part upsamples the downsampled outputs to input image size using transpose convolution. FCN8 upsamples conv7 layer 4 times, pool4 layer 2 times and add them to pool3 layer, then upsamples the summation 8 times to reach to input image size again. Resulting output is a tensor of m by n by number of classes. This tensor contains a classification score for each pixel in the input. FCN has made great improvements in object segmentation field [20]. It is an end to end segmentation framework that extracts the features and learns the classifier function simultaneously. n J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation”, 2015

Data Train Validation Test Number of pages 20 4 8 Set 1 (0<D<1) Number of patches 100.000 20.000 - Set 2 (0.83<D<1) D = density P(i,j) = pixel value m = image height n = image width We used a dataset of 20 pages for training, 4 pages for validation and 8 pages for testing. We created two sets Set1 contains randomly cropped 100.000 patches for training Set2 contains randomly cropped 100.000 patches for training with density grater than 0.83 Density is defined as the ratio of number of foreground pixels to the total number of pixels in a patch 𝑫= 𝒊=𝟎 𝒎 𝒋=𝟎 𝒏 𝑷(𝒊,𝒋) 𝒎×𝒏

Data 𝐷<0.83 Set 1 𝐷>0.83 Set 2 320×320 Our input size 224×224 Here is the difference between a patch with a density less than 0.83 and a patch with a density greater than 0.83. Means that it contains mostly text. We follow such a scheme since during the training with sett with faced with class imbalance problem. We also change the original input size of FCN8 network. Its original size 224 by 224 contains 2 lines at most. We changed it to be 320 by 320 to include 3 lines approximately, which means more context information. 𝐷<0.83 Set 1 𝐷>0.83 Set 2 320×320 Our input size 224×224 Original input size

Training Optimizer = Stochastic Gradient Descent Learning rate = 0.001 VGG-16 pre-trained weights Train on Set 1 Continue on Set 2 Train up to 50 epochs Save least loss model Train We train by Stochastic Gradient Descent (SGD) with a learning rate of 0.001. We initialize VGG-16 with its publicly available pre-trained weights. We first trained on set 1 until overfitting. Then using the output weights we further trained on set 2 And save the least loss model.

Quantitative results Main text F1 Side text F1 Set 1 0.90 0.68 Set 2 0.95 0.80 Bukhari et al Hera are the quantitative results In set 1