Using Word Based Features for Word Clustering The Thirteenth Conference on Language Engineering 11-12, December 2013 Department of Electronics and Communications,

Slides:

Advertisements

Similar presentations

Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.

Advertisements

Michele Merler Jacquilene Jacob.  Applications online are inherently insecure  Growing rate of hackers  Confidentiality of online systems should be.

JPEG Compresses real images Standard set by the Joint Photographic Experts Group in 1991.

QR Code Recognition Based On Image Processing

Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.

Word Spotting DTW.

Biomedical Person Identification via Eye Printing Masoud Alipour Ali Farhadi Ali Farhadi Nima Razavi.

Person Re-Identification Application for Android

ECE 562 Computer Architecture and Design Project: Improving Feature Extraction Using SIFT on GPU Rodrigo Savage, Wo-Tak Wu.

66: Priyanka J. Sawant 67: Ayesha A. Upadhyay 75: Sumeet Sukthankar.

Features for handwriting recognition. | 2 The challenge “Rappt JD 10 Feb no 175, om machtiging om af”

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

A Study of Approaches for Object Recognition

Distinguishing Photographic Images and Photorealistic Computer Graphics Using Visual Vocabulary on Local Image Edges Rong Zhang,Rand-Ding Wang, and Tian-Tsong.

Objective of Computer Vision

Detecting Image Region Duplication Using SIFT Features March 16, ICASSP 2010 Dallas, TX Xunyu Pan and Siwei Lyu Computer Science Department University.

Objective of Computer Vision

Iris localization algorithm based on geometrical features of cow eyes Menglu Zhang Institute of Systems Engineering

Handwritten Thai Character Recognition Using Fourier Descriptors and Robust C-Prototype Olarik Surinta Supot Nitsuwat.

Low Complexity Scalable DCT Image Compression IEEE International Conference on Image Processing 2000 Philips Research Laboratories, Eindhoven, Netherlands.

Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.

CS430 © 2006 Ray S. Babcock Lossy Compression Examples JPEG MPEG JPEG MPEG.

Presented by: Kamakhaya Argulewar Guided by: Prof. Shweta V. Jain

Image Pattern Recognition The identification of animal species through the classification of hair patterns using image pattern recognition: A case study.

The MPEG-7 Color Descriptors

Presented by Tienwei Tsai July, 2005

Ajay Kumar, Member, IEEE, and David Zhang, Senior Member, IEEE.

K. Zagoris, K. Ergina and N. Papamarkos Image Processing and Multimedia Laboratory Department of Electrical & Computer Engineering Democritus University.

An efficient method of license plate location Pattern Recognition Letters 26 (2005) Journal of Electronic Imaging 11(4), (October 2002)

S EGMENTATION FOR H ANDWRITTEN D OCUMENTS Omar Alaql Fab. 20, 2014.

COMPARISON OF IMAGE ANALYSIS FOR THAI HANDWRITTEN CHARACTER RECOGNITION Olarik Surinta, chatklaw Jareanpon Department of Management Information System.

September 23, 2014Computer Vision Lecture 5: Binary Image Processing 1 Binary Images Binary images are grayscale images with only two possible levels of.

Codec structuretMyn1 Codec structure In an MPEG system, the DCT and motion- compensated interframe prediction are combined. The coder subtracts the motion-compensated.

資訊工程系智慧型系統實驗室 iLab 南台科技大學 1 A Static Hand Gesture Recognition Algorithm Using K- Mean Based Radial Basis Function Neural Network 作者 :Dipak Kumar Ghosh,

1 Multiple Classifier Based on Fuzzy C-Means for a Flower Image Retrieval Keita Fukuda, Tetsuya Takiguchi, Yasuo Ariki Graduate School of Engineering,

Digital Image Processing Image Compression

Handwritten Recognition with Neural Network Chatklaw Jareanpon, Olarik Surinta Mahasarakham University.

Eyes detection in compressed domain using classification Eng. Alexandru POPA Technical University of Cluj-Napoca Faculty.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

Handwritten Hindi Numerals Recognition Kritika Singh Akarshan Sarkar Mentor- Prof. Amitabha Mukerjee.

Implementing GIST on the GPU. Refrence Original Work  Aude Oliva, Antonio Torralba  Modeling the shape of the scene: a holistic representation of the.

Fourier Descriptors For Shape Recognition Applied to Tree Leaf Identification By Tyler Karrels.

Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp

2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.

2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )

Implementation, Comparison and Literature Review of Spatio-temporal and Compressed domains Object detection. By Gokul Krishna Srinivasan Submitted to Dr.

1/18 New Feature Presentation of Transition Probability Matrix for Image Tampering Detection Luyi Chen 1 Shilin Wang 2 Shenghong Li 1 Jianhua Li 1 1 Department.

Content-Based Image Retrieval Using Block Discrete Cosine Transform Presented by Te-Wei Chiang Department of Information Networking Technology Chihlee.

The task of compression consists of two components, an encoding algorithm that takes a file and generates a “compressed” representation (hopefully with.

A Fast LBG Codebook Training Algorithm for Vector Quantization Presented by 蔡進義.

1 An Efficient Classification Approach Based on Grid Code Transformation and Mask-Matching Method Presenter: Yo-Ping Huang.

Introduction to JPEG m Akram Ben Ahmed

Preliminary Transformations Presented By: -Mona Saudagar Under Guidance of: - Prof. S. V. Jain Multi Oriented Text Recognition In Digital Images.

Optical Character Recognition

An Image Database Retrieval Scheme Based Upon Multivariate Analysis and Data Mining Presented by C.C. Chang Dept. of Computer Science and Information.

Recognition of biological cells – development

Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.

Features for handwriting recognition

Digital Image Processing Lecture 21: Lossy Compression May 18, 2005

Digital Image Processing Lecture 21: Lossy Compression

Hybrid Features based Gender Classification

Car License Plate Recognition

A new data transfer method via signal-rich-art code images captured by mobile devices Source: IEEE Transactions on Circuits and Systems for Video Technology,

4.2 Data Input-Output Representation

Hu Li Moments for Low Resolution Thermal Face Recognition

Discrete Cosine Transform (DCT)

Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa

Image Compression Techniques

Measuring the Similarity of Rhythmic Patterns

Amari Lewis Aidean Sharghi

Presentation transcript:

Using Word Based Features for Word Clustering The Thirteenth Conference on Language Engineering 11-12, December 2013 Department of Electronics and Communications, Faculty of Engineering Cairo University Research Team: Farhan M. A. Nashwan Prof. Dr. Mohsen A. A. Rashwan Presented By: Farhan M. A. Nashwan

Contribution:  Reduce vocabulary  Increase speed 2 The Thirteenth Conference on Language Engineering 11-12, December 2013

Generated Image word Preprocessing and Word segmentor Word Grouping Clustering Groups and Clusters for Holistic Recognition Proposed Approach: 3 The Thirteenth Conference on Language Engineering 11-12, December 2013

Grouping:  Extraction subwords (PAW)  Extraction dots and diacritics  Used it to select the group 4 The Thirteenth Conference on Language Engineering 11-12, December 2013

Grouping: 5 Secondaries separation using contour analysis Secondaries Recognition using SVM Grouping Process Groups Preprocessing and Word segmentor Generated Image Word

Grouping Example: 6 Grouping code (1,21,2) Grouping Code (3,0, 2) Grouping Code (4,11, 12) Grouping Code (3,2, 21) Grouping Code (2,0, 2) PAW=1 Upper Sec.=2 PAW=3 Down Sec.=0 Upper Sec.=2 PAW=4 Down Sec.=1&1 Upper Sec.=1 & 2 PAW=3 Down Sec.=2 Upper Sec.=2 &1 PAW=2 Down Sec.=0 Upper Sec.=2 Down Sec.= 2 & 1 The Thirteenth Conference on Language Engineering 11-12, December 2013

7 Challenges  Sticking  Sensitive to noise Treatments  PAWs  Down secondaries  Upper secondaries Grouping based on:  Overlapping  SVM The Thirteenth Conference on Language Engineering 11-12, December 2013

Clustering:  Complementary of grouping  LBG algorithm used  Done on groups contain large words  Euclidean distance used 8 The Thirteenth Conference on Language Engineering 11-12, December 2013 Groups Feature Extraction Clustering using LBG Clustering using LBG Clusters & Groups

Features : 1- (ICC): Image centroid and CellsImage centroid and Cells 2- (DCT):Discrete Cosine TransformDiscrete Cosine Transform 3- (BDCT):Block Discrete Cosine TransformBlock Discrete Cosine Transform 4-(DCT-4B): Discrete Cosine Transform 4- BlocksDiscrete Cosine Transform 4- Blocks 5- (BDCT+ICC):Hybrid BDCT with ICC. 6- (ICC+DCT): Hybrid DCT with ICC 7- (ICZ):Image Centroid and ZoneImage Centroid and Zone 8- (DCT+ICZ): Hybrid DCT and ICZ. 9- (DTW ):Dynamic Time WarpingDynamic Time Warping 10- The Moment Invariant FeaturesThe Moment Invariant Features 9 The Thirteenth Conference on Language Engineering 11-12, December 2013

Results : Word/ClusterGroup ER (%) Clustering ER (%) Total ER (%) Cluster Rate (%) Features ICC BDCT DCT DCT-4B ICC+BDCT ICC+ DCT IZC IZC+DCT DTW Moments TABLE 1: CLUSTERING RATE OF SIMPLIFIED ARABIC FONT USING DIFFERENT FEATURES 10 The Thirteenth Conference on Language Engineering 11-12, December 2013

To_Ave_Time (ms) Clus_Ave_Time (ms) Feat_Ext_Time (ms) Word/Cluster Cluster Rate (%) Features ICC BDCT DCT DCT-4B ICC+BDCT ICC+ DCT IZC IZC+DCT DTW Moments TABLE 2: PROCESSING TIME FOR FEATURE EXTRACTION AND CLUSTERING OF SIMPLIFIED ARABIC FONT USING DIFFERENT FEATURES 11 The Thirteenth Conference on Language Engineering 11-12, December 2013

Conclusion: based on their holistic features:  Recognition speed increased  unnecessary entries in the vocabulary removed  Total average time of ICC or Moments (0.29 ms) is better than that of other methods.  but the clustering rates are not the best (98.69% for ICC and 82.61% for Moment).  the clustering rate of DCT (99.19%) is the better, but time is the worst (~12 ms).  With two parameters (clustering rate and time) ICC may be a good compromise. 12 The Thirteenth Conference on Language Engineering 11-12, December 2013

Thanks for your attention.. 13 The Thirteenth Conference on Language Engineering 11-12, December 2013

Go Back counting the number of black pixels Vertical transitions from black to white horizontal transitions from black to white 14 The Thirteenth Conference on Language Engineering 11-12, December 2013

Go Back DCT. -Applying DCT to the whole word image -The features are extracted in a vector form by using the DCT coefficient set in a zigzag order. -Usually we get the most significant DCT coefficients(160 coef.) 15 The Thirteenth Conference on Language Engineering 11-12, December 2013

Go Back Block Discrete Cosine Transform (BDCT) Apply the DCT transform for each cell Get the average of the differences between all the DCT coefficients 16 The Thirteenth Conference on Language Engineering 11-12, December 2013

Go Back Discrete Cosine Transform 4-Blocks (DCT-4B) 1- Compute the center of gravity of the input image. 2- Divide the word image into 4-parts taking the center of gravity as the origin point. 3- Apply the DCT transform for each Part. 4- Concatenate the features taken from each part to form the feature set of the given word. 17 The Thirteenth Conference on Language Engineering 11-12, December 2013

Go Back Image Centroid and Zone (ICZ) Compute the average distance among these points (in a given zone) and the centroid of the word image 18 The Thirteenth Conference on Language Engineering 11-12, December 2013

Go Back DTW (Dynamic Time Warping) Features. The three types of features are extracted from the binarized images and used in our DTW techniques: X-axis and Y-axis Histogram Profile Profile Features(Upper, Down, Left and Right) Forground/Background Transition DTW) is an algorithm for measuring similarity between two sequences The distance between two time series x1... xM and y1... yN is D(M,N), that is calculated in a dynamic programming approach using 19 The Thirteenth Conference on Language Engineering 11-12, December 2013

Go Back DTW (Dynamic Time Warping) Features. 20 The Thirteenth Conference on Language Engineering 11-12, December 2013 Figure 1: The Four Profiles Features: (A) Left Profile. B) Up (C) Down Profile. D) Right Profile

Go Back The Moment Invariant Features Hu moments: Hu defined seven values, computed from central moments through order three 21 The Thirteenth Conference on Language Engineering 11-12, December 2013

Go Back 22 The Thirteenth Conference on Language Engineering 11-12, December 2013

Go Back Moments 23 The Thirteenth Conference on Language Engineering 11-12, December 2013 The moment invariant descriptors are calculated and fed to the feature vector