A language assistant system for smart glasses

Slides:

Advertisements

Similar presentations

Video Object Tracking and Replacement for Post TV Production LYU0303 Final Year Project Spring 2004.

Advertisements

Object Recognition from Local Scale-Invariant Features David G. Lowe Presented by Ashley L. Kapron.

QR Code Recognition Based On Image Processing

Word Spotting DTW.

Automatic Feature Extraction for Multi-view 3D Face Recognition

IIIT Hyderabad Pose Invariant Palmprint Recognition Chhaya Methani and Anoop Namboodiri Centre for Visual Information Technology IIIT, Hyderabad, INDIA.

Real-Time Camera-Based Character Recognition Free from Layout Constraints M. Iwamura, T. Tsuji, A. Horimatsu, and K. Kise.

Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.

Move With Me S.W Graduation Project An Najah National University Engineering Faculty Computer Engineering Department Supervisor : Dr. Raed Al-Qadi Ghada.

Model: Parts and Structure. History of Idea Fischler & Elschlager 1973 Yuille ‘91 Brunelli & Poggio ‘93 Lades, v.d. Malsburg et al. ‘93 Cootes, Lanitis,

Virtual Dart: An Augmented Reality Game on Mobile Device Supervisor: Professor Michael R. Lyu Prepared by: Lai Chung Sum Siu Ho Tung.

LYU0603 A Generic Real-Time Facial Expression Modelling System Supervisor: Prof. Michael R. Lyu Group Member: Cheung Ka Shun ( ) Wong Chi Kin ( )

Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,

System Design and Analysis

LYU0203 Smart Traveller with Visual Translator for OCR and Face Recognition Supervised by Prof. LYU, Rung Tsong Michael Prepared by: Wong Chi Hang Tsang.

Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.

ART: Augmented Reality Table for Interactive Trading Card Game Albert H.T. Lam, Kevin C. H. Chow, Edward H. H. Yau and Michael R. Lyu Department of Computer.

CS 223B Assignment 1 Help Session Dan Maynes-Aminzade.

Recognizing and Tracking Human Action Josephine Sullivan and Stefan Carlsson.

Augmented Reality: Object Tracking and Active Appearance Model

Smart Traveller with Visual Translator. What is Smart Traveller? Mobile Device which is convenience for a traveller to carry Mobile Device which is convenience.

Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.

Vision-Based Biometric Authentication System by Padraic o hIarnain Final Year Project Presentation.

EE392J Final Project, March 20, Multiple Camera Object Tracking Helmy Eltoukhy and Khaled Salama.

Automatic Camera Calibration

Knowledge Systems Lab JN 8/24/2015 A Method for Temporal Hand Gesture Recognition Joshua R. New Knowledge Systems Laboratory Jacksonville State University.

October 8, 2013Computer Vision Lecture 11: The Hough Transform 1 Fitting Curve Models to Edges Most contours can be well described by combining several.

TH-OCR NK. content introduction go to next page background assumptions overall structure chart IPO for overall structure dataflow diagram of overall structure.

Introduction to Systems Analysis and Design Trisha Cummings.

AdvisorStudent Dr. Jia Li Shaojun Liu Dept. of Computer Science and Engineering, Oakland University 3D Shape Classification Using Conformal Mapping In.

Shape Recognition and Pose Estimation for Mobile Augmented Reality Author ： N. Hagbi, J. El-Sana, O. Bergig, and M. Billinghurst Date ： Speaker.

Digitizing Transmuter. Extracting relevant information from the electronic media into digitized form and accumulating the information bank for further.

GOOGLE GLASS Contents:-  Introduction  Technologies used  How it works?  Advantages  Disadvantages  Future scope  Conclusion.

A Method for Hand Gesture Recognition Jaya Shukla Department of Computer Science Shiv Nadar University Gautam Budh Nagar, India Ashutosh Dwivedi.

Course 12 Calibration. 1.Introduction In theoretic discussions, we have assumed: Camera is located at the origin of coordinate system of scene.

Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.

Learning Phonetic Similarity for Matching Named Entity Translation and Mining New Translations Wai Lam, Ruizhang Huang, Pik-Shan Cheung ACM SIGIR 2004.

S EGMENTATION FOR H ANDWRITTEN D OCUMENTS Omar Alaql Fab. 20, 2014.

Generalized Hough Transform

Combining geometry and domain knowledge to interpret hand-drawn diagrams As Presented By: Andrew Campbell Christopher Dahlberg.

Stable Multi-Target Tracking in Real-Time Surveillance Video

Automated Reading Assistance System Using Point-of-Gaze Estimation M.A.Sc. Thesis Presentation Automated Reading Assistance System Using Point-of-Gaze.

Hand Gesture Recognition Using Haar-Like Features and a Stochastic Context-Free Grammar IEEE 高裕凱陳思安.

By: David Gelbendorf, Hila Ben-Moshe Supervisor : Alon Zvirin

Team Members Ming-Chun Chang Lungisa Matshoba Steven Preston Supervisors Dr James Gain Dr Patrick Marais.

Chapter 1: Image processing and computer vision Introduction

1 Faculty of Information Technology Enhanced Generic Fourier Descriptor for Object-Based Image Retrieval Dengsheng Zhang, Guojun Lu Gippsland School of.

Image Text & Audio hacks. Introduction Image Processing is one of the fastest growing technology in the field of computer science. It is a method to convert.

Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects for Blind Persons.

Calibration ECE 847: Digital Image Processing Stan Birchfield Clemson University.

Automatic License Plate Recognition for Electronic Payment system Chiu Wing Cheung d.

Vision-based Android Application for GPS Assistance in Tunnels

Hand Gestures Based Applications

The design of smart glasses for VR applications The CU-GLASSES

Article Review Todd Hricik.

Fast Preprocessing for Robust Face Sketch Synthesis

Fitting Curve Models to Edges

Text Detection in Images and Video

IMAGE BASED VISUAL SERVOING

Introduction to Systems Analysis and Design

Optical Character Recognition

Chapter 1: Image processing and computer vision Introduction

Geometric Hashing: An Overview

Improving Retrieval Performance of Zernike Moment Descriptor on Affined Shapes Dengsheng Zhang, Guojun Lu Gippsland School of Comp. & Info Tech Monash.

Brief Review of Recognition + Context

MSC projects for for CMSC5720(term1), CMSC5721(term2)

Chapter V Vertex Processing

An Infant Facial Expression Recognition System Based on Moment Feature Extraction C. Y. Fang, H. W. Lin, S. W. Chen Department of Computer Science and.

Spatial Data Entry via Digitizing

Introduction to Artificial Intelligence Lecture 22: Computer Vision II

Presentation transcript:

A language assistant system for smart glasses Shi Fan Zhang, Kin Hong Wong The Dept. of Computer Science and Engineering The Chinese University of Hong Kong Shatin, Hong Kong Email:khwong.cse.cuhk.edu.hk ICNGC2017b A language assistant system for smart glasses v.7d

ICNGC2017b A language assistant system for smart glasses v.7d Overview Introduction and the idea Background Theory The algorithm Affine transform Quadrangle detection Affine Transformation Optical Character Recognition OCR Finger location Pose estimation Results Conclusions and discussions ICNGC2017b A language assistant system for smart glasses v.7d

ICNGC2017b A language assistant system for smart glasses v.7d Introduction Developed a language assistant system May be used for people using wearable smart glasses. The user points to a word he needs language translation, our system will recognize the word And report to the user using sound or images. Tested with satisfactory results. ICNGC2017b A language assistant system for smart glasses v.7d

ICNGC2017b A language assistant system for smart glasses v.7d Background Smart cameras are becoming very popular The Hong Kong Police Force is also running trials on "Body Worn Video Camera" for evidence capturing and recording [2]. Some gadgets even equipped with Global Positioning System (GPS), High Density camera, Head-up-display (HUD) and smartphone connectivity. Personal communication capabilities are also included in some systems, for example, Golden-i [4] features a wearable computer at work, and allows users to send/receive email, or browse webpages or files. The Google Glass is also an example. It may also handle face recognition and computer vision tasks. Some people are working in this direction and exploring new ways of using it [5]. ICNGC2017b A language assistant system for smart glasses v.7d

Theory and Methodology Our system will help you in Language translating First, the rectangular image of the paper with the text should be detected. The paper may not be perpendicular to the user’s view, the text may be geometrically distorted. To fix, we recertify the image so the texts will appear normal. Four corners of the rectangular image are recognized and then affine transformation is applied to rectify the image. Then OCR will be carried out. ICNGC2017b A language assistant system for smart glasses v.7d

ICNGC2017b A language assistant system for smart glasses v.7d The algorithm Locate the quadrangle which is the boundary of the paper. Since the 3D geometry of the paper is known (assume A4 size paper), so using a pose estimation technique we can find the pose of the paper. Another approach is to use affine transform to rectify the image to the normal pose. The next step is to use a finger pointing algorithm to find where the user wants the text to be located on the paper. So we should know which part of the image contains the text to be translated. Use the rectified image as the input of the OCR system. ICNGC2017b A language assistant system for smart glasses v.7d

ICNGC2017b A language assistant system for smart glasses v.7d Quadrangle detection K. K. Lee [6] combines Randomized Hough Transform and Q-corners to facilitate multiple-quadrilateral detection. A similar idea can be applied to rectangle corner detection. Our approach is based on the method of Hough transform. First all lines inside the image are selected, our target is to find the four corners that constitute the quadrilateral of the paper. In our processing cycle, first we pick two detected lines and calculate their intersection point. If the intersection point is not out of the boundary, one score is voted for this point. We repeat the above for all line pairs, and then four points with the highest scores are the four points of the quadrangle. ICNGC2017b A language assistant system for smart glasses v.7d

ICNGC2017b A language assistant system for smart glasses v.7d Demo https://youtu.be/aJd4BenNkTA ICNGC2017b A language assistant system for smart glasses v.7d

ICNGC2017b A language assistant system for smart glasses v.7d Homography transform Three conditions under which Optical Character Recognition (OCR) has excellent performance: The text region is normal to the principal axis of the camera lens. The text is not skewed. The text is clear and distinct from the background. Therefore, affine transformation is applied to rectify the frames so that the text lines are not geometrically distorted. OpenCV [7] provides several functions to rotate the input frames and are used in our project. ICNGC2017b A language assistant system for smart glasses v.7d

ICNGC2017b A language assistant system for smart glasses v.7d Homography transform Where X-Y are the coordinates to be calculated in the second reference system, given coordinates x-y in the first reference system in function of 8 transformation parameters a, b, c, d, e, f, g, h. ICNGC2017b A language assistant system for smart glasses v.7d http://www.corrmap.com/features/homography_transformation.php

homography transform to rectify the text Input: Index of the nearest text region Output: The recognized string of text ICNGC2017b A language assistant system for smart glasses v.7d

Optical Character Recognition OCR Use Tesseract for our OCR task. Tesseract [8] Achieves high accuracy when dealing with printed English characters. Since our users may be Chinese, and we found that for the Chinese text, Tesseract performs poorly if all parameters are set to be default values. Often one Chinese character is split into two so it is obvious that Tesseract is difficult to recognize those characters with radicals (parts of a Chinese character). To fix the problem, some parameters (for instance, set psm=6) are adjusted to inform Tesseract that the targets are of the same shape. An example of successfully Chinese text recognition is shown in Fig. 2. ICNGC2017b A language assistant system for smart glasses v.7d

ICNGC2017b A language assistant system for smart glasses v.7d OCR test We performed the experiments using (1) default setting and (2) adjusted parameter setting; the results are shown in Table 1 and 2, respectively. From the results, it can be found that Tesseract recognizes Chinese characters precisely with adjusted parameters, regardless whether the characters are printed in standard format or some strokes are eroded to some extent. It should be mentioned that Tesseract can be trained manually. Although the official dataset contains huge chunks of trained data, it may fail to completely recognize all the characters without mistake. However, related documents declare that official dataset cannot be revised. Manual training the Tesseract needs to start everything from scratch, which can be a hard task. So it may be the next target we will achieve in future. ICNGC2017b A language assistant system for smart glasses v.7d

Test result Table 1: Tesseract recognizes Chinese characters with default parameters Table 2: Tesseract recognizes Chinese characters with adjusted parameters Number of all characters recognized characters Overall accuracy High quality characters 75 73 97.3% Low quality characters 264 180 68.2% Number of all characters recognized characters Overall accuracy High quality characters 75 74 98.7% Low quality characters 264 262 99.2% ICNGC2017b A language assistant system for smart glasses v.7d

ICNGC2017b A language assistant system for smart glasses v.7d Finger location The feature of human hand includes: Five fingers. Each finger is a convex polygon The space between two adjacent fingers forms a hollow region. First, the background is removed, Our system will perform image thresholding, contour and convex hull finding. The contour with maximum area (target contour) should form the image of our hand. If we calculate the distance between points on the contour and the convex hull, several local maxima can be found. They correspond to the hollow region between adjacent fingers. We can select the top point on the target contour and it is the fingertip of the finger. A small image area above the fingertip which contains the target word will be selected. https://youtu.be/qsNmY7OW5_s ICNGC2017b A language assistant system for smart glasses v.7d

Perspective-Four-Point Objective Our proposal Perspective-Four-Point Kalman Filter Experiments & Results Conclusion Perspective-Four-Point Suppose we have four model points defined as t=0: Their image projection points can be detected as: ICNGC2017b A language assistant system for smart glasses v.7d

Pose Estimation for future development Pose estimation using 4 points. A video demonstration of our implementation can be found at https://youtu.be/1vMNN4Gbty8 https://www.youtube.com/watch?v=diN565iTPdM ICNGC2017b A language assistant system for smart glasses v.7d

Results and Demonstration We integrated all modules The OCR module can look up its meaning in the dictionary and show the result on screen. As shown in Fig.4, the user points to a word (right window), OCR is ued for recognition. The result is the text of the target image. Then, the recognized text is shown on screen (left window). A video demonstration of this process running in real-time can be found at Our new test https://youtu.be/ZVW7IUBM0R8 https://www.youtube.com/watch?v=cepRpxxQbDs ICNGC2017b A language assistant system for smart glasses v.7d

Conclusions and Discussions From the tests, our idea of building a language assistant for smart glass users is feasible. The camera can detect the quadrangle of the paper. Affine transform is applied successful to rectify the text of the page. A finger pointing detection system can be used to find the text to be translated. An Optical Character Recognition (OCR) system translates the word into text. The meaning of the text can be then be found by an electronic dictionary. The whole process is in real-time and can be very useful for students or travelers where instant language translation of foreign words is necessary. ICNGC2017b A language assistant system for smart glasses v.7d

ICNGC2017b A language assistant system for smart glasses v.7d Q&A Thank you ICNGC2017b A language assistant system for smart glasses v.7d