GW2003, Genoa April, 2003 GesRec3D: A real-time coded gesture-to-speech system with automatic segmentation and recognition thresholding using dissimilarity.

Slides:

Advertisements

Similar presentations

Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.

Advertisements

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.

CHE 185 – PROCESS CONTROL AND DYNAMICS

Standard Algorithms. Many algorithms appear over and over again, in program after program. These are called standard algorithms You are required to know.

Virtual Dart: An Augmented Reality Game on Mobile Device Supervisor: Professor Michael R. Lyu Prepared by: Lai Chung Sum Siu Ho Tung.

The Maryland Optics Group Parametric Studies for FSO Transceiver Pointing Tzung-Hsien Ho, Sugianto Trisno, Stuart D. Milner*, and Christopher C. Davis.

Motion Detection And Analysis Michael Knowles Tuesday 13 th January 2004.

Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 12: Sequence Analysis Martin Russell.

Boundary Detection Jue Wang and Runhe Zhang. May 17, 2004 UCLA EE206A In-class presentation 2 Outline Boundary detection using static nodes Boundary detection.

Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.

Learn how to make your drawings come alive…  NEW COURSE: SKETCH RECOGNITION Analysis, implementation, and comparison of sketch recognition algorithms,

Learn how to make your drawings come alive…  Lecture 2: SKETCH RECOGNITION Analysis, implementation, and comparison of sketch recognition algorithms,

Virtual Dart – An Augmented Reality Game on Mobile Device Supervised by Prof. Michael R. Lyu LYU0604Lai Chung Sum ( )Siu Ho Tung ( )

Hand Movement Recognition By: Tokman Niv Levenbroun Guy Instructor: Todtfeld Ari.

Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.

Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.

Gesture Recognition Using Laser-Based Tracking System Stéphane Perrin, Alvaro Cassinelli and Masatoshi Ishikawa Ishikawa Namiki Laboratory UNIVERSITY OF.

1//hw Cherniak Software Development Corporation ARM Features Presentation Alacrity Results Management (ARM) Major Feature Description.

Graph-based Segmentation. Main Ideas Convert image into a graph Vertices for the pixels Vertices for the pixels Edges between the pixels Edges between.

Travel Speed Study of Urban Streets Using GPS &GIS Tom E. Sellsted City of Yakima, Washington Information Systems and Traffic.

Implementing a Speech Recognition System on a GPU using CUDA

© 2009 Bentley Systems, Incorporated Richard W. Bradshaw Director of Development – Rail Design InRoads Geometry Update.

Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.

Recognition, Analysis and Synthesis of Gesture Expressivity George Caridakis IVML-ICCS.

STARDUST – Speech Training And Recognition for Dysarthric Users of Assistive Technology Mark Hawley et al Barnsley District General Hospital and University.

Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.

UNOCODE 299 Copyright Silca S.p.A UNOCODE 299.

E.g.: MS-DOS interface. DIR C: /W /A:D will list all the directories in the root directory of drive C in wide list format. Disadvantage is that commands.

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.

KAMI KITT ASSISTIVE TECHNOLOGY Chapter 7 Human/ Assistive Technology Interface.

PatternHunter: A Fast and Highly Sensitive Homology Search Method Bin Ma Department of Computer Science University of Western Ontario.

Segmentation of Vehicles in Traffic Video Tun-Yu Chiang Wilson Lau.

By: David Gelbendorf, Hila Ben-Moshe Supervisor : Alon Zvirin

Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.

Frank Bergschneider February 21, 2014 Presented to National Instruments.

VESSEL SETTINGS. Vessel Functions Each mobile can have separate settings for each Area Map. Main Vessel (designation) Display Shape (toggle) Select Shape.

Data Analytics Framework for A Game-based Rehabilitation System Jiongqian (Albert) Liang*, David Fuhry*, David Maung*, Alexandra Borstad +, Roger Crawfis*,

Filters– Chapter 6. Filter Difference between a Filter and a Point Operation is that a Filter utilizes a neighborhood of pixels from the input image to.

UWave: Accelerometer-based personalized gesture recognition and its applications Tae-min Hwang.

Graph-based Segmentation

LabVIEW Real Time for High Performance Control Applications

When CSI Meets Public WiFi: Inferring Your Mobile Phone Password via WiFi Signals Adekemi Adedokun May 2, 2017.

EYE-GAZE COMMUNICATION

San Diego May 22, 2013 Giovanni Saponaro Giampiero Salvi

핵심어 검출을 위한 단일 끝점 DTW 알고리즘 Yong-Sun Choi and Soo-Young Lee

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Accelerometer-Based Character Recognition Pen

Supervised Time Series Pattern Discovery through Local Importance

Software Design Team KANG Group 1.

Motion Detection And Analysis

Advised by Professor Baird Soules

Standard Algorithms Higher Computing.

Database Performance Tuning and Query Optimization

Chapter 2: Input and output devices

Tremor Detection Using Motion Filtering and SVM Bilge Soran, Jenq-Neng Hwang, Linda Shapiro, ICPR, /16/2018.

CIS 470 Mobile App Development

CSCI1600: Embedded and Real Time Software

Visual-based ID Verification by Signature Tracking

Connected Word Recognition

Chapter 11 Database Performance Tuning and Query Optimization

EE 492 ENGINEERING PROJECT

Accelerometer-Based Character Recognition Pen

Fourier Transform of Boundaries

CSCI1600: Embedded and Real Time Software

Multiple features Linear Regression with multiple variables

Multiple features Linear Regression with multiple variables

easYgen-3000XT Series Training

Chapter 4 . Trajectory planning and Inverse kinematics

Presentation transcript:

GW2003, Genoa April, 2003 GesRec3D: A real-time coded gesture-to-speech system with automatic segmentation and recognition thresholding using dissimilarity measures Michael P. Craven, School of Engineering, University of Technology, Jamaica <michael.craven@ieee.org> K. Mervyn Curtis, Department of Mathematics and Computer Science, University of the West Indies, Jamaica Work carried out at University of Nottingham, School of Electrical and Electronic Engineering in collaboration with Access to Communication and Technology, Regional Rehabilitation Centre, Oak Tree Lane Centre, Selly Oak, Birmingham. Funded by Action Research grant.

Motivations and Issues Apply gesture recognition to severely disabled users (cerebral palsy, stroke) Augmentative and Alternative Communication (AAC) e.g. gesture-to-speech environmental control e.g. opening doors, operating appliances replace mouse buttons in PC applications Segment and recognise ‘crude’ gestures be less reliant on fine motor control maintain spatial and temporal differences filter out ‘spurious’ movements Control over recognition confidence robust acceptance/rejection strategy reduce confusion between gestures, but avoid excessive rejection may be safety critical Human factors user fatigue: incremental training, short overall training time understandability: for both disabled users and their helpers

GesRec3D gesture-to-speech system

GesRec3D: Summary Gesture->Text->Speech system MS Windows application running on PC with Soundblaster card Polhemus Fastrak tracker (1 to 4 sensors, 20 samples/sec) Up to 30 user-defined gestures linked to a user-defined (or preset) table of words/phrases, spoken by TextAssist speech engine Minimising Fatigue On-line segmentation for fast training & recognition Only 5 examples of each gesture Incremental acquisition (or removal) of gesture examples Other features Speech and/or text to prompt user input Sensitive to differences in scale and duration but invariant to gesture start location User control over segmentation & rejection/confusion trade-off

Real-time on-line segmentation Continuation condition FALSE Min. Duration condition FALSE Time-out condition FALSE Continuation condition FALSE RESET Starting condition TRUE GESTURE Continuation condition FALSE, Min. Duration condition TRUE END [start timer] Time-out condition TRUE Starting condition FALSE Continue condition TRUE Parameters 1. Starting speed 2. Continuation speed 3. Minimum duration 4. Time-out interval 5. Pause interval Add to training set, Pause Recognise

On-line segmentation video

Training - dissimilarity measure Compare 2 segmented gestures Ga(x,y,z) and Gb(x,y,z), of lengths ma and mb Dissimilarity measure dab - accumulated ‘city block’ distance same length 1) (m = ma = mb ) different lengths dynamic time-warping (non-linear optimal match) - slowest linearly interpolate shorter gesture and use 1) - faster pad shorter gesture with zeros and use 2) - fastest (ma > mb ) 2)

Training - rejection threshold Train C gesture classes, n examples of each Calculate nCnC dissimilarity matrix e.g. 60x60 elements for n=5, C=12 (Note: scales with both n2 and C2) For each class, find worst match internal to class, dint (largest value) find best match external to class, dext (smallest value) calculate rejection threshold Default global rejection parameter K=1 (midpoint threshold) Decrease K for stricter rejection May also set bounds on dth

Recognition - algorithms Best match between unknown gesture and any in training set is minimum distance dmin Single sensor: 1.Acquire gesture and compare with training set for best match dmin 2. Find gesture class corresponding to dmin 3. If dmin<dth select that class, otherwise reject gesture 4. Perform action linked to selected gesture class Multiple sensors: 1. Find class with dmin for each sensor 2. (optional: reject gesture if classes are different) 3. Find dth for each class for all sensors 4. Add the dmin 5. Add the dth 6. If dmin < dth select class corresponding to ‘primary’ sensor, otherwise reject gesture 7. Perform action linked to selected gesture class

Experiment 1- Shape gestures Multiple sensors Fast Slow

Results - Shape gestures Hit rates between 82-96% 100 further arbitrary gestures all rejected Spurious short gestures rejected by segmentation algorithm Fewer misses from confusion than rejection Fast training - 5 minutes to input 60 gestures (5 examples x 12 classes) Fast 60x60 dissimilarity matrix calculation (on Pentium 133MHz): Zero padded - 0.06sec Linear Interpolation - 0.6sec DP - 7.5 sec

Dissimilarity data - one row

Experiment 2 - Greeting gestures Figures in brackets demonstrate use of a stricter threshold to obtain lower confusion - global threshold K reduced by 10%

Dissimilarity data - multiple sensors

Research Directions Design alternative algorithms for multiple sensors e.g. incorporate arm model Use dissimilarity data to suggest ‘better’ gestures Further filter out ‘spurious’ movements e.g. tremor Design mobile tracking device with wireless sensors Improve user interface more intuitive control over recognition parameters esp. for helpers assess user motivation esp. for children investigate memorability of gestures