Download presentation
Presentation is loading. Please wait.
Published byGabriella Cross Modified over 8 years ago
1
MobileASL: Intelligibility of Sign Language as Constrained by Mobile Phone Technology Richard Ladner, Eve Riskin Dane Barney, Anna Cavender, Neva Cherniavsky, Jessica DeWitt, Rahul Vanam University of Washington Sheila Hemami, Frank Ciaramello Cornell University
2
2 ASL ASL is the preferred language for over 1,000,000 Deaf people in the U.S. ASL is not a code for English Signs usually occur within the “sign-box” Composed of location, orientation, shape of hands and arms + facial expressions Usually uses 2 hands, but one-handed signing not uncommon
3
3 Challenges: Limited network bandwidth Limited processing power on cell phones Our goal: ASL communication using video cell phones over current U.S. cell phone network
4
4 Cell Phone Network Constraints MobileASL is about fair access to the current network –As soon as possible, no special accommodations –Not geographically limited –Lower bitrate = more accessible 3G = 3 rd Generation –Special service, perhaps more expensive –Not yet widespread –Will still have congestion Low bit rate goal –GPRS (General Packet Radio Service) Ranges from 30kbps to 80kbps (download) Perhaps half that for upload
5
5 Codec Used: x264 Open source implementation of H.264 standard –Doubles compression ratio over MPEG2 x264 offers faster encoding Off-the-shelf H.264 decoder can be used
6
6 Outline MobileASL Focus Group Eyetracking Motivation User Studies Encoder Complexity Rate, Distortion, Complexity Optimization Activity Recognition Spatial Compression
7
7 MobileASL Focus Group 4 Deaf people, mid-20s to mid-40s, 1 hour Open ended questions: –Physical Setup Camera, distance, … –Features Compatibility, text, … –Privacy Concerns ASL is a visual language –Scenarios Lighting, driving, relay services, … Anna Cavender, cavender@cs.washington.edu
8
8 Implications of Focus Group “I don’t foresee any limitations. I would use the phone anywhere: the grocery store, the bus, the car, a restaurant, … anywhere!” There is a need within the Deaf Community for mobile ASL conversations Existing video phone technology (with minor modifications) would be usable Anna Cavender, cavender@cs.washington.edu
9
9 Outline MobileASL Focus Group Eyetracking Motivation User Studies Encoder Complexity Rate, Distortion, Complexity Optimization Activity Recognition Spatial Compression
10
10 Eyetracking Studies Participants watched ASL videos while eye movements were tracked Important regions of the video could be encoded differently Muir et al. (2005) and Agrafiotis et al. (2003) Anna Cavender, cavender@cs.washington.edu
11
11 Eyetracking Results 95% of eye movements within 2 degrees visual angle of the signer’s face (demo) Implications: Face region of video is most visually important –Detailed grammar in face requires foveal vision –Hands and arms can be viewed in peripheral vision Anna Cavender, cavender@cs.washington.edu
12
12 Outline MobileASL Focus Group Eyetracking Motivation User Studies Encoder Complexity Rate, Distortion, Complexity Optimization Activity Recognition Spatial Compression
13
13 Video Phone Study: ROI Varied quality in fixed-sized region around the face 2x quality in face 4x quality in face Anna Cavender, cavender@cs.washington.edu
14
14 Video Phone Study: FPS Varied frame rate: 10 fps and 15 fps For a given bit rate: Fewer frames = more bits per frame Anna Cavender, cavender@cs.washington.edu
15
15 Implications of results A mid-range ROI was preferred –Optimal tradeoff between clarity in face and distortion in rest of “sign-box” Lower frame rate preferred –Optimal tradeoff between clarity of frames and number of frames per second Results independent of bit rate Anna Cavender, cavender@cs.washington.edu
16
16 Outline MobileASL Focus Group Eyetracking Motivation User Studies Encoder Complexity Rate, Distortion, Complexity Optimization Activity Recognition Spatial Compression
17
17 Encoder Complexity Processors on cell phones are limited Encoding videos is computationally intensive Encoder needs to be real-time Simpler encoding parameters are available, but quality suffers Faster encoding = less quality Dane Barney, dane@cs.washington.edu
18
18 Rate, distortion and complexity optimization H.264 encoder Input parameters Raw video Goal: Best possible quality for least encoding time at a given bitrate Distortion Encoding time Rahul Vanam, rahulv@u.washington.edu
19
19 Rate, distortion and complexity comparison Testing with 10 ASL videos at 30 kb/s Max difference in PSNR = 0.68 dB Amount of time reduction = 99% Rahul Vanam, rahulv@u.washington.edu
20
20 Encoder Complexity Study - Speed
21
21 Implications of Results We can increase the encoding time from 2 to 7.3 frames per second without significantly affecting intelligibility! We are closer to our goal of 10 fps. Rahul Vanam, rahulv@u.washington.edu and Anna Cavender, cavender@cs.washington.edu
22
22 Outline MobileASL Focus Group Eyetracking Motivation User Studies Encoder Complexity Rate, Distortion, Complexity Optimization Activity Recognition Spatial Compression
23
23 Activity Recognition Motivation: –Finger spelling requires a higher bit rate and/or frame rate for intelligibility than signing –Don’t waste bits when the user is not signing (i.e., is “listening” as opposed to “speaking”) Goal: –Recognize these three states: finger spelling, signing, not signing –Perform recognition in real time Neva Cherniavsky, nchernia@cs.washington.edu
24
24 Solution Use H.264 motion vectors as input to system Use probabilistic techniques to automatically recognize activity –Hidden Markov Models –Kalman filters Neva Cherniavsky, nchernia@cs.washington.edu
25
25 Outline MobileASL Focus Group Eyetracking Motivation User Studies Encoder Complexity Rate, Distortion, Complexity Optimization Activity Recognition Spatial Compression
26
26 Spatial Compression Dynamic Region-of-Interest –Skin detection algorithms New video quality metric based on ASL intelligibility –Traditional video quality measures, such as PSNR, are not good measures of intelligibility Frank Ciaramello, fmc3@cornell.edu
27
27 Challenges: Limited network bandwidth Limited processing power on cell phones Conclusion Video compression for ASL communication using video cell phones over current U.S. cell phone network
28
28 MobileASL Intelligibility of Sign Language as Constrained by Mobile Phone Technology Supported by NSF CCF-0514353 and an NSF graduate fellowship http://www.cs.washington.edu/research/MobileASL *** Come see our poster and talk to Anna and Neva in room 615 ***
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.