Presentation is loading. Please wait.

Presentation is loading. Please wait.

MobileASL: Intelligibility of Sign Language as Constrained by Mobile Phone Technology Richard Ladner, Eve Riskin Dane Barney, Anna Cavender, Neva Cherniavsky,

Similar presentations


Presentation on theme: "MobileASL: Intelligibility of Sign Language as Constrained by Mobile Phone Technology Richard Ladner, Eve Riskin Dane Barney, Anna Cavender, Neva Cherniavsky,"— Presentation transcript:

1 MobileASL: Intelligibility of Sign Language as Constrained by Mobile Phone Technology Richard Ladner, Eve Riskin Dane Barney, Anna Cavender, Neva Cherniavsky, Jessica DeWitt, Rahul Vanam University of Washington Sheila Hemami, Frank Ciaramello Cornell University

2 2 ASL ASL is the preferred language for over 1,000,000 Deaf people in the U.S. ASL is not a code for English Signs usually occur within the “sign-box” Composed of location, orientation, shape of hands and arms + facial expressions Usually uses 2 hands, but one-handed signing not uncommon

3 3 Challenges: Limited network bandwidth Limited processing power on cell phones Our goal: ASL communication using video cell phones over current U.S. cell phone network

4 4 Cell Phone Network Constraints MobileASL is about fair access to the current network –As soon as possible, no special accommodations –Not geographically limited –Lower bitrate = more accessible 3G = 3 rd Generation –Special service, perhaps more expensive –Not yet widespread –Will still have congestion Low bit rate goal –GPRS (General Packet Radio Service) Ranges from 30kbps to 80kbps (download) Perhaps half that for upload

5 5 Codec Used: x264 Open source implementation of H.264 standard –Doubles compression ratio over MPEG2 x264 offers faster encoding Off-the-shelf H.264 decoder can be used

6 6 Outline MobileASL Focus Group Eyetracking Motivation User Studies Encoder Complexity Rate, Distortion, Complexity Optimization Activity Recognition Spatial Compression

7 7 MobileASL Focus Group 4 Deaf people, mid-20s to mid-40s, 1 hour Open ended questions: –Physical Setup Camera, distance, … –Features Compatibility, text, … –Privacy Concerns ASL is a visual language –Scenarios Lighting, driving, relay services, … Anna Cavender, cavender@cs.washington.edu

8 8 Implications of Focus Group “I don’t foresee any limitations. I would use the phone anywhere: the grocery store, the bus, the car, a restaurant, … anywhere!” There is a need within the Deaf Community for mobile ASL conversations Existing video phone technology (with minor modifications) would be usable Anna Cavender, cavender@cs.washington.edu

9 9 Outline MobileASL Focus Group Eyetracking Motivation User Studies Encoder Complexity Rate, Distortion, Complexity Optimization Activity Recognition Spatial Compression

10 10 Eyetracking Studies Participants watched ASL videos while eye movements were tracked Important regions of the video could be encoded differently Muir et al. (2005) and Agrafiotis et al. (2003) Anna Cavender, cavender@cs.washington.edu

11 11 Eyetracking Results 95% of eye movements within 2 degrees visual angle of the signer’s face (demo) Implications: Face region of video is most visually important –Detailed grammar in face requires foveal vision –Hands and arms can be viewed in peripheral vision Anna Cavender, cavender@cs.washington.edu

12 12 Outline MobileASL Focus Group Eyetracking Motivation User Studies Encoder Complexity Rate, Distortion, Complexity Optimization Activity Recognition Spatial Compression

13 13 Video Phone Study: ROI Varied quality in fixed-sized region around the face 2x quality in face 4x quality in face Anna Cavender, cavender@cs.washington.edu

14 14 Video Phone Study: FPS Varied frame rate: 10 fps and 15 fps For a given bit rate: Fewer frames = more bits per frame Anna Cavender, cavender@cs.washington.edu

15 15 Implications of results A mid-range ROI was preferred –Optimal tradeoff between clarity in face and distortion in rest of “sign-box” Lower frame rate preferred –Optimal tradeoff between clarity of frames and number of frames per second Results independent of bit rate Anna Cavender, cavender@cs.washington.edu

16 16 Outline MobileASL Focus Group Eyetracking Motivation User Studies Encoder Complexity Rate, Distortion, Complexity Optimization Activity Recognition Spatial Compression

17 17 Encoder Complexity Processors on cell phones are limited Encoding videos is computationally intensive Encoder needs to be real-time Simpler encoding parameters are available, but quality suffers Faster encoding = less quality Dane Barney, dane@cs.washington.edu

18 18 Rate, distortion and complexity optimization H.264 encoder Input parameters Raw video Goal: Best possible quality for least encoding time at a given bitrate Distortion Encoding time Rahul Vanam, rahulv@u.washington.edu

19 19 Rate, distortion and complexity comparison Testing with 10 ASL videos at 30 kb/s Max difference in PSNR = 0.68 dB Amount of time reduction = 99% Rahul Vanam, rahulv@u.washington.edu

20 20 Encoder Complexity Study - Speed

21 21 Implications of Results We can increase the encoding time from 2 to 7.3 frames per second without significantly affecting intelligibility! We are closer to our goal of 10 fps. Rahul Vanam, rahulv@u.washington.edu and Anna Cavender, cavender@cs.washington.edu

22 22 Outline MobileASL Focus Group Eyetracking Motivation User Studies Encoder Complexity Rate, Distortion, Complexity Optimization Activity Recognition Spatial Compression

23 23 Activity Recognition Motivation: –Finger spelling requires a higher bit rate and/or frame rate for intelligibility than signing –Don’t waste bits when the user is not signing (i.e., is “listening” as opposed to “speaking”) Goal: –Recognize these three states: finger spelling, signing, not signing –Perform recognition in real time Neva Cherniavsky, nchernia@cs.washington.edu

24 24 Solution Use H.264 motion vectors as input to system Use probabilistic techniques to automatically recognize activity –Hidden Markov Models –Kalman filters Neva Cherniavsky, nchernia@cs.washington.edu

25 25 Outline MobileASL Focus Group Eyetracking Motivation User Studies Encoder Complexity Rate, Distortion, Complexity Optimization Activity Recognition Spatial Compression

26 26 Spatial Compression Dynamic Region-of-Interest –Skin detection algorithms New video quality metric based on ASL intelligibility –Traditional video quality measures, such as PSNR, are not good measures of intelligibility Frank Ciaramello, fmc3@cornell.edu

27 27 Challenges: Limited network bandwidth Limited processing power on cell phones Conclusion Video compression for ASL communication using video cell phones over current U.S. cell phone network

28 28 MobileASL Intelligibility of Sign Language as Constrained by Mobile Phone Technology Supported by NSF CCF-0514353 and an NSF graduate fellowship http://www.cs.washington.edu/research/MobileASL *** Come see our poster and talk to Anna and Neva in room 615 ***


Download ppt "MobileASL: Intelligibility of Sign Language as Constrained by Mobile Phone Technology Richard Ladner, Eve Riskin Dane Barney, Anna Cavender, Neva Cherniavsky,"

Similar presentations


Ads by Google