MobileASL: Intelligibility of Sign Language as Constrained by Mobile Phone Technology Richard Ladner, Eve Riskin Dane Barney, Anna Cavender, Neva Cherniavsky,

Slides:



Advertisements
Similar presentations
DDDAS: Stochastic Multicue Tracking of Objects with Many Degrees of Freedom PIs: D. Metaxas, A. Elgammal and V. Pavlovic Dept of CS, Rutgers University.
Advertisements

Kien A. Hua Division of Computer Science University of Central Florida.
Hand Gesture for Taking Self Portrait Shaowei Chu and Jiro Tanaka University of Tsukuba Japan 12th July 15 minutes talk.
Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.
CLOUD COMPUTING FOR MOBILE USERS: CAN OFFLOADING COMPUTATION SAVE ENERGY? Purdue University.
Motivation Application driven -- VoD, Information on Demand (WWW), education, telemedicine, videoconference, videophone Storage capacity Large capacity.
References [1] Ramanathan Palaniappan, Nitin Suresh and Nikil Jayant, “Objective measurement of transcoded video quality in mobile applications”,IEEE MoVID.
A Colour Face Image Database for Benchmarking of Automatic Face Detection Algorithms Prag Sharma, Richard B. Reilly UCD DSP Research Group This work is.
VIPER DSPS 1998 Slide 1 A DSP Solution to Error Concealment in Digital Video Eduardo Asbun and Edward J. Delp Video and Image Processing Laboratory (VIPER)
December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.
Optimal Scalable Video Multiplexing in Mobile Broadcast Networks
Recursive End-to-end Distortion Estimation with Model-based Cross-correlation Approximation Hua Yang, Kenneth Rose Signal Compression Lab University of.
1 School of Computing Science Simon Fraser University, Canada Rate-Distortion Optimized Streaming of Fine-Grained Scalable Video Sequences Mohamed Hefeeda.
A Layered Hybrid ARQ Scheme for Scalable Video Multicast over Wireless Networks Zhengye Liu, Joint work with Zhenyu Wu.
MUltimo3-D: a Testbed for Multimodel 3-D PC Presenter: Yi Shi & Saul Rodriguez March 14, 2008.
DSI Division of Integrated Systems Design Proven experience in: Applications: Integrated Systems for Multimedia Processing Goals Our group of engineers.
Scalable Wavelet Video Coding Using Aliasing- Reduced Hierarchical Motion Compensation Xuguang Yang, Member, IEEE, and Kannan Ramchandran, Member, IEEE.
Cindy Song Sharena Paripatyadar. Use vision for HCI Determine steps necessary to incorporate vision in HCI applications Examine concerns & implications.
Streaming Video Gabriel Nell UC Berkeley. Outline Scalable MPEG-4 video – Layered coding method – Integrated transport-decoder buffer model RAP streaming.
Hand Movement Recognition By: Tokman Niv Levenbroun Guy Instructor: Todtfeld Ari.
Source-Channel Prediction in Error Resilient Video Coding Hua Yang and Kenneth Rose Signal Compression Laboratory ECE Department University of California,
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS280 Course Project.
Xinqiao LiuRate constrained conditional replenishment1 Rate-Constrained Conditional Replenishment with Adaptive Change Detection Xinqiao Liu December 8,
Non-invasive Techniques for Human Fatigue Monitoring Qiang Ji Dept. of Electrical, Computer, and Systems Engineering Rensselaer Polytechnic Institute
Statistical Multiplexer of VBR video streams By Ofer Hadar Statistical Multiplexer of VBR video streams By Ofer Hadar.
MobileASL: Making Cell Phones Accessible to the Deaf Community Anna Cavender Richard Ladner, Eve Riskin University of Washington.
Strictly private and confidential
An Introduction to H.264/AVC and 3D Video Coding.
January 26, Nick Feamster Development of a Transcoding Algorithm from MPEG to H.263.
+ Video Compression Rudina Alhamzi, Danielle Guir, Scott Hansen, Joe Jiang, Jason Ostroski.
Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.
1 Motivation Video Communication over Heterogeneous Networks –Diverse client devices –Various network connection bandwidths Limitations of Scalable Video.
MACHINE VISION GROUP Multimodal sensing-based camera applications Miguel Bordallo 1, Jari Hannuksela 1, Olli Silvén 1 and Markku Vehviläinen 2 1 University.
JPEG 2000 Image Type Image width and height: 1 to 2 32 – 1 Component depth: 1 to 32 bits Number of components: 1 to 255 Each component can have a different.
Video Streaming © Nanda Ganesan, Ph.D..
Using Multimedia on the Web
Jani Pousi Supervisor: Jukka Manner Espoo,
MPEG-2 Standard By Rigoberto Fernandez. MPEG Standards MPEG (Moving Pictures Experts Group) is a group of people that meet under ISO (International Standards.
 Coding efficiency/Compression ratio:  The loss of information or distortion measure:
HiTrack Introduction Shanghai Zhuo-Shi Software Technology Co., Ltd Shanghai Zhuo-Shi Software Technology Co., Ltd
Frame by Frame Bit Allocation for Motion-Compensated Video Michael Ringenburg May 9, 2003.
Knowledge Systems Lab JN 9/10/2002 Computer Vision: Gesture Recognition from Images Joshua R. New Knowledge Systems Laboratory Jacksonville State University.
School of Computer Science & Information Technology G6DPMM - Lecture 15 Media Design III – Video & Animation.
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
WebCCTV 1 Contents Introduction Getting Started Connecting the WebCCTV NVR to a local network Connecting the WebCCTV NVR to the Internet Restoring the.
Video Compression: Performance evaluation of available codec software Sridhar Godavarthy.
1 Lecture 1 1 Image Processing Eng. Ahmed H. Abo absa
1 Webcam Mouse Using Face and Eye Tracking in Various Illumination Environments Yuan-Pin Lin et al. Proceedings of the 2005 IEEE Y.S. Lee.
Study-Element Based Adaptation of Lecture Videos to Mobile Devices Ganesh Narayana Murthy (M.Tech IIT Bombay) Sridhar Iyer (Associate Professor, IIT Bombay)
Online Kinect Handwritten Digit Recognition Based on Dynamic Time Warping and Support Vector Machine Journal of Information & Computational Science, 2015.
ECE 8443 – Pattern Recognition EE 3512 – Signals: Continuous and Discrete Objectives: Spectrograms Revisited Feature Extraction Filter Bank Analysis EEG.
Aug 25, 2005 page1 Aug 25, 2005 Integration of Advanced Video/Speech Codecs into AccessGrid National Center for High Performance Computing Speaker: Barz.
Downlink Scheduling With Economic Considerations to Future Wireless Networks Bader Al-Manthari, Nidal Nasser, and Hossam Hassanein IEEE Transactions on.
Compression of Real-Time Cardiac MRI Video Sequences EE 368B Final Project December 8, 2000 Neal K. Bangerter and Julie C. Sabataitis.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Real-Time Simultaneous Localization and Mapping with a Single Camera (Mono SLAM) Young Ki Baik Computer Vision Lab. Seoul National University.
Cooperative Layered Wireless Video Multicast Ozgu Alay, Thanasis Korakis, Yao Wang, Elza Erkip, Shivendra Panwar.
AIMS’99 Workshop Heidelberg, May 1999 Assessing Audio Visual Quality P905 - AQUAVIT Assessment of Quality for audio-visual signals over Internet.
Team Members Ming-Chun Chang Lungisa Matshoba Steven Preston Supervisors Dr James Gain Dr Patrick Marais.
MPEG4 Fine Grained Scalable Multi-Resolution Layered Video Encoding Authors from: University of Georgia Speaker: Chang-Kuan Lin.
Activity Analysis of Sign Language Video Generals exam Neva Cherniavsky.
Overview of Digital Video Compression Multimedia Systems and Standards S2 IF Telkom University.
Low Vision Part I Chapter Overview This presentation covers the implications for communication with people who have tunnel vision. It includes:
COMPARATIVE STUDY OF HEVC and H.264 INTRA FRAME CODING AND JPEG2000 BY Under the Guidance of Harshdeep Brahmasury Jain Dr. K. R. RAO ID MS Electrical.
SR: 599 report Channel Estimation for W-CDMA on DSPs Sridhar Rajagopal ECE Dept., Rice University Elec 599.
CDEEP Lecture Video Adaptation to mobile devices Ganesh Narayana Murthy Guided by: Prof. Sridhar Iyer.
Presenting: Shlomo Ben-Shoshan, Nir Straze Supervisors: Dr. Ofer Hadar, Dr. Evgeny Kaminsky.
Quality Evaluation and Comparison of SVC Encoders
Kyoungwoo Lee, Minyoung Kim, Nikil Dutt, and Nalini Venkatasubramanian
Presentation transcript:

MobileASL: Intelligibility of Sign Language as Constrained by Mobile Phone Technology Richard Ladner, Eve Riskin Dane Barney, Anna Cavender, Neva Cherniavsky, Jessica DeWitt, Rahul Vanam University of Washington Sheila Hemami, Frank Ciaramello Cornell University

2 ASL ASL is the preferred language for over 1,000,000 Deaf people in the U.S. ASL is not a code for English Signs usually occur within the “sign-box” Composed of location, orientation, shape of hands and arms + facial expressions Usually uses 2 hands, but one-handed signing not uncommon

3 Challenges: Limited network bandwidth Limited processing power on cell phones Our goal: ASL communication using video cell phones over current U.S. cell phone network

4 Cell Phone Network Constraints MobileASL is about fair access to the current network –As soon as possible, no special accommodations –Not geographically limited –Lower bitrate = more accessible 3G = 3 rd Generation –Special service, perhaps more expensive –Not yet widespread –Will still have congestion Low bit rate goal –GPRS (General Packet Radio Service) Ranges from 30kbps to 80kbps (download) Perhaps half that for upload

5 Codec Used: x264 Open source implementation of H.264 standard –Doubles compression ratio over MPEG2 x264 offers faster encoding Off-the-shelf H.264 decoder can be used

6 Outline MobileASL Focus Group Eyetracking Motivation User Studies Encoder Complexity Rate, Distortion, Complexity Optimization Activity Recognition Spatial Compression

7 MobileASL Focus Group 4 Deaf people, mid-20s to mid-40s, 1 hour Open ended questions: –Physical Setup Camera, distance, … –Features Compatibility, text, … –Privacy Concerns ASL is a visual language –Scenarios Lighting, driving, relay services, … Anna Cavender,

8 Implications of Focus Group “I don’t foresee any limitations. I would use the phone anywhere: the grocery store, the bus, the car, a restaurant, … anywhere!” There is a need within the Deaf Community for mobile ASL conversations Existing video phone technology (with minor modifications) would be usable Anna Cavender,

9 Outline MobileASL Focus Group Eyetracking Motivation User Studies Encoder Complexity Rate, Distortion, Complexity Optimization Activity Recognition Spatial Compression

10 Eyetracking Studies Participants watched ASL videos while eye movements were tracked Important regions of the video could be encoded differently Muir et al. (2005) and Agrafiotis et al. (2003) Anna Cavender,

11 Eyetracking Results 95% of eye movements within 2 degrees visual angle of the signer’s face (demo) Implications: Face region of video is most visually important –Detailed grammar in face requires foveal vision –Hands and arms can be viewed in peripheral vision Anna Cavender,

12 Outline MobileASL Focus Group Eyetracking Motivation User Studies Encoder Complexity Rate, Distortion, Complexity Optimization Activity Recognition Spatial Compression

13 Video Phone Study: ROI Varied quality in fixed-sized region around the face 2x quality in face 4x quality in face Anna Cavender,

14 Video Phone Study: FPS Varied frame rate: 10 fps and 15 fps For a given bit rate: Fewer frames = more bits per frame Anna Cavender,

15 Implications of results A mid-range ROI was preferred –Optimal tradeoff between clarity in face and distortion in rest of “sign-box” Lower frame rate preferred –Optimal tradeoff between clarity of frames and number of frames per second Results independent of bit rate Anna Cavender,

16 Outline MobileASL Focus Group Eyetracking Motivation User Studies Encoder Complexity Rate, Distortion, Complexity Optimization Activity Recognition Spatial Compression

17 Encoder Complexity Processors on cell phones are limited Encoding videos is computationally intensive Encoder needs to be real-time Simpler encoding parameters are available, but quality suffers Faster encoding = less quality Dane Barney,

18 Rate, distortion and complexity optimization H.264 encoder Input parameters Raw video Goal: Best possible quality for least encoding time at a given bitrate Distortion Encoding time Rahul Vanam,

19 Rate, distortion and complexity comparison Testing with 10 ASL videos at 30 kb/s Max difference in PSNR = 0.68 dB Amount of time reduction = 99% Rahul Vanam,

20 Encoder Complexity Study - Speed

21 Implications of Results We can increase the encoding time from 2 to 7.3 frames per second without significantly affecting intelligibility! We are closer to our goal of 10 fps. Rahul Vanam, and Anna Cavender,

22 Outline MobileASL Focus Group Eyetracking Motivation User Studies Encoder Complexity Rate, Distortion, Complexity Optimization Activity Recognition Spatial Compression

23 Activity Recognition Motivation: –Finger spelling requires a higher bit rate and/or frame rate for intelligibility than signing –Don’t waste bits when the user is not signing (i.e., is “listening” as opposed to “speaking”) Goal: –Recognize these three states: finger spelling, signing, not signing –Perform recognition in real time Neva Cherniavsky,

24 Solution Use H.264 motion vectors as input to system Use probabilistic techniques to automatically recognize activity –Hidden Markov Models –Kalman filters Neva Cherniavsky,

25 Outline MobileASL Focus Group Eyetracking Motivation User Studies Encoder Complexity Rate, Distortion, Complexity Optimization Activity Recognition Spatial Compression

26 Spatial Compression Dynamic Region-of-Interest –Skin detection algorithms New video quality metric based on ASL intelligibility –Traditional video quality measures, such as PSNR, are not good measures of intelligibility Frank Ciaramello,

27 Challenges: Limited network bandwidth Limited processing power on cell phones Conclusion Video compression for ASL communication using video cell phones over current U.S. cell phone network

28 MobileASL Intelligibility of Sign Language as Constrained by Mobile Phone Technology Supported by NSF CCF and an NSF graduate fellowship *** Come see our poster and talk to Anna and Neva in room 615 ***