From User-friendly to User’s Friend Dr. Eric Petajan Founder and Chief Scientist face2face animation, inc. Why vision.

Slides:

Advertisements

Similar presentations

National Technical University of Athens Department of Electrical and Computer Engineering Image, Video and Multimedia Systems Laboratory

Advertisements

Internet for multimedia content Yogendra Pal Chief Engineer, All India Radio.

BrightAuthor v3.7 software and BrightSign XD v4.7 firmware

Kien A. Hua Division of Computer Science University of Central Florida.

Interaction Devices By: Michael Huffman Kristen Spivey.

Virtual Reality Design Virtual reality systems are designed to produce in the participant the cognitive effects of feeling immersed in the environment.

2. What is Multimedia? Multimedia can have a many definitions these include: Multimedia means that computer information can be represented through audio,

Digital Interactive Entertainment Dr. Yangsheng Wang Professor of Institute of Automation Chinese Academy of Sciences

Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.

Umow Lai engineering sustainable environments Technology…Evolution…Design.

 INTRODUCTION  STEPS OF GESTURE RECOGNITION  TRACKING TECHNOLOGIES  SPEECH WITH GESTURE  APPLICATIONS.

LYU0603 A Generic Real-Time Facial Expression Modelling System Supervisor: Prof. Michael R. Lyu Group Member: Cheung Ka Shun ( ) Wong Chi Kin ( )

Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,

MPEG-4, NETWORKED MULTIMEDIA STANDARD

Final Year Student Projects: Prelude Michael R. Lyu.

A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes ( ) A thousand bytes - modern translation.

CS335 Principles of Multimedia Systems Multimedia and Human Computer Interfaces Hao Jiang Computer Science Department Boston College Nov. 20, 2007.

The Next Generation Learning Environment Using 3D & Virtual Gaming Principles in E-Learning THE VIRTUAL FRONTIER Edward Prentice III Centrax Corporation.

SET TOP BOX What is set-top box ? An interactive device which integrates the video and audio decoding capabilities of television with a multimedia application.

University of Texas at Austin CS 378 – Game Technology Don Fussell CS 378: Computer Game Technology Beyond Meshes Spring 2012.

Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.

CHAPTER 2 Input & Output Prepared by: Mrs.sara salih 1.

Chapter 1 The Challenges of Networked Games. Online Gaming Desire for entertainment has pushed the frontiers of computing and networking technologies.

 Introduction  Devices  Technology – Hardware & Software  Architecture  Applications.

Closing conference of SYSIASS – June 17 th 2014 Multimodal Bio-signal based Control of Intelligent Wheelchair Professor Huosheng Hu Leader of Activity.

Sony Pictures1 Preparing your customers and your facilities for Blu-ray A powerful format that leads to new challenges in authoring and creative design.

Chapter II The Multimedia Sysyem. What is multimedia? Multimedia means that computer information can be represented through audio, video, and animation.

Gesture Recognition Using Laser-Based Tracking System Stéphane Perrin, Alvaro Cassinelli and Masatoshi Ishikawa Ishikawa Namiki Laboratory UNIVERSITY OF.

Multimedia. Definition What is Multimedia? Multimedia can have a many definitions these include: Multimedia means that computer information can be represented.

Multimedia Specification Design and Production 2013 / Semester 2 / week 8 Lecturer: Dr. Nikos Gazepidis

Facial animation retargeting framework using radial basis functions Tamás Umenhoffer, Balázs Tóth Introduction Realistic facial animation16 is a challenging.

Lector: Aliyev H.U. Lecture №15: Telecommun ication network software design multimedia services. TASHKENT UNIVERSITY OF INFORMATION TECHNOLOGIES THE DEPARTMENT.

Face Animation Overview with Shameless Bias Toward MPEG-4 Face Animation Tools Dr. Eric Petajan Chief Scientist and Founder face2face animation, inc.

Chapter 7. BEAT: the Behavior Expression Animation Toolkit

EE 492 ENGINEERING PROJECT LIP TRACKING Yusuf Ziya Işık & Ashat Turlibayev Yusuf Ziya Işık & Ashat Turlibayev Advisor: Prof. Dr. Bülent Sankur Advisor:

Advanced Computer Technology II FTV and 3DV KyungHee Univ. Master Course Kim Kyung Yong 10/10/2015.

Zavod za telekomunikacije Igor S. Pandžić Department of telecommunications Faculty of electrical engineering and computing University of Zagreb, Croatia.

Input By Hollee Smalley. What is Input? Input is any data or instructions entered into the memory of a computer.

The Way Forward Factors Driving Video Conferencing Dr. Jan Linden, VP of Engineering Global IP Solutions.

Realistic Modeling of Animatable Faces in MPEG-4 Marco Fratarcangeli and Marco Schaerf University of Rome “La Sapienza”

Collaborative Annotation of the AMI Meeting Corpus Jean Carletta University of Edinburgh.

Human-Computer Interaction

Trends in Embedded Computing The Ubiquitous Computing through Sensor Swarms.

卓越發展延續計畫分項三 User-Centric Interactive Media ~ 主持人 : 傅立成共同主持人 : 李琳山，歐陽明，洪一平，陳祝嵩水美溫泉會館研討會

Class 13 LBSC 690 Information Technology More Multimedia Compression and Recognition, and Social Issues.

MIRALab Where Research means Creativity SVG Open 2005 University of Geneva 1 Converting 3D Facial Animation with Gouraud shaded SVG A method.

Toward a Unified Scripting Language 1 Toward a Unified Scripting Language : Lessons Learned from Developing CML and AML Soft computing Laboratory Yonsei.

KAMI KITT ASSISTIVE TECHNOLOGY Chapter 7 Human/ Assistive Technology Interface.

Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.

HCI 입문 Graphics Korea University HCI System 2005 년 2 학기 김 창 헌.

Human Figure Animation. Interacting Modules The ones identified –Speech, face, emotion Plus others: –Perception –Physiological states.

© 2003 Gina Joue & Brian Duffy Dr. Brian Duffy

Animated Speech Therapist for Individuals with Parkinson Disease Supported by the Coleman Institute for Cognitive Disabilities J. Yan, L. Ramig and R.

Dasar-Dasar Multimedia

CONTENT FOCUS FOCUS INTRODUCTION INTRODUCTION COMPONENTS COMPONENTS TYPES OF GESTURES TYPES OF GESTURES ADVANTAGES ADVANTAGES CHALLENGES CHALLENGES REFERENCE.

Multimedia Communications Introduction ECE 591 Pro. Honggang Wang Notes: Some slides including figures are referred from textbooks and relevant materials.

1 What is Multimedia? Multimedia can have a many definitions Multimedia means that computer information can be represented through media types: – Text.

Saving Bitrate vs. Users: Where is the Break-Even Point in Mobile Video Quality? ACM MM’11 Presenter: Piggy Date:

Face Recognition Technology By Catherine jenni christy.M.sc.

MobileASL: Intelligibility of Sign Language as Constrained by Mobile Phone Technology Richard Ladner, Eve Riskin Dane Barney, Anna Cavender, Neva Cherniavsky,

Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects for Blind Persons.

MPEG 7 &MPEG 21.

BlueEyes Human Operator Monitoring System BlueEyes Human-Operator Monitoring System PRESENTED BY:- AYUSHI TYAGI B1803B37.

MIT Artificial Intelligence Laboratory — Research Directions Intelligent Perceptual Interfaces Trevor Darrell Eric Grimson.

NCP meeting Jan 27-28, 2003, Brussels Colette Maloney Interfaces, Knowledge and Content technologies, Applications & Information Market DG INFSO Multimodal.

Perceptive Computing Democracy Communism Architecture The Steam Engine WheelFire Zero Domestication Iron Ships Electricity The Vacuum tube E=mc 2 The.

Multimedia Systems Dr. Wissam Alkhadour.

MPEG-4 Binary Information for Scenes (BIFS)

Chapter I Introduction

Presentation transcript:

From User-friendly to User’s Friend Dr. Eric Petajan Founder and Chief Scientist face2face animation, inc. Why vision is required for the ideal HCI design

Problem Statement The electronic extension of human capabilities is primarily limited by Human- Computer Interaction (HCI) systems that fail to meet our needs for fast, reliable, and secure input of information using the most comfortable human communication modes

Your computer should emulate your best friend n It should know who you are and if you are present n It should see and hear you in adverse conditions n It should respond to you quickly n It should tell you the truth n It should keep your secrets n It should be pleasant or entertaining n It should follow you around

A humanoid agent is a necessary component for the ultimate HCI

Humanoids can provide: n Clear focus for audio and visual attention –Easier to capture user behavior –Less taxing for user n Perception of credibility n Engagement and entertainment n Increased comprehension n Guidance with traditional information display

The quality of the virtual human is critically dependent on the amount of real human behavior that informs the humanoid model Autonomous humanoid agents can’t pass the Turing test today

The non-invasive capture and machine understanding of human behavior are grand challenges that have yet be fully accomplished We are still tethered to the keyboard and mouse

Significant Human Behaviors Available without Contact n Audio/Visual Speech n Gestures n Facial expressions n Gaze direction n Posture

Ideal HCI Process Graph Capture Complete Human Behavior Build Humanoid Model Present Humanoid To Human “AI” Engine Knowledge Motive Power Capture Human Behavior What has been achieved to date?

The Good News n Processing hardware is fast and cheap n HD cameras now 10 times cheaper n Displays are good and cheap enough n Mobile data bandwidth is reliable enough for audio plus animation streams n Individual recognition technologies are approaching maturity (if not utility)

The Bad News n Computers can’t reliably “hear” humans with a single fixed microphone n Computers can’t reliably “see” humans with a single cheap video camera n HCI constraints exhaust and encumber users n Large segments of the population are unwilling or unable to engage in HCI

Steps in the Right Direction n Use one or more HD video cameras n Use steered microphone array with face tracking n Track and control users attention with humanoid n Continuously identify the user n Train the user with entertainment n Use dedicated hardware to minimize the impact of the HCI system on general computing and communication tasks

Multi-modal Speech Recognition n Audio-visual speech and speaker recognition provides robustness in noise n Use of visual speech removes need for close- talking microphone and provides robust steering of microphone array n MPEG-4 Face Animation Parameters (FAPs) accurately encode visual speech

People want information and communication where ever they happen to be n Mobile devices need to be small (thin client) n Device and service costs must be low n Must be fast and reliable n Bandwidth must be used efficiently for low latency and cost

People want to be entertained n Entertaining information is retained better n Personality attracts attention and is main component of entertainment n Personality is manifested mostly in face and voice n Face and voice must be synced and delivered with quality (high frame rate)

People like animated characters n Entertaining/relationship forming n Can be efficiently delivered anywhere n Graphical faces scale well to small screens n Character design limited only by imagination n Any person can drive any character (with FAPs) n Emotional response to animated faces is hardwired

Mobile devices today n Can deliver animated characters n Are cheap n Can deliver low bit-rate content reliably n Are communicators and entertainers n Are very popular

User Input to Mobile Devices n Keyboards are impractical for mobile devices n Best user interface is speech and face n Little room for text/menus on small screens n Acoustic speech recognition is unreliable in mobile environments n Visual speech and face recognition are needed for robust mobile user interface

Low bit-rate is the key to mobile happiness n Reliable delivery of wireless video will not happen for a very long time n Only kilobits/sec can be sustained everywhere n MPEG-4 animation streams fit in available bandwidth with audio n 2 kilobits/sec for face animation data n 6-10 kilobits/sec for body animation data

Mobile Character Player Demo n Facial expressions, lip movements and head motion extracted from ordinary video automatically as FAPs n FAPs streamed to player with compressed audio at 10 kbps total n 300 triangle 3D mesh face model renders in real time on phone n FAPs and audio decoded in parallel with graphics rendering in software

Standards n Facilitate collaboration n Minimize reinvention of wheels n Decrease costs with economies of scale n Allow database sharing n Provide free or cheap source code n Enable low latency communication

The MPEG-4 Standard The MPEG-4 Standard n Provides comprehensive framework for 2D and 3D multimedia communication n Provides Face and Body Animation (FBA) representation and coding n Low bit-rate coding eliminates network bottlenecks n Optimized implementations increase speed and reduce costs to consumers

MPEG-4 Face Animation n Face model is independent of Face Animation Parameters (FAPs) n FAPs contain high quality animation data for driving all types of face models from broadcast to wireless n FAPs displace feature points from neutral position

Body Animation n Harmonized with VRML Hanim spec n Body Animation Parameters (BAPs) are humanoid skeleton joint Euler angles n Body Animation Table (BAT) can be downloaded to map BAPs to skin deformation n BAPs can be highly compressed for streaming

Body Animation Parameters (BAPs) n 186 humanoid skeleton euler angles n 110 free parameters for use with downloaded body surface mesh n Coded using same codecs as FAPs n Typical bitrates for coded BAPs is 5- 10kbps

Neutral Face Definition  Head axes parallel to the world axes  Gaze is in direction of Z axis  Eyelids tangent to the iris  Pupil diameter is one third of iris diameter  Mouth is closed and the upper and lower teeth are touching  Tongue is flat, horizontal with the tip of tongue touching the boundary between upper and lower teeth

Face Feature Points x y z Tongue Mouth x y z Nose Teeth Feature points affected by FAPs Other feature points Right eyeLeft eye

Face Model Independence n FAPs are always normalized for model independence n FAPs (and BAPs) can be used without MPEG-4 systems/BIFS n Private face models can be accurately animated with FAPs n Face models can be simple or complex depending on terminal resources

Face Animation Parameter Normalization n Face Animation Parameters (FAPs) are normalized to facial dimensions n Each FAP is measured as a fraction of neutral face mouth width, mouth-nose distance, eye separation, or iris diameter n 3 Head and 2 eyeball rotation FAPs are Euler angles

Neutral Face Dimensions for FAP Normalization

Lip FAPs Mouth closed if sum of upper and lower lip FAPs = 0

FAP Compression n FAPs are adaptively quantized to desired quality level n Quantized FAPs are differentially coded n Adaptive arithmetic coding further reduces bitrate n Typical compressed FAP bitrate is less than 2 kilobits/second

FAP Predictive Coding FAP(t) + Q Q -1 Frame Delay - Arithmetic Coder Bitstream

General Bandwidth Issues n Broadband deployment is happening slowly n 3G will not be ubiquitous for many years n DSL availability is limited and cable is shared n Talking heads need high frame-rate n Consumer graphics hardware is cheap and powerful n MPEG-4 FBA tools are matched to available bandwidth and terminals

Markerless Facial Motion Capture for Animation Production n Track/analyze face features in each video frame n Captured face feature motion easily converted to FAPs n Face model is “puppeteered” by FAPs n MPEG-4 FAPs only specify motion of feature points (not surrounding surface)

Bones rig for mouth area

Automatic Face Animation Demonstration n FAPs extracted from camcorder video n Inner lip, eye region and head rotation FAPs compressed to less than 2 kbits/sec n 30 frames/sec animation generated automatically n Face models developed with face2face plugin Maya

Conclusions n Humanoid agents are required for best HCI n Vision-based facial capture is required for humanoid design and human behavior capture n MPEG-4 Face and Body Animation coding enables high quality mobile communication n Ultimate HCI systems must continuously see, hear and identify the user for best reliability and security