Designing for energy-efficient vision-based interactivity on mobile devices Lectio Praecursoria 15.12.2014 Miguel Bordallo López miguelbl@ee.oulu.fi Honorable Custos, Honorable Opponents, Ladies and Gentlemen: First of all, thank you all for coming to my doctoral defence. I will now present the introduction to my doctoral dissertation: “Designing for energy-efficient vision-based interactivity on mobile devices” University of Oulu Graduate School Faculty of Information Technology and Electrical Engineering Department of Computer Science and Engineering Infotech Oulu
Mobile device as a multimedia platform Mobile devices have become attractive platforms for multimedia Applications. Most of the recent devices have increasingly been equipped with several built-in cameras, a large set of sensors, and high resolution touch-screens. Future devices are expected to include a wider range of subsystems, improving in their multimedia capabilities.
Typical interaction method Buttons Touch Screens However, and despite of the progress in mobile technology, the typical ways of interacting with our devices have stayed relatively simple. Nowadays, most of the devices include either physical or logical buttons. Pushing this buttons, turns on the device in the interaction sense, illuminating the displays and providing for an input. Once the device is in this ”active state”, the device’s can be used operating on its touch screen. Touch screens allow the creation of detailed and powerful User Interfaces.
Interacting with devices Pointing Clicking It is possible to argue that human-computer interaction has not changed fundamentally for nearly two decades. If we have to define the most typical ways of interacting with our devices, even nowadays, we could summarize the most of our interactions in two actions to ”pointing” and ”clicking actions”
Pointing and Clicking Several modalities for pointing and clicking have been emerging. For example, older devices allowed the selection of User Interface elements with keypads and pointers.
Pointing and Clicking The introduction of touch screens allowed the elimination of the screen pointer and put the pointer in our hands as an ”stylus” device
Pointing and Clicking The evolution of touchsreens eliminated the stylus to allow the utilization of our own fingers as ”clicking pointers”
Limitations Obstructing the view Two-hands operation However, the latest evolution into touch-screen devices, has originated some limitations and potential problems. The user, most of the times, needs to use both hands to operate the device. Also, and especially in smaller screens, the user’s hand or fingers are partially obstructing the view of the device when the user is interacting with it, compromising the eventual perception of the displayed augmented information.
Novel interaction methods Motion sensors Voice commands Vision-based interactions To complement current tactile user interfaces, researchers and developers have been devising new methods to overcome the limitations of the ones currently used. Motion sensor interaction is emerging as a modality to actively interact with the device with simple gestures such as shaking the device or turning it in a fast movement. However, they are mostly useful as complementary features when the user is already ”actively” interacting with the device. Voice commands are rapidly gaining traction, but utilizing them in public migh compromise the privacy of the user that might be easlily overheard. Last, camera or vision-based user interaction is slowly being integrated in some of the newer platforms. This doctoral work focus mostly in these kind of methods.
Vision-based interactivity Using cameras as an input modality Enables recognizing the context in real time Utilizes already existing cameras The small size of handheld devices and their multiple cameras and sensors are under-exploited assets In vision-based interactivity, the camera of a device is used essentially for sensing purposes, and utilized to control some aspects of the device to interact with it. The applications of vision-based interactivity range from gaming and augmented reality, to vision assisted general user interfaces. One advantage of vision-based interactivity is that the rich information provided by the camera enables the recognition of the context in real time, in a way of ”seeing” the user and the environment. In addition, all mobile devices already integrate several cameras for different purposes so, utilizing them for interactive purposes is relatively straightforward. However, mobile devices are still mostly replicating the same functionality as digital cameras were already some time ago. The small size of handheld devices and their multiple cameras are still under-exploited assets.
History of vision-based interactivity Vision-based interactivity (even in mobile devices) is not as new as we might think. Already in 2003, Siemens presented the first application (in this case a game) that was able to make use of the built-in camera for controlling purposes. The game, called mozzies (and shown in this video) utilized the buil-in camera, to make the user search for artificially created mosquitos, to track them, and catch them. Mozzies (Siemens, 2003)
Vision-based interaction methods Since then, a lot of vision-based interaction techniques have been demostrated on embedded systems. For example, marker-based augmented reality has been demostrated on mobile devices as a way of showing enhanced information ovelaying real-world images. Marker-based augmented reality
Vision-based interaction methods The use of motion estimation techniques has been utilized in several applications such as browsing big documents or galleries. Motion estimation-based browsing
Vision-based interaction methods Finger tracking and hand gesture recognition has been demonstrated in several controlling applications such as different games or map browsing Finger and hand gesture recognition
Vision-based interaction methods User’s head at the middle User moves head left Head kept in left position Dialogue Bookmarks Page shown in full screen Trigger eveals bookmarks panel when head moves Dialogue of bookmarks jumps floating on screen ... And even head and face movement tracking has been proposed asa way to control certain parts of the user interface. Head-movements triggers (looking at something to interact)
Why don’t we use these kind of methods on our mobile devices? If all these methods have been demonstrated, (some of them even in embedded devices), Why is it that we don’t use them (at least consistently) on our mobile device. What is it that makes them unpractical?
Challenges of vision-based interactivity Very low latency (below 100 ms.) Computationally costly algorithms Sensors (cameras) ”always” on Energy-efficient solutions The reason is that together with the difficulties posed by the creation of these interaction methods, there are some other challenges that are inherently tied to the platform, in this case the mobile device. Vision-based interactivity requires very low latency and crips response. Even response times as low as 100ms. can be considered disturbing for many user interface functionalities. The camera-based algorithms are esentially working with very large amounts of pixels. This usually makes them computationally very costly. Interacting with the camera, requires that the camera is (or at least appears to be) always on. If we need to push a button to start the camera, the chances of a user interacting using it are smaller. All these challenges esentially trace back to one characteristic of the mobile devices. They are essentially battery powered devices. Any interaction method that is included on a future mobile platform requires to be energy-efficient.
Objectives of the research Study the relationship between interactivity and energy efficiency To gain understanding on how to build the future mobile plaftorms in order to satisfy their interactivity requirements To provide insight into the computing needs and characteristics of the future camera-based applications In this context, the main objective of this research was to ”establish a relationship beween vision-based interactivity and energy efficient”. How they relate with each other and what are the implications of considering both at the same time, and not as separate concepts. The expected implications of the results of this analysis are two fold: First, to gain understanding on how to build the future mobile platforms to satisfy their interactivity needs Second, to provide insight into the computing needs and characteristics of the future camera-based applications
To define the process for designing future highly interactive mobile platforms, the doctoral work presented today analyzes the implications of vision-based interactivity at several levels. In this context, the creation of interactive capture methods and the offer of sufficient user feedback, can engage the user in the collaboration for image acquisition, mitigating the limitations of current applications.
These concepts have already have an impact in current mobile applications, such as panorama stitiching. This picture, taken from a Nokia N9 panorama application, shows how the image capturing stage asks for the collaboration of the user and guides him at the same time.
Applications are usually built on top of complete user interfaces. The implementation of a vision-based user interface and its analysis allows the understanding of the needs of vision-based interactivity. Interacting with the device intuitively, using just one hand, can be done by tracking the user’s position and reacting to the user movements accordingly.
These concept has already found its way to commercial devices such as the one shown in this short video, the Amazon Fire Phone. In the video it is possible to perceive the effect of the virtual 3D environment, that practically works as a window to the virtual world. Just tilting the device or changing where we look at, it is possible to reveal hidden information, not needing any interaction with the touch screen or any button.
The integration of data from different sensors together with the camera images, and the development of an specific sensing subsystems could enable reducing the latencies in the startup of the camera and the impact in the battery life. In the picture, we can see that how raising the device to the sight level, starts turning the camera application already ON, giving the impression of an ”always on” camera system, always ready for interaction
”We are stuck with technology when what we really want is just stuff that works” - Douglas Adams The work presented in this thesis, aims at making it possible to change the way that we interact with our devices in the future. Because, as it was said by Douglas Adams, the author of the ”Hitchhiker Guide to the Galaxy”, ”We are stuck with technology when what we really want is just stuff that works”