The key to understanding multi-touch touch panels is to realize that a touch is not the same thing as a mouse click.
With a multi-touch projected capacitive touch panel, the user interface of an embedded application can be enhanced with gestures such as pinch, zoom, and rotate. True multi-touch panels, that is, panels that return actual coordinates for each individual touch, can support even more advanced features like multiple-person collaboration and gestures made of a combination of touches (for example, one finger touching while another swipes). The different combinations of gestures are limited only by the designer’s imagination and the amount of code space. As multi-touch projected capacitive touch panels continue to replace single-touch resistive touch panels in embedded systems, designers of those systems must develop expertise on how to interface to these new panels and how to use multi-touch features to enhance applications.
When implementing a touch interface, the most important thing is to keep in mind the way the user will interact with the application. The fastest, most elegant gesture-recognition system will not be appreciated if the user finds the application difficult to understand. The biggest mistake made in designing a touch interface is using the same techniques you would use for a mouse. While a touch panel and a mouse have some similarities, they’re very different kinds of input devices. For example, you can move a mouse around the screen and track its position before taking any action. With tracking, it’s possible to position the mouse pointer precisely before clicking a button. With a touch interface, the touch itself causes the action.
The touch location isn’t as precise as a mouse click. One complication with touch is that it can be difficult to tell exactly where the touch is being reported since the finger obscures the screen during the touch. Another difference is in the adjacency of touch areas; due to the preciseness of a mouse, touch areas can be fairly small and immediately adjacent to each other.
With a touch interface, it’s helpful to leave space between touch areas to allow for the ambiguousness of the touch position. Figure 1 shows some recommended minimum sizes and distances.
Feedback mechanisms need to be tailored to a touch interface to help the user understand what action was taken and why. For example, if the user is trying to touch a button on the screen and the touch position is reported at a location just outside the active button area, the user won’t know why the expected action did not occur. In the book Brave NUI World: Designing Natural User Interfaces for Touch and Gesture, Daniel Wigdor and Dennis Wixon suggest several ways to provide feedback so the user can adjust the position and generate the expected action.1 One example is a translucent ring that appears around the user’s finger. When the finger is over an active touch area, the ring might contract, wiggle, change color, or indicate in some other way that the reported finger position is over an active element (Figure 2a). Another option is that the element itself changes when the finger is over it (Figure 2b).
The authors describe several other strategies for designing touch interfaces, including adaptive positioning (which activates the nearest active area to the touch), various feedback mechanisms, and modeling the algorithmic flow of a gesture.
You’ll need to consider the capabilities of the touch controller when designing the gestures that will be recognized by the user interface. Some multi-touch controllers report gesture information without coordinates. For example, the controller might send a message saying that a rotation gesture is in progress and the current angle of the rotation is 48º, but it won’t reveal the center of the rotation or the location of the touches that are generating the gesture. Other controllers provide gesture messages as well as the actual coordinates and some controllers provide only the touch coordinates without any gesture information. These last two types are considered “true” multi-touch because they provide the physical coordinates of every touch on the panel regardless of whether a gesture is occurring or not.
Even if the controller provides gesture information, its interpretation of the gestures may not match the requirements of the user interface. The controller might support only one gesture at a time while the application requires support for three or four simultaneous gestures; or it may define the center of rotation differently from the way you want it defined. Of course no controller is going to automatically recognize gestures that have been invented for an application such as the “one finger touching while another swipes” example given above. As a result, you will often need to implement your own gesture-recognition engine.
A gesture-recognition engine can be a collection of fairly simple algorithms that generates events for touches, drags, and flicks, or it can be a complicated processing system that uses predictive analysis to identify gestures in real time. Gesture engines have been implemented using straight algorithmic processing, fuzzy logic, and even neural networks. The type of gesture-recognition engine is driven by the user interface requirements, available code space, processor speed, and real-time responsiveness. For example, the Canonical Multitouch library for Linux analyzes multiple gesture frames to determine what kinds of gesture patterns are being executed.2 In the rest of this article I’ll focus on a few simple gesture-recognition algorithms that can be implemented with limited resources.
Common gestures
The simplest and most common gestures are touch (and double touch), drag, flick, rotate, and zoom. A single touch, analogous to a click event with a mouse, is defined by the amount of time a touch is active and the amount of movement during the touch. Typical values might be that the touch-down and touch-up events must be less than a half second apart, and the finger cannot move by more than five pixels.
A double touch is a simple extension of the single touch where the second touch must occur within a certain amount of time after the first touch, and the second touch must also follow the same timing and positional requirements as the first touch. Keep in mind that if you are implementing both a single touch and a double touch, the single touch will need an additional timeout to ensure that the user isn’t executing a double touch.
While a drag gesture is fairly simple to implement, it’s often not needed at the gesture-recognition level. Since the touch controller only reports coordinates when a finger is touching the panel, the application can treat those coordinate reports as a drag. Implementing this at the application level has the added benefit of knowing if the drag occurred over an element that can be dragged. If not, then the touch reports can be ignored or continually analyzed for other events (for example, passing over an element may result in some specific behavior).
A flick is similar to a drag but with a different purpose. A drag event begins when the finger touches the panel and ends when the finger is removed. A flick can continue to generate events after the finger is removed. This can be used to implement the kinds of fast scrolling features common on many cell phones where a list continues to scroll even after the finger is lifted. A flick can be implemented in several ways, with the responsibilities divided between the gesture-recognition layer and the application layer. Before we discuss the different ways to implement flick gestures, let’s first focus on how to define a flick.
A flick is generally a fast swipe of the finger across the surface of the touch panel in a single direction. The actual point locations during the flick do not typically matter to the application. The relevant parameters are velocity and direction. To identify a flick, the gesture-recognition layer first needs to determine the velocity of the finger movement. This can be as simple as determining the amount of time between the finger-down report and the finger-up report divided by the distance traveled. However, this can slow the response time since the velocity is not determined until after the gesture has finished.