Academia.eduAcademia.edu

Separability of Spatial Manipulations In Multi-Touch Interfaces

2009, Proceedings of Graphics …

Multi-touch interfaces allow users to translate, rotate, and scale digital objects in a single interaction. However, this freedom represents a problem when users intend to perform only a subset of manipulations. A user trying to scale an object in a print layout program, for example, might find that the object was also slightly translated and rotated, interfering with what was already carefully laid out earlier. We implemented and tested interaction techniques that allow users to select a subset of manipulations. Magnitude Filtering eliminates transformations (e.g., rotation) that are small in magnitude. Gesture Matching attempts to classify the user's input into a subset of manipulation gestures. Handles adopts a conventional single-touch handles approach for touch input. Our empirical study showed that these techniques significantly reduce errors in layout, while the Handles technique was slowest. A variation of the Gesture Matching technique presented the best combination of speed and control, and was favored by participants.

Separability of Spatial Manipulations in Multi-touch Interfaces Miguel A. Nacenta1, Patrick Baudisch2, Hrvoje Benko2 and Andy Wilson2. 1 University of Saskatchewan 2Microsoft Research ABSTRACT Multi-touch interfaces allow users to translate, rotate, and scale digital objects in a single interaction. However, this freedom represents a problem when users intend to perform only a subset of manipulations. A user trying to scale an object in a print layout program, for example, might find that the object was also slightly translated and rotated, interfering with what was already carefully laid out earlier. We implemented and tested interaction techniques that allow users to select a subset of manipulations. Magnitude Filtering eliminates transformations (e.g., rotation) that are small in magnitude. Gesture Matching attempts to classify the user’s input into a subset of manipulation gestures. Handles adopts a conventional single-touch handles approach for touch input. Our empirical study showed that these techniques significantly reduce errors in layout, while the Handles technique was slowest. A variation of the Gesture Matching technique presented the best combination of speed and control, and was favored by participants. KEYWORDS: Tabletops, separability, multi-touch interaction. INDEX TERMS: H5.2.User Interfaces: Input devices and strategies 1 INTRODUCTION Multi-touch interfaces allow users to apply multiple spatial transformations to a virtual object with a single combined gesture. Using two fingers, for example, users can translate, rotate, and scale a photograph simultaneously. The increase in interaction bandwidth afforded by multi-touch has two main potential advantages: it could improve the speed of complex manipulations because operations need not be applied sequentially, and it is often referred to as “natural” because it resembles how we manipulate objects in the physical world. Unfortunately, multi-touch gestures also make it difficult to perform only one (or just a subset) of the available operations at a time. For example, it becomes hard to only scale and translate an object (without rotating it) because the object will also react to small variations of the angle between the contact points. Figure 1 shows a scenario in which a designer is laying out a poster for a chess competition. By enlarging and moving one of the pawns of the original figure (Figure 1A), the designer intends to create a sense of depth (Figure 1B). However, when performed on a multi-touch interface, this might result in the pawn not only being translated and scaled, but also rotated (Figure 1C). In many cases, this will interfere with what was carefully laid out earlier and stand out as a flaw, because humans are highly sensitive to variations in rotation and scale [19, 25]. We investigated four strategies to allow users to constrain multi-finger interaction to any subset of translation, rotation, and scaling manipulations while preserving, as much as possible, the freehand nature of the interactions: Handles, Magnitude Filtering, and two variants of Gesture Matching (Frame-to-Frame and FirstTouch). The Handles technique allows users to restrict manipulations explicitly. It offers one handle for each possible [email protected],{baudisch, benko, awilson}@microsoft.com 1 Computer Science Department, University of Saskatchewan, 110 Science Place, S7N 5C9, Canada. 2 One Microsoft Way, Redmond, WA 98052-6399, US. tion style; users select a manipulation by picking the corresponding handle. Magnitude Filtering acts upon touch input only when the resulting rotation, scaling, and translations exceed a minimum amplitude; small movements are filtered out. The Gesture Matching techniques help users avoid undesired manipulations by guessing which kind of manipulation the user is performing (e.g. rotation+translation, translation only). Figure 1. Poster design scenario: A) initial state B) desired result (translated & scaled) C) likely result (translated, scaled & rotated). A user study shows that Handles, First-Touch Gesture Matching and Magnitude Filtering reduce the number of unwanted sideeffect manipulations by up to 90%, although Handles does so at the expense of increased manipulation time. In addition to technique comparisons, we also performed a movement analysis of unconstrained gestures that allowed us to characterize multi-touch interaction in terms of expected error, simultaneity and order of the different manipulations. Our findings indicate that the First-Touch Gesture Matching technique and the Magnitude Filtering techniques are well-suited for bringing the benefits of multi-touch interfaces to layout tasks that would otherwise be difficult due to the extended control freedom. Our movement characterization also provides groundwork that can inform the design and configuration of future techniques. 2 RELATED WORK IN M ULTI-TOUCH Recent advances in input technology have resulted in a broad range of multi-touch devices, i.e., devices that can track more than one contact point (usually a finger) simultaneously. Compared to single-point input, Multi-touch offers additional degrees of freedom. These have often been mapped to spatial manipulations such as rotation and scaling [12, 17]. Two main motivations drive Multi-touch research. First, since is resembles the way humans manipulate physical objects, Multitouch can lead to more “natural” interactions. Second, Multi-touch is expected to be more efficient, because it allows users to manipulate multiple degrees of freedom simultaneously [11]. Multi-touch spatial manipulation shares some problems with touch screens including the lack of stability on release [17] and the fat finger/occlusion problem [22]. In this paper, however, we focus on issues related to the added number of contacts. Although bimanual interaction (e.g. [11]) can be considered multi-touch, multi-touch does not necessarily require the use of multiple hands. In this paper, we focus on a common mode of multi-touch interaction: manipulation of a single object using multiple fingers of the same hand. Users commonly manipulate objects with a single hand, especially if they are located in an area that is hard to reach with the other hand, if the object is too small for two hands, or if the other hand is used for something else, e.g. if it is manipulating a different object, is gesticulating, or is maintaining a posture. See [15, 16] for a detailed discussion on the differences between same-hand and bi-manual interaction. 2.1 Characteristics of multi-dimensional manipulation Many studies have looked at the characteristics of input in systems that allow for more than the standard two degrees of freedom. Among others, Zhai and Milgram [26], Masliah and Milgram [14], and Wang et. al. [23] analyze 3D docking tasks. Van Rhijn and Mulder investigate slicing of 3D anatomical data [21]. Mason and Bryden observe rotation tasks of real objects [15]. Latulipe et al. found performance benefits for two mice when performing bi-manual alignment tasks [11]. Buxton and Myers examine bi-manual translation and scaling [6], and Balakrishnan and Hinckley observe a continuous bi-manual tracking task [2]. Other researchers have looked at providing extended manipulations (e.g. rotation) with the standard DOF of single-point input [8, 10, 20]; however, neither this research, nor the work from the previous paragraph has focused on the issue of separability. 2.2 Manipulation separability Not all interactions on a multi-touch device are intended to produce all possible manipulations, but isolating one or more manipulations can be difficult. In [17], Moscovich and Hughes observed that “due to physiological constraints on finger motion it is difficult to rotate the fingers while keeping them at a precisely fixed distance [to keep the same scale]”. In 3D spatial manipulation a similar effect was observed by Ware [24], who found that 3D docking was harder when participants had control over all dimensions than when one of them was “locked”. We identify these as two instances of lack of separability, where separability is defined as the ability to purposefully avoid variation in one or more of the available manipulations (e.g. rotate but not scale). Our definition is inspired by Jacob and colleagues’ integrality and separability concept [9]. They propose that tasks that require multiple manipulations can be integral or separable according to their perceptual structure; for example, translation and scaling are integral because they are perceptually related, whereas translation and color are separable because they rely on separate perceptual mechanisms. Input devices can also be integral or separable (depending on how the different degrees of input are linked), and they show in an experiment that matching the integrality of the input device and the task results in better performance. Our use of the separability concept is different; although the input and perceptual structure of our task and device would be considered integral in their terms, it is sometimes desirable to achieve separation of control of the different manipulations. We believe that separability can be improved through the design of appropriate interaction techniques. Separability is also related to Zhai and Milgram’s concept of Efficiency [26]. The efficiency of a gesture refers to the simultaneity of manipulation of different dimensions. To quantify efficiency they used a formula that compares the Euclidean distance between the starting position and the final state (considering each manipulation as one of the Euclidean dimensions), to the actual trajectory achieved by the user. Note that this efficiency does not necessarily relate to faster interaction; a gesture could be very slow but also efficient if all dimensions are being manipulated simultaneously at the same rate until the final state is reached. Separability implies efficiency because avoiding unwanted manipulation in one or more dimensions makes for a shorter Euclidean trajectory. 2.3 Related problems Separability is related to but different from the problem of alignment. The purpose of alignment is to make dimensions equal to pre-set values (guides, preferred directions) or to the dimensional values of other elements in the interface (alignment, equalization). Solutions for alignment include guides, snapping [3], and the use of alignment tools, such as the alignment stick [18]. The goal of separability, in contrast, is to keep certain dimensions unchanged without such external references. In introducing techniques to interactively control separability in multi-touch interfaces, our hope is to give the user selective control while preserving much of the flavor of existing multi-touch interactions. 3 INTERACTION TECHNIQUES FOR SEPARABILITY To provide manipulation separability in multi-touch interfaces, we have explored a number of techniques of which we selected the following four for evaluation: Handles, Magnitude Filtering, and two variants of Gesture Matching. The selected techniques assume the use of two or more fingers. We acknowledge that there are other ways to ameliorate the lack of separability: for example, the iPhone interface rarely permits translation, rotation and scaling simultaneously (separability becomes less of an issue when fewer manipulations are possible); using virtual tools such as pins or guides [4] can help lock certain dimensions; and the number of touches (fingers) can be used to determine which manipulations are active [7] (one finger means translation only, two fingers means rotation only, and so on). Most of these alternative strategies are compatible with our approaches (we do not study the separability of translation-only tasks because it is more meaningful to assign one-touch interactions to translation only). However, these approaches also have important shortcomings: the assignment of number of fingers to operations is somehow arbitrary above the distinction between one finger and more; and using pins and guides for manipulation requires extra steps in the interaction that may slow down the action and complicate the interface. 3.1 Handles Single-touch interfaces typically require explicit mode changes. Usually, the modes take the form of handles, which are special regions on the object that are assigned to a certain manipulation. For example, in PowerPoint, the user may grab the object with the cursor on specific handles that determine whether the figure is rotated (small green handle in Figure 2.A), scaled (handles in the corners) or stretched (handles in the middle of the sides). Figure 2. Different kinds of handles: A) Standard cursor handles are too small for touch input; B) Apted et al.’s [1] handle implementation for rotation/scaling, and translation; C) Our own implementation with areas dedicated to specific operation combinations. We included Handles in our study because it is common in current single-point interfaces and it has been used before in the multi-touch context [1]. Handles increases separability because the operations are explicitly selected by the user at the moment of touch. In order to prevent an object from rotating, users simply avoid touching the “rotate” handle. 3.2 Magnitude Filtering The Magnitude Filtering technique filters each manipulation of a multi-touch gesture transformation (rotation, scale, translation) such that values below a certain threshold magnitude produce no effect. For example, we may interpose a function between input and output such that the object will only rotate if the rotation indicated by the contact points exceeds 30° (with respect to the original orientation) (Figure 3). Separability is achieved because users can make the desired manipulations large (over the threshold), while small manipulations are ignored. This technique works regardless of where the object is touched because the rotation is calculated using the angle between the line formed by the initial points, and the line formed by the current points, regardless of their position in the object. A B < Threshold P P C > Threshold P P‘ P‘ Q Q‘ Q‘ Q Q Figure 3. Rotation filtering A) initial touch points P and Q on an object B) the touch points rotate to the P’ and Q’ positions, but the angle is not yet above the rotation threshold C) further rotation is above the threshold and the object rotates to the angle indicated by the current touch points. Starting Position Input Snap zone Catch-up zone Starting Position Unconstrained zone B Catch-up zone A Snap zone Snap pt. Unconstrained zone Unreachable target values Unconstrained zone Unreachable target values Snap pt. Output Input Figure 4. Filtering functions. A) Snap B) Snap with buffer zone. Since the algorithm avoids abrupt transitions between zones, interactions using Magnitude Filtering feel smooth. Initially, objects offer some resistance to change in each dimension. When the gesture becomes large enough in one of the dimensions (e.g. scale) the object starts changing fast until the user’s fingers have caught up with the initial contact point. Further expansion proceeds as with a regular unconstrained manipulation. 3.3 Gesture Matching The Gesture Matching techniques explore the idea that when users desire a pure rotation (for example) they may strive to provide an input gesture that itself is a pure rotation. These techniques are based on a battery of different models that try to explain the combined motion of all the touch points on the object. Each model tries to minimize the mean root square difference between the actual motion and a motion generated with the manipulation subset of each model. There are models for simple gestures (translation) and compound gestures (rotation+translation, scale +translation, and rotation+scale+translation). The technique selects the simplest manipulation mode that still explains the actual motion reasonably well. For example, if we find that the translation-only model approximates the motion well enough, we will not engage the more complex rotation+translation model. Given a set of starting and ending positions of touch points, each model generates two outputs: the error of the best fit of the model to the data (i.e., how distant are the predicted points from the actual points), and the magnitudes of rotation, scale and translation that minimize that error. The error outputs of each model are then collected by a decision algorithm that chooses which model to apply (see Figure 5): errors are normalized using a sigmoid function, compared to the error of the active model, and subtracted a configurable parameter. The system changes to a new state when, after subtraction, the error of the corresponding model is still lower than the error of the current model. Naturally, the models with the most parameters will generate the least error, and therefore the configurable parameters have to penalize more the more complex models (this process is analogous to regularization in the machine learning field). Parameters for each transition can be configured individually. P Translation P* Q P‘ P‘ P Rot + Trans Q‘ Q Scale + Trans Scale + Rot + Trans A Q‘ * Q B Decision Algorithm This technique was inspired by snapping techniques that enlarge the motor space over the desired snap locations (e.g. screen limits, or pre-selected values of the x or y coordinates – guides) (e.g., [3, 5, 13]). Magnitude Filtering differs from snapping techniques in the following two aspects. First, Magnitude Filtering “snaps” objects only to the object’s initial state, making it easier for an object to maintain its initial rotation, initial scale, or initial position or to return to it. Snapping techniques, in contrast, generally snap to pre-selected values or to other elements in the environment. Second, we introduce a catch up zone where the transformations are amplified to allow a continuous transition between the snap zone (where variations of the input do not affect the output) and the unconstrained zone (where the output corresponds exactly with the input). This makes all target positions obtainable (a concept introduced by snap-and-go [3]) and allows dragged objects to catch up with the finger dragging it (unlike snap-and-go), thereby preventing excessive separation of finger and object. Figure 4B explains the concept by comparing it to snapping (Figure 4A). Output Unconstrained zone We modeled our implementation of the Handles technique after Apted et al.’s design [1] but modified it in order to allow for separate control of the rotate and scale dimensions (Figure 2C). We also added labels so that users could identify the operation associated to each handle. We use text labels on the handles instead of icons in order to avoid the possible ambiguities of icon interpretation during the evaluation (e.g., an icon for scale+rotate could be easily confounded with an icon for rotate only). Our prototype supports multi-touch interaction. For example, a translation+rotation gesture can be achieved by placing one finger in the move handle (which activates translation) and another in the rotate area (which activates rotation). Two fingers on the move handle will not cause any rotation or scaling, just as any number of fingers in the scale handle will not change the position or orientation of the object. * P * Q Previous Active Model Figure 5. Diagram of the Gesture Matching Technique. A) Notation: P, Q (previous touch points); P’, Q’ (current touch points); P*, Q* (estimated current touch points – translation model), red arrows (estimation errors). B) Schematic of the technique implementation. The reader might notice that the list of models in Figure 5 does not include rotate-only or scale-only models. The reason is that multi-touch gestures that we might consider pure rotation or scaling are usually a combination of rotation+translation and scaling+translation respectively. For explanation, consider the rotation gestures depicted in Figure 6; all can be considered strictly rotations, but each uses a different implicit rotation center. Instead of trying to deduce which center was meant for rotation or arbitrarily deciding on one, we decided to merge the rotation and translation models into one, i.e., rotation and translation explain a rotation gesture around any center. The reasoning is analogous for scale. Q‘ P Q‘ P βQ‘ A P P‘ x P‘ x P‘ Q Q B C Figure 6. Rotation movements around different rotation centers: A) rotation center is in Q (the Q contact point does not move) B) rotation center is in the center of the object C) rotation center is in the mass center of the contacts. 3.3.1 Frame-to-Frame Gesture Matching Our two variants of the Gesture Matching technique differ in the period of time over which the models are fitted. In Frame-toFrame Gesture Matching, models are fitted each time step using the previous frame and the current frame, and a decision is made each frame as to which manipulation model is used to control the object. Most interactive systems require frame rates of 30Hz or more. If the technique were to select a different model every few milliseconds, the behavior of the object would be very similar to the unconstrained movement (it can interleave many alternating types of short manipulations to achieve any desired final result). To avoid this we added hysteresis to the selection process: the configurable parameters add resistance to leave the current mode. The Frame-to-Frame Gesture Matching technique is very flexible because it allows for sophisticated behavior configurations; for example, certain transitions (such as rotation+translation to rotation+translation+scale) can be made more difficult, allowing for a certain “feel” of the manipulation. However the technique proved hard to configure because the information contained in frame to frame variation is often not enough to distinguish one type of gesture from another reliably, and so thresholds must be set high. Users must then indicate a change of manipulation mode through a fast gesture that has a strong component of the desired manipulation. For example, an object in rotation mode needs a fast pinch gesture before it will start scaling. 3.3.2 First-Touch Gesture Matching To avoid the problems of the Frame-to-Frame gesture matching, we implemented a variant, called First-Touch Gesture Matching, that fits the same models over the duration of the gesture. That is, it uses the touch positions from the first frame of the gesture (first touch) and compares it to the most current data. The result is a more stable estimation of the gesture (the models have much clearer changes to discriminate the manipulation). In exchange, when any of the thresholds is surpassed, the object jumps to a new position (as fit by the new model). For example, if a gesture starts with a slight scaling movement the object will start scaling. If after a while the touch points start to rotate, hysteresis will keep the scaling mode until it is decided that rotation is a much better match; at that moment the object will return to its original size and rotate to match the current rotation of the touch points. If the rotation has gone beyond the desired rotation, the user can rotate back a small amount without switching into scale mode (the hysteresis will prevent the mode change unless the touch points separate again significantly). First-Touch Gesture Matching has the configurability of the Frame-to-Frame version, but it does not require fast gestures to activate different modes. In fact, the behavior of this technique resembles that of Magnitude Filtering. There are several important differences between the First-Touch Gesture Matching and the Magnitude Filtering techniques: First-Touch Gesture Matching will jump to a new position whenever a better fit is found, whereas Magnitude Filtering will never “jump”, but change gradually instead. To access the object positions abandoned by a jump, the First-Touch Gesture Matching requires returning towards the initial state, whereas Magnitude Filtering can reach any magnitude in a monotonous movement; and First-Touch Gesture Matching can distinguish between composite manipulations (e.g. rotation and translation), whereas in Magnitude Filtering each manipulation is independent. This last difference is irrelevant for simple configurations, but it can help design a better technique if we know that certain combinations of manipulations are more likely to take place together or in certain orders. 4 EMPIRICAL EVALUATION We designed our evaluation with two goals in mind: to compare the different alternatives that enhance separability in multi-touch interfaces; and to gain insight into the nature of unrestricted motions that could help us design better techniques in the future. 4.1 Techniques We tested the four techniques described in the previous section and a baseline condition that does not constrain rotation, scale or translation; we call this the Unconstrained technique. The Magnitude Filtering technique snap zones were configured as follows: translation 20 pixels (in each direction); rotation 11.25° (for each, positive and negative angles); scale 20% (for each, enlargement and reduction). Buffer zones were set to the same size than the corresponding snap zones. The other two techniques (Frame-to-Frame and First-Touch Gesture Matching) were configured through an iterative process that resulted in thresholds similar to those of the Magnitude Filtering configuration (numeric configuration values of the different techniques are not comparable because of the different implementations). 4.2 Apparatus The experiment was run on the commercially available version of Microsoft Surface, which provides a touch input rate of 60Hz. The size of the interactive area is 76.2cm diagonal (30”) for a 1024x768px image (4:3 aspect ratio – see Figure 7A). Figure 7. A) Experimental setting. B) Beginning of a trial. 3° 10% Rotation Error Scale For each trial, participants manipulated a rectangular object (initial size 10x7.5cm) located on the right side of the screen until it matched the scale, rotation, and relative position of a reference object displayed on the left side of the screen (Figure 7B). Participants were instructed not to change dimensions that already matched the reference object. Participants pressed a button with their non-dominant hand to end the trial. Some trials required manipulating only location, only rotation, or only scale. Others required changing two or all three manipulations, resulting in seven types of trials. Each manipulation had two possible values: short and long trajectories for translation (12 and 21cm respectively), small and large rotation (30º and 60º respectively), and small and large enlargement (50% and 100% size increase respectively). There were tasks for all combinations of values (2 rotation, 2 scaling, 2 translation, 4 rotation+scaling, 4 rotation+translation, 4 scaling+translation, and 8 rotation+scaling +translation): 26 different tasks overall. In order to control noise and make technique comparisons fair, participants were instructed to always grab objects with two fingers. Participants were also encouraged (but not required) to complete each trial with a single gesture, i.e., without releasing and reacquiring the object. errors (F4,52 = 38.0, p < 0.001, η2 = 0.74) and on scale errors (F4,52 = 19.6, p < 0.001, η2 = 0.60). The rotation data does not meet the sphericity assumption, but the corrected test (Greenhouse-Geisser) shows the same results. The post-hoc analyses show that Frame-to-Frame Gesture Matching has the largest rotation error average (μ = 2.6°), significantly larger than the rest. Unconstrained gestures followed (μ = 1.3°), also significantly larger than the other three. The three remaining techniques had smaller errors, but not significantly different from each other (μHandles = 0.4°, μFirst-Touch_G.M. = 0.3°, μMagnitude_Filtering = 0.2°). Results for the scale errors follow similar lines, but the most error-prone were now Unconstrained (μ = 7%) and Frame-to-Frame Gesture Matching (μ = 4%), not significantly different from each other. Both techniques were statistically different from the rest (μMagnitude_Filtering = 2%, μFirst-Touch_G.M. = 2%, μHandles = 1%) except for the comparison between the two Gesture Matching techniques. These results are summarized in Figure 8. Rotation 4.3 Tasks Scale Error 1.5° 5% 4.4 Participants and Study design 15 participants (8 female, 7 male) of ages between 18 and 59 participated in the study in exchange of gratuity. All participants were right handed except one, who could use either left or right hand as dominant. The experiment was divided into two parts. The first part consisted of a single block of 78 trials with the Unconstrained condition (3 repetitions of each task, in random order), preceded by training (1 trial of each task – 26 trials). In the second part, the participants performed five blocks like that of the first part, one for each condition (including Unconstrained). The order of the conditions was assigned through a random Latin square and the presentation of each task followed no predictable order. Dividing the experiment into two blocks corresponds to the goals of analyzing unconstrained gestures and comparing the proposed techniques. To analyze Unconstrained gestures without any bias from our proposed techniques we ran it first; we replicated the Unconstrained block in the second part to avoid biasing against Unconstrained due to possible learning effects. Instead, our design may have biased this condition positively. At the end of the experiment the subjects filled in a questionnaire about their technique preferences. 4.5 Results: Technique comparisons To compare how well techniques achieved separability we performed statistical analyses on the final rotation and scale errors of trials that did not require rotation or scaling respectively. We did not perform analysis on translation-only tasks because of a similar problem to that discussed in section 3.3 (it is unclear what a translation-only motion is when there is rotation and scaling involved) and because the translation-only case is less relevant (see discussion at the beginning of section 3). The analysis does not include the data from the first part of the experiment (one block of unconstrained trials). Data from one participant in the Frame-to-Frame condition was lost due to an error, and therefore the participant’s data is removed from the repeated measures analyses. All significant differences reported in the post-hoc analysis are signficant at the 0.05 level after applying Bonferroni’s correction. Two-way repeated measures ANOVAs with technique and task as factors showed a strong main effect of technique on rotation 0° 0% Unconstrained Frame-Frame G.M. First-Touch G.M. Handles Mag.Fitering Figure 8. Rotation and Scale errors for no-rotation and no-scale tasks. Error bars represent 95% confidence intervals. We also tested the proportion of trials with any rotation and scale errors (for tasks that did not require either rotation or scale changes). The results follow the trend of the previous analysis: Unconstrained resulted in the maximum percentage of trials with error (98%), followed by Frame-to-Frame Gesture Matching (65%), Handles (21%), Magnitude Filtering (8%) and First-Touch Gesture Matching (6%). Scale errors shuffle the pattern except for Unconstrained, that still shows the most trials with error (98%), followed by Magnitude Filtering (42%), Frame-to-Frame Gesture Matching (36%), First-Touch Gesture Matching (23%), and Handles (13%). These results are summarized in Figure 9. 100 % Trials with rotation error 80 % Trials with scale error 60 % 40 % 20 % 0% Unconstrained Frame-Frame G.M. First-Touch G.M. Handles Mag.Filtering Figure 9. Percentage of trials with any rotation or scale errors for tasks that do not require rotation or scale (respectively). Two non-parametric Friedman analyses of the percentage data grouped by user and technique indicate a main effect of technique in both rotation and scale percentage of trials with errors (χ 2rot(4) = 44.3, p < 0.001, χ2scale(4) =43.0). We also performed a repeated-measures ANOVA on logtransformed task completion times with the same two factors (task and technique) to find out which techniques were faster. Time was measured from the moment that the user touched the object for the first time and transformed logarithmically (as is usual for linear analysis of temporal data). All means presented henceforth are back-transformed from the logarithmic domain. The analysis shows a strong main effect of technique (F4,52 = 8.3, p < 0.001, η2 = 0.39). The post-hoc analyses show that Magnitude Filtering (μ = 2,622ms), Unconstrained (μ = 2,629ms), and First-Touch Gesture Matching (FT) (μ = 2,779ms) were fastest (and statistically indistinguishable from each other); while Frame-to-Frame Gesture Matching (μ = 3,482ms), and Handles (μ = 3,482ms) were significantly slower than the other three. From pilot studies we observed differences between techniques in the time users took to start interacting with the object after each trial started. To test these differences we performed an ANOVA on the time to start gesture. The result shows a strong main effect of technique (F4,52 = 49.3, p < 0.001, η2 = 0.79). Post-hoc analysis show that it took significantly longer to start with Handles (μ = 1,404ms) than with any of the other techniques (μFrame-to-Frame_G.M. = 775ms, μFirst-Touch_G.M. = 708ms, μUnconstrained = 692ms, μMagnitude_Filtering = 686ms). Results are summarized in Figure 10. 5,000 Each participant ran a block of unconstrained trials before they used any other technique. We collected these data to understand the basic characteristics of unconstrained rotation-scaletranslation gestures in multi-touch interfaces and to look for general patterns that could help us design the next generation of techniques. This section discusses three analyses: gestural noise, allocation of control and manipulation order. Each of the signals referred to in the following sub-sections were conditioned using standard human movement signal processing procedures: the signal (variation of a magnitude in time for a given trial) was recorded directly by our software at a typical sampling rate of 60Hz; then it was resampled at 50Hz to correct for sampling period variability, then padded and processed through a four-order low-pass Butterworth filter (cut-off frequency: 8Hz). The signals of manipulations that changed were further differentiated to find the rate at which error was reduced towards the goal state, and filtered at 4Hz. 4.7.1 Gestural noise Completion Time (from touch - ms) 4,500 4.7 Results: Characterization of Unconstrained Gestures Time to start gesture (ms) 4,000 3,500 5 4 3,000 2,500 3 2 2,000 1 1,500 5 1,000 500 4 2 1 3 0 Unconstrained Frame-Frame G.M. First-Touch G.M. Handles Mag.Filtering Figure 10. Completion and time to start trials. The sum of the two columns is the total trial time. Units are milliseconds. Numbers in the bar indicate order from 1 (fastest) to 5 (slowest). We analyzed trial data to characterize the expected variability of quantities that were not supposed to change; i.e., we measured the typical scale changes for tasks that do not require scaling, and orientation changes for tasks that do not require rotation. We found that the maximum orientation error in a trial averaged across all trials that did not require rotation was 5.1º and the average maximum scale error across all trials that did not require scaling was 100% (a doubling in size). Figure 11 shows the overall distribution of all orientation and scale points for the trials indicated above. The distribution of the error for the different manipulations can help set appropriate parameters for the techniques. Rotation values distribution Scale values distribution (% over initial scale) (degrees) Across participants Magnitude Filtering and First-Touch Gesture Matching were ranked as the preferred techniques, closely followed by Unconstrained and, at a distance, by Handles and Frame-to-Frame Gesture Matching (see Table 1). Ordered nonparametric statistical contrasts of the technique preferences (pairwise Wilcoxon Signed Ranks) showed statistical differences between Handles and Frame-to-Frame Gesture Matching and the other three. Participants also ranked the techniques in terms of speed and accuracy with very similar results (not reported here). Table 1. Subjective preferences (# users assigning the rank). 1 2 3 4 best 5 worst Mean Magnitude Filter. 5 7 0 3 0 2.07 First Touch G.M. 6 2 4 3 0 2.27 Unconstrained 3 5 7 0 0 2.27 Handles 1 1 4 5 4 3.67 Frame-Fr. G.M. 0 0 0 4 11 4.73 Many participants disliked the Handles technique because “[I have] to think a little bit more” and “I cannot just automatically instinctively do it [manipulate the object]”. These comments refer to the fact that, with Handles, the type of movement must be decided before contact, whereas with the other techniques you can decide as you go. Several participants commented that the Frameto-Frame Gesture Matching technique was difficult to control; a participant noted that “it has a mind of its own”. CASES 4.6 Results: Subjective -12° -6° 0° 6° 12° -30% 0% 30% 50% Figure 11. Histograms of the rotation (left) and scale magnitudes (right) for trials not requiring rotation or scaling respectively. 4.7.2 Allocation of control An important characteristic of any multi-dimensional gesture is degree to which objects are manipulated simultaneously in several dimensions [9, 14, 26]. Several metrics of control allocation have been proposed in the input control literature, from which we chose the m-metric proposed [14] for being the most comprehensive. The m-metric measures the degree of simultaneity of two or more signals on a continuous scale between 0 and 1, where 0 indicates that the signals never change simultaneously (e.g., they take turns in how they change) and 1 indicates that they are perfectly synchronized (e.g., one signal is an amplified version of the other). In our case we used the m-metric to calculate which manipulations are more coordinated with each other. We calculated three m-metric coefficients, one for each of the possible manipulation couples: rotation-translation, rotation-scale and scaletranslation. Each unconstrained trial generated one measure for each of the combinations. A two-way repeated-measures ANOVA with task type and manipulation-couple as factors showed a strong significant main effect of manipulation-couple (F2,28 = 48.3, p < 0.001, η2 = 0.77). Post-hoc comparisons confirmed that rotation and translation are more simultaneous (average 0.43) than either scale and translation (0.32) or scale and rotation (0.32 – all posthocs p < 0.05, with Bonferroni correction). These results indicate that whereas rotation and translation seem simultaneous, scaling proceeds more independently. 4.7.3 Order of manipulations To learn about the temporal distribution of the different manipulations in time we performed an analysis similar to Mason et al.’s [15] and Wang et al.’s [23] temporal analysis. In a first step we normalize the signals in magnitude and time to have equal areas; then we calculate the contiguous area of the signal that contains the time of fastest change and covers the 50% of the total variation towards the goal value. This calculation gives the estimated periods when the signal experienced most of its change. The results are summarized in Figure 12. Periods of high change occur within the first quarter of the gesture, consistent with the usual movement patterns of targeting and docking tasks. The graph also shows how manipulations start in a typical order: first translation, then rotation and finally scale. The rotation manipulation is contained within the translation manipulation, which is consistent with Wang et al.’s analysis of 3D docking problems (rotation and translation only) [23]. Rotation Translation Scale 0 10% 20% Normalized Time (% of gesture duration) 30% Figure 12. Duration of the periods of maximum activity of each manipulation with respect to the total duration of the gesture. Dotted lines represent confidence intervals. 5 DISCUSSION We divide our discussion in three main themes: how to reduce undesired manipulations; the limitations of our experiment and what we learned about unconstrained motions. 5.1 Reducing Undesired Manipulation The data from our study confirms that the lack of manipulation separability can be a problem for tasks where any error in size or rotation is important: 98% of the unconstrained gestures contain some undesired rotation or scale; and the average error amounts to 2.5° and 7% respectively. All but the Frame-to-Frame Gesture Matching technique succeeded in reducing both the average error and the proportion of trials with errors. However, there are important differences in how the techniques perform. 5.1.1 First-Touch Gesture Matching vs. Magnitude Filtering These two techniques were rated best by users and were among the most successful in reducing scale and rotation error (together with Handles). The similarity of the results is consistent with the similar behavior of the techniques; however, the main differences between these techniques are not in our empirical data, but in the way they are implemented and the configurability they afford. Magnitude Filtering is a straight forward technique to implement and configure: each manipulation is filtered separately, and only a couple of parameters can be adjusted (the snap and buffer zones of the transfer function). In contrast, the Gesture Matching technique requires the setup of parameters for each transition. Our experience proved that the configuration of the Gesture Matching techniques is complex; on the other hand, the large parameter space offers many possibilities. Gesture Matching techniques could be configured differently for different applications so that interaction designers have control not only of the level of noise that is tolerated in certain manipulations, but also in the way that the technique feels: for example more or less “sticky”. We also believe that Gesture Matching techniques could take advantage of a deeper knowledge of human spatial manipulation gestures. For example, we could configure it to take into account the order in which manipulations are usually performed: transitions from translation to translation+rotation modes could be made easier than transitions from translation+scale to translation+rotation. 5.1.2 Problems with Handles The Handles technique might seem the obvious choice for preventing unwanted operations because it is explicit and works as most single-point interfaces. However, we discovered that trials took about 50% longer than with other techniques, and that it does not reduce error better than any of the two winners. We speculate that, with handles, the user must think at touch time about the manipulations that the movement will require and also target a smaller region of the object. We also speculate that it is likely that sometimes the initial grip of the object is not accommodating enough to comfortably reach the goal position (anatomical constraints of the hands, see also [16]), requiring changes in touch positions in the middle of the gesture. The high percentage of trials with errors (21% for no-rotation tasks and 13% for no-scale tasks) also points to the difficulty of selecting proper handles in advance (grabbing the wrong handle was scored as an error). We rule out that these results are due to the difficulty in finding the correct handle because the tasks always started with the object in the same position and orientation, and the participants had plenty of opportunities (training) to learn the handles arrangement. Although our design of handles might be improved through design (e.g. intelligent handles that adapt to the users’ position, circular handles), we believe that the problems exposed by our experiment, and some other intrinsic problems of handles (visual clutter, occlusion of content, fat finger issues with small handles) should be carefully taken into account by designers when choosing multi-touch techniques for spatial manipulations. 5.1.3 Frame-to-Frame Gesture Matching Even though we implemented and configured all techniques to the best of our abilities, Frame-to-Frame Gesture Matching showed very little benefit for separability, and was by far the least preferred. We believe that two factors explain the failure: there is very little information about a gesture between two frames, making the technique error-prone and hard to configure; and participants found difficult to perform the required fast gesture. 5.2 Limitations of the study It is important to note several significant limitations to our study. We performed the study on one-handed multi-touch gestures on a horizontal surface. It is difficult to use our results to draw conclusions about the performance of single touch interactions, bimanual interactions or performance on vertical surfaces, and further studies are needed to investigate those comparisons. Our study was also constrained to tasks with relatively large angle, distance and scale changes. The techniques that we compared do not make impossible to perform small adjustments (see details in section 3), but they might hinder these tasks. The trade- off between the configuration of techniques to improve separability (e.g. thresholds of magnitude filters) and performance with small adjustments deserves further study. 5.3 Nature of multi-touch spatial transformations Although most of the results from our characterization of unconstrained gestures do not help us address the separability problem, we believe that they are useful for understanding the nature of multi-touch movements, and can be useful to configure and inspire future techniques. For example, scale error has a larger variability than we expected, which may explain why Magnitude Filtering performed relatively poorly in the no-scaling trials (the threshold was probably too low in the scale dimension). We also learned that unconstrained gestures have large variations in all dimensions before they approximate the goal position. This suggests that it is difficult to solve the separability issues with techniques based on thresholds that disappear once they are surpassed (unlike the tested version of Magnitude Filtering) or techniques where it is difficult to go back to the original state of the object. Our analysis also indicates that there might be benefit in considering scaling as a different class of manipulation (it is less simultaneous than the rest), and that different manipulations tend to start at slightly offset times within the gesture. All of these findings offer promising avenues for future advances. 5.4 Lessons for practitioners We summarize the contributions of our study in four main statements:  Separability can be a serious issue for spatial manipulation applications.  Magnitude Filtering and First-Touch Gesture Matching can help improve separability.  The Handles technique makes manipulation slower and it has intrinsic problems, although it does help separability.  Gesture Matching techniques can be difficult to configure, although they offer configurability. 6 CONCLUSION AND FUTURE WORK Layout tasks often require careful control of the ways in which objects are manipulated. Multi-touch interaction can be faster and more natural, but it also presents the problem of separability: it becomes difficult to control one of the dimensions (e.g. orientation) without slightly affecting other (e.g. size). In this research we explored four different techniques to reduce unwanted manipulations for single-hand multi-touch spatial transformations. We found that First-Touch Gesture Matching and Magnitude Filtering improve separability without negatively affecting performance, and that using Handles results in similar gains in separability, but at the cost of extra interaction time. In the future, we are interested in developing new techniques based on what we have learned. For example, in the case of Gesture Matching, more sophisticated temporal models such as Hidden Markov Models may allow the calculation of transforms from moments other than “first touch”. Also interesting are techniques that balance the explicitness of the handles and the implicitness of Gesture Matching; for example, techniques that give subtle cues about what mode it is about to activate, and allow the user to react in consequence. Finally, we are also exploring how to improve separability in situations where many other manipulations are available (e.g. stretching, shearing and perspective transforms). REFERENCES 1. Apted, T., Kay, J., and Quigley, A. 2006. Tabletop sharing of digital photographs for the elderly. Proc. CHI'06, 781-790. 2. Balakrishnan, R. and Hinckley, K. 2000. Symmetric bimanual interaction. Proc. CHI'00, 33-40. 3. Baudisch, P., Cutrell, E., Hinckley, K., and Eversole, A. 2005. Snapand-go: helping users align objects without the modality of traditional snapping. Proc. CHI '05, 301-310. 4. Beaudoin-Lafon, M. and Lassen, M. 2000. The architecture and implementation of a Post-WIMP Graphical Application. Proc. UIST'00. 5. Bier, E. A. and Stone, M. C. 1986. Snap-dragging. Comput. Graph. 20, 4 (Aug. 1986, 233-240. 6. Buxton, W. and Myers, B. A. 1986. A Study in Two-Handed Input. Proc. CHI’86, 321-326. 7. Hancock, M., Carpendale, S., and Cockburn, A. 2007. Shallow-depth 3D Interaction: Design and evaluation of one-, two- and three-touch techniques. Proc. CHI’07, 1147-1156. 8. Hancock, M. S., Vernier, F. Wigdor, D., Carpendale, S., and Shen, C. 2006. Rotation and translation mechanisms for tabletop interaction. Proc. Tabletop’06, 79-86, 9. Jacob, R. J., Sibert, L. E., McFarlane, D. C., and Mullen, M. P. 1994. Integrality and separability of input devices. TOCHI. 1, 1 (Mar. 1994), 3-26. 10. Kruger, R., Carpendale, S., Scott, S., and Tang, A. 2005. Fluid Integration of rotation and translation. Proc. CHI’05, 601-610. 11. Latulipe, C., Kaplan, C. S., and Clarke, C. L. 2005. Bimanual and unimanual image alignment: an evaluation of mouse-based techniques. Proc. UIST'05, 123-131. 12. MacKenzie, I. S., Soukoreff, R. W., and Pal, C. 1997. A two-ball mouse affords three degrees of freedom. Ext. Abs. CHI'97. 303-304. 13. Mandryk, R. L., Rodgers, M. E., and Inkpen, K. M. 2005. Sticky widgets: pseudo-haptic widget enhancements for multi-monitor displays. Ext. Abs. CHI'05, 1621-1624. 14. Masliah, M. R. and Milgram, P. 2000. Measuring the allocation of control in a 6 degree-of-freedom docking experiment. Proc. CHI'00, 25-32. 15. Mason, A.H. & Bryden, P.J. 2007. Coordination and concurrency in bimanual rotation tasks when moving away from and toward the body. Exp. Brain. Res. 183. Springer. 541-556. 16. Moscovich, T. and Hughes, J. F. 2008. Indirect mappings of multitouch input using one and two hands. Proc. of CHI'08, 1275-1284. 17. Moscovich, T. and Hughes, J. F. 2006. Multi-finger cursor techniques. Proc. Graphics Interface’06, 1-7. 18. Raisamo, R. and Räihä, K. 1996. A new direct manipulation technique for aligning objects in drawing programs. Proc. UIST'96, 157-164. 19. Schiffman, H.R. 2001. Fundamental Visual Functions and Phenomena. Sensation and Perception. Wiley, 89-115. 20. Shen, C., Vernier, F., Forlines, C., and Ringel, M. 2004. DiamondSpin: an extensible toolkit for around-the-table interaction. Proc. CHI’04, 167-174. 21. van Rhijn, A. and Mulder, J. D. 2006. Spatial input device structure and bimanual object manipulation in virtual environments. Proc. VRST'06., 51-60. 22. Vogel, D. and Baudisch, P. 2007. Shift: a technique for operating penbased interfaces using touch. Proc. CHI'07, 657-666. 23. Wang, Y., MacKenzie, C. L., Summers, V. A., and Booth, K. S. 1998. The structure of object transportation and orientation in humancomputer interaction. Proc. CHI’09, 312-319. 24. Ware, C. Using hand position for virtual object placement. Vis. Comput. 6, 5 (Nov. 1990), 245-253. 25. Williams, R. 2008 The Non-Designer's Design Book, Third Edition. Peachpit Press. 26. Zhai, S., Milgram, P., 1998. Quantifying coordination in multiple DOF movement and its application to evaluating 6 DOF input devices. Proc. CHI'98, 320-327.