Remove background folder
42
4-manipulation/visual-hand/1-introduction.tex
Normal file
@@ -0,0 +1,42 @@
|
||||
\section{Introduction}
|
||||
\label{intro}
|
||||
|
||||
Touching, grasping and manipulating virtual objects are fundamental interactions in \AR (\secref[related_work]{ve_tasks}) and essential for many of its applications (\secref[related_work]{ar_applications}).
|
||||
The most common current \AR systems, in the form of portable and immersive \OST-\AR headsets \cite{hertel2021taxonomy}, allow real-time hand tracking and direct interaction with virtual objects with bare hands (\secref[related_work]{real_virtual_gap}).
|
||||
Manipulation of virtual objects is achieved using a virtual hand interaction technique that represents the user's hand in the \VE and simulates interaction with virtual objects (\secref[related_work]{ar_virtual_hands}).
|
||||
However, direct hand manipulation is still challenging due to the intangibility of the \VE, the lack of mutual occlusion between the hand and the virtual object in \OST-\AR (\secref[related_work]{ar_displays}), and the inherent delays between the user's hand and the result of the interaction simulation (\secref[related_work]{ar_virtual_hands}).
|
||||
|
||||
In this chapter, we investigate the \textbf{visual rendering as hand augmentation} for direct manipulation of virtual objects in \OST-\AR.
|
||||
To this end, we selected in the literature and compared the most popular visual hand renderings used to interact with virtual objects in \AR.
|
||||
The virtual hand is \textbf{displayed superimposed} on the user's hand with these visual rendering, providing a \textbf{feedback on the tracking} of the real hand, as shown in \figref{hands}.
|
||||
The movement of the virtual hand is also \textbf{constrained to the surface} of the virtual object, providing an additional \textbf{feedback on the interaction} with the virtual object.
|
||||
We \textbf{evaluate in a user study}, using the \OST-\AR headset Microsoft HoloLens~2, the effect of six visual hand renderings on the user performance and experience in two representative manipulation tasks: push-and-slide and grasp-and-place a virtual object directly with the hand.
|
||||
|
||||
\noindentskip The main contributions of this chapter are:
|
||||
\begin{itemize}
|
||||
\item A comparison from the literature of the six most common visual hand renderings used to interact with virtual objects in \AR.
|
||||
\item A user study evaluating with 24 participants the performance and user experience of the six visual hand renderings as augmentation of the real hand during free and direct hand manipulation of virtual objects in \OST-\AR.
|
||||
\end{itemize}
|
||||
|
||||
\noindentskip In the next sections, we first present the six visual hand renderings we considered and gathered from the literature. We then describe the experimental setup and design, the two manipulation tasks, and the metrics used. We present the results of the user study and discuss the implications of these results for the manipulation of virtual objects directly with the hand in \AR.
|
||||
|
||||
\bigskip
|
||||
|
||||
\begin{subfigs}{hands}{The six visual hand renderings as augmentation of the real hands.}[
|
||||
As seen by the user through the \AR headset during the two-finger grasping of a virtual cube.
|
||||
][
|
||||
\item No visual rendering \level{(None)}.
|
||||
\item Cropped virtual content to enable hand-cube occlusion \level{(Occlusion, Occl)}.
|
||||
\item Rings on the fingertips \level{(Tips)}.
|
||||
\item Thin outline of the hand \level{(Contour, Cont)}.
|
||||
\item Fingers' joints and phalanges \level{(Skeleton, Skel)}.
|
||||
\item Semi-transparent \ThreeD hand model \level{(Mesh)}.
|
||||
]
|
||||
\subfig[0.22]{method/hands-none}
|
||||
\subfig[0.22]{method/hands-occlusion}
|
||||
\subfig[0.22]{method/hands-tips}
|
||||
\par
|
||||
\subfig[0.22]{method/hands-contour}
|
||||
\subfig[0.22]{method/hands-skeleton}
|
||||
\subfig[0.22]{method/hands-mesh}
|
||||
\end{subfigs}
|
||||
167
4-manipulation/visual-hand/2-method.tex
Normal file
@@ -0,0 +1,167 @@
|
||||
\section{Visual Hand Renderings}
|
||||
\label{hands}
|
||||
|
||||
We compared a set of the most popular visual hand renderings, as found in the literature \secref[related_work]{ar_visual_hands}.
|
||||
Since we address hand-centered manipulation tasks, we only considered renderings including the fingertips (\secref[related_work]{grasp_types}).
|
||||
Moreover, as to keep the focus on the hand rendering itself, we used neutral semi-transparent grey meshes, consistent with the choices made in \cite{yoon2020evaluating,vanveldhuizen2021effect}.
|
||||
All considered hand renderings are drawn following the tracked pose of the user's real hand.
|
||||
However, while the real hand can of course penetrate virtual objects, the visual hand is always constrained by the \VE (\secref[related_work]{ar_virtual_hands}).
|
||||
|
||||
They are shown in \figref{hands} and described below, with an abbreviation in parentheses when needed.
|
||||
|
||||
\paragraph{None}
|
||||
|
||||
As a reference, we considered no visual hand rendering (\figref{method/hands-none}), as is common in \AR \cite{hettiarachchi2016annexing,blaga2017usability,xiao2018mrtouch,teng2021touch}.
|
||||
Users have no information about hand tracking and no feedback about contact with the virtual objects, other than their movement when touched.
|
||||
As virtual content is rendered on top of the \RE, the hand of the user can be hidden by the virtual objects when manipulating them (\secref[related_work]{ar_displays}).
|
||||
|
||||
\paragraph{Occlusion (Occl)}
|
||||
|
||||
To avoid the abovementioned undesired occlusions due to the virtual content being rendered on top of the \RE, we can carefully crop the former whenever it hides real content that should be visible \cite{macedo2023occlusion}, \eg the thumb of the user in \figref{method/hands-occlusion}.
|
||||
This approach is frequent in works using \VST-\AR headsets \cite{knorlein2009influence,ha2014wearhand,piumsomboon2014graspshell,suzuki2014grasping,al-kalbani2016analysis}.
|
||||
|
||||
\paragraph{Tips}
|
||||
|
||||
This rendering shows small visual rings around the fingertips of the user (\figref{method/hands-tips}), highlighting the most important parts of the hand and contact with virtual objects during fine manipulation (\secref[related_work]{grasp_types}).
|
||||
Unlike work using small spheres \cite{maisto2017evaluation,meli2014wearable,grubert2018effects,normand2018enlarging,schwind2018touch}, this ring rendering also provides information about the orientation of the fingertips.
|
||||
|
||||
\paragraph{Contour (Cont)}
|
||||
|
||||
This rendering is a \qty{1}{\mm} thick outline contouring the user's hands, providing information about the whole hand while leaving its inside visible.
|
||||
Unlike the other renderings, it is not occluded by the virtual objects, as shown in \figref{method/hands-contour}.
|
||||
This rendering is not as usual as the previous others in the literature \cite{kang2020comparative}.
|
||||
|
||||
\paragraph{Skeleton (Skel)}
|
||||
|
||||
This rendering schematically renders the joints and phalanges of the fingers with small spheres and cylinders, respectively, leaving the outside of the hand visible (\figref{method/hands-skeleton}).
|
||||
It can be seen as an extension of the Tips rendering to include the complete fingers articulations.
|
||||
It is widely used in \VR \cite{argelaguet2016role,schwind2018touch,chessa2019grasping} and \AR \cite{blaga2017usability,yoon2020evaluating}, as it is considered simple yet rich and comprehensive.
|
||||
|
||||
\paragraph{Mesh}
|
||||
|
||||
This rendering is a \ThreeD semi-transparent ($a=0.2$) hand model (\figref{method/hands-mesh}), which is common in \VR \cite{prachyabrued2014visual,argelaguet2016role,schwind2018touch,chessa2019grasping,yoon2020evaluating,vanveldhuizen2021effect}.
|
||||
It can be seen as a filled version of the Contour hand rendering, thus partially covering the view of the real hand.
|
||||
|
||||
\section{User Study}
|
||||
\label{method}
|
||||
|
||||
We aim to investigate whether the chosen visual hand rendering affects the performance and user experience of manipulating virtual objects with free hands in \AR.
|
||||
|
||||
\subsection{Manipulation Tasks and Virtual Scene}
|
||||
\label{tasks}
|
||||
|
||||
Following the guidelines of \textcite{bergstrom2021how} for designing object manipulation tasks, we considered two variations of a \ThreeD pick-and-place task, commonly found in interaction and manipulation studies \cite{prachyabrued2014visual,blaga2017usability,maisto2017evaluation,meli2018combining,vanveldhuizen2021effect}.
|
||||
|
||||
\subsubsection{Push Task}
|
||||
\label{push-task}
|
||||
|
||||
The first manipulation task consists in pushing a virtual object along a real flat surface towards a target placed on the same plane (\figref{method/task-push}).
|
||||
The virtual object to manipulate is a small \qty{50}{\mm} blue and opaque cube, while the target is a (slightly) bigger \qty{70}{\mm} blue and semi-transparent volume.
|
||||
At every repetition of the task, the cube to manipulate always spawns at the same place, on top of a real table in front of the user.
|
||||
On the other hand, the target volume can spawn in eight different locations on the same table, located on a \qty{20}{\cm} radius circle centred on the cube, at \qty{45}{\degree} from each other (again \figref{method/task-push}).
|
||||
Users are asked to push the cube towards the target volume using their fingertips in any way they prefer.
|
||||
In this task, the cube cannot be lifted.
|
||||
The task is considered completed when the cube is \emph{fully} inside the target volume.
|
||||
|
||||
\subsubsection{Grasp Task}
|
||||
\label{grasp-task}
|
||||
|
||||
The second manipulation task consists in grasping, lifting, and placing a virtual object in a target placed on a different (higher) plane (\figref{method/task-grasp}).
|
||||
The cube to manipulate and target volume are the same as in the previous task.
|
||||
However, this time, the target volume can spawn in eight different locations on a plane \qty{10}{\cm} \emph{above} the table, still located on a \qty{20}{\cm} radius circle at \qty{45}{\degree} from each other.
|
||||
Users are asked to grasp, lift, and move the cube towards the target volume using their fingertips in any way they prefer.
|
||||
As before, the task is considered completed when the cube is \emph{fully} inside the volume.
|
||||
|
||||
\begin{subfigs}{tasks}{The two manipulation tasks of the user study.}[
|
||||
The cube to manipulate is in the middle of the table (\qty{5}{cm} edge and opaque) and the eight possible targets to reach are arround (\qty{7}{cm} edge volume and semi-transparent).
|
||||
Only one target at a time was shown during the experiments.
|
||||
][
|
||||
\item Push task: pushing the virtual cube along a table towards a target placed on the same surface.
|
||||
\item Grasp task: grasping and lifting the virtual cube towards a target placed on a \qty{20}{\cm} higher plane.
|
||||
]
|
||||
\subfig[0.45]{method/task-push}
|
||||
\subfig[0.45]{method/task-grasp}
|
||||
\end{subfigs}
|
||||
|
||||
\subsection{Experimental Design}
|
||||
\label{design}
|
||||
|
||||
We analyzed the two tasks separately.
|
||||
For each of them, we considered two independent, within-subject, variables:
|
||||
\begin{itemize}
|
||||
\item \factor{Hand}, consisting of the six possible visual hand renderings discussed in \secref{hands}: \level{None}, \level{Occlusion} (Occl), \level{Tips}, \level{Contour} (Cont), \level{Skeleton} (Skel), and \level{Mesh}.
|
||||
\item \factor{Target}, consisting of the eight possible locations of the target volume, named from the participant's point of view and as shown in \figref{tasks}: right (\level{R}), right-back (\level{RB}), back (\level{B}), left-back (\level{LB}), left (\level{L}), left-front (\level{LF}), front (\level{F}) and right-front (\level{RF}).
|
||||
|
||||
\end{itemize}
|
||||
Each condition was repeated three times.
|
||||
To control learning effects, we counter-balanced the orders of the two manipulation tasks and visual hand renderings following a 6 \x 6 Latin square, leading to six blocks where the position of the target volume was in turn randomized.
|
||||
This design led to a total of 2 manipulation tasks \x 6 visual hand renderings \x 8 targets \x 3 repetitions $=$ 288 trials per participant.
|
||||
|
||||
\subsection{Apparatus}
|
||||
\label{apparatus}
|
||||
|
||||
We used the \OST-\AR headset HoloLens~2, as described in \secref[vhar_system]{virtual_real_alignment}.
|
||||
%It is capable of rendering virtual content within an horizontal field of view of \qty{43}{\degree} and a vertical one of \qty{29}{\degree}. It is also able to track the environment as well as the user's fingers.
|
||||
It is also able to track the user's fingers.
|
||||
We measured the latency of the hand tracking at \qty{15}{\ms}, independent of the hand movement speed.
|
||||
|
||||
The implementation of our experiment was done using Unity 2022.1, PhysX 4.1, and the Mixed Reality Toolkit (MRTK) 2.8.
|
||||
The compiled application ran directly on the HoloLens~2 at \qty{60}{FPS}.
|
||||
|
||||
The default \ThreeD hand model from MRTK was used for all visual hand renderings.
|
||||
By changing the material properties of this hand model, we were able to achieve the six renderings shown in \figref{hands}.
|
||||
A calibration was performed for every participant, to best adapt the size of the visual hand rendering to their real hand.
|
||||
A set of empirical tests enabled us to choose the best rendering characteristics in terms of transparency and brightness for the virtual objects and hand renderings, which were applied throughout the experiment.
|
||||
|
||||
The hand tracking information provided by MRTK was used to construct a virtual articulated physics-enabled hand (\secref[related_work]{ar_virtual_hands}) using PhysX.
|
||||
It featured 25 DoFs, including the fingers proximal, middle, and distal phalanges.
|
||||
To allow effective (and stable) physical interactions between the hand and the virtual cube to manipulate, we implemented an approach similar to that of \textcite{borst2006spring}, where a series of virtual springs with high stiffness are used to couple the physics-enabled hand with the tracked hand.
|
||||
As before, a set of empirical tests have been used to select the most effective physical characteristics in terms of mass, elastic constant, friction, damping, colliders size, and shape for the (tracked) virtual hand interaction model.
|
||||
|
||||
The room where the experiment was held had no windows, with one light source of \qty{800}{\lumen} placed \qty{70}{\cm} above the table.
|
||||
This setup enabled a good and consistent tracking of the user's fingers.
|
||||
|
||||
\subsection{Procedure}
|
||||
\label{procedure}
|
||||
|
||||
First, participants were given a consent form that briefed them about the tasks and the procedure of the experiment.
|
||||
Then, participants were asked to comfortably sit in front of a table and wear the HoloLens~2 headset as shown in~\figref{tasks}, perform the calibration of the visual hand size as described in~\secref{apparatus}, and complete a \qty{2}{min} training to familiarize with the \AR rendering and the two considered tasks.
|
||||
During this training, we did not use any of the six hand renderings we want to test, but rather a fully-opaque white hand rendering that completely occluded the real hand of the user.
|
||||
Participants were asked to carry out the two tasks as naturally and as fast as possible.
|
||||
Similarly to \cite{prachyabrued2014visual,maisto2017evaluation,blaga2017usability,vanveldhuizen2021effect}, we only allowed the use of the dominant hand.
|
||||
The experiment took around 1 hour and 20 minutes to complete.
|
||||
|
||||
\subsection{Participants}
|
||||
\label{participants}
|
||||
|
||||
Twenty-four subjects participated in the study (eight aged between 18 and 24, fourteen aged between 25 and 34, and two aged between 35 and 44; 22~males, 1~female, 1~preferred not to say).
|
||||
None of the participants reported any deficiencies in their visual perception abilities.
|
||||
Two subjects were left-handed, while the twenty-two other were right-handed; they all used their dominant hand during the trials.
|
||||
Ten subjects had significant experience with \VR (\enquote{I use it every week}), while the fourteen other reported little to no experience with \VR.
|
||||
Two subjects had significant experience with \AR (\enquote{I use it every week}), while the twenty-two other reported little to no experience with \AR.
|
||||
Participants signed an informed consent, including the declaration of having no conflict of interest.
|
||||
|
||||
\subsection{Collected Data}
|
||||
\label{metrics}
|
||||
|
||||
Inspired by \textcite[p.674]{laviolajr20173d}, we collected the following metrics during the experiment:
|
||||
\begin{itemize}
|
||||
\item \response{Completion Time}, defined as the time elapsed between the first contact with the virtual cube and its correct placement inside the target volume; as subjects were asked to complete the tasks as fast as possible, lower completion times mean better performance.
|
||||
\item \response{Contacts}, defined as the number of separate times the user's hand makes contact with the virtual cube; in both tasks, a lower number of contacts means a smoother continuous interaction with the object.
|
||||
\item \response{Time per Contact}, defined as the total time any part of the user's hand contacted the cube divided by the number of contacts; higher values mean that the user interacted with the object for longer non-interrupted periods of time.
|
||||
\item \response{Grip Aperture} (solely for the grasp-and-place task), defined as the average distance between the thumb's fingertip and the other fingertips during the grasping of the cube; lower values indicate a greater finger interpenetration with the cube, resulting in a greater discrepancy between the real hand and the visual hand rendering constrained to the cube surfaces and showing how confident users are in their grasp \cite{prachyabrued2014visual, al-kalbani2016analysis, blaga2017usability, chessa2019grasping}.
|
||||
\end{itemize}
|
||||
Taken together, these measures provide an overview of the performance and usability of each of the visual hand renderings tested, as we hypothesized that they should influence the behavior and effectiveness of the participants.
|
||||
|
||||
At the end of each task, participants were asked to rank the visual hand renderings according to their preference with respect to the considered task.
|
||||
Participants also rated each visual hand rendering individually on six questions using a 7-item Likert scale (1=Not at all, 7=Extremely):
|
||||
\begin{itemize}
|
||||
\item \response{Difficulty}: How difficult were the tasks?
|
||||
\item \response{Fatigue}: How fatiguing (mentally and physically) were the tasks?
|
||||
\item \response{Precision}: How precise were you in performing the tasks?
|
||||
\item \response{Performance}: How successful were you in performing the tasks? %
|
||||
\item \response{Efficiency}: How fast/efficient do you think you were in performing the tasks?
|
||||
\item \response{Rating}: How much do you like each visual hand?
|
||||
\end{itemize}
|
||||
|
||||
Finally, participants were encouraged to comment out loud on the conditions throughout the experiment, as well as in an open-ended question at its end, to gather additional qualitative information.
|
||||
10
4-manipulation/visual-hand/3-0-results.tex
Normal file
@@ -0,0 +1,10 @@
|
||||
\section{Results}
|
||||
\label{results}
|
||||
|
||||
Results of each trial metrics were analyzed with an \ANOVA on a \LMM model, with the order of the two manipulation tasks and the six visual hand renderings (\factor{Order}), the visual hand renderings (\factor{Hand}), the target volume position (\factor{Target}), and their interactions as fixed effects and the \factor{Participant} as random intercept.
|
||||
For every \LMM, residuals were tested with a Q-Q plot to confirm normality.
|
||||
On statistically significant effects, estimated marginal means of the \LMM were compared pairwise using Tukey's \HSD test.
|
||||
Only significant results were reported.
|
||||
|
||||
Because \response{Completion Time}, \response{Contacts}, and \response{Time per Contact} measure results were Gamma distributed, they were first transformed with a log to approximate a normal distribution.
|
||||
Their analysis results are reported anti-logged, corresponding to geometric means of the measures.
|
||||
52
4-manipulation/visual-hand/3-1-push.tex
Normal file
@@ -0,0 +1,52 @@
|
||||
\subsection{Push Task}
|
||||
\label{push}
|
||||
|
||||
\paragraph{Completion Time}
|
||||
|
||||
On the time to complete a trial, there were two statistically significant effects: %
|
||||
\factor{Hand} (\anova{5}{2868}{24.8}, \pinf{0.001}, see \figref{results/Push-ContactsCount-Hand-Overall-Means}) %
|
||||
and \factor{Target} (\anova{7}{2868}{5.9}, \pinf{0.001}).
|
||||
\level{Skeleton} was the fastest, more than \level{None} (\percent{+18}, \p{0.005}), \level{Occlusion} (\percent{+26}, \pinf{0.001}), \level{Tips} (\percent{+22}, \pinf{0.001}), and \level{Contour} (\percent{+20}, \p{0.001}).
|
||||
|
||||
Three groups of targets volumes were identified:
|
||||
(1) sides \level{R}, \level{L}, and \level{LF} targets were the fastest;
|
||||
(2) back and front \level{RB}, \level{F}, and \level{RF} were slower (\p{0.003});
|
||||
and (3) back \level{B} and \level{LB} targets were the slowest (\p{0.04}).
|
||||
|
||||
\paragraph{Contacts}
|
||||
|
||||
On the number of contacts, there were two statistically significant effects: %
|
||||
\factor{Hand} (\anova{5}{2868}{6.7}, \pinf{0.001}, see \figref{results/Push-ContactsCount-Hand-Overall-Means}) %
|
||||
and \factor{Target} (\anova{7}{2868}{27.8}, \pinf{0.001}).
|
||||
|
||||
Less contacts were made with \level{Skeleton} than with \level{None} (\percent{-23}, \pinf{0.001}), \level{Occlusion} (\percent{-26}, \pinf{0.001}), \level{Tips} (\percent{-18}, \p{0.004}), and \level{Contour} (\percent{-15}, \p{0.02});
|
||||
and less with \level{Mesh} than with \level{Occlusion} (\percent{-14}, \p{0.04}).
|
||||
This indicates how effective a visual hand rendering is: a lower result indicates a smoother ability to push and rotate properly the cube into the target, as one would probably do with a real cube.
|
||||
|
||||
Targets on the left (\level{L}, \level{LF}) and the right (\level{R}) were easier to reach than the back ones (\level{B}, \level{LB}, \pinf{0.001}).
|
||||
|
||||
\paragraph{Time per Contact}
|
||||
|
||||
On the mean time spent on each contact, there were two statistically significant effects: %
|
||||
\factor{Hand} (\anova{5}{2868}{8.4}, \pinf{0.001}, see \figref{results/Push-MeanContactTime-Hand-Overall-Means}) %
|
||||
and \factor{Target} (\anova{7}{2868}{19.4}, \pinf{0.001}).
|
||||
|
||||
It was shorter with \level{None} than with \level{Skeleton} (\percent{-10}, \pinf{0.001}) and \level{Mesh} (\percent{-8}, \p{0.03});
|
||||
and shorter with \level{Occlusion} than with \level{Tips} (\percent{-10}, \p{0.002}), \level{Contour} (\percent{-10}, \p{0.001}), \level{Skeleton} (\percent{-14}, \p{0.001}), and \level{Mesh} (\percent{-12}, \p{0.03}).
|
||||
This result suggests that users pushed the virtual cube with more confidence with a visible visual hand rendering.
|
||||
On the contrary, the lack of visual hand constrained the participants to give more attention to the cube's reactions.
|
||||
|
||||
Targets on the left (\level{L}, \level{LF}) and the right (\level{R}) sides had higher \response{Timer per Contact} than all the other targets (\p{0.005}).
|
||||
|
||||
\begin{subfigs}{push_results}{Results of the push task performance metrics for each visual hand rendering.}[
|
||||
Geometric means with bootstrap \percent{95} \CI
|
||||
and Tukey's \HSD pairwise comparisons: *** is \pinf{0.001}, ** is \pinf{0.01}, and * is \pinf{0.05}.
|
||||
][
|
||||
\item Time to complete a trial.
|
||||
\item Number of contacts with the cube.
|
||||
\item Time spent on each contact.
|
||||
]
|
||||
\subfig[0.32]{results/Push-CompletionTime-Hand-Overall-Means}
|
||||
\subfig[0.32]{results/Push-ContactsCount-Hand-Overall-Means}
|
||||
\subfig[0.32]{results/Push-MeanContactTime-Hand-Overall-Means}
|
||||
\end{subfigs}
|
||||
67
4-manipulation/visual-hand/3-2-grasp.tex
Normal file
@@ -0,0 +1,67 @@
|
||||
\subsection{Grasp Task}
|
||||
\label{grasp}
|
||||
|
||||
\paragraph{Completion Time}
|
||||
|
||||
On the time to complete a trial, there was one statistically significant effect %
|
||||
of \factor{Target} (\anova{7}{2868}{37.2}, \pinf{0.001}) %
|
||||
but not of \factor{Hand} (\anova{5}{2868}{1.8}, \p{0.1}, see \figref{results/Grasp-CompletionTime-Hand-Overall-Means}).
|
||||
Targets on the back and the left (\level{B}, \level{LB}, and \level{L}) were slower than targets on the front (\level{LF}, \level{F}, and \level{RF}, \p{0.003}) {except for} \level{RB} (back-right) which was also fast.
|
||||
|
||||
\paragraph{Contacts}
|
||||
|
||||
On the number of contacts, there were two statistically significant effects: %
|
||||
\factor{Hand} (\anova{5}{2868}{5.2}, \pinf{0.001}, see \figref{results/Grasp-ContactsCount-Hand-Overall-Means}) %
|
||||
and \factor{Target} (\anova{7}{2868}{21.2}, \pinf{0.001}).
|
||||
|
||||
Less contacts were made with \level{Tips} than with \level{None} (\percent{-13}, \p{0.02}) and \level{Occlusion} (\percent{-15}, \p{0.004});
|
||||
and less with \level{Mesh} than with \level{None} (\percent{-15}, \p{0.006}) and \level{Occlusion} (\percent{-17}, \p{0.001}).
|
||||
This result suggests that having no visible visual hand increased the number of failed grasps or cube drops.
|
||||
But, surprisingly, only \level{Tips} and \level{Mesh} were statistically significantly better, not \level{Contour} nor \level{Skeleton}.
|
||||
|
||||
Targets on the back and left were more difficult (\level{B}, \level{LB}, and \level{L}) than targets on the front (\level{LF}, \level{F}, and \level{RF}, \pinf{0.001}).
|
||||
|
||||
\paragraph{Time per Contact}
|
||||
|
||||
On the mean time spent on each contact, there were two statistically significant effects: %
|
||||
\factor{Hand} (\anova{5}{2868}{9.6}, \pinf{0.001}, see \figref{results/Grasp-MeanContactTime-Hand-Overall-Means}) %
|
||||
and \factor{Target} (\anova{7}{2868}{5.6}, \pinf{0.001}).
|
||||
|
||||
It was shorter with \level{None} than with \level{Tips} (\percent{-15}, \pinf{0.001}), \level{Skeleton} (\percent{-11}, \p{0.001}) and \level{Mesh} (\percent{-11}, \p{0.001});
|
||||
shorter with \level{Occlusion} than with \level{Tips} (\percent{-10}, \pinf{0.001}), \level{Skeleton} (\percent{-8}, \p{0.05}), and \level{Mesh} (\percent{-8}, \p{0.04});
|
||||
shorter with \level{Contour} than with \level{Tips} (\percent{-8}, \pinf{0.001}).
|
||||
As for the \level{Push} task, the lack of visual hand increased the number of failed grasps or cube drops.
|
||||
The \level{Tips} rendering seemed to provide one of the best feedback for the grasping, maybe thanks to the fact that it provides information about both position and rotation of the tracked fingertips.
|
||||
|
||||
This time was the shortest on the front \level{F} than on the other target volumes (\pinf{0.001}).
|
||||
|
||||
\paragraph{Grip Aperture}
|
||||
|
||||
On the average distance between the thumb's fingertip and the other fingertips during grasping, there were two
|
||||
statistically significant effects: %
|
||||
\factor{Hand} (\anova{5}{2868}{35.8}, \pinf{0.001}, see \figref{results/Grasp-GripAperture-Hand-Overall-Means}) %
|
||||
and \factor{Target} (\anova{7}{2868}{3.7}, \pinf{0.001}).
|
||||
|
||||
It was shorter with \level{None} than with \level{Occlusion} (\pinf{0.001}), \level{Tips} (\pinf{0.001}), \level{Contour} (\pinf{0.001}), \level{Skeleton} (\pinf{0.001}) and \level{Mesh} (\pinf{0.001});
|
||||
shorter with \level{Tips} than with \level{Occlusion} (\p{0.008}), \level{Contour} (\p{0.006}) and \level{Mesh} (\pinf{0.001});
|
||||
and shorter with \level{Skeleton} than with \level{Mesh} (\pinf{0.001}).
|
||||
This result is an evidence of the lack of confidence of participants with no visual hand rendering: they grasped the cube more to secure it.
|
||||
The \level{Mesh} rendering seemed to have provided the most confidence to participants, maybe because it was the closest to the real hand.
|
||||
|
||||
The \response{Grip Aperture} was longer on the right-front (\level{RF}) target volume, indicating a higher confidence, than on back and side targets (\level{R}, \level{RB}, \level{B}, \level{L}, \p{0.03}).
|
||||
|
||||
\begin{subfigs}{grasp_results}{Results of the grasp task performance metrics for each visual hand rendering.}[
|
||||
Geometric means with bootstrap \percent{95} \CI
|
||||
and Tukey's \HSD pairwise comparisons: *** is \pinf{0.001}, ** is \pinf{0.01}, and * is \pinf{0.05}.
|
||||
][
|
||||
\item Time to complete a trial.
|
||||
\item Number of contacts with the cube.
|
||||
\item Time spent on each contact.
|
||||
\item Distance between thumb and the other fingertips when grasping.
|
||||
]
|
||||
\subfig[0.4]{results/Grasp-CompletionTime-Hand-Overall-Means}
|
||||
\subfig[0.4]{results/Grasp-ContactsCount-Hand-Overall-Means}
|
||||
\par
|
||||
\subfig[0.4]{results/Grasp-MeanContactTime-Hand-Overall-Means}
|
||||
\subfig[0.4]{results/Grasp-GripAperture-Hand-Overall-Means}
|
||||
\end{subfigs}
|
||||
26
4-manipulation/visual-hand/3-3-ranks.tex
Normal file
@@ -0,0 +1,26 @@
|
||||
\subsection{Ranking}
|
||||
\label{ranks}
|
||||
|
||||
\figref{results_ranks} shows the ranking of each visual \factor{Hand} rendering for the \level{Push} and \level{Grasp} tasks.
|
||||
Friedman tests indicated that both ranking had statistically significant differences (\pinf{0.001}).
|
||||
Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment were then used on both ranking results (\secref{metrics}):
|
||||
|
||||
\begin{itemize}
|
||||
\item \response{Push task ranking}: \level{Occlusion} was ranked lower than \level{Contour} (\p{0.005}), \level{Skeleton} (\p{0.02}), and \level{Mesh} (\p{0.03});
|
||||
\level{Tips} was ranked lower than \level{Skeleton} (\p{0.02}).
|
||||
This good ranking of the \level{Skeleton} rendering for the Push task is consistent with the Push trial results.
|
||||
\item \response{Grasp task ranking}: \level{Occlusion} was ranked lower than \level{Contour} (\p{0.001}), \level{Skeleton} (\p{0.001}), and \level{Mesh} (\p{0.007});
|
||||
No Hand was ranked lower than \level{Skeleton} (\p{0.04}).
|
||||
A complete visual hand rendering seemed to be preferred over no visual hand rendering when grasping.
|
||||
\end{itemize}
|
||||
|
||||
\begin{subfigs}{results_ranks}{Boxplots of the ranking for each visual hand rendering.}[
|
||||
Lower is better.
|
||||
Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment: ** is \pinf{0.01} and * is \pinf{0.05}.
|
||||
][
|
||||
\item Push task ranking.
|
||||
\item Grasp task ranking.
|
||||
]
|
||||
\subfig[0.4]{results/Ranks-Push}
|
||||
\subfig[0.4]{results/Ranks-Grasp}
|
||||
\end{subfigs}
|
||||
35
4-manipulation/visual-hand/3-4-questions.tex
Normal file
@@ -0,0 +1,35 @@
|
||||
\subsection{Questionnaire}
|
||||
\label{questions}
|
||||
|
||||
\figref{results_questions} presents the questionnaire results for each visual hand rendering.
|
||||
Friedman tests indicated that all questions had statistically significant differences (\pinf{0.001}).
|
||||
Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment were then used each question results (\secref{metrics}):
|
||||
\begin{itemize}
|
||||
\item \response{Difficulty}: \level{Occlusion} was considered more difficult than \level{Contour} (\p{0.02}), \level{Skeleton} (\p{0.01}), and \level{Mesh} (\p{0.03}).
|
||||
\item \response{Fatigue}: \level{None} was found more fatiguing than \level{Mesh} (\p{0.04}); And \level{Occlusion} more than \level{Skeleton} (\p{0.02}) and \level{Mesh} (\p{0.02}).
|
||||
\item \response{Precision}: \level{None} was considered less precise than \level{Skeleton} (\p{0.02}) and \level{Mesh} (\p{0.02}); And \level{Occlusion} more than \level{Contour} (\p{0.02}), \level{Skeleton} (\p{0.006}), and \level{Mesh} (\p{0.02}).
|
||||
\item \response{Performance}: \level{Occlusion} was lower than \level{Contour} (\p{0.02}), \level{Skeleton} (\p{0.006}), and \level{Mesh} (\p{0.03}).
|
||||
\item \response{Efficiency}: \level{Occlusion} was found less efficient than \level{Contour} (\p{0.01}), \level{Skeleton} (\p{0.02}), and \level{Mesh} (\p{0.02}).
|
||||
\item \response{Rating}: \level{Occlusion} was rated lower than \level{Contour} (\p{0.02}) and \level{Skeleton} (\p{0.03}).
|
||||
\end{itemize}
|
||||
|
||||
In summary, \level{Occlusion} was worse than \level{Skeleton} for all questions, and worse than \level{Contour} and \level{Mesh} on 5 over 6 questions.
|
||||
Results of \response{Difficulty}, \response{Performance}, and \response{Precision} questions are consistent in that way.
|
||||
Moreover, having no visible visual \factor{Hand} rendering was felt by users fatiguing and less precise than having one.
|
||||
Surprisingly, no clear consensus was found on \response{Rating}.
|
||||
Each visual hand rendering, except for \level{Occlusion}, had simultaneously received the minimum and maximum possible notes.
|
||||
|
||||
\begin{subfigs}{results_questions}{Boxplots of the questionnaire results for each visual hand rendering.}[
|
||||
Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment: ** is \pinf{0.01} and * is \pinf{0.05}.
|
||||
Lower is better for \textbf{(a)} difficulty and \textbf{(b)} fatigue.
|
||||
Higher is better for \textbf{(d)} performance, \textbf{(d)} precision, \textbf{(e)} efficiency, and \textbf{(f)} rating.
|
||||
]
|
||||
\subfig[0.4]{results/Question-Difficulty}
|
||||
\subfig[0.4]{results/Question-Fatigue}
|
||||
\par
|
||||
\subfig[0.4]{results/Question-Precision}
|
||||
\subfig[0.4]{results/Question-Performance}
|
||||
\par
|
||||
\subfig[0.4]{results/Question-Efficiency}
|
||||
\subfig[0.4]{results/Question-Rating}
|
||||
\end{subfigs}
|
||||
32
4-manipulation/visual-hand/4-discussion.tex
Normal file
@@ -0,0 +1,32 @@
|
||||
\section{Discussion}
|
||||
\label{discussion}
|
||||
|
||||
We evaluated six visual hand renderings, as described in \secref{hands}, displayed on top of the real hand, in two virtual object manipulation tasks in \AR.
|
||||
|
||||
During the \level{Push} task, the \level{Skeleton} hand rendering was the fastest (\figref{results/Push-CompletionTime-Hand-Overall-Means}), as participants employed fewer and longer contacts to adjust the cube inside the target volume (\figref{results/Push-ContactsCount-Hand-Overall-Means} and \figref{results/Push-MeanContactTime-Hand-Overall-Means}).
|
||||
Participants consistently used few and continuous contacts for all visual hand renderings (Fig. 3b), with only less than ten trials, carried out by two participants, quickly completed with multiple discrete touches.
|
||||
However, during the \level{Grasp} task, despite no difference in \response{Completion Time}, providing no visible hand rendering (\level{None} and \level{Occlusion} renderings) led to more failed grasps or cube drops (\figref{results/Grasp-CompletionTime-Hand-Overall-Means} and \figref{results/Grasp-MeanContactTime-Hand-Overall-Means}).
|
||||
Indeed, participants found the \level{None} and \level{Occlusion} renderings less effective (\figref{results/Ranks-Grasp}) and less precise (\figref{results_questions}).
|
||||
To understand whether the participants' previous experience might have played a role, we also carried out an additional statistical analysis considering \VR experience as an additional between-subjects factor, \ie \VR novices vs. \VR experts (\enquote{I use it every week}, see \secref{participants}).
|
||||
We found no statistically significant differences when comparing the considered metrics between \VR novices and experts.
|
||||
|
||||
All visual hand renderings showed \response{Grip Apertures} close to the size of the virtual cube, except for the \level{None} rendering (\figref{results/Grasp-GripAperture-Hand-Overall-Means}), with which participants applied stronger grasps, \ie less distance between the fingertips.
|
||||
Having no visual hand rendering, but only the reaction of the cube to the interaction as feedback, made participants less confident in their grip.
|
||||
This result contrasts with the wrongly estimated grip apertures observed by \textcite{al-kalbani2016analysis} in an exocentric VST-AR setup.
|
||||
Also, while some participants found the absence of visual hand rendering more natural, many of them commented on the importance of having feedback on the tracking of their hands, as observed by \textcite{xiao2018mrtouch} in a similar immersive OST-AR setup.
|
||||
|
||||
Yet, participants' opinions of the visual hand renderings were mixed on many questions, except for the \level{Occlusion} one, which was perceived less effective than more \enquote{complete} visual hands such as \level{Contour}, \level{Skeleton}, and \level{Mesh} hands (\figref{results_questions}).
|
||||
However, due to the latency of the hand tracking and the visual hand reacting to the cube, almost all participants thought that the \level{Occlusion} rendering to be a \enquote{shadow} of the real hand on the cube.
|
||||
|
||||
The \level{Tips} rendering, which showed the contacts made on the virtual cube, was controversial as it received the minimum and the maximum score on every question.
|
||||
Many participants reported difficulties in seeing the orientation of the visual fingers,
|
||||
while others found that it gave them a better sense of the contact points and improved their concentration on the task.
|
||||
This result is consistent with \textcite{saito2021contact}, who found that displaying the points of contacts was beneficial for grasping a virtual object over an opaque visual hand overlay.
|
||||
|
||||
To summarize, when employing a visual hand rendering overlaying the real hand, participants were more performant and confident in manipulating virtual objects with bare hands in \AR.
|
||||
These results contrast with similar manipulation studies, but in non-immersive, on-screen \AR, where the presence of a visual hand rendering was found by participants to improve the usability of the interaction, but not their performance \cite{blaga2017usability,maisto2017evaluation,meli2018combining}.
|
||||
Our results show the most effective visual hand rendering to be the \level{Skeleton} one.
|
||||
Participants appreciated that it provided a detailed and precise view of the tracking of the real hand, without hiding or masking it.
|
||||
Although the \level{Contour} and \level{Mesh} hand renderings were also highly rated, some participants felt that they were too visible and masked the real hand.
|
||||
This result is in line with the results of virtual object manipulation in \VR of \textcite{prachyabrued2014visual}, who found that the most effective visual hand rendering was a double representation of both the real tracked hand and a visual hand physically constrained by the \VE.
|
||||
This type of \level{Skeleton} rendering was also the one that provided the best sense of agency (control) in \VR \cite{argelaguet2016role,schwind2018touch}.
|
||||
23
4-manipulation/visual-hand/5-conclusion.tex
Normal file
@@ -0,0 +1,23 @@
|
||||
\section{Conclusion}
|
||||
\label{conclusion}
|
||||
|
||||
In this chapter, we addressed the challenge of touching, grasping and manipulating virtual objects directly with the hand in immersive \OST-\AR by providing and evaluating visual renderings as augmentation of the real hand.
|
||||
Superimposed on the user's hand, these visual renderings provide feedback from the virtual hand, which tracks the real hand, and simulates the interaction with virtual objects as a proxy.
|
||||
We first selected and compared the six most popular visual hand renderings used to interact with virtual objects in \AR.
|
||||
Then, in a user study with 24 participants and an immersive \OST-\AR headset, we evaluated the effect of these six visual hand renderings on the user performance and experience in two representative manipulation tasks.
|
||||
|
||||
Our results showed that a visual hand augmentation improved the performance, perceived effectiveness and confidence of participants compared to no augmentation.
|
||||
A skeleton rendering, which provided a detailed view of the tracked joints and phalanges while not hiding the real hand, was the most performant and effective.
|
||||
The contour and mesh renderings were found to mask the real hand, while the tips rendering was controversial.
|
||||
The occlusion rendering had too much tracking latency to be effective.
|
||||
This is consistent with similar manipulation studies in \VR and in non-immersive \VST-\AR setups.
|
||||
|
||||
This study suggests that a \ThreeD visual hand augmentation is important in \AR when interacting with a virtual hand technique, particularly when it involves precise finger movements in relation to virtual content, \eg \ThreeD windows, buttons and sliders, or more complex tasks, such as stacking or assembly.
|
||||
A minimal but detailed rendering of the virtual hand that does not hide the real hand, such as the skeleton rendering we evaluated, seems to be the best compromise between the richness and effectiveness of the feedback.
|
||||
%Still, users should be able to choose and adapt the visual hand rendering to their preferences and needs.
|
||||
|
||||
\noindentskip This work was published in Transactions on Haptics:
|
||||
|
||||
Erwan Normand, Claudio Pacchierotti, Eric Marchand, and Maud Marchal.
|
||||
\enquote{Visuo-Haptic Rendering of the Hand during 3D Manipulation in Augmented Reality}.
|
||||
In: \textit{IEEE Transactions on Haptics}. 27.4 (2024), pp. 2481--2487.
|
||||
BIN
4-manipulation/visual-hand/figures/method/hands-contour.jpg
Normal file
|
After Width: | Height: | Size: 455 KiB |
BIN
4-manipulation/visual-hand/figures/method/hands-contour.pdf
Normal file
BIN
4-manipulation/visual-hand/figures/method/hands-mesh.jpg
Normal file
|
After Width: | Height: | Size: 400 KiB |
BIN
4-manipulation/visual-hand/figures/method/hands-mesh.pdf
Normal file
BIN
4-manipulation/visual-hand/figures/method/hands-none.jpg
Normal file
|
After Width: | Height: | Size: 426 KiB |
BIN
4-manipulation/visual-hand/figures/method/hands-none.pdf
Normal file
BIN
4-manipulation/visual-hand/figures/method/hands-occlusion.jpg
Normal file
|
After Width: | Height: | Size: 406 KiB |
BIN
4-manipulation/visual-hand/figures/method/hands-occlusion.pdf
Normal file
BIN
4-manipulation/visual-hand/figures/method/hands-skeleton.jpg
Normal file
|
After Width: | Height: | Size: 427 KiB |
BIN
4-manipulation/visual-hand/figures/method/hands-skeleton.pdf
Normal file
BIN
4-manipulation/visual-hand/figures/method/hands-tips.jpg
Normal file
|
After Width: | Height: | Size: 423 KiB |
BIN
4-manipulation/visual-hand/figures/method/hands-tips.pdf
Normal file
BIN
4-manipulation/visual-hand/figures/method/task-grasp.jpg
Normal file
|
After Width: | Height: | Size: 1.3 MiB |
BIN
4-manipulation/visual-hand/figures/method/task-grasp.odp
Normal file
BIN
4-manipulation/visual-hand/figures/method/task-grasp.pdf
Normal file
BIN
4-manipulation/visual-hand/figures/method/task-push.jpg
Normal file
|
After Width: | Height: | Size: 1.2 MiB |
BIN
4-manipulation/visual-hand/figures/method/task-push.odp
Normal file
BIN
4-manipulation/visual-hand/figures/method/task-push.pdf
Normal file
BIN
4-manipulation/visual-hand/figures/results/Question-Fatigue.pdf
Normal file
BIN
4-manipulation/visual-hand/figures/results/Question-Rating.pdf
Normal file
BIN
4-manipulation/visual-hand/figures/results/Ranks-Grasp.pdf
Normal file
BIN
4-manipulation/visual-hand/figures/results/Ranks-Push.pdf
Normal file
14
4-manipulation/visual-hand/visual-hand.tex
Normal file
@@ -0,0 +1,14 @@
|
||||
\chapter{Visual Augmentation of the Hand for Manipulating Virtual Objects in AR}
|
||||
\mainlabel{visual_hand}
|
||||
|
||||
\chaptertoc
|
||||
|
||||
\input{1-introduction}
|
||||
\input{2-method}
|
||||
\input{3-0-results}
|
||||
\input{3-1-push}
|
||||
\input{3-2-grasp}
|
||||
\input{3-3-ranks}
|
||||
\input{3-4-questions}
|
||||
\input{4-discussion}
|
||||
\input{5-conclusion}
|
||||