Fix visual-hand chapter
This commit is contained in:
@@ -37,25 +37,28 @@ In this paper, we investigate the role of the visuo-haptic rendering of the hand
|
||||
We consider two representative manipulation tasks: push-and-slide and grasp-and-place a virtual object.
|
||||
%
|
||||
The main contributions of this work are:
|
||||
%
|
||||
\begin{itemize}
|
||||
\item a first human subject experiment evaluating the performance and user experience of six visual hand renderings superimposed on the real hand;
|
||||
\item A comparison from the literature of the six most common visual hand renderings used in \AR.
|
||||
\item A user study evaluating with 24 participants the performance and user experience of the six visual hand renderings superimposed on the real hand during free and direct hand manipulation of \VOs in \OST-\AR.
|
||||
\end{itemize}
|
||||
|
||||
\begin{subfigs}{hands}{The six visual hand renderings}[
|
||||
Depicted as seen by the user through the \AR headset during the two-finger grasping of a virtual cube.
|
||||
In the next sections, we first present the six visual hand renderings considered in this study and gathered from the literature. We then describe the experimental setup and design, the two manipulation tasks, and the metrics used. We present the results of the user study and discuss the implications of these results for the manipulation of \VOs directly with the hand in \AR.
|
||||
|
||||
\begin{subfigs}{hands}{The six visual hand renderings.}[
|
||||
As seen by the user through the \AR headset during the two-finger grasping of a virtual cube.
|
||||
][
|
||||
\item No visual rendering \emph{(None)}.
|
||||
\item Cropped virtual content to enable hand-cube occlusion \emph{(Occlusion, Occl)}.
|
||||
\item Rings on the fingertips \emph{(Tips)}.
|
||||
\item Thin outline of the hand \emph{(Contour, Cont)}.
|
||||
\item Fingers' joints and phalanges \emph{(Skeleton, Skel)}.
|
||||
\item Semi-transparent 3D hand model \emph{(Mesh)}.
|
||||
\item No visual rendering \level{(None)}.
|
||||
\item Cropped virtual content to enable hand-cube occlusion \level{(Occlusion, Occl)}.
|
||||
\item Rings on the fingertips \level{(Tips)}.
|
||||
\item Thin outline of the hand \level{(Contour, Cont)}.
|
||||
\item Fingers' joints and phalanges \level{(Skeleton, Skel)}.
|
||||
\item Semi-transparent 3D hand model \level{(Mesh)}.
|
||||
]
|
||||
\subfig[0.15]{method/hands-none}
|
||||
\subfig[0.15]{method/hands-occlusion}
|
||||
\subfig[0.15]{method/hands-tips}
|
||||
\subfig[0.15]{method/hands-contour}
|
||||
\subfig[0.15]{method/hands-skeleton}
|
||||
\subfig[0.15]{method/hands-mesh}
|
||||
\subfig[0.22]{method/hands-none}
|
||||
\subfig[0.22]{method/hands-occlusion}
|
||||
\subfig[0.22]{method/hands-tips}
|
||||
\par
|
||||
\subfig[0.22]{method/hands-contour}
|
||||
\subfig[0.22]{method/hands-skeleton}
|
||||
\subfig[0.22]{method/hands-mesh}
|
||||
\end{subfigs}
|
||||
|
||||
@@ -1,218 +1,166 @@
|
||||
\section{Visual Hand Renderings}
|
||||
\label{hands}
|
||||
|
||||
We compared a set of the most popular visual hand renderings, as found in the literature \secref[related_work]{ar_visual_hands}.
|
||||
Since we address hand-centered manipulation tasks, we only considered renderings including the fingertips (\secref[related_work]{grasp_types}).
|
||||
Moreover, as to keep the focus on the hand rendering itself, we used neutral semi-transparent grey meshes, consistent with the choices made in \cite{yoon2020evaluating,vanveldhuizen2021effect}.
|
||||
All considered hand renderings are drawn following the tracked pose of the user's real hand.
|
||||
However, while the real hand can of course penetrate virtual objects, the visual hand is always constrained by the virtual environment (\secref[related_work]{ar_virtual_hands}).
|
||||
|
||||
They are shown in \figref{hands} and described below, with an abbreviation in parentheses when needed.
|
||||
|
||||
\paragraph{None}
|
||||
|
||||
As a reference, we considered no visual hand rendering (\figref{method/hands-none}), as is common in \AR \cite{hettiarachchi2016annexing,blaga2017usability,xiao2018mrtouch,teng2021touch}.
|
||||
Users have no information about hand tracking and no feedback about contact with the virtual objects, other than their movement when touched.
|
||||
As virtual content is rendered on top of the real environment, the hand of the user can be hidden by the virtual objects when manipulating them (\secref[related_work]{ar_displays}).
|
||||
|
||||
\paragraph{Occlusion (Occl)}
|
||||
|
||||
To avoid the abovementioned undesired occlusions due to the virtual content being rendered on top of the real environment, we can carefully crop the former whenever it hides real content that should be visible \cite{macedo2023occlusion}, \eg the thumb of the user in \figref{method/hands-occlusion}.
|
||||
This approach is frequent in works using \VST-\AR headsets \cite{knorlein2009influence,ha2014wearhand,piumsomboon2014graspshell,suzuki2014grasping,al-kalbani2016analysis}.
|
||||
|
||||
\paragraph{Tips}
|
||||
|
||||
This rendering shows small visual rings around the fingertips of the user (\figref{method/hands-tips}), highlighting the most important parts of the hand and contact with virtual objects during fine manipulation (\secref[related_work]{grasp_types}).
|
||||
Unlike work using small spheres \cite{maisto2017evaluation,meli2014wearable,grubert2018effects,normand2018enlarging,schwind2018touch}, this ring rendering also provides information about the orientation of the fingertips.
|
||||
|
||||
\paragraph{Contour (Cont)}
|
||||
|
||||
This rendering is a \qty{1}{\mm} thick outline contouring the user's hands, providing information about the whole hand while leaving its inside visible.
|
||||
Unlike the other renderings, it is not occluded by the virtual objects, as shown in \figref{method/hands-contour}.
|
||||
This rendering is not as usual as the previous others in the literature \cite{kang2020comparative}.
|
||||
|
||||
\paragraph{Skeleton (Skel)}
|
||||
|
||||
This rendering schematically renders the joints and phalanges of the fingers with small spheres and cylinders, respectively, leaving the outside of the hand visible (\figref{method/hands-skeleton}).
|
||||
It can be seen as an extension of the Tips rendering to include the complete fingers articulations.
|
||||
It is widely used in \VR \cite{argelaguet2016role,schwind2018touch,chessa2019grasping} and \AR \cite{blaga2017usability,yoon2020evaluating}, as it is considered simple yet rich and comprehensive.
|
||||
|
||||
\paragraph{Mesh}
|
||||
|
||||
This rendering is a 3D semi-transparent ($a=0.2$) hand model (\figref{method/hands-mesh}), which is common in \VR \cite{prachyabrued2014visual,argelaguet2016role,schwind2018touch,chessa2019grasping,yoon2020evaluating,vanveldhuizen2021effect}.
|
||||
It can be seen as a filled version of the Contour hand rendering, thus partially covering the view of the real hand.
|
||||
|
||||
\section{User Study}
|
||||
\label{method}
|
||||
|
||||
This first experiment aims to analyze whether the chosen visual hand rendering affects the performance and user experience of manipulating virtual objects with bare hands in \AR.
|
||||
|
||||
\subsection{Visual Hand Renderings}
|
||||
\label{hands}
|
||||
|
||||
We compared a set of the most popular visual hand renderings.%, as also presented in \secref{hands}.
|
||||
%
|
||||
Since we address hand-centered manipulation tasks, we only considered renderings including the fingertips.
|
||||
%
|
||||
Moreover, as to keep the focus on the hand rendering itself, we used neutral semi-transparent grey meshes, consistent with the choices made in \cite{yoon2020evaluating, vanveldhuizen2021effect}.
|
||||
%
|
||||
All considered hand renderings are drawn following the tracked pose of the user's real hand.
|
||||
%
|
||||
However, while the real hand can of course penetrate virtual objects, the visual hand is always constrained by the virtual environment.
|
||||
|
||||
\subsubsection{None~(\figref{method/hands-none})}
|
||||
\label{hands_none}
|
||||
|
||||
As a reference, we considered no visual hand rendering, as is common in \AR \cite{hettiarachchi2016annexing, blaga2017usability, xiao2018mrtouch, teng2021touch}.
|
||||
%
|
||||
Users have no information about hand tracking and no feedback about contact with the virtual objects, other than their movement when touched.
|
||||
%
|
||||
As virtual content is rendered on top of the real environment, the hand of the user can be hidden by the virtual objects when manipulating them (\secref{hands}).
|
||||
|
||||
\subsubsection{Occlusion (Occl,~\figref{method/hands-occlusion})}
|
||||
\label{hands_occlusion}
|
||||
|
||||
To avoid the abovementioned undesired occlusions due to the virtual content being rendered on top of the real environment, we can carefully crop the former whenever it hides real content that should be visible \cite{macedo2023occlusion}, \eg the thumb of the user in \figref{method/hands-occlusion}.
|
||||
%
|
||||
This approach is frequent in works using VST-AR headsets \cite{knorlein2009influence, ha2014wearhand, piumsomboon2014graspshell, suzuki2014grasping, al-kalbani2016analysis}.
|
||||
|
||||
\subsubsection{Tips (\figref{method/hands-tips})}
|
||||
\label{hands_tips}
|
||||
|
||||
This rendering shows small visual rings around the fingertips of the user, highlighting the most important parts of the hand and contact with virtual objects during fine manipulation.
|
||||
%
|
||||
Unlike work using small spheres \cite{maisto2017evaluation, meli2014wearable, grubert2018effects, normand2018enlarging, schwind2018touch}, this ring rendering also provides information about the orientation of the fingertips.
|
||||
|
||||
\subsubsection{Contour (Cont,~\figref{method/hands-contour})}
|
||||
\label{hands_contour}
|
||||
|
||||
This rendering is a {1-mm-thick} outline contouring the user's hands, providing information about the whole hand while leaving its inside visible.
|
||||
%
|
||||
Unlike the other renderings, it is not occluded by the virtual objects, as shown in \figref{method/hands-contour}.
|
||||
%
|
||||
This rendering is not as usual as the previous others in the literature \cite{kang2020comparative}.
|
||||
|
||||
\subsubsection{Skeleton (Skel,~\figref{method/hands-skeleton})}
|
||||
\label{hands_skeleton}
|
||||
|
||||
This rendering schematically renders the joints and phalanges of the fingers with small spheres and cylinders, respectively, leaving the outside of the hand visible.
|
||||
%
|
||||
It can be seen as an extension of the Tips rendering to include the complete fingers articulations.
|
||||
%
|
||||
It is widely used in \VR \cite{argelaguet2016role, schwind2018touch, chessa2019grasping} and \AR \cite{blaga2017usability, yoon2020evaluating}, as it is considered simple yet rich and comprehensive.
|
||||
|
||||
\subsubsection{Mesh (\figref{method/hands-mesh})}
|
||||
\label{hands_mesh}
|
||||
|
||||
This rendering is a 3D semi-transparent ($a=0.2$) hand model, which is common in \VR \cite{prachyabrued2014visual, argelaguet2016role, schwind2018touch, chessa2019grasping, yoon2020evaluating, vanveldhuizen2021effect}.
|
||||
%
|
||||
It can be seen as a filled version of the Contour hand rendering, thus partially covering the view of the real hand.
|
||||
We aim to investigate whether the chosen visual hand rendering affects the performance and user experience of manipulating virtual objects with free hands in \AR.
|
||||
|
||||
\subsection{Manipulation Tasks and Virtual Scene}
|
||||
\label{tasks}
|
||||
|
||||
\begin{subfigs}{tasks}{The two manipulation tasks of the user study. }[
|
||||
The cube to manipulate is in the middle of the table (5-cm-edge and opaque) and the eight possible targets to reach are arround (7-cm-edge volume and semi-transparent).
|
||||
Only one target at a time was shown during the experiments.
|
||||
][
|
||||
\item Push task: pushing a virtual cube along a table towards a target placed on the same surface.
|
||||
\item Grasp task: grasping and lifting a virtual cube towards a target placed on a 20-cm-higher plane.
|
||||
]
|
||||
\subfig[0.23]{method/task-push}
|
||||
\subfig[0.23]{method/task-grasp}
|
||||
\end{subfigs}
|
||||
|
||||
Following the guidelines of \textcite{bergstrom2021how} for designing object manipulation tasks, we considered two variations of a 3D pick-and-place task, commonly found in interaction and manipulation studies \cite{prachyabrued2014visual, maisto2017evaluation, meli2018combining, blaga2017usability, vanveldhuizen2021effect}.
|
||||
Following the guidelines of \textcite{bergstrom2021how} for designing object manipulation tasks, we considered two variations of a 3D pick-and-place task, commonly found in interaction and manipulation studies \cite{prachyabrued2014visual,maisto2017evaluation,meli2018combining,blaga2017usability,vanveldhuizen2021effect}.
|
||||
|
||||
\subsubsection{Push Task}
|
||||
\label{push-task}
|
||||
|
||||
The first manipulation task consists in pushing a virtual object along a real flat surface towards a target placed on the same plane (\figref{method/task-push}).
|
||||
%
|
||||
The virtual object to manipulate is a small \qty{50}{\mm} blue and opaque cube, while the target is a (slightly) bigger \qty{70}{\mm} blue and semi-transparent volume.
|
||||
%
|
||||
At every repetition of the task, the cube to manipulate always spawns at the same place, on top of a real table in front of the user.
|
||||
%
|
||||
On the other hand, the target volume can spawn in eight different locations on the same table, located on a \qty{20}{\cm} radius circle centered on the cube, at \qty{45}{\degree} from each other (again \figref{method/task-push}).
|
||||
%
|
||||
On the other hand, the target volume can spawn in eight different locations on the same table, located on a \qty{20}{\cm} radius circle centred on the cube, at \qty{45}{\degree} from each other (again \figref{method/task-push}).
|
||||
Users are asked to push the cube towards the target volume using their fingertips in any way they prefer.
|
||||
%
|
||||
In this task, the cube cannot be lifted.
|
||||
%
|
||||
The task is considered completed when the cube is \emph{fully} inside the target volume.
|
||||
|
||||
\subsubsection{Grasp Task}
|
||||
\label{grasp-task}
|
||||
|
||||
The second manipulation task consists in grasping, lifting, and placing a virtual object in a target placed on a different (higher) plane (\figref{method/task-grasp}).
|
||||
%
|
||||
The cube to manipulate and target volume are the same as in the previous task. However, this time, the target volume can spawn in eight different locations on a plane \qty{10}{\cm} \emph{above} the table, still located on a \qty{20}{\cm} radius circle at \qty{45}{\degree} from each other.
|
||||
%
|
||||
The cube to manipulate and target volume are the same as in the previous task.
|
||||
However, this time, the target volume can spawn in eight different locations on a plane \qty{10}{\cm} \emph{above} the table, still located on a \qty{20}{\cm} radius circle at \qty{45}{\degree} from each other.
|
||||
Users are asked to grasp, lift, and move the cube towards the target volume using their fingertips in any way they prefer.
|
||||
%
|
||||
As before, the task is considered completed when the cube is \emph{fully} inside the volume.
|
||||
|
||||
\begin{subfigs}{tasks}{The two manipulation tasks wof the user study. }[
|
||||
The cube to manipulate is in the middle of the table (5-cm-edge and opaque) and the eight possible targets to reach are arround (7-cm-edge volume and semi-transparent).
|
||||
Only one target at a time was shown during the experiments.
|
||||
][
|
||||
\item Push task: pushing the virtual cube along a table towards a target placed on the same surface.
|
||||
\item Grasp task: grasping and lifting the virtual cube towards a target placed on a \qty{20}{\cm} higher plane.
|
||||
]
|
||||
\subfig[0.4]{method/task-push}
|
||||
\subfig[0.4]{method/task-grasp}
|
||||
\end{subfigs}
|
||||
|
||||
\subsection{Experimental Design}
|
||||
\label{design}
|
||||
|
||||
We analyzed the two tasks separately. For each of them, we considered two independent, within-subject, variables:
|
||||
%
|
||||
We analyzed the two tasks separately.
|
||||
For each of them, we considered two independent, within-subject, variables:
|
||||
\begin{itemize}
|
||||
\item \emph{Visual Hand Renderings}, consisting of the six possible renderings discussed in \secref{hands}: None, Occlusion (Occl), Tips, Contour (Cont), Skeleton (Skel), and Mesh.
|
||||
\item \emph{Target}, consisting of the eight possible {location} of the target volume, named as the cardinal points and as shown in \figref{tasks}: {E, NE, N, NW, W, SW, S, and SE}.
|
||||
\item \factor{Hand}, consisting of the six possible visual hand renderings discussed in \secref{hands}: \level{None}, \level{Occlusion} (Occl), \level{Tips}, \level{Contour} (Cont), \level{Skeleton} (Skel), and \level{Mesh}.
|
||||
\item \factor{Target}, consisting of the eight possible location of the target volume, named as the cardinal points and as shown in \figref{tasks}: right (\level{R}), right-back (\level{RB}), back (\level{B}), left-back (\level{LB}), left (\level{L}), left-front (\level{LF}), front (\level{F}) and right-front (\level{RF}).
|
||||
|
||||
\end{itemize}
|
||||
%
|
||||
Each condition was repeated three times.
|
||||
%
|
||||
To control learning effects, we counter-balanced the orders of the two manipulation tasks and visual hand renderings following a 6 \x 6 Latin square, leading to six blocks where the position of the target volume was in turn randomized.
|
||||
%
|
||||
This design led to a total of 2 manipulation tasks \x 6 visual hand renderings \x 8 targets \x 3 repetitions $=$ 288 trials per participant.
|
||||
|
||||
\subsection{Apparatus and Implementation}
|
||||
\label{apparatus}
|
||||
|
||||
We used the OST-AR headset HoloLens~2.
|
||||
%
|
||||
It is capable of rendering virtual content within an horizontal field of view of \qty{43}{\degree} and a vertical one of \qty{29}{\degree}. It is also able to track the environment as well as the user's fingers.
|
||||
%
|
||||
We used the \OST-\AR headset HoloLens~2, as described in \secref[vhar_system]{virtual_real_alignment}.
|
||||
%It is capable of rendering virtual content within an horizontal field of view of \qty{43}{\degree} and a vertical one of \qty{29}{\degree}. It is also able to track the environment as well as the user's fingers.
|
||||
It is also able to track the user's fingers.
|
||||
We measured the latency of the hand tracking at \qty{15}{\ms}, independent of the hand movement speed.
|
||||
|
||||
The implementation of our experiment was done in C\# using Unity 2022.1, PhysX 4.1, and the Mixed Reality Toolkit (MRTK) 2.8\footnoteurl{https://learn.microsoft.com/windows/mixed-reality/mrtk-unity}.
|
||||
%
|
||||
The compiled application ran directly on the HoloLens~2 at \qty{60}{FPS}.
|
||||
|
||||
The default 3D hand model from MRTK was used for all visual hand renderings.
|
||||
%
|
||||
By changing the material properties of this hand model, we were able to achieve the six renderings shown in \figref{hands}.
|
||||
%
|
||||
A calibration was performed for every participant, so as to best adapt the size of the visual hand rendering to their real hand.
|
||||
%
|
||||
A calibration was performed for every participant, to best adapt the size of the visual hand rendering to their real hand.
|
||||
A set of empirical tests enabled us to choose the best rendering characteristics in terms of transparency and brightness for the virtual objects and hand renderings, which were applied throughout the experiment.
|
||||
|
||||
The hand tracking information provided by MRTK was used to construct a virtual articulated physics-enabled hand using PhysX.
|
||||
%
|
||||
The hand tracking information provided by MRTK was used to construct a virtual articulated physics-enabled hand (\secref[related_work]{ar_virtual_hands}) using PhysX.
|
||||
It featured 25 DoFs, including the fingers proximal, middle, and distal phalanges.
|
||||
%
|
||||
To allow effective (and stable) physical interactions between the hand and the virtual cube to manipulate, we implemented an approach similar to that of \textcite{borst2006spring}, where a series of virtual springs with high stiffness are used to couple the physics-enabled hand with the tracked hand.
|
||||
%
|
||||
As before, a set of empirical tests have been used to select the most effective physical characteristics in terms of mass, elastic constant, friction, damping, colliders size, and shape for the (tracked) virtual hand interaction model.
|
||||
|
||||
The room where the experiment was held had no windows, with one light source of \qty{800}{\lumen} placed \qty{70}{\cm} above the table.
|
||||
%
|
||||
This setup enabled a good and consistent tracking of the user's fingers.
|
||||
|
||||
\subsection{Protocol}
|
||||
\label{protocol}
|
||||
|
||||
First, participants were given a consent form that briefed them about the tasks and the protocol of the experiment.
|
||||
%
|
||||
Then, participants were asked to comfortably sit in front of a table and wear the HoloLens~2 headset as shown in~\figref{tasks}, perform the calibration of the visual hand size as described in~\secref{apparatus}, and complete a 2-minutes training to familiarize with the \AR rendering and the two considered tasks.
|
||||
%
|
||||
Then, participants were asked to comfortably sit in front of a table and wear the HoloLens~2 headset as shown in~\figref{tasks}, perform the calibration of the visual hand size as described in~\secref{apparatus}, and complete a \qty{2}{min} training to familiarize with the \AR rendering and the two considered tasks.
|
||||
During this training, we did not use any of the six hand renderings we want to test, but rather a fully-opaque white hand rendering that completely occluded the real hand of the user.
|
||||
|
||||
Participants were asked to carry out the two tasks as naturally and as fast as possible.
|
||||
%
|
||||
Similarly to \cite{prachyabrued2014visual, maisto2017evaluation, blaga2017usability, vanveldhuizen2021effect}, we only allowed the use of the dominant hand.
|
||||
%
|
||||
The experiment took around 1 hour and 20 minutes to complete.
|
||||
|
||||
\subsection{Participants}
|
||||
\label{participants}
|
||||
|
||||
Twenty-four subjects participated in the study (eight aged between 18 and 24, fourteen aged between 25 and 34, and two aged between 35 and 44; 22~males, 1~female, 1~preferred not to say).
|
||||
%
|
||||
None of the participants reported any deficiencies in their visual perception abilities.
|
||||
%
|
||||
Two subjects were left-handed, while the twenty-two other were right-handed; they all used their dominant hand during the trials.
|
||||
%
|
||||
Ten subjects had significant experience with \VR (\enquote{I use it every week}), while the fourteen other reported little to no experience with \VR.
|
||||
%
|
||||
Two subjects had significant experience with \AR (\enquote{I use it every week}), while the twenty-two other reported little to no experience with \AR.
|
||||
%
|
||||
Participants signed an informed consent, including the declaration of having no conflict of interest.
|
||||
|
||||
\subsection{Collected Data}
|
||||
\label{metrics}
|
||||
|
||||
Inspired by \textcite{laviolajr20173d}, we collected the following metrics during the experiment.
|
||||
%
|
||||
(i) The task \emph{Completion Time}, defined as the time elapsed between the very first contact with the virtual cube and its correct placement inside the target volume; as subjects were asked to complete the tasks as fast as possible, lower completion times mean better performance.
|
||||
%
|
||||
(ii) The number of \emph{Contacts}, defined as the number of separate times the user's hand makes contact with the virtual cube; in both tasks, a lower number of contacts means a smoother continuous interaction with the object.
|
||||
%
|
||||
Finally, (iii) the mean \emph{Time per Contact}, defined as the total time any part of the user's hand contacted the cube divided by the number of contacts; higher values mean that the user interacted with the object for longer non-interrupted periods of time.
|
||||
%
|
||||
Solely for the grasp-and-place task, we also measured the (iv) \emph{Grip Aperture}, defined as the average distance between the thumb's fingertip and the other fingertips during the grasping of the cube;
|
||||
%
|
||||
lower values indicate a greater finger interpenetration with the cube, resulting in a greater discrepancy between the real hand and the visual hand rendering constrained to the cube surfaces and showing how confident users are in their grasp \cite{prachyabrued2014visual, al-kalbani2016analysis, blaga2017usability, chessa2019grasping}.
|
||||
%
|
||||
Inspired by \textcite{laviolajr20173d}, we collected the following metrics during the experiment:
|
||||
\begin{itemize}
|
||||
\item \response{Completion Time}, defined as the time elapsed between the very first contact with the virtual cube and its correct placement inside the target volume; as subjects were asked to complete the tasks as fast as possible, lower completion times mean better performance.
|
||||
\item \response{Contacts}, defined as the number of separate times the user's hand makes contact with the virtual cube; in both tasks, a lower number of contacts means a smoother continuous interaction with the object.
|
||||
\item \response{Time per Contact}, defined as the total time any part of the user's hand contacted the cube divided by the number of contacts; higher values mean that the user interacted with the object for longer non-interrupted periods of time.
|
||||
\item \response{Grip Aperture} (solely for the grasp-and-place task), defined as the average distance between the thumb's fingertip and the other fingertips during the grasping of the cube; lower values indicate a greater finger interpenetration with the cube, resulting in a greater discrepancy between the real hand and the visual hand rendering constrained to the cube surfaces and showing how confident users are in their grasp \cite{prachyabrued2014visual, al-kalbani2016analysis, blaga2017usability, chessa2019grasping}.
|
||||
\end{itemize}
|
||||
Taken together, these measures provide an overview of the performance and usability of each of the visual hand renderings tested, as we hypothesized that they should influence the behavior and effectiveness of the participants.
|
||||
|
||||
At the end of each task, participants were asked to rank the visual hand renderings according to their preference with respect to the considered task.
|
||||
%
|
||||
Participants also rated each visual hand rendering individually on six questions using a 7-item Likert scale (1=Not at all, 7=Extremely):
|
||||
%
|
||||
\emph{(Difficulty)} How difficult were the tasks? %
|
||||
\emph{(Fatigue)} How fatiguing (mentally and physically) were the tasks? %
|
||||
\emph{(Precision)} How precise were you in performing the tasks? %
|
||||
\emph{(Efficiency)} How fast/efficient do you think you were in performing the tasks? %
|
||||
\emph{({Rating})} How much do you like each visual hand?
|
||||
\begin{itemize}
|
||||
\item \response{Difficulty}: How difficult were the tasks?
|
||||
\item \response{Fatigue}: How fatiguing (mentally and physically) were the tasks?
|
||||
\item \response{Precision}: How precise were you in performing the tasks?
|
||||
\item \response{Efficiency}: How fast/efficient do you think you were in performing the tasks?
|
||||
\item \response{Rating}: How much do you like each visual hand?
|
||||
\end{itemize}
|
||||
|
||||
Finally, participants were encouraged to comment out loud on the conditions throughout the experiment, as well as in an open-ended question at its end, so as to gather additional qualitative information.
|
||||
Finally, participants were encouraged to comment out loud on the conditions throughout the experiment, as well as in an open-ended question at its end, to gather additional qualitative information.
|
||||
|
||||
14
3-manipulation/visual-hand/3-0-results.tex
Normal file
14
3-manipulation/visual-hand/3-0-results.tex
Normal file
@@ -0,0 +1,14 @@
|
||||
\section{Results}
|
||||
\label{results}
|
||||
|
||||
Results of each trial metrics were analyzed with an \ANOVA on a \LMM model, with the order of the two manipulation tasks and the six visual hand renderings (\factor{Order}), the visual hand renderings (\factor{Hand}), the target volume position (\factor{Target}), and their interactions as fixed effects and the \factor{Participant} as random intercept.
|
||||
%
|
||||
For every \LMM, residuals were tested with a Q-Q plot to confirm normality.
|
||||
%
|
||||
On statistically significant effects, estimated marginal means of the \LMM were compared pairwise using Tukey's \HSD test.
|
||||
%
|
||||
Only significant results were reported.
|
||||
|
||||
Because \response{Completion Time}, \response{Contacts}, and \response{Time per Contact} measure results were Gamma distributed, they were first transformed with a log to approximate a normal distribution.
|
||||
%
|
||||
Their analysis results are reported anti-logged, corresponding to geometric means of the measures.
|
||||
@@ -5,49 +5,51 @@
|
||||
\label{push_tct}
|
||||
|
||||
On the time to complete a trial, there were two statistically significant effects: %
|
||||
Hand (\anova{5}{2868}{24.8}, \pinf{0.001}, see \figref{results/Push-ContactsCount-Hand-Overall-Means}) %
|
||||
and Target (\anova{7}{2868}{5.9}, \pinf{0.001}).
|
||||
%
|
||||
Skeleton was the fastest, more than None (\qty{+18}{\%}, \p{0.005}), Occlusion (\qty{+26}{\%}, \pinf{0.001}), Tips (\qty{+22}{\%}, \pinf{0.001}), and Contour (\qty{+20}{\%}, \p{0.001}).
|
||||
%
|
||||
\factor{Hand} (\anova{5}{2868}{24.8}, \pinf{0.001}, see \figref{results/Push-ContactsCount-Hand-Overall-Means}) %
|
||||
and \factor{Target} (\anova{7}{2868}{5.9}, \pinf{0.001}).
|
||||
\level{Skeleton} was the fastest, more than \level{None} (\percent{+18}, \p{0.005}), \level{Occlusion} (\percent{+26}, \pinf{0.001}), \level{Tips} (\percent{+22}, \pinf{0.001}), and \level{Contour} (\percent{+20}, \p{0.001}).
|
||||
|
||||
Three groups of targets volumes were identified:
|
||||
%
|
||||
(1) sides E, W, and SW targets were the fastest;
|
||||
%
|
||||
(2) back and front NE, S, and SE were slower (\p{0.003});
|
||||
%
|
||||
and (3) back N and NW targets were the slowest (\p{0.04}).
|
||||
(1) sides \level{R}, \level{L}, and \level{LF} targets were the fastest;
|
||||
(2) back and front \level{RB}, \level{F}, and \level{RF} were slower (\p{0.003});
|
||||
and (3) back \level{B} and \level{LB} targets were the slowest (\p{0.04}).
|
||||
|
||||
\subsubsection{Contacts}
|
||||
\label{push_contacts_count}
|
||||
|
||||
On the number of contacts, there were two statistically significant effects: %
|
||||
Hand (\anova{5}{2868}{6.7}, \pinf{0.001}, see \figref{results/Push-ContactsCount-Hand-Overall-Means}) %
|
||||
and Target (\anova{7}{2868}{27.8}, \pinf{0.001}).
|
||||
%
|
||||
\figref{results/Push-ContactsCount-Hand-Overall-Means} shows the Contacts for each Hand.
|
||||
%
|
||||
Less contacts were made with Skeleton than with None (\qty{-23}{\%}, \pinf{0.001}), Occlusion (\qty{-26}{\%}, \pinf{0.001}), Tips (\qty{-18}{\%}, \p{0.004}), and Contour (\qty{-15}{\%}, \p{0.02});
|
||||
%
|
||||
and less with Mesh than with Occlusion (\qty{-14}{\%}, \p{0.04}).
|
||||
%
|
||||
\factor{Hand} (\anova{5}{2868}{6.7}, \pinf{0.001}, see \figref{results/Push-ContactsCount-Hand-Overall-Means}) %
|
||||
and \factor{Target} (\anova{7}{2868}{27.8}, \pinf{0.001}).
|
||||
|
||||
Less contacts were made with \level{Skeleton} than with \level{None} (\percent{-23}, \pinf{0.001}), \level{Occlusion} (\percent{-26}, \pinf{0.001}), \level{Tips} (\percent{-18}, \p{0.004}), and \level{Contour} (\percent{-15}, \p{0.02});
|
||||
and less with \level{Mesh} than with \level{Occlusion} (\percent{-14}, \p{0.04}).
|
||||
This indicates how effective a visual hand rendering is: a lower result indicates a smoother ability to push and rotate properly the cube into the target, as one would probably do with a real cube.
|
||||
%
|
||||
Targets on the left (W) and the right (E, SW) were easier to reach than the back ones (N, NW, \pinf{0.001}).
|
||||
|
||||
Targets on the left (\level{L}, \level{LF}) and the right (\level{R}) were easier to reach than the back ones (\level{B}, \level{LB}, \pinf{0.001}).
|
||||
|
||||
\subsubsection{Time per Contact}
|
||||
\label{push_time_per_contact}
|
||||
|
||||
On the mean time spent on each contact, there were two statistically significant effects: %
|
||||
Hand (\anova{5}{2868}{8.4}, \pinf{0.001}, see \figref{results/Push-MeanContactTime-Hand-Overall-Means}) %
|
||||
and Target (\anova{7}{2868}{19.4}, \pinf{0.001}).
|
||||
%
|
||||
It was shorter with None than with Skeleton (\qty{-10}{\%}, \pinf{0.001}) and Mesh (\qty{-8}{\%}, \p{0.03});
|
||||
%
|
||||
and shorter with Occlusion than with Tips (\qty{-10}{\%}, \p{0.002}), Contour (\qty{-10}{\%}, \p{0.001}), Skeleton (\qty{-14}{\%}, \p{0.001}), and Mesh (\qty{-12}{\%}, \p{0.03}).
|
||||
%
|
||||
\factor{Hand} (\anova{5}{2868}{8.4}, \pinf{0.001}, see \figref{results/Push-MeanContactTime-Hand-Overall-Means}) %
|
||||
and \factor{Target} (\anova{7}{2868}{19.4}, \pinf{0.001}).
|
||||
|
||||
It was shorter with \level{None} than with \level{Skeleton} (\percent{-10}, \pinf{0.001}) and \level{Mesh} (\percent{-8}, \p{0.03});
|
||||
and shorter with \level{Occlusion} than with \level{Tips} (\percent{-10}, \p{0.002}), \level{Contour} (\percent{-10}, \p{0.001}), \level{Skeleton} (\percent{-14}, \p{0.001}), and \level{Mesh} (\percent{-12}, \p{0.03}).
|
||||
This result suggests that users pushed the virtual cube with more confidence with a visible visual hand rendering.
|
||||
%
|
||||
On the contrary, the lack of visual hand constrained the participants to give more attention to the cube's reactions.
|
||||
%
|
||||
Targets on the left (W, SW) and the right (E) sides had higher Timer per Contact than all the other targets (\p{0.005}).
|
||||
|
||||
Targets on the left (\level{L}, \level{LF}) and the right (\level{R}) sides had higher \response{Timer per Contact} than all the other targets (\p{0.005}).
|
||||
|
||||
\begin{subfigs}{push_results}{Results of the push task performance metrics for each visual hand rendering. }[
|
||||
Geometric means with bootstrap 95~\% \CI
|
||||
and Tukey's \HSD pairwise comparisons: *** is \pinf{0.001}, ** is \pinf{0.01}, and * is \pinf{0.05}.
|
||||
][
|
||||
\item Time to complete a trial.
|
||||
\item Number of contacts with the cube.
|
||||
\item Time spent on each contact.
|
||||
]
|
||||
\subfig[0.32]{results/Push-CompletionTime-Hand-Overall-Means}
|
||||
\subfig[0.32]{results/Push-ContactsCount-Hand-Overall-Means}
|
||||
\subfig[0.32]{results/Push-MeanContactTime-Hand-Overall-Means}
|
||||
\end{subfigs}
|
||||
|
||||
@@ -5,66 +5,67 @@
|
||||
\label{grasp_tct}
|
||||
|
||||
On the time to complete a trial, there was one statistically significant effect %
|
||||
of Target (\anova{7}{2868}{37.2}, \pinf{0.001}) %
|
||||
but not of Hand (\anova{5}{2868}{1.8}, \p{0.1}, see \figref{results/Grasp-CompletionTime-Hand-Overall-Means}).
|
||||
%
|
||||
Targets on the back and the left (N, NW, and W) were slower than targets on the front (SW, S, and SE, \p{0.003}) {except for} NE (back-right) which was also fast.
|
||||
|
||||
of \factor{Target} (\anova{7}{2868}{37.2}, \pinf{0.001}) %
|
||||
but not of \factor{Hand} (\anova{5}{2868}{1.8}, \p{0.1}, see \figref{results/Grasp-CompletionTime-Hand-Overall-Means}).
|
||||
Targets on the back and the left (\level{B}, \level{LB}, and \level{L}) were slower than targets on the front (\level{LF}, \level{F}, and \level{RF}, \p{0.003}) {except for} \level{RB} (back-right) which was also fast.
|
||||
|
||||
\subsubsection{Contacts}
|
||||
\label{grasp_contacts_count}
|
||||
|
||||
On the number of contacts, there were two statistically significant effects: %
|
||||
Hand (\anova{5}{2868}{5.2}, \pinf{0.001}, see \figref{results/Grasp-ContactsCount-Hand-Overall-Means}) %
|
||||
and Target (\anova{7}{2868}{21.2}, \pinf{0.001}).
|
||||
%
|
||||
Less contacts were made with Tips than with None (\qty{-13}{\%}, \p{0.02}) and Occlusion (\qty{-15}{\%}, \p{0.004});
|
||||
%
|
||||
and less with Mesh than with None (\qty{-15}{\%}, \p{0.006}) and Occlusion (\qty{-17}{\%}, \p{0.001}).
|
||||
%
|
||||
This result suggests that having no visible visual hand increased the number of failed grasps or cube drops.
|
||||
%
|
||||
But, surprisingly, only Tips and Mesh were statistically significantly better, not Contour nor Skeleton.
|
||||
%
|
||||
Targets on the back and left were more difficult (N, NW, and W) than targets on the front (SW, S, and SE, \pinf{0.001}).
|
||||
\factor{Hand} (\anova{5}{2868}{5.2}, \pinf{0.001}, see \figref{results/Grasp-ContactsCount-Hand-Overall-Means}) %
|
||||
and \factor{Target} (\anova{7}{2868}{21.2}, \pinf{0.001}).
|
||||
|
||||
Less contacts were made with \level{Tips} than with \level{None} (\qty{-13}{\%}, \p{0.02}) and \level{Occlusion} (\qty{-15}{\%}, \p{0.004});
|
||||
and less with \level{Mesh} than with \level{None} (\qty{-15}{\%}, \p{0.006}) and \level{Occlusion} (\qty{-17}{\%}, \p{0.001}).
|
||||
This result suggests that having no visible visual hand increased the number of failed grasps or cube drops.
|
||||
But, surprisingly, only \level{Tips} and \level{Mesh} were statistically significantly better, not \level{Contour} nor \level{Skeleton}.
|
||||
|
||||
Targets on the back and left were more difficult (\level{B}, \level{LB}, and \level{L}) than targets on the front (\level{LF}, \level{F}, and \level{RF}, \pinf{0.001}).
|
||||
|
||||
\subsubsection{Time per Contact}
|
||||
\label{grasp_time_per_contact}
|
||||
|
||||
On the mean time spent on each contact, there were two statistically significant effects: %
|
||||
Hand (\anova{5}{2868}{9.6}, \pinf{0.001}, see \figref{results/Grasp-MeanContactTime-Hand-Overall-Means}) %
|
||||
and Target (\anova{7}{2868}{5.6}, \pinf{0.001}).
|
||||
%
|
||||
It was shorter with None than with Tips (\qty{-15}{\%}, \pinf{0.001}), Skeleton (\qty{-11}{\%}, \p{0.001}) and Mesh (\qty{-11}{\%}, \p{0.001});
|
||||
%
|
||||
shorter with Occlusion than with Tips (\qty{-10}{\%}, \pinf{0.001}), Skeleton (\qty{-8}{\%}, \p{0.05}), and Mesh (\qty{-8}{\%}, \p{0.04});
|
||||
%
|
||||
shorter with Contour than with Tips (\qty{-8}{\%}, \pinf{0.001}).
|
||||
%
|
||||
As for the Push task, the lack of visual hand increased the number of failed grasps or cube drops.
|
||||
%
|
||||
The Tips rendering seemed to provide one of the best feedback for the grasping, maybe thanks to the fact that it provides information about both position and rotation of the tracked fingertips.
|
||||
%
|
||||
This time was the shortest on the front S than on the other target volumes (\pinf{0.001}).
|
||||
\factor{Hand} (\anova{5}{2868}{9.6}, \pinf{0.001}, see \figref{results/Grasp-MeanContactTime-Hand-Overall-Means}) %
|
||||
and \factor{Target} (\anova{7}{2868}{5.6}, \pinf{0.001}).
|
||||
|
||||
It was shorter with \level{None} than with \level{Tips} (\qty{-15}{\%}, \pinf{0.001}), \level{Skeleton} (\qty{-11}{\%}, \p{0.001}) and \level{Mesh} (\qty{-11}{\%}, \p{0.001});
|
||||
shorter with \level{Occlusion} than with \level{Tips} (\qty{-10}{\%}, \pinf{0.001}), \level{Skeleton} (\qty{-8}{\%}, \p{0.05}), and \level{Mesh} (\qty{-8}{\%}, \p{0.04});
|
||||
shorter with \level{Contour} than with \level{Tips} (\qty{-8}{\%}, \pinf{0.001}).
|
||||
As for the \factor{Push} task, the lack of visual hand increased the number of failed grasps or cube drops.
|
||||
The \level{Tips} rendering seemed to provide one of the best feedback for the grasping, maybe thanks to the fact that it provides information about both position and rotation of the tracked fingertips.
|
||||
|
||||
This time was the shortest on the front \level{F} than on the other target volumes (\pinf{0.001}).
|
||||
|
||||
\subsubsection{Grip Aperture}
|
||||
\label{grasp_grip_aperture}
|
||||
|
||||
On the average distance between the thumb's fingertip and the other fingertips during grasping, there were two
|
||||
statistically significant effects: %
|
||||
Hand (\anova{5}{2868}{35.8}, \pinf{0.001}, see \figref{results/Grasp-GripAperture-Hand-Overall-Means}) %
|
||||
and Target (\anova{7}{2868}{3.7}, \pinf{0.001}).
|
||||
%
|
||||
It was shorter with None than with Occlusion (\pinf{0.001}), Tips (\pinf{0.001}), Contour (\pinf{0.001}), Skeleton (\pinf{0.001}) and Mesh (\pinf{0.001});
|
||||
%
|
||||
shorter with Tips than with Occlusion (\p{0.008}), Contour (\p{0.006}) and Mesh (\pinf{0.001});
|
||||
%
|
||||
and shorter with Skeleton than with Mesh (\pinf{0.001}).
|
||||
%
|
||||
\factor{Hand} (\anova{5}{2868}{35.8}, \pinf{0.001}, see \figref{results/Grasp-GripAperture-Hand-Overall-Means}) %
|
||||
and \factor{Target} (\anova{7}{2868}{3.7}, \pinf{0.001}).
|
||||
|
||||
It was shorter with \level{None} than with \level{Occlusion} (\pinf{0.001}), \level{Tips} (\pinf{0.001}), \level{Contour} (\pinf{0.001}), \level{Skeleton} (\pinf{0.001}) and \level{Mesh} (\pinf{0.001});
|
||||
shorter with \level{Tips} than with \level{Occlusion} (\p{0.008}), \level{Contour} (\p{0.006}) and \level{Mesh} (\pinf{0.001});
|
||||
and shorter with \level{Skeleton} than with \level{Mesh} (\pinf{0.001}).
|
||||
This result is an evidence of the lack of confidence of participants with no visual hand rendering: they grasped the cube more to secure it.
|
||||
%
|
||||
The Mesh rendering seemed to have provided the most confidence to participants, maybe because it was the closest to the real hand.
|
||||
%
|
||||
The Grip Aperture was longer on SE (bottom-right) target volume, indicating a higher confidence, than on back and side targets (E, NE, N, W, \p{0.03}).
|
||||
The \level{Mesh} rendering seemed to have provided the most confidence to participants, maybe because it was the closest to the real hand.
|
||||
|
||||
The \response{Grip Aperture} was longer on the right-front (\level{RF}) target volume, indicating a higher confidence, than on back and side targets (\level{R}, \level{RB}, \level{B}, \level{L}, \p{0.03}).
|
||||
|
||||
\begin{subfigs}{grasp_results}{Results of the grasp task performance metrics for each visual hand rendering. }[
|
||||
Geometric means with bootstrap 95~\% \CI
|
||||
and Tukey's \HSD pairwise comparisons: *** is \pinf{0.001}, ** is \pinf{0.01}, and * is \pinf{0.05}.
|
||||
][
|
||||
\item Time to complete a trial.
|
||||
\item Number of contacts with the cube.
|
||||
\item Time spent on each contact.
|
||||
\item Distance between thumb and the other fingertips when grasping.
|
||||
]
|
||||
\subfig[0.4]{results/Grasp-CompletionTime-Hand-Overall-Means}
|
||||
\subfig[0.4]{results/Grasp-ContactsCount-Hand-Overall-Means}
|
||||
\par
|
||||
\subfig[0.4]{results/Grasp-MeanContactTime-Hand-Overall-Means}
|
||||
\subfig[0.4]{results/Grasp-GripAperture-Hand-Overall-Means}
|
||||
\end{subfigs}
|
||||
|
||||
@@ -1,6 +1,19 @@
|
||||
\subsection{Ranking}
|
||||
\label{ranks}
|
||||
|
||||
\figref{results_ranks} shows the ranking of each visual \factor{Hand} rendering for the \factor{Push} and \factor{Grasp} tasks.
|
||||
Friedman tests indicated that both ranking had statistically significant differences (\pinf{0.001}).
|
||||
Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment were then used on both ranking results (\secref{metrics}):
|
||||
|
||||
\begin{itemize}
|
||||
\item \response{Push task ranking}: \level{Occlusion} was ranked lower than \level{Contour} (\p{0.005}), \level{Skeleton} (\p{0.02}), and \level{Mesh} (\p{0.03});
|
||||
\level{Tips} was ranked lower than \level{Skeleton} (\p{0.02}).
|
||||
This good ranking of the \level{Skeleton} rendering for the Push task is consistent with the Push trial results.
|
||||
\item \response{Grasp task ranking}: \level{Occlusion} was ranked lower than \level{Contour} (\p{0.001}), \level{Skeleton} (\p{0.001}), and \level{Mesh} (\p{0.007});
|
||||
No Hand was ranked lower than \level{Skeleton} (\p{0.04}).
|
||||
A complete visual hand rendering seemed to be preferred over no visual hand rendering when grasping.
|
||||
\end{itemize}
|
||||
|
||||
\begin{subfigs}{results_ranks}{Boxplots of the ranking for each visual hand rendering. }[
|
||||
Lower is better.
|
||||
Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment: ** is \pinf{0.01} and * is \pinf{0.05}.
|
||||
@@ -8,25 +21,6 @@
|
||||
\item Push task ranking.
|
||||
\item Grasp task ranking.
|
||||
]
|
||||
\subfig[0.24]{results/Ranks-Push}
|
||||
\subfig[0.24]{results/Ranks-Grasp}
|
||||
\subfig[0.4]{results/Ranks-Push}
|
||||
\subfig[0.4]{results/Ranks-Grasp}
|
||||
\end{subfigs}
|
||||
|
||||
\figref{results_ranks} shows the ranking of each visual hand rendering for the Push and Grasp tasks.
|
||||
%
|
||||
Friedman tests indicated that both ranking had statistically significant differences (\pinf{0.001}).
|
||||
%
|
||||
Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment were then used on both ranking results (\secref{metrics}):
|
||||
|
||||
\begin{itemize}
|
||||
\item \textit{Push Ranking}: Occlusion was ranked lower than Contour (\p{0.005}), Skeleton (\p{0.02}), and Mesh (\p{0.03});
|
||||
%
|
||||
Tips was ranked lower than Skeleton (\p{0.02}).
|
||||
%
|
||||
This good ranking of the Skeleton rendering for the Push task is consistent with the Push trial results.
|
||||
\item \textit{Grasp Ranking}: Occlusion was ranked lower than Contour (\p{0.001}), Skeleton (\p{0.001}), and Mesh (\p{0.007});
|
||||
%
|
||||
No Hand was ranked lower than Skeleton (\p{0.04}).
|
||||
%
|
||||
A complete visual hand rendering seemed to be preferred over no visual hand rendering when grasping.
|
||||
\end{itemize}
|
||||
|
||||
@@ -1,37 +1,36 @@
|
||||
\subsection{Questionnaire}
|
||||
\label{questions}
|
||||
|
||||
\figref{results_questions} presents the questionnaire results for each visual hand rendering.
|
||||
Friedman tests indicated that all questions had statistically significant differences (\pinf{0.001}).
|
||||
Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment were then used each question results (\secref{metrics}):
|
||||
\begin{itemize}
|
||||
\item \response{Difficulty}: \level{Occlusion} was considered more difficult than \level{Contour} (\p{0.02}), \level{Skeleton} (\p{0.01}), and \level{Mesh} (\p{0.03}).
|
||||
\item \response{Fatigue}: \level{None} was found more fatiguing than \level{Mesh} (\p{0.04}); And \level{Occlusion} more than \level{Skeleton} (\p{0.02}) and \level{Mesh} (\p{0.02}).
|
||||
\item \response{Precision}: \level{None} was considered less precise than \level{Skeleton} (\p{0.02}) and \level{Mesh} (\p{0.02}); And \level{Occlusion} more than \level{Contour} (\p{0.02}), \level{Skeleton} (\p{0.006}), and \level{Mesh} (\p{0.02}).
|
||||
\item \response{Performance}: \level{Occlusion} was lower than \level{Contour} (\p{0.02}), \level{Skeleton} (\p{0.006}), and \level{Mesh} (\p{0.03}).
|
||||
\item \response{Efficiency}: \level{Occlusion} was found less efficient than \level{Contour} (\p{0.01}), \level{Skeleton} (\p{0.02}), and \level{Mesh} (\p{0.02}).
|
||||
\item \response{Rating}: \level{Occlusion} was rated lower than \level{Contour} (\p{0.02}) and \level{Skeleton} (\p{0.03}).
|
||||
\end{itemize}
|
||||
|
||||
In summary, \level{Occlusion} was worse than \level{Skeleton} for all questions, and worse than \level{Contour} and \level{Mesh} on 5 over 6 questions.
|
||||
Results of \response{Difficulty}, \response{Performance}, and \response{Precision} questions are consistent in that way.
|
||||
Moreover, having no visible visual \factor{Hand} rendering was felt by users fatiguing and less precise than having one.
|
||||
Surprisingly, no clear consensus was found on \response{Rating}.
|
||||
Each visual hand rendering, except for \level{Occlusion}, had simultaneously received the minimum and maximum possible notes.
|
||||
|
||||
\begin{subfigs}{results_questions}{Boxplots of the questionnaire results for each visual hand rendering. }[
|
||||
Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment: ** is \pinf{0.01} and * is \pinf{0.05}.
|
||||
Lower is better for \textbf{(a)} difficulty and \textbf{(b)} fatigue.
|
||||
Higher is better for \textbf{(c)} precision, \textbf{(d)} efficiency, and \textbf{(e)} rating.
|
||||
Higher is better for \textbf{(d)} performance, \textbf{(d)} precision, \textbf{(e)} efficiency, and \textbf{(f)} rating.
|
||||
]
|
||||
\subfig[0.19]{results/Question-Difficulty}
|
||||
\subfig[0.19]{results/Question-Fatigue}
|
||||
\subfig[0.19]{results/Question-Precision}
|
||||
\subfig[0.19]{results/Question-Efficiency}
|
||||
\subfig[0.19]{results/Question-Rating}
|
||||
\subfig[0.4]{results/Question-Difficulty}
|
||||
\subfig[0.4]{results/Question-Fatigue}
|
||||
\par
|
||||
\subfig[0.4]{results/Question-Performance}
|
||||
\subfig[0.4]{results/Question-Precision}
|
||||
\par
|
||||
\subfig[0.4]{results/Question-Efficiency}
|
||||
\subfig[0.4]{results/Question-Rating}
|
||||
\end{subfigs}
|
||||
|
||||
\figref{results_questions} presents the questionnaire results for each visual hand rendering.
|
||||
%
|
||||
Friedman tests indicated that all questions had statistically significant differences (\pinf{0.001}).
|
||||
%
|
||||
Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment were then used each question results (\secref{metrics}):
|
||||
\begin{itemize}
|
||||
\item \textit{Difficulty}: Occlusion was considered more difficult than Contour (\p{0.02}), Skeleton (\p{0.01}), and Mesh (\p{0.03}).
|
||||
\item \textit{Fatigue}: None was found more fatiguing than Mesh (\p{0.04}); And Occlusion more than Skeleton (\p{0.02}) and Mesh (\p{0.02}).
|
||||
\item \textit{Precision}: None was considered less precise than Skeleton (\p{0.02}) and Mesh (\p{0.02}); And Occlusion more than Contour (\p{0.02}), Skeleton (\p{0.006}), and Mesh (\p{0.02}).
|
||||
\item \textit{Efficiency}: Occlusion was found less efficient than Contour (\p{0.01}), Skeleton (\p{0.02}), and Mesh (\p{0.02}).
|
||||
\item \textit{Rating}: Occlusion was rated lower than Contour (\p{0.02}) and Skeleton (\p{0.03}).
|
||||
\end{itemize}
|
||||
|
||||
In summary, Occlusion was worse than Skeleton for all questions, and worse than Contour and Mesh on 5 over 6 questions.
|
||||
%
|
||||
Results of Difficulty, Performance, and Precision questions are consistent in that way.
|
||||
%
|
||||
Moreover, having no visible visual hand rendering was felt by users fatiguing and less precise than having one.
|
||||
%
|
||||
Surprisingly, no clear consensus was found on Rating.
|
||||
%
|
||||
Each visual hand rendering, except for Occlusion, had simultaneously received the minimum and maximum possible notes.
|
||||
|
||||
@@ -1,43 +0,0 @@
|
||||
\section{Results}
|
||||
\label{results}
|
||||
|
||||
\begin{subfigs}{push_results}{Results of the push task performance metrics for each visual hand rendering. }[
|
||||
Geometric means with bootstrap 95~\% \CI
|
||||
and Tukey's \HSD pairwise comparisons: *** is \pinf{0.001}, ** is \pinf{0.01}, and * is \pinf{0.05}.
|
||||
][
|
||||
\item Time to complete a trial.
|
||||
\item Number of contacts with the cube.
|
||||
\item Time spent on each contact.
|
||||
]
|
||||
\subfig[0.24]{results/Push-CompletionTime-Hand-Overall-Means}
|
||||
\subfig[0.24]{results/Push-ContactsCount-Hand-Overall-Means}
|
||||
\subfig[0.24]{results/Push-MeanContactTime-Hand-Overall-Means}
|
||||
\end{subfigs}
|
||||
|
||||
\begin{subfigs}{grasp_results}{Results of the grasp task performance metrics for each visual hand rendering. }[
|
||||
Geometric means with bootstrap 95~\% \CI
|
||||
and Tukey's \HSD pairwise comparisons: *** is \pinf{0.001}, ** is \pinf{0.01}, and * is \pinf{0.05}.
|
||||
][
|
||||
\item Time to complete a trial.
|
||||
\item Number of contacts with the cube.
|
||||
\item Time spent on each contact.
|
||||
\item Distance between thumb and the other fingertips when grasping.
|
||||
]
|
||||
\subfig[0.24]{results/Grasp-CompletionTime-Hand-Overall-Means}
|
||||
\subfig[0.24]{results/Grasp-ContactsCount-Hand-Overall-Means}
|
||||
\subfig[0.24]{results/Grasp-MeanContactTime-Hand-Overall-Means}
|
||||
\subfig[0.24]{results/Grasp-GripAperture-Hand-Overall-Means}
|
||||
\end{subfigs}
|
||||
|
||||
Results of each trials measure were analyzed with a \LMM, with the order of the two manipulation tasks and the six visual hand renderings (Order), the visual hand renderings (Hand), the target volume position (Target), and their interactions as fixed effects and the Participant as random intercept.
|
||||
%
|
||||
For every \LMM, residuals were tested with a Q-Q plot to confirm normality.
|
||||
%
|
||||
On statistically significant effects, estimated marginal means of the \LMM were compared pairwise using Tukey's \HSD test.
|
||||
%
|
||||
Only significant results were reported.
|
||||
|
||||
Because Completion Time, Contacts, and Time per Contact measure results were Gamma distributed, they were first
|
||||
transformed with a log to approximate a normal distribution.
|
||||
%
|
||||
Their analysis results are reported anti-logged, corresponding to geometric means of the measures.
|
||||
@@ -3,56 +3,30 @@
|
||||
|
||||
We evaluated six visual hand renderings, as described in \secref{hands}, displayed on top of the real hand, in two virtual object manipulation tasks in \AR.
|
||||
|
||||
During the Push task, the Skeleton hand rendering was the fastest (\figref{results/Push-CompletionTime-Hand-Overall-Means}), as participants employed fewer and longer contacts to adjust the cube inside the target volume (\figref{results/Push-ContactsCount-Hand-Overall-Means} and \figref{results/Push-MeanContactTime-Hand-Overall-Means}).
|
||||
%
|
||||
During the \factor{Push} task, the \level{Skeleton} hand rendering was the fastest (\figref{results/Push-CompletionTime-Hand-Overall-Means}), as participants employed fewer and longer contacts to adjust the cube inside the target volume (\figref{results/Push-ContactsCount-Hand-Overall-Means} and \figref{results/Push-MeanContactTime-Hand-Overall-Means}).
|
||||
Participants consistently used few and continuous contacts for all visual hand renderings (Fig. 3b), with only less than ten trials, carried out by two participants, quickly completed with multiple discrete touches.
|
||||
%
|
||||
However, during the Grasp task, despite no difference in completion time, providing no visible hand rendering (None and Occlusion renderings) led to more failed grasps or cube drops (\figref{results/Grasp-CompletionTime-Hand-Overall-Means} and \figref{results/Grasp-MeanContactTime-Hand-Overall-Means}).
|
||||
%
|
||||
Indeed, participants found the None and Occlusion renderings less effective (\figref{results/Ranks-Grasp}) and less precise (\figref{questions}).
|
||||
%
|
||||
However, during the \factor{Grasp} task, despite no difference in \response{Completion Time}, providing no visible hand rendering (\level{None} and \level{Occlusion} renderings) led to more failed grasps or cube drops (\figref{results/Grasp-CompletionTime-Hand-Overall-Means} and \figref{results/Grasp-MeanContactTime-Hand-Overall-Means}).
|
||||
Indeed, participants found the \level{None} and \level{Occlusion} renderings less effective (\figref{results/Ranks-Grasp}) and less precise (\figref{results_questions}).
|
||||
To understand whether the participants' previous experience might have played a role, we also carried out an additional statistical analysis considering \VR experience as an additional between-subjects factor, \ie \VR novices vs. \VR experts (\enquote{I use it every week}, see \secref{participants}).
|
||||
%
|
||||
We found no statistically significant differences when comparing the considered metrics between \VR novices and experts.
|
||||
|
||||
Interestingly, all visual hand renderings showed grip apertures very close to the size of the virtual cube, except for the None rendering (\figref{results/Grasp-GripAperture-Hand-Overall-Means}), with which participants applied stronger grasps, \ie less distance between the fingertips.
|
||||
%
|
||||
Interestingly, all visual hand renderings showed \response{Grip Apertures} very close to the size of the virtual cube, except for the \level{None} rendering (\figref{results/Grasp-GripAperture-Hand-Overall-Means}), with which participants applied stronger grasps, \ie less distance between the fingertips.
|
||||
Having no visual hand rendering, but only the reaction of the cube to the interaction as feedback, made participants less confident in their grip.
|
||||
%
|
||||
This result contrasts with the wrongly estimated grip apertures observed by \textcite{al-kalbani2016analysis} in an exocentric VST-AR setup.
|
||||
%
|
||||
Also, while some participants found the absence of visual hand rendering more natural, many of them commented on the importance of having feedback on the tracking of their hands, as observed by \textcite{xiao2018mrtouch} in a similar immersive OST-AR setup.
|
||||
|
||||
Yet, participants' opinions of the visual hand renderings were mixed on many questions, except for the Occlusion one, which was perceived less effective than more \enquote{complete} visual hands such as Contour, Skeleton, and Mesh hands (\figref{questions}).
|
||||
%
|
||||
However, due to the latency of the hand tracking and the visual hand reacting to the cube, almost all participants thought that the Occlusion rendering to be a \enquote{shadow} of the real hand on the cube.
|
||||
Yet, participants' opinions of the visual hand renderings were mixed on many questions, except for the \level{Occlusion} one, which was perceived less effective than more \enquote{complete} visual hands such as \level{Contour}, \level{Skeleton}, and \level{Mesh} hands (\figref{results_questions}).
|
||||
However, due to the latency of the hand tracking and the visual hand reacting to the cube, almost all participants thought that the \level{Occlusion} rendering to be a \enquote{shadow} of the real hand on the cube.
|
||||
|
||||
The Tips rendering, which showed the contacts made on the virtual cube, was controversial as it received the minimum and the maximum score on every question.
|
||||
%
|
||||
The \level{Tips} rendering, which showed the contacts made on the virtual cube, was controversial as it received the minimum and the maximum score on every question.
|
||||
Many participants reported difficulties in seeing the orientation of the visual fingers,
|
||||
%
|
||||
while others found that it gave them a better sense of the contact points and improved their concentration on the task.
|
||||
%
|
||||
This result are consistent with \textcite{saito2021contact}, who found that displaying the points of contacts was beneficial for grasping a virtual object over an opaque visual hand overlay.
|
||||
This result is consistent with \textcite{saito2021contact}, who found that displaying the points of contacts was beneficial for grasping a virtual object over an opaque visual hand overlay.
|
||||
|
||||
To summarize, when employing a visual hand rendering overlaying the real hand, participants were more performant and confident in manipulating virtual objects with bare hands in \AR.
|
||||
%
|
||||
These results contrast with similar manipulation studies, but in non-immersive, on-screen \AR, where the presence of a visual hand rendering was found by participants to improve the usability of the interaction, but not their performance \cite{blaga2017usability,maisto2017evaluation,meli2018combining}.
|
||||
%
|
||||
Our results show the most effective visual hand rendering to be the Skeleton one{. Participants appreciated that} it provided a detailed and precise view of the tracking of the real hand{, without} hiding or masking it.
|
||||
%
|
||||
Although the Contour and Mesh hand renderings were also highly rated, some participants felt that they were too visible and masked the real hand.
|
||||
%
|
||||
Our results show the most effective visual hand rendering to be the \level{Skeleton} one. Participants appreciated that it provided a detailed and precise view of the tracking of the real hand, without hiding or masking it.
|
||||
Although the \level{Contour} and \level{Mesh} hand renderings were also highly rated, some participants felt that they were too visible and masked the real hand.
|
||||
This result is in line with the results of virtual object manipulation in \VR of \textcite{prachyabrued2014visual}, who found that the most effective visual hand rendering was a double representation of both the real tracked hand and a visual hand physically constrained by the virtual environment.
|
||||
%
|
||||
This type of Skeleton rendering was also the one that provided the best sense of agency (control) in \VR \cite{argelaguet2016role, schwind2018touch}.
|
||||
This type of \level{Skeleton} rendering was also the one that provided the best sense of agency (control) in \VR \cite{argelaguet2016role, schwind2018touch}.
|
||||
|
||||
These results have of course some limitations as they only address limited types of manipulation tasks and visual hand characteristics, evaluated in a specific OST-AR setup.
|
||||
%
|
||||
The two manipulation tasks were also limited to placing a virtual cube in predefined target volumes.
|
||||
%
|
||||
Testing a wider range of virtual objects and more ecological tasks \eg stacking, assembly, will ensure a greater applicability of the results obtained in this work, as well as considering bimanual manipulation.
|
||||
%
|
||||
Similarly, a broader experimental study might shed light on the role of gender and age, as our subject pool was not sufficiently diverse in this respect.
|
||||
%
|
||||
However, we believe that the results presented here provide a rather interesting overview of the most promising approaches in \AR manipulation.
|
||||
|
||||
@@ -2,9 +2,6 @@
|
||||
\label{conclusion}
|
||||
|
||||
This paper presented two human subject studies aimed at better understanding the role of visuo-haptic rendering of the hand during virtual object manipulation in OST-AR.
|
||||
%
|
||||
The first experiment compared six visual hand renderings in two representative manipulation tasks in \AR, \ie push-and-slide and grasp-and-place of a virtual object.
|
||||
%
|
||||
Results show that a visual hand rendering improved the performance, perceived effectiveness, and user confidence.
|
||||
%
|
||||
A skeleton rendering, providing a detailed view of the tracked joints and phalanges while not hiding the real hand, was the most performant and effective.
|
||||
|
||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
@@ -5,7 +5,7 @@
|
||||
|
||||
\input{1-introduction}
|
||||
\input{2-method}
|
||||
\input{3-results}
|
||||
\input{3-0-results}
|
||||
\input{3-1-push}
|
||||
\input{3-2-grasp}
|
||||
\input{3-3-ranks}
|
||||
|
||||
@@ -5,7 +5,13 @@
|
||||
|
||||
\section*{Future Work}
|
||||
|
||||
\subsection*{}
|
||||
\subsection*{Visual Rendering of the Hand for Manipulating Virtual Objects in Augmented Reality}
|
||||
|
||||
These results have of course some limitations as they only address limited types of manipulation tasks and visual hand characteristics, evaluated in a specific \OST-\AR setup.
|
||||
The two manipulation tasks were also limited to placing a virtual cube in predefined target volumes.
|
||||
Testing a wider range of virtual objects and more ecological tasks \eg stacking, assembly, will ensure a greater applicability of the results obtained in this work, as well as considering bimanual manipulation.
|
||||
Similarly, a broader experimental study might shed light on the role of gender and age, as our subject pool was not sufficiently diverse in this respect.
|
||||
However, we believe that the results presented here provide a rather interesting overview of the most promising approaches in \AR manipulation.
|
||||
|
||||
\section*{Perspectives}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user