phd-thesis/4-manipulation/visual-hand/2-method.tex

\section{Visual Hand Augmentations}
\label{hands}

We compared a set of the most popular visual hand augmentations, as found in the literature \secref[related_work]{ar_visual_hands}.
Since we address hand-centered manipulation tasks, we only considered renderings including the fingertips (\secref[related_work]{grasp_types}).
Moreover, as to keep the focus on the hand rendering itself, we used neutral semi-transparent grey meshes, consistent with the choices made in \cite{yoon2020evaluating,vanveldhuizen2021effect}.
All considered hand renderings are drawn following the tracked pose of the user's real hand.
However, while the real hand can of course penetrate virtual objects, the visual hand is always constrained by the \VE (\secref[related_work]{ar_virtual_hands}).

They are shown in \figref{hands} and described below, with an abbreviation in brackets when needed.

\paragraph{None}

As a reference, we considered no visual hand augmentation (\figref{method/hands-none}), as is common in \AR \cite{hettiarachchi2016annexing,blaga2017usability,xiao2018mrtouch,teng2021touch}.
Users have no information about hand tracking and no feedback about contact with the virtual objects, other than their movement when touched.
As virtual content is rendered on top of the \RE, the hand of the user can be hidden by the virtual objects when manipulating them (\secref[related_work]{ar_displays}).

\paragraph{Occlusion (Occl)}

To avoid the abovementioned undesired occlusions due to the virtual content being rendered on top of the \RE, we can carefully crop the former whenever it hides real content that should be visible \cite{macedo2023occlusion}, \eg the thumb of the user in \figref{method/hands-occlusion}.
This approach is frequent in works using \VST-\AR headsets \cite{knorlein2009influence,ha2014wearhand,piumsomboon2014graspshell,suzuki2014grasping,al-kalbani2016analysis}.

\paragraph{Tips}

This rendering shows small visual rings around the fingertips of the user (\figref{method/hands-tips}), highlighting the most important parts of the hand and contact with virtual objects during fine manipulation (\secref[related_work]{grasp_types}).
Unlike work using small spheres \cite{maisto2017evaluation,meli2014wearable,grubert2018effects,normand2018enlarging,schwind2018touch}, this ring rendering also provides information about the orientation of the fingertips.

\paragraph{Contour (Cont)}

This rendering is a \qty{1}{\mm} thick outline contouring the user's hands, providing information about the whole hand while leaving its inside visible.
Unlike the other renderings, it is not occluded by the virtual objects, as shown in \figref{method/hands-contour}.
This rendering is not as usual as the previous others in the literature \cite{kang2020comparative}.

\paragraph{Skeleton (Skel)}

This rendering schematically renders the joints and phalanges of the fingers with small spheres and cylinders, respectively, leaving the outside of the hand visible (\figref{method/hands-skeleton}).
It can be seen as an extension of the Tips rendering to include the complete fingers articulations.
It is widely used in \VR \cite{argelaguet2016role,schwind2018touch,chessa2019grasping} and \AR \cite{blaga2017usability,yoon2020evaluating}, as it is considered simple yet rich and comprehensive.

\paragraph{Mesh}

This rendering is a \ThreeD semi-transparent ($a=0.2$) hand model (\figref{method/hands-mesh}), which is common in \VR \cite{prachyabrued2014visual,argelaguet2016role,schwind2018touch,chessa2019grasping,yoon2020evaluating,vanveldhuizen2021effect}.
It can be seen as a filled version of the Contour hand rendering, thus partially covering the view of the real hand.

\section{User Study}
\label{method}

We aim to investigate whether the chosen visual feedback of the virtual hand affects the performance and user experience of manipulating virtual objects with free hands in \AR.

\subsection{Manipulation Tasks and Virtual Scene}
\label{tasks}

Following the guidelines of \textcite{bergstrom2021how} for designing object manipulation tasks, we considered two variations of a \ThreeD pick-and-place task, commonly found in interaction and manipulation studies \cite{prachyabrued2014visual,blaga2017usability,maisto2017evaluation,meli2018combining,vanveldhuizen2021effect}.

\subsubsection{Push Task}
\label{push-task}

The first manipulation task consists in pushing a virtual object along a real flat surface towards a target placed on the same plane (\figref{method/task-push}).
The virtual object to manipulate is a small \qty{5}{\cm} blue and opaque cube, while the target is a (slightly) bigger \qty{7}{\cm} blue and semi-transparent volume.
At every repetition of the task, the cube to manipulate always spawns at the same place, on top of a real table in front of the user.
On the other hand, the target volume can spawn in eight different locations on the same table, located on a \qty{20}{\cm} radius circle centred on the cube, at \qty{45}{\degree} from each other (again \figref{method/task-push}).
Users are asked to push the cube towards the target volume using their fingertips in any way they prefer.
In this task, the cube cannot be lifted.
The task is considered completed when the cube is \emph{fully} inside the target volume.

\subsubsection{Grasp Task}
\label{grasp-task}

The second manipulation task consists in grasping, lifting, and placing a virtual object in a target placed on a different (higher) plane (\figref{method/task-grasp}).
The cube to manipulate and target volume are the same as in the previous task.
However, this time, the target volume can spawn in eight different locations on a plane \qty{10}{\cm} \emph{above} the table, still located on a \qty{20}{\cm} radius circle at \qty{45}{\degree} from each other.
Users are asked to grasp, lift, and move the cube towards the target volume using their fingertips in any way they prefer.
As before, the task is considered completed when the cube is \emph{fully} inside the volume.

\begin{subfigs}{tasks}{The two manipulation tasks of the user study.}[
    The cube to manipulate is in the middle of the table (\qty{5}{cm} edge and opaque) and the eight possible targets to reach are arround (\qty{7}{cm} edge volume and semi-transparent).
    Only one target at a time was shown during the experiments.
  ][
  \item Push task: pushing the virtual cube along a table towards a target placed on the same surface.
  \item Grasp task: grasping and lifting the virtual cube towards a target placed on a \qty{20}{\cm} higher plane.
  ]
  \subfig[0.45]{method/task-push}
  \subfig[0.45]{method/task-grasp}
\end{subfigs}

\subsection{Experimental Design}
\label{design}

We analyzed the two tasks separately.
For each of them, we considered two independent, within-subject, variables:
\begin{itemize}
  \item \factor{Hand}, consisting of the six possible visual hand augmentations discussed in \secref{hands}: \level{None}, \level{Occlusion} (Occl), \level{Tips}, \level{Contour} (Cont), \level{Skeleton} (Skel), and \level{Mesh}.
  \item \factor{Target}, consisting of the eight possible locations of the target volume, named from the participant's point of view and as shown in \figref{tasks}: right (\level{R}), right-back (\level{RB}), back (\level{B}), left-back (\level{LB}), left (\level{L}), left-front (\level{LF}), front (\level{F}) and right-front (\level{RF}).

\end{itemize}
Each condition was repeated three times.
To control learning effects, we counter-balanced the orders of the two manipulation tasks and visual hand augmentations following a 6 \x 6 Latin square, leading to six blocks where the position of the target volume was in turn randomized.
This design led to a total of 2 manipulation tasks \x 6 visual hand augmentations \x 8 targets \x 3 repetitions $=$ 288 trials per participant.

\subsection{Apparatus}
\label{apparatus}

We used the \OST-\AR headset HoloLens~2, as described in \secref[vhar_system]{virtual_real_registration}.
It is also able to track the user's fingers.
We measured the latency of the hand tracking at \qty{15}{\ms}, independent of the hand movement speed.

The implementation of our experiment was done using Unity (v2022.1), PhysX (v4.1), and the Mixed Reality Toolkit (MRTK, v2.8).
The compiled application ran directly on the HoloLens~2 at \qty{60}{FPS}.

The default \ThreeD hand model from MRTK was used for all visual hand augmentations.
By changing the material properties of this hand model, we were able to achieve the six renderings shown in \figref{hands}.
A calibration was performed for every participant, to best adapt the size of the visual hand augmentation to their real hand.
A set of empirical tests enabled us to choose the best rendering characteristics in terms of transparency and brightness for the virtual objects and hand renderings, which were applied throughout the experiment.

The hand tracking information provided by MRTK was used to construct a virtual articulated physics-enabled hand (\secref[related_work]{ar_virtual_hands}) using PhysX.
It featured 25 \DoFs, including the fingers proximal, middle, and distal phalanges.
To allow effective (and stable) physical interactions between the hand and the virtual cube to manipulate, we implemented an approach similar to that of \textcite{borst2006spring}, where a series of virtual springs with high stiffness are used to couple the physics-enabled hand with the tracked hand.
As before, a set of empirical tests have been used to select the most effective physical characteristics in terms of mass, elastic constant, friction, damping, colliders size, and shape for the (tracked) virtual hand interaction model.

The room where the experiment was held had no windows, with one light source of \qty{800}{\lumen} placed \qty{70}{\cm} above the table.
This setup enabled a good and consistent tracking of the user's fingers.

\subsection{Procedure}
\label{procedure}

First, participants were given a consent form that briefed them about the tasks and the procedure of the experiment.
Then, participants were asked to comfortably sit in front of a table and wear the HoloLens~2 headset as shown in~\figref{tasks}, perform the calibration of the visual hand size as described in~\secref{apparatus}, and complete a \qty{2}{min} training to familiarize with the \AR rendering and the two considered tasks.
During this training, we did not use any of the six hand renderings we want to test, but rather a fully-opaque white hand rendering that completely occluded the real hand of the user.
Participants were asked to carry out the two tasks as naturally and as fast as possible.
Similarly to \cite{prachyabrued2014visual,maisto2017evaluation,blaga2017usability,vanveldhuizen2021effect}, we only allowed the use of the dominant hand.
The experiment took around 1 hour and 20 minutes to complete.

\subsection{Participants}
\label{participants}

Twenty-four subjects participated in the study (eight aged between 18 and 24, fourteen aged between 25 and 34, and two aged between 35 and 44; 22~males, 1~female, 1~preferred not to say).
None of the participants reported any deficiencies in their visual perception abilities.
Two subjects were left-handed, while the twenty-two other were right-handed; they all used their dominant hand during the trials.
Ten subjects had significant experience with \VR (\enquote{I use it every week}), while the fourteen other reported little to no experience with \VR.
Two subjects had significant experience with \AR (\enquote{I use it every week}), while the twenty-two other reported little to no experience with \AR.
Participants signed an informed consent, including the declaration of having no conflict of interest.

\subsection{Collected Data}
\label{metrics}

Inspired by \textcite[p.674]{laviolajr20173d}, we collected the following metrics during the experiment:
\begin{itemize}
  \item \response{Completion Time}, defined as the time elapsed between the first contact with the virtual cube and its correct placement inside the target volume; as subjects were asked to complete the tasks as fast as possible, lower completion times mean better performance.
  \item \response{Contacts}, defined as the number of separate times the user's hand makes contact with the virtual cube; in both tasks, a lower number of contacts means a smoother continuous interaction with the object.
  \item \response{Time per Contact}, defined as the total time any part of the user's hand contacted the cube divided by the number of contacts; higher values mean that the user interacted with the object for longer non-interrupted periods of time.
  \item \response{Grip Aperture} (solely for the grasp-and-place task), defined as the average distance between the thumb's fingertip and the other fingertips during the grasping of the cube; lower values indicate a greater finger interpenetration with the cube, resulting in a greater discrepancy between the real hand and the visual hand augmentation constrained to the cube surfaces and showing how confident users are in their grasp \cite{prachyabrued2014visual, al-kalbani2016analysis, blaga2017usability, chessa2019grasping}.
\end{itemize}
Taken together, these measures provide an overview of the performance and usability of each visual hand augmentation, as we hypothesized that they should influence the behavior and effectiveness of the participants.

At the end of each task, participants were asked to rank the visual hand augmentations according to their preference with respect to the considered task.
Participants also rated each visual hand augmentation individually on six questions using a 7-item Likert scale (1=Not at all, 7=Extremely):
\begin{itemize}
  \item \response{Difficulty}: How difficult were the tasks?
  \item \response{Fatigue}: How fatiguing (mentally and physically) were the tasks?
  \item \response{Precision}: How precise were you in performing the tasks?
  \item \response{Performance}: How successful were you in performing the tasks?
  \item \response{Efficiency}: How fast/efficient do you think you were in performing the tasks?
  \item \response{Rating}: How much do you like each visual hand?
\end{itemize}

Finally, participants were encouraged to comment out loud on the conditions throughout the experiment, as well as in an open-ended question at its end, to gather additional qualitative information.

The results were analyzed using R (v4.4) and the packages \textit{afex} (v1.4), \textit{ARTool} (v0.11), and \textit{performance} (v0.13).