Remove background folder

This commit is contained in:
2024-10-18 17:51:39 +02:00
parent c88bf3c6c1
commit 1d18d2cecf
365 changed files with 17 additions and 17 deletions

View File

@@ -0,0 +1,2 @@
\part{Improving the Manipulation of Virtual Objects}
\mainlabel{manipulation}

View File

@@ -0,0 +1,42 @@
\section{Introduction}
\label{intro}
Touching, grasping and manipulating virtual objects are fundamental interactions in \AR (\secref[related_work]{ve_tasks}) and essential for many of its applications (\secref[related_work]{ar_applications}).
The most common current \AR systems, in the form of portable and immersive \OST-\AR headsets \cite{hertel2021taxonomy}, allow real-time hand tracking and direct interaction with virtual objects with bare hands (\secref[related_work]{real_virtual_gap}).
Manipulation of virtual objects is achieved using a virtual hand interaction technique that represents the user's hand in the \VE and simulates interaction with virtual objects (\secref[related_work]{ar_virtual_hands}).
However, direct hand manipulation is still challenging due to the intangibility of the \VE, the lack of mutual occlusion between the hand and the virtual object in \OST-\AR (\secref[related_work]{ar_displays}), and the inherent delays between the user's hand and the result of the interaction simulation (\secref[related_work]{ar_virtual_hands}).
In this chapter, we investigate the \textbf{visual rendering as hand augmentation} for direct manipulation of virtual objects in \OST-\AR.
To this end, we selected in the literature and compared the most popular visual hand renderings used to interact with virtual objects in \AR.
The virtual hand is \textbf{displayed superimposed} on the user's hand with these visual rendering, providing a \textbf{feedback on the tracking} of the real hand, as shown in \figref{hands}.
The movement of the virtual hand is also \textbf{constrained to the surface} of the virtual object, providing an additional \textbf{feedback on the interaction} with the virtual object.
We \textbf{evaluate in a user study}, using the \OST-\AR headset Microsoft HoloLens~2, the effect of six visual hand renderings on the user performance and experience in two representative manipulation tasks: push-and-slide and grasp-and-place a virtual object directly with the hand.
\noindentskip The main contributions of this chapter are:
\begin{itemize}
\item A comparison from the literature of the six most common visual hand renderings used to interact with virtual objects in \AR.
\item A user study evaluating with 24 participants the performance and user experience of the six visual hand renderings as augmentation of the real hand during free and direct hand manipulation of virtual objects in \OST-\AR.
\end{itemize}
\noindentskip In the next sections, we first present the six visual hand renderings we considered and gathered from the literature. We then describe the experimental setup and design, the two manipulation tasks, and the metrics used. We present the results of the user study and discuss the implications of these results for the manipulation of virtual objects directly with the hand in \AR.
\bigskip
\begin{subfigs}{hands}{The six visual hand renderings as augmentation of the real hands.}[
As seen by the user through the \AR headset during the two-finger grasping of a virtual cube.
][
\item No visual rendering \level{(None)}.
\item Cropped virtual content to enable hand-cube occlusion \level{(Occlusion, Occl)}.
\item Rings on the fingertips \level{(Tips)}.
\item Thin outline of the hand \level{(Contour, Cont)}.
\item Fingers' joints and phalanges \level{(Skeleton, Skel)}.
\item Semi-transparent \ThreeD hand model \level{(Mesh)}.
]
\subfig[0.22]{method/hands-none}
\subfig[0.22]{method/hands-occlusion}
\subfig[0.22]{method/hands-tips}
\par
\subfig[0.22]{method/hands-contour}
\subfig[0.22]{method/hands-skeleton}
\subfig[0.22]{method/hands-mesh}
\end{subfigs}

View File

@@ -0,0 +1,167 @@
\section{Visual Hand Renderings}
\label{hands}
We compared a set of the most popular visual hand renderings, as found in the literature \secref[related_work]{ar_visual_hands}.
Since we address hand-centered manipulation tasks, we only considered renderings including the fingertips (\secref[related_work]{grasp_types}).
Moreover, as to keep the focus on the hand rendering itself, we used neutral semi-transparent grey meshes, consistent with the choices made in \cite{yoon2020evaluating,vanveldhuizen2021effect}.
All considered hand renderings are drawn following the tracked pose of the user's real hand.
However, while the real hand can of course penetrate virtual objects, the visual hand is always constrained by the \VE (\secref[related_work]{ar_virtual_hands}).
They are shown in \figref{hands} and described below, with an abbreviation in parentheses when needed.
\paragraph{None}
As a reference, we considered no visual hand rendering (\figref{method/hands-none}), as is common in \AR \cite{hettiarachchi2016annexing,blaga2017usability,xiao2018mrtouch,teng2021touch}.
Users have no information about hand tracking and no feedback about contact with the virtual objects, other than their movement when touched.
As virtual content is rendered on top of the \RE, the hand of the user can be hidden by the virtual objects when manipulating them (\secref[related_work]{ar_displays}).
\paragraph{Occlusion (Occl)}
To avoid the abovementioned undesired occlusions due to the virtual content being rendered on top of the \RE, we can carefully crop the former whenever it hides real content that should be visible \cite{macedo2023occlusion}, \eg the thumb of the user in \figref{method/hands-occlusion}.
This approach is frequent in works using \VST-\AR headsets \cite{knorlein2009influence,ha2014wearhand,piumsomboon2014graspshell,suzuki2014grasping,al-kalbani2016analysis}.
\paragraph{Tips}
This rendering shows small visual rings around the fingertips of the user (\figref{method/hands-tips}), highlighting the most important parts of the hand and contact with virtual objects during fine manipulation (\secref[related_work]{grasp_types}).
Unlike work using small spheres \cite{maisto2017evaluation,meli2014wearable,grubert2018effects,normand2018enlarging,schwind2018touch}, this ring rendering also provides information about the orientation of the fingertips.
\paragraph{Contour (Cont)}
This rendering is a \qty{1}{\mm} thick outline contouring the user's hands, providing information about the whole hand while leaving its inside visible.
Unlike the other renderings, it is not occluded by the virtual objects, as shown in \figref{method/hands-contour}.
This rendering is not as usual as the previous others in the literature \cite{kang2020comparative}.
\paragraph{Skeleton (Skel)}
This rendering schematically renders the joints and phalanges of the fingers with small spheres and cylinders, respectively, leaving the outside of the hand visible (\figref{method/hands-skeleton}).
It can be seen as an extension of the Tips rendering to include the complete fingers articulations.
It is widely used in \VR \cite{argelaguet2016role,schwind2018touch,chessa2019grasping} and \AR \cite{blaga2017usability,yoon2020evaluating}, as it is considered simple yet rich and comprehensive.
\paragraph{Mesh}
This rendering is a \ThreeD semi-transparent ($a=0.2$) hand model (\figref{method/hands-mesh}), which is common in \VR \cite{prachyabrued2014visual,argelaguet2016role,schwind2018touch,chessa2019grasping,yoon2020evaluating,vanveldhuizen2021effect}.
It can be seen as a filled version of the Contour hand rendering, thus partially covering the view of the real hand.
\section{User Study}
\label{method}
We aim to investigate whether the chosen visual hand rendering affects the performance and user experience of manipulating virtual objects with free hands in \AR.
\subsection{Manipulation Tasks and Virtual Scene}
\label{tasks}
Following the guidelines of \textcite{bergstrom2021how} for designing object manipulation tasks, we considered two variations of a \ThreeD pick-and-place task, commonly found in interaction and manipulation studies \cite{prachyabrued2014visual,blaga2017usability,maisto2017evaluation,meli2018combining,vanveldhuizen2021effect}.
\subsubsection{Push Task}
\label{push-task}
The first manipulation task consists in pushing a virtual object along a real flat surface towards a target placed on the same plane (\figref{method/task-push}).
The virtual object to manipulate is a small \qty{50}{\mm} blue and opaque cube, while the target is a (slightly) bigger \qty{70}{\mm} blue and semi-transparent volume.
At every repetition of the task, the cube to manipulate always spawns at the same place, on top of a real table in front of the user.
On the other hand, the target volume can spawn in eight different locations on the same table, located on a \qty{20}{\cm} radius circle centred on the cube, at \qty{45}{\degree} from each other (again \figref{method/task-push}).
Users are asked to push the cube towards the target volume using their fingertips in any way they prefer.
In this task, the cube cannot be lifted.
The task is considered completed when the cube is \emph{fully} inside the target volume.
\subsubsection{Grasp Task}
\label{grasp-task}
The second manipulation task consists in grasping, lifting, and placing a virtual object in a target placed on a different (higher) plane (\figref{method/task-grasp}).
The cube to manipulate and target volume are the same as in the previous task.
However, this time, the target volume can spawn in eight different locations on a plane \qty{10}{\cm} \emph{above} the table, still located on a \qty{20}{\cm} radius circle at \qty{45}{\degree} from each other.
Users are asked to grasp, lift, and move the cube towards the target volume using their fingertips in any way they prefer.
As before, the task is considered completed when the cube is \emph{fully} inside the volume.
\begin{subfigs}{tasks}{The two manipulation tasks of the user study.}[
The cube to manipulate is in the middle of the table (\qty{5}{cm} edge and opaque) and the eight possible targets to reach are arround (\qty{7}{cm} edge volume and semi-transparent).
Only one target at a time was shown during the experiments.
][
\item Push task: pushing the virtual cube along a table towards a target placed on the same surface.
\item Grasp task: grasping and lifting the virtual cube towards a target placed on a \qty{20}{\cm} higher plane.
]
\subfig[0.45]{method/task-push}
\subfig[0.45]{method/task-grasp}
\end{subfigs}
\subsection{Experimental Design}
\label{design}
We analyzed the two tasks separately.
For each of them, we considered two independent, within-subject, variables:
\begin{itemize}
\item \factor{Hand}, consisting of the six possible visual hand renderings discussed in \secref{hands}: \level{None}, \level{Occlusion} (Occl), \level{Tips}, \level{Contour} (Cont), \level{Skeleton} (Skel), and \level{Mesh}.
\item \factor{Target}, consisting of the eight possible locations of the target volume, named from the participant's point of view and as shown in \figref{tasks}: right (\level{R}), right-back (\level{RB}), back (\level{B}), left-back (\level{LB}), left (\level{L}), left-front (\level{LF}), front (\level{F}) and right-front (\level{RF}).
\end{itemize}
Each condition was repeated three times.
To control learning effects, we counter-balanced the orders of the two manipulation tasks and visual hand renderings following a 6 \x 6 Latin square, leading to six blocks where the position of the target volume was in turn randomized.
This design led to a total of 2 manipulation tasks \x 6 visual hand renderings \x 8 targets \x 3 repetitions $=$ 288 trials per participant.
\subsection{Apparatus}
\label{apparatus}
We used the \OST-\AR headset HoloLens~2, as described in \secref[vhar_system]{virtual_real_alignment}.
%It is capable of rendering virtual content within an horizontal field of view of \qty{43}{\degree} and a vertical one of \qty{29}{\degree}. It is also able to track the environment as well as the user's fingers.
It is also able to track the user's fingers.
We measured the latency of the hand tracking at \qty{15}{\ms}, independent of the hand movement speed.
The implementation of our experiment was done using Unity 2022.1, PhysX 4.1, and the Mixed Reality Toolkit (MRTK) 2.8.
The compiled application ran directly on the HoloLens~2 at \qty{60}{FPS}.
The default \ThreeD hand model from MRTK was used for all visual hand renderings.
By changing the material properties of this hand model, we were able to achieve the six renderings shown in \figref{hands}.
A calibration was performed for every participant, to best adapt the size of the visual hand rendering to their real hand.
A set of empirical tests enabled us to choose the best rendering characteristics in terms of transparency and brightness for the virtual objects and hand renderings, which were applied throughout the experiment.
The hand tracking information provided by MRTK was used to construct a virtual articulated physics-enabled hand (\secref[related_work]{ar_virtual_hands}) using PhysX.
It featured 25 DoFs, including the fingers proximal, middle, and distal phalanges.
To allow effective (and stable) physical interactions between the hand and the virtual cube to manipulate, we implemented an approach similar to that of \textcite{borst2006spring}, where a series of virtual springs with high stiffness are used to couple the physics-enabled hand with the tracked hand.
As before, a set of empirical tests have been used to select the most effective physical characteristics in terms of mass, elastic constant, friction, damping, colliders size, and shape for the (tracked) virtual hand interaction model.
The room where the experiment was held had no windows, with one light source of \qty{800}{\lumen} placed \qty{70}{\cm} above the table.
This setup enabled a good and consistent tracking of the user's fingers.
\subsection{Procedure}
\label{procedure}
First, participants were given a consent form that briefed them about the tasks and the procedure of the experiment.
Then, participants were asked to comfortably sit in front of a table and wear the HoloLens~2 headset as shown in~\figref{tasks}, perform the calibration of the visual hand size as described in~\secref{apparatus}, and complete a \qty{2}{min} training to familiarize with the \AR rendering and the two considered tasks.
During this training, we did not use any of the six hand renderings we want to test, but rather a fully-opaque white hand rendering that completely occluded the real hand of the user.
Participants were asked to carry out the two tasks as naturally and as fast as possible.
Similarly to \cite{prachyabrued2014visual,maisto2017evaluation,blaga2017usability,vanveldhuizen2021effect}, we only allowed the use of the dominant hand.
The experiment took around 1 hour and 20 minutes to complete.
\subsection{Participants}
\label{participants}
Twenty-four subjects participated in the study (eight aged between 18 and 24, fourteen aged between 25 and 34, and two aged between 35 and 44; 22~males, 1~female, 1~preferred not to say).
None of the participants reported any deficiencies in their visual perception abilities.
Two subjects were left-handed, while the twenty-two other were right-handed; they all used their dominant hand during the trials.
Ten subjects had significant experience with \VR (\enquote{I use it every week}), while the fourteen other reported little to no experience with \VR.
Two subjects had significant experience with \AR (\enquote{I use it every week}), while the twenty-two other reported little to no experience with \AR.
Participants signed an informed consent, including the declaration of having no conflict of interest.
\subsection{Collected Data}
\label{metrics}
Inspired by \textcite[p.674]{laviolajr20173d}, we collected the following metrics during the experiment:
\begin{itemize}
\item \response{Completion Time}, defined as the time elapsed between the first contact with the virtual cube and its correct placement inside the target volume; as subjects were asked to complete the tasks as fast as possible, lower completion times mean better performance.
\item \response{Contacts}, defined as the number of separate times the user's hand makes contact with the virtual cube; in both tasks, a lower number of contacts means a smoother continuous interaction with the object.
\item \response{Time per Contact}, defined as the total time any part of the user's hand contacted the cube divided by the number of contacts; higher values mean that the user interacted with the object for longer non-interrupted periods of time.
\item \response{Grip Aperture} (solely for the grasp-and-place task), defined as the average distance between the thumb's fingertip and the other fingertips during the grasping of the cube; lower values indicate a greater finger interpenetration with the cube, resulting in a greater discrepancy between the real hand and the visual hand rendering constrained to the cube surfaces and showing how confident users are in their grasp \cite{prachyabrued2014visual, al-kalbani2016analysis, blaga2017usability, chessa2019grasping}.
\end{itemize}
Taken together, these measures provide an overview of the performance and usability of each of the visual hand renderings tested, as we hypothesized that they should influence the behavior and effectiveness of the participants.
At the end of each task, participants were asked to rank the visual hand renderings according to their preference with respect to the considered task.
Participants also rated each visual hand rendering individually on six questions using a 7-item Likert scale (1=Not at all, 7=Extremely):
\begin{itemize}
\item \response{Difficulty}: How difficult were the tasks?
\item \response{Fatigue}: How fatiguing (mentally and physically) were the tasks?
\item \response{Precision}: How precise were you in performing the tasks?
\item \response{Performance}: How successful were you in performing the tasks? %
\item \response{Efficiency}: How fast/efficient do you think you were in performing the tasks?
\item \response{Rating}: How much do you like each visual hand?
\end{itemize}
Finally, participants were encouraged to comment out loud on the conditions throughout the experiment, as well as in an open-ended question at its end, to gather additional qualitative information.

View File

@@ -0,0 +1,10 @@
\section{Results}
\label{results}
Results of each trial metrics were analyzed with an \ANOVA on a \LMM model, with the order of the two manipulation tasks and the six visual hand renderings (\factor{Order}), the visual hand renderings (\factor{Hand}), the target volume position (\factor{Target}), and their interactions as fixed effects and the \factor{Participant} as random intercept.
For every \LMM, residuals were tested with a Q-Q plot to confirm normality.
On statistically significant effects, estimated marginal means of the \LMM were compared pairwise using Tukey's \HSD test.
Only significant results were reported.
Because \response{Completion Time}, \response{Contacts}, and \response{Time per Contact} measure results were Gamma distributed, they were first transformed with a log to approximate a normal distribution.
Their analysis results are reported anti-logged, corresponding to geometric means of the measures.

View File

@@ -0,0 +1,52 @@
\subsection{Push Task}
\label{push}
\paragraph{Completion Time}
On the time to complete a trial, there were two statistically significant effects: %
\factor{Hand} (\anova{5}{2868}{24.8}, \pinf{0.001}, see \figref{results/Push-ContactsCount-Hand-Overall-Means}) %
and \factor{Target} (\anova{7}{2868}{5.9}, \pinf{0.001}).
\level{Skeleton} was the fastest, more than \level{None} (\percent{+18}, \p{0.005}), \level{Occlusion} (\percent{+26}, \pinf{0.001}), \level{Tips} (\percent{+22}, \pinf{0.001}), and \level{Contour} (\percent{+20}, \p{0.001}).
Three groups of targets volumes were identified:
(1) sides \level{R}, \level{L}, and \level{LF} targets were the fastest;
(2) back and front \level{RB}, \level{F}, and \level{RF} were slower (\p{0.003});
and (3) back \level{B} and \level{LB} targets were the slowest (\p{0.04}).
\paragraph{Contacts}
On the number of contacts, there were two statistically significant effects: %
\factor{Hand} (\anova{5}{2868}{6.7}, \pinf{0.001}, see \figref{results/Push-ContactsCount-Hand-Overall-Means}) %
and \factor{Target} (\anova{7}{2868}{27.8}, \pinf{0.001}).
Less contacts were made with \level{Skeleton} than with \level{None} (\percent{-23}, \pinf{0.001}), \level{Occlusion} (\percent{-26}, \pinf{0.001}), \level{Tips} (\percent{-18}, \p{0.004}), and \level{Contour} (\percent{-15}, \p{0.02});
and less with \level{Mesh} than with \level{Occlusion} (\percent{-14}, \p{0.04}).
This indicates how effective a visual hand rendering is: a lower result indicates a smoother ability to push and rotate properly the cube into the target, as one would probably do with a real cube.
Targets on the left (\level{L}, \level{LF}) and the right (\level{R}) were easier to reach than the back ones (\level{B}, \level{LB}, \pinf{0.001}).
\paragraph{Time per Contact}
On the mean time spent on each contact, there were two statistically significant effects: %
\factor{Hand} (\anova{5}{2868}{8.4}, \pinf{0.001}, see \figref{results/Push-MeanContactTime-Hand-Overall-Means}) %
and \factor{Target} (\anova{7}{2868}{19.4}, \pinf{0.001}).
It was shorter with \level{None} than with \level{Skeleton} (\percent{-10}, \pinf{0.001}) and \level{Mesh} (\percent{-8}, \p{0.03});
and shorter with \level{Occlusion} than with \level{Tips} (\percent{-10}, \p{0.002}), \level{Contour} (\percent{-10}, \p{0.001}), \level{Skeleton} (\percent{-14}, \p{0.001}), and \level{Mesh} (\percent{-12}, \p{0.03}).
This result suggests that users pushed the virtual cube with more confidence with a visible visual hand rendering.
On the contrary, the lack of visual hand constrained the participants to give more attention to the cube's reactions.
Targets on the left (\level{L}, \level{LF}) and the right (\level{R}) sides had higher \response{Timer per Contact} than all the other targets (\p{0.005}).
\begin{subfigs}{push_results}{Results of the push task performance metrics for each visual hand rendering.}[
Geometric means with bootstrap \percent{95} \CI
and Tukey's \HSD pairwise comparisons: *** is \pinf{0.001}, ** is \pinf{0.01}, and * is \pinf{0.05}.
][
\item Time to complete a trial.
\item Number of contacts with the cube.
\item Time spent on each contact.
]
\subfig[0.32]{results/Push-CompletionTime-Hand-Overall-Means}
\subfig[0.32]{results/Push-ContactsCount-Hand-Overall-Means}
\subfig[0.32]{results/Push-MeanContactTime-Hand-Overall-Means}
\end{subfigs}

View File

@@ -0,0 +1,67 @@
\subsection{Grasp Task}
\label{grasp}
\paragraph{Completion Time}
On the time to complete a trial, there was one statistically significant effect %
of \factor{Target} (\anova{7}{2868}{37.2}, \pinf{0.001}) %
but not of \factor{Hand} (\anova{5}{2868}{1.8}, \p{0.1}, see \figref{results/Grasp-CompletionTime-Hand-Overall-Means}).
Targets on the back and the left (\level{B}, \level{LB}, and \level{L}) were slower than targets on the front (\level{LF}, \level{F}, and \level{RF}, \p{0.003}) {except for} \level{RB} (back-right) which was also fast.
\paragraph{Contacts}
On the number of contacts, there were two statistically significant effects: %
\factor{Hand} (\anova{5}{2868}{5.2}, \pinf{0.001}, see \figref{results/Grasp-ContactsCount-Hand-Overall-Means}) %
and \factor{Target} (\anova{7}{2868}{21.2}, \pinf{0.001}).
Less contacts were made with \level{Tips} than with \level{None} (\percent{-13}, \p{0.02}) and \level{Occlusion} (\percent{-15}, \p{0.004});
and less with \level{Mesh} than with \level{None} (\percent{-15}, \p{0.006}) and \level{Occlusion} (\percent{-17}, \p{0.001}).
This result suggests that having no visible visual hand increased the number of failed grasps or cube drops.
But, surprisingly, only \level{Tips} and \level{Mesh} were statistically significantly better, not \level{Contour} nor \level{Skeleton}.
Targets on the back and left were more difficult (\level{B}, \level{LB}, and \level{L}) than targets on the front (\level{LF}, \level{F}, and \level{RF}, \pinf{0.001}).
\paragraph{Time per Contact}
On the mean time spent on each contact, there were two statistically significant effects: %
\factor{Hand} (\anova{5}{2868}{9.6}, \pinf{0.001}, see \figref{results/Grasp-MeanContactTime-Hand-Overall-Means}) %
and \factor{Target} (\anova{7}{2868}{5.6}, \pinf{0.001}).
It was shorter with \level{None} than with \level{Tips} (\percent{-15}, \pinf{0.001}), \level{Skeleton} (\percent{-11}, \p{0.001}) and \level{Mesh} (\percent{-11}, \p{0.001});
shorter with \level{Occlusion} than with \level{Tips} (\percent{-10}, \pinf{0.001}), \level{Skeleton} (\percent{-8}, \p{0.05}), and \level{Mesh} (\percent{-8}, \p{0.04});
shorter with \level{Contour} than with \level{Tips} (\percent{-8}, \pinf{0.001}).
As for the \level{Push} task, the lack of visual hand increased the number of failed grasps or cube drops.
The \level{Tips} rendering seemed to provide one of the best feedback for the grasping, maybe thanks to the fact that it provides information about both position and rotation of the tracked fingertips.
This time was the shortest on the front \level{F} than on the other target volumes (\pinf{0.001}).
\paragraph{Grip Aperture}
On the average distance between the thumb's fingertip and the other fingertips during grasping, there were two
statistically significant effects: %
\factor{Hand} (\anova{5}{2868}{35.8}, \pinf{0.001}, see \figref{results/Grasp-GripAperture-Hand-Overall-Means}) %
and \factor{Target} (\anova{7}{2868}{3.7}, \pinf{0.001}).
It was shorter with \level{None} than with \level{Occlusion} (\pinf{0.001}), \level{Tips} (\pinf{0.001}), \level{Contour} (\pinf{0.001}), \level{Skeleton} (\pinf{0.001}) and \level{Mesh} (\pinf{0.001});
shorter with \level{Tips} than with \level{Occlusion} (\p{0.008}), \level{Contour} (\p{0.006}) and \level{Mesh} (\pinf{0.001});
and shorter with \level{Skeleton} than with \level{Mesh} (\pinf{0.001}).
This result is an evidence of the lack of confidence of participants with no visual hand rendering: they grasped the cube more to secure it.
The \level{Mesh} rendering seemed to have provided the most confidence to participants, maybe because it was the closest to the real hand.
The \response{Grip Aperture} was longer on the right-front (\level{RF}) target volume, indicating a higher confidence, than on back and side targets (\level{R}, \level{RB}, \level{B}, \level{L}, \p{0.03}).
\begin{subfigs}{grasp_results}{Results of the grasp task performance metrics for each visual hand rendering.}[
Geometric means with bootstrap \percent{95} \CI
and Tukey's \HSD pairwise comparisons: *** is \pinf{0.001}, ** is \pinf{0.01}, and * is \pinf{0.05}.
][
\item Time to complete a trial.
\item Number of contacts with the cube.
\item Time spent on each contact.
\item Distance between thumb and the other fingertips when grasping.
]
\subfig[0.4]{results/Grasp-CompletionTime-Hand-Overall-Means}
\subfig[0.4]{results/Grasp-ContactsCount-Hand-Overall-Means}
\par
\subfig[0.4]{results/Grasp-MeanContactTime-Hand-Overall-Means}
\subfig[0.4]{results/Grasp-GripAperture-Hand-Overall-Means}
\end{subfigs}

View File

@@ -0,0 +1,26 @@
\subsection{Ranking}
\label{ranks}
\figref{results_ranks} shows the ranking of each visual \factor{Hand} rendering for the \level{Push} and \level{Grasp} tasks.
Friedman tests indicated that both ranking had statistically significant differences (\pinf{0.001}).
Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment were then used on both ranking results (\secref{metrics}):
\begin{itemize}
\item \response{Push task ranking}: \level{Occlusion} was ranked lower than \level{Contour} (\p{0.005}), \level{Skeleton} (\p{0.02}), and \level{Mesh} (\p{0.03});
\level{Tips} was ranked lower than \level{Skeleton} (\p{0.02}).
This good ranking of the \level{Skeleton} rendering for the Push task is consistent with the Push trial results.
\item \response{Grasp task ranking}: \level{Occlusion} was ranked lower than \level{Contour} (\p{0.001}), \level{Skeleton} (\p{0.001}), and \level{Mesh} (\p{0.007});
No Hand was ranked lower than \level{Skeleton} (\p{0.04}).
A complete visual hand rendering seemed to be preferred over no visual hand rendering when grasping.
\end{itemize}
\begin{subfigs}{results_ranks}{Boxplots of the ranking for each visual hand rendering.}[
Lower is better.
Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment: ** is \pinf{0.01} and * is \pinf{0.05}.
][
\item Push task ranking.
\item Grasp task ranking.
]
\subfig[0.4]{results/Ranks-Push}
\subfig[0.4]{results/Ranks-Grasp}
\end{subfigs}

View File

@@ -0,0 +1,35 @@
\subsection{Questionnaire}
\label{questions}
\figref{results_questions} presents the questionnaire results for each visual hand rendering.
Friedman tests indicated that all questions had statistically significant differences (\pinf{0.001}).
Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment were then used each question results (\secref{metrics}):
\begin{itemize}
\item \response{Difficulty}: \level{Occlusion} was considered more difficult than \level{Contour} (\p{0.02}), \level{Skeleton} (\p{0.01}), and \level{Mesh} (\p{0.03}).
\item \response{Fatigue}: \level{None} was found more fatiguing than \level{Mesh} (\p{0.04}); And \level{Occlusion} more than \level{Skeleton} (\p{0.02}) and \level{Mesh} (\p{0.02}).
\item \response{Precision}: \level{None} was considered less precise than \level{Skeleton} (\p{0.02}) and \level{Mesh} (\p{0.02}); And \level{Occlusion} more than \level{Contour} (\p{0.02}), \level{Skeleton} (\p{0.006}), and \level{Mesh} (\p{0.02}).
\item \response{Performance}: \level{Occlusion} was lower than \level{Contour} (\p{0.02}), \level{Skeleton} (\p{0.006}), and \level{Mesh} (\p{0.03}).
\item \response{Efficiency}: \level{Occlusion} was found less efficient than \level{Contour} (\p{0.01}), \level{Skeleton} (\p{0.02}), and \level{Mesh} (\p{0.02}).
\item \response{Rating}: \level{Occlusion} was rated lower than \level{Contour} (\p{0.02}) and \level{Skeleton} (\p{0.03}).
\end{itemize}
In summary, \level{Occlusion} was worse than \level{Skeleton} for all questions, and worse than \level{Contour} and \level{Mesh} on 5 over 6 questions.
Results of \response{Difficulty}, \response{Performance}, and \response{Precision} questions are consistent in that way.
Moreover, having no visible visual \factor{Hand} rendering was felt by users fatiguing and less precise than having one.
Surprisingly, no clear consensus was found on \response{Rating}.
Each visual hand rendering, except for \level{Occlusion}, had simultaneously received the minimum and maximum possible notes.
\begin{subfigs}{results_questions}{Boxplots of the questionnaire results for each visual hand rendering.}[
Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment: ** is \pinf{0.01} and * is \pinf{0.05}.
Lower is better for \textbf{(a)} difficulty and \textbf{(b)} fatigue.
Higher is better for \textbf{(d)} performance, \textbf{(d)} precision, \textbf{(e)} efficiency, and \textbf{(f)} rating.
]
\subfig[0.4]{results/Question-Difficulty}
\subfig[0.4]{results/Question-Fatigue}
\par
\subfig[0.4]{results/Question-Precision}
\subfig[0.4]{results/Question-Performance}
\par
\subfig[0.4]{results/Question-Efficiency}
\subfig[0.4]{results/Question-Rating}
\end{subfigs}

View File

@@ -0,0 +1,32 @@
\section{Discussion}
\label{discussion}
We evaluated six visual hand renderings, as described in \secref{hands}, displayed on top of the real hand, in two virtual object manipulation tasks in \AR.
During the \level{Push} task, the \level{Skeleton} hand rendering was the fastest (\figref{results/Push-CompletionTime-Hand-Overall-Means}), as participants employed fewer and longer contacts to adjust the cube inside the target volume (\figref{results/Push-ContactsCount-Hand-Overall-Means} and \figref{results/Push-MeanContactTime-Hand-Overall-Means}).
Participants consistently used few and continuous contacts for all visual hand renderings (Fig. 3b), with only less than ten trials, carried out by two participants, quickly completed with multiple discrete touches.
However, during the \level{Grasp} task, despite no difference in \response{Completion Time}, providing no visible hand rendering (\level{None} and \level{Occlusion} renderings) led to more failed grasps or cube drops (\figref{results/Grasp-CompletionTime-Hand-Overall-Means} and \figref{results/Grasp-MeanContactTime-Hand-Overall-Means}).
Indeed, participants found the \level{None} and \level{Occlusion} renderings less effective (\figref{results/Ranks-Grasp}) and less precise (\figref{results_questions}).
To understand whether the participants' previous experience might have played a role, we also carried out an additional statistical analysis considering \VR experience as an additional between-subjects factor, \ie \VR novices vs. \VR experts (\enquote{I use it every week}, see \secref{participants}).
We found no statistically significant differences when comparing the considered metrics between \VR novices and experts.
All visual hand renderings showed \response{Grip Apertures} close to the size of the virtual cube, except for the \level{None} rendering (\figref{results/Grasp-GripAperture-Hand-Overall-Means}), with which participants applied stronger grasps, \ie less distance between the fingertips.
Having no visual hand rendering, but only the reaction of the cube to the interaction as feedback, made participants less confident in their grip.
This result contrasts with the wrongly estimated grip apertures observed by \textcite{al-kalbani2016analysis} in an exocentric VST-AR setup.
Also, while some participants found the absence of visual hand rendering more natural, many of them commented on the importance of having feedback on the tracking of their hands, as observed by \textcite{xiao2018mrtouch} in a similar immersive OST-AR setup.
Yet, participants' opinions of the visual hand renderings were mixed on many questions, except for the \level{Occlusion} one, which was perceived less effective than more \enquote{complete} visual hands such as \level{Contour}, \level{Skeleton}, and \level{Mesh} hands (\figref{results_questions}).
However, due to the latency of the hand tracking and the visual hand reacting to the cube, almost all participants thought that the \level{Occlusion} rendering to be a \enquote{shadow} of the real hand on the cube.
The \level{Tips} rendering, which showed the contacts made on the virtual cube, was controversial as it received the minimum and the maximum score on every question.
Many participants reported difficulties in seeing the orientation of the visual fingers,
while others found that it gave them a better sense of the contact points and improved their concentration on the task.
This result is consistent with \textcite{saito2021contact}, who found that displaying the points of contacts was beneficial for grasping a virtual object over an opaque visual hand overlay.
To summarize, when employing a visual hand rendering overlaying the real hand, participants were more performant and confident in manipulating virtual objects with bare hands in \AR.
These results contrast with similar manipulation studies, but in non-immersive, on-screen \AR, where the presence of a visual hand rendering was found by participants to improve the usability of the interaction, but not their performance \cite{blaga2017usability,maisto2017evaluation,meli2018combining}.
Our results show the most effective visual hand rendering to be the \level{Skeleton} one.
Participants appreciated that it provided a detailed and precise view of the tracking of the real hand, without hiding or masking it.
Although the \level{Contour} and \level{Mesh} hand renderings were also highly rated, some participants felt that they were too visible and masked the real hand.
This result is in line with the results of virtual object manipulation in \VR of \textcite{prachyabrued2014visual}, who found that the most effective visual hand rendering was a double representation of both the real tracked hand and a visual hand physically constrained by the \VE.
This type of \level{Skeleton} rendering was also the one that provided the best sense of agency (control) in \VR \cite{argelaguet2016role,schwind2018touch}.

View File

@@ -0,0 +1,23 @@
\section{Conclusion}
\label{conclusion}
In this chapter, we addressed the challenge of touching, grasping and manipulating virtual objects directly with the hand in immersive \OST-\AR by providing and evaluating visual renderings as augmentation of the real hand.
Superimposed on the user's hand, these visual renderings provide feedback from the virtual hand, which tracks the real hand, and simulates the interaction with virtual objects as a proxy.
We first selected and compared the six most popular visual hand renderings used to interact with virtual objects in \AR.
Then, in a user study with 24 participants and an immersive \OST-\AR headset, we evaluated the effect of these six visual hand renderings on the user performance and experience in two representative manipulation tasks.
Our results showed that a visual hand augmentation improved the performance, perceived effectiveness and confidence of participants compared to no augmentation.
A skeleton rendering, which provided a detailed view of the tracked joints and phalanges while not hiding the real hand, was the most performant and effective.
The contour and mesh renderings were found to mask the real hand, while the tips rendering was controversial.
The occlusion rendering had too much tracking latency to be effective.
This is consistent with similar manipulation studies in \VR and in non-immersive \VST-\AR setups.
This study suggests that a \ThreeD visual hand augmentation is important in \AR when interacting with a virtual hand technique, particularly when it involves precise finger movements in relation to virtual content, \eg \ThreeD windows, buttons and sliders, or more complex tasks, such as stacking or assembly.
A minimal but detailed rendering of the virtual hand that does not hide the real hand, such as the skeleton rendering we evaluated, seems to be the best compromise between the richness and effectiveness of the feedback.
%Still, users should be able to choose and adapt the visual hand rendering to their preferences and needs.
\noindentskip This work was published in Transactions on Haptics:
Erwan Normand, Claudio Pacchierotti, Eric Marchand, and Maud Marchal.
\enquote{Visuo-Haptic Rendering of the Hand during 3D Manipulation in Augmented Reality}.
In: \textit{IEEE Transactions on Haptics}. 27.4 (2024), pp. 2481--2487.

Binary file not shown.

After

Width:  |  Height:  |  Size: 455 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 400 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 426 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 406 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 427 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 423 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.2 MiB

View File

@@ -0,0 +1,14 @@
\chapter{Visual Augmentation of the Hand for Manipulating Virtual Objects in AR}
\mainlabel{visual_hand}
\chaptertoc
\input{1-introduction}
\input{2-method}
\input{3-0-results}
\input{3-1-push}
\input{3-2-grasp}
\input{3-3-ranks}
\input{3-4-questions}
\input{4-discussion}
\input{5-conclusion}

View File

@@ -0,0 +1,35 @@
\section{Introduction}
\label{intro}
Providing haptic feedback during free-hand manipulation in \AR is not trivial, as wearing haptic devices on the hand might affect the tracking capabilities of the system \cite{pacchierotti2016hring}.
Moreover, it is important to leave the user capable of interacting with both virtual and real objects, avoiding the use of haptic interfaces that cover the fingertips or palm.
For this reason, it is often considered beneficial to move the point of application of the haptic feedback elsewhere on the hand (\secref[related_work]{vhar_haptics}).
However, the impact of the positioning of the haptic feedback on the hand during direct hand manipulation in \AR has not been systematically studied.
Conjointly, a few studies have explored and compared the effects of visual and haptic feedback in tasks involving the manipulation of virtual objects with the hand.
\textcite{sarac2022perceived} and \textcite{palmer2022haptic} studied the effects of providing haptic feedback about contacts at the fingertips using haptic devices worn at the wrist, testing different mappings.
Their results proved that moving the haptic feedback away from the point(s) of contact is possible and effective, and that its impact is more significant when the visual feedback is limited.
A final question is whether one or the other of these (haptic or visual) hand feedback should be preferred \cite{maisto2017evaluation,meli2018combining}, or whether a combined visuo-haptic feedback is beneficial for users.
However, these studies were conducted in non-immersive setups, with a screen displaying the \VE view.
In fact, both hand feedback can provide sufficient sensory feedback for efficient direct hand manipulation of virtual objects in \AR, or conversely, they can be shown to be complementary.
In this chapter, we aim to investigate the role of \textbf{visuo-haptic feedback of the hand when manipulating virtual object} in immersive \OST-\AR using wearable vibrotactile haptics.
We selected \textbf{four different delocalized positionings on the hand} that have been previously proposed in the literature for direct hand interaction in \AR using wearable haptic devices (\secref[related_work]{vhar_haptics}): on the nails, the proximal phalanges, the wrist, and the nails of the opposite hand.
We focused on vibrotactile feedback, as it is used in most of the wearable haptic devices and has the lowest encumbrance.
In a \textbf{user study}, using the \OST-\AR headset Microsoft HoloLens~2 and two \ERM vibrotactile motors, we evaluated the effect of the four positionings with \textbf{two contact vibration techniques} on the user performance and experience with the same two manipulation tasks as in \chapref{visual_hand}.
We additionally compared these vibrotactile renderings with the \textbf{skeleton-like visual hand augmentation} established in the \chapref{visual_hand} as a complementary visuo-haptic feedback of the hand interaction with the virtual objects.
\noindentskip The contributions of this chapter are:
\begin{itemize}
\item The evaluation in a user study with 20 participants of the effect of providing a vibrotactile feedback of the fingertip contacts with virtual objects, during direct manipulation with bare hand in \AR, at four different delocalized positionings of the haptic feedback on the hand and with two contact vibration techniques.
\item The comparison of these vibrotactile positionings and renderings techniques with the two most representative visual hand augmentations established in the \chapref{visual_hand}.
\end{itemize}
\noindentskip In the next sections, we first describe the four delocalized positionings and the two contact vibration techniques we considered, based on previous work. We then present the experimental setup and design of the user study. Finally, we report the results and discuss them in the context of the free hand interaction with virtual content in \AR.
\bigskip
\fig[0.6]{method/locations}{Setup of the vibrotactile positionings on the hand.}[
To ensure minimal encumbrance, we used the same two motors throughout the experiment, moving them to the considered positioning before each new experimental block (in this case, on the co-located \level{Proximal} phalange).
Thin self-gripping straps were placed on the five considered positionings during the entirety of the experiment.
]

View File

@@ -0,0 +1,141 @@
\section{Vibrotactile Renderings of the Hand-Object Contacts}
\label{vibration}
The vibrotactile hand rendering provided information about the contacts between the virtual object and the thumb and index fingers of the user, as they are the two fingers most used for grasping (\secref[related_work]{grasp_types}).
We evaluated both the delocalized positioning and the contact vibration technique of the vibrotactile hand rendering.
\subsection{Vibrotactile Positionings}
\label{positioning}
We considered five different positionings for providing the vibrotactile rendering as feedback of the contacts between the virtual hand and the virtual objects, as shown in \figref{method/locations}.
They are representative of the most common locations used by wearable haptic devices in \AR to place their end-effector, as found in the literature (\secref[related_work]{vhar_haptics}), as well as other positionings that have been employed for manipulation tasks.
For each positioning, we used two vibrating actuators, for the thumb and index finger, respectively.
They are described as follows, with the corresponding abbreviation in parentheses:
\begin{itemize}
\item \level{Fingertips} (Tips): Vibrating actuators were placed right above the nails, similarly to \cite{ando2007fingernailmounted}. This is the positioning closest to the fingertips.
\item \level{Proximal} Phalanges (Prox): Vibrating actuators were placed on the dorsal side of the proximal phalanges, similarly to \cite{maisto2017evaluation,meli2018combining,chinello2020modular}.
\item \level{Wrist} (Wris): Vibrating actuators providing contacts rendering for the index and thumb were placed on ulnar and radial sides of the wrist, similarly to \cite{pezent2019tasbi,palmer2022haptic,sarac2022perceived}.
\item \level{Opposite} Fingertips (Oppo): Vibrating actuators were placed on the fingertips of contralateral hand, also above the nails, similarly to \cite{prattichizzo2012cutaneous,detinguy2018enhancing}.
\item \level{Nowhere} (Nowh): As a reference, we also considered the case where we provided no vibrotactile rendering, as in \chapref{visual_hand}.
\end{itemize}
\subsection{Contact Vibration Techniques}
\label{technique}
When a fingertip contacts the virtual cube, we activate the corresponding vibrating actuator.
We considered two representative contact vibration techniques, \ie two ways of rendering such contacts through vibrations:
\begin{itemize}
\item \level{Impact} (Impa): a \qty{200}{\ms}--long vibration burst is applied when the fingertip makes contact with the object.
The amplitude of the vibration is proportional to the speed of the fingertip at the moment of the contact.
This technique is inspired by the impact vibrations modelled by tapping on real surfaces, as described in \secref[related_work]{hardness_rendering}.
\item \level{Distance} (Dist): a continuous vibration is applied whenever the fingertip is in contact with the object.
The amplitude of the vibration is proportional to the interpenetration between the fingertip and the virtual cube surface.
\end{itemize}
The implementation of these two techniques have been tuned according to the results of a preliminary experiment.
Three participants were asked to carry out a series of push and grasp tasks similar to those used in the actual experiment.
Results showed that \percent{95} of the contacts between the fingertip and the virtual cube happened at speeds below \qty{1.5}{\m\per\s}.
We also measured the perceived minimum amplitude to be \percent{15} (\qty{0.6}{\g}) of the maximum amplitude of the motors we used.
For this reason, we designed the Impact vibration technique (Impa) so that contact speeds from \qtyrange{0}{1.5}{\m\per\s} are linearly mapped into \qtyrange{15}{100}{\%} amplitude commands for the motors.
Similarly, we designed the distance vibration technique (Dist) so that interpenetrations from \qtyrange{0}{2.5}{\cm} are linearly mapped into \qtyrange{15}{100}{\%} amplitude commands for the motors, recalling that the virtual cube has an edge of \qty{5}{\cm}.
\section{User Study}
\label{method}
This user study aims to evaluate whether a visuo-haptic rendering of the hand affects the user performance and experience of manipulation of virtual objects with bare hands in \OST-\AR.
The chosen visuo-haptic hand renderings are the combination of the two most representative visual hand renderings established in the \chapref{visual_hand}, \ie \level{Skeleton} and \level{No Hand}, described in \secref[visual_hand]{hands}, with the two contact vibration techniques provided at the four delocalized positions on the hand described in \secref{vibration}.
\subsection{Experimental Design}
\label{design}
We considered the same two \level{Push} and \level{Grasp} tasks as described in \secref[visual_hand]{tasks}, that we analyzed separately, considering four independent, within-subject variables:
\begin{itemize}
\item \factor{Positioning}: the five positionings for providing vibrotactile hand rendering of the virtual contacts, as described in \secref{positioning}.
\item \factor{Vibration Technique}: the two contact vibration techniques, as described in \secref{technique}.
\item \factor{Hand}: two visual hand renderings from the \chapref{visual_hand}, \level{Skeleton} (Skel) and \level{No Hand}, as described in \secref[visual_hand]{hands}; we considered \level{Skeleton} as it performed the best in terms of performance and perceived effectiveness and \level{No Hand} as reference.
\item \factor{Target}: we considered the target volumes (\figref{tasks}), from the participant's point of view, located at:
\begin{itemize}
\item left-bottom (\level{LB}) and left-right (\level{LF}) during the \level{Push} task; and
\item right-bottom (\level{RB}), left-bottom (\level{LB}), left-right (\level{LF}) and right-front (\level{RF}) during the \level{Grasp} task.
\end{itemize}. We considered these targets because they presented different difficulties.
\end{itemize}
\begin{subfigs}{tasks}{The two manipulation tasks of the user study.}[
Both pictures show the cube to manipulate in the middle (\qty{5}{\cm} and opaque) and the eight possible targets to reach (\qty{7}{\cm} cube and semi-transparent).
Only one target at a time was shown during the experiments.
][
\item Push task: pushing the virtual cube along a table towards a target placed on the same surface.
\item Grasp task: grasping and lifting the virtual cube towards a target placed on a \qty{20}{\cm} higher plane.
]
\subfig[0.45]{method/task-push-2}
\subfig[0.45]{method/task-grasp-2}
\end{subfigs}
To account for learning and fatigue effects, the order of the \factor{Positioning} conditions were counter-balanced using a balanced \numproduct{10 x 10} Latin square.
In these ten blocks, all possible \factor{Technique} \x \factor{Hand} \x \factor{Target} combination conditions were repeated three times in a random order.
As we did not find any relevant effect of the order in which the tasks were performed in the \chapref{visual_hand}, we fixed the order of the tasks: first, the \level{Push} task and then the \level{Grasp} task.
This design led to a total of 5 vibrotactile positionings \x 2 vibration contact techniques \x 2 visual hand rendering \x (2 targets on the Push task + 4 targets on the Grasp task) \x 3 repetitions $=$ 420 trials per participant.
\subsection{Apparatus and Procedure}
\label{apparatus}
Apparatus and experimental procedure were similar to the \chapref{visual_hand}, as described in \secref[visual_hand]{apparatus} and \secref[visual_hand]{procedure}, respectively.
We report here only the differences.
We employed the same vibrotactile device used by \cite{devigne2020power}.
It is composed of two encapsulated \ERM (\secref[related_work]{vibrotactile_actuators}) vibration motors (Pico-Vibe 304-116, Precision Microdrive, UK).
They are small and light (\qty{5}{\mm} \x \qty{20}{\mm}, \qty{1.2}{\g}) actuators capable of vibration frequencies from \qtyrange{120}{285}{\Hz} and
amplitudes from \qtyrange{0.2}{1.15}{\g}.
They have a latency of \qty{20}{\ms} that we partially compensated for at the software level with slightly larger colliders to trigger the vibrations close the moment the finger touched the cube.
These two outputs vary linearly together, based on the tension applied.
They were controlled by an Arduino Pro Mini (\qty{3.3}{\V}) and a custom board that delivered the tension independently to each motor.
A small \qty{400}{mAh} Li-ion battery allowed for 4 hours of constant vibration at maximum intensity.
A Bluetooth module (RN42XV module, Microchip Technology Inc., USA) mounted on the Arduino ensured wireless communication with the HoloLens~2.
To ensure minimal encumbrance, we used the same two motors throughout the experiment, moving them to the considered positioning before each new block.
Thin self-gripping straps were placed on the five positionings, with an elastic strap stitched on top to place the motor, as shown in \figref{method/locations}.
The straps were fixed during the entirety of the experiment to ensure similar hand tracking conditions.
We confirmed that this setup ensured a good transmission of the rendering and guaranteed a good hand tracking performance, that was measured to be constant (\qty{15}{\ms}) with and without motors, regardless their positioning.
The control board was fastened to the arm with an elastic strap.
Finally, participants wore headphones diffusing brown noise to mask the sound of the vibrotactile motors.
We improved the hand tracking performance of the system by placing on the table a black sheet that absorbs the infrared light as well as placing the participants in front of a wall to ensure a more constant exposure to the light.
We also made grasping easier by adding a grasping helper, similar to UltraLeap's Physics Hands.\footnoteurl{https://docs.ultraleap.com/unity-api/Preview/physics-hands.html}.
When a phalanx collider of the tracked hand contacts the virtual cube,
a spring with a low stiffness is created and attached between the cube and the collider.
The spring pulls gently the cube toward the phalanxes in contact with the object to help maintain a natural and stable grasp.
When the contact is lost, the spring is destroyed.
Preliminary tests confirmed this approach.
\subsection{Participants}
\label{participants}
Twenty subjects participated in the study (mean age = 26.8, \sd{4.1}; 19~males, 1~female).
One was left-handed, while the other nineteen were right-handed. They all used their dominant hand during the trials.
They all had a normal or corrected-to-normal vision.
Thirteen subjects participated also in the previous experiment.
Participants rated their expertise (\enquote{I use it more than once a year}) with \VR, \AR, and haptics in a pre-experiment questionnaire.
There were twelve experienced with \VR, eight experienced with \AR, and ten experienced with haptics.
VR and haptics expertise were highly correlated (\pearson{0.9}), as well as \AR and haptics expertise (\pearson{0.6}).
Other expertise correlations were low ($r<0.35$).
\subsection{Collected Data}
\label{metrics}
During the experiment, we collected the same data as in the \chapref{visual_hand}, see \secref[visual_hand]{metrics}.
At the end of the experiment, participants were asked if they recognized the different contact vibration techniques.
They then rated the ten combinations of \factor{Positioning} \x \factor{Vibration Technique} using a 7-item Likert scale (1=Not at all, 7=Extremely):
\begin{itemize}
\item \response{Vibration Rating}: How much do you like each vibrotactile rendering?
\item \response{Workload}: How demanding or frustrating was each vibrotactile rendering?
\item \response{Usefulness}: How useful was each vibrotactile rendering?
\item \response{Realism}: How realistic was each vibrotactile rendering?
\end{itemize}
Finally, they rated the ten combinations of \factor{Positioning} \x factor{Hand} on a 7-item Likert scale (1=Not at all, 7=Extremely):
\response{Positioning \x Hand Rating}: How much do you like each combination of vibrotactile location for each visual hand rendering?

View File

@@ -0,0 +1,6 @@
\section{Results}
\label{results}
Results were analyzed similarly as in the user study of the visual hand renderings (\secref[visual_hand]{results}).
The \LMM were fitted with the order of the five vibrotactile positionings (\factor{Order}), the vibrotactile positionings (\factor{Positioning}), the visual hand rendering (\factor{Hand}), the {contact vibration techniques} (\factor{Vibration Technique}), and the target volume position (\factor{Target}), and their interactions as fixed effects and Participant as random intercept.

View File

@@ -0,0 +1,47 @@
\subsection{Push Task}
\label{push}
\paragraph{Completion Time}
On the time to complete a trial, there were two statistically significant effects:
\factor{Positioning} (\anova{4}{1990}{3.8}, \p{0.004}, see \figref{results/Push-CompletionTime-Location-Overall-Means}) %
and \factor{Target} (\anova{1}{1990}{3.9}, \p{0.05}).
\level{Fingertips} was slower than \level{Proximal} (\percent{+11}, \p{0.01}) or \level{Opposite} (\percent{+12}, \p{0.03}).
There was no evidence of an advantage of \level{Proximal} or \level{Opposite} on \level{Nowhere}, nor a disadvantage of \level{Fingertips} on \level{Nowhere}.
Yet, there was a tendency of faster trials with \level{Proximal} and \level{Opposite}.
The \level{LB} target volume was also faster than the \level{LF} (\p{0.05}).
\paragraph{Contacts}
On the number of contacts, there was one statistically significant effect of
\factor{Positioning} (\anova{4}{1990}{2.4}, \p{0.05}, see \figref{results/Push-Contacts-Location-Overall-Means}).
More contacts were made with \level{Fingertips} than with \level{Opposite} (\percent{+12}, \p{0.03}).
This could indicate more difficulties to adjust the virtual cube inside the target volume.
\paragraph{Time per Contact}
On the mean time spent on each contact, there were two statistically significant effects of
\factor{Positioning} (\anova{4}{1990}{11.5}, \pinf{0.001}, see \figref{results/Push-TimePerContact-Location-Overall-Means}) %
and of \factor{Hand} (\anova{1}{1990}{16.1}, \pinf{0.001}, see \figref{results/Push-TimePerContact-Hand-Overall-Means})%
but not of the \factor{Positioning} \x \factor{Hand} interaction.
It was shorter with \level{Fingertips} than with \level{Wrist} (\percent{-15}, \pinf{0.001}), \level{Opposite} (\percent{-11}, \p{0.01}), or NoVi (\percent{-15}, \pinf{0.001});
and shorter with \level{Proximal} than with \level{Wrist} (\percent{-16}, \pinf{0.001}), \level{Opposite} (\percent{-12}, \p{0.005}), or \level{Nowhere} (\percent{-16}, \pinf{0.001}).
This showed different strategies to adjust the cube inside the target volume, with faster repeated pushes with the \level{Fingertips} and \level{Proximal} positionings.
It was also shorter with \level{None} than with \level{Skeleton} (\percent{-9}, \pinf{0.001}).
This indicates, as for the \chapref{visual_hand}, more confidence with a visual hand rendering.
\begin{subfigs}{push_results}{Results of the grasp task performance metrics.}[
Geometric means with bootstrap \percent{95} \CI for each vibrotactile positioning (a, b and c) or visual hand rendering (d)
and Tukey's \HSD pairwise comparisons: *** is \pinf{0.001}, ** is \pinf{0.01}, and * is \pinf{0.05}.
][
\item Time to complete a trial.
\item Number of contacts with the cube.
\item Mean time spent on each contact.
\item Mean time spent on each contact.
]
\subfig[0.4]{results/Push-CompletionTime-Location-Overall-Means}
\subfig[0.4]{results/Push-Contacts-Location-Overall-Means}
\par
\subfig[0.4]{results/Push-TimePerContact-Location-Overall-Means}
\subfig[0.4]{results/Push-TimePerContact-Hand-Overall-Means}
\end{subfigs}

View File

@@ -0,0 +1,56 @@
\subsection{Grasp Task}
\label{grasp}
\paragraph{Completion Time}
On the time to complete a trial, there were two statistically significant effects:
\factor{Positioning} (\anova{4}{3990}{13.6}, \pinf{0.001}, see \figref{results/Grasp-CompletionTime-Location-Overall-Means})
and \factor{Target} (\anova{3}{3990}{18.8}, \pinf{0.001}).
\level{\level{Opposite}} was faster than \level{Fingertips} (\percent{+19}, \pinf{0.001}), \level{Proximal} (\percent{+13}, \pinf{0.001}), \level{Wrist} (\percent{+14}, \pinf{0.001}), and \level{Nowhere} (\percent{+8}, \p{0.03}).
\level{Nowhere} was faster than \level{Fingertips} (\percent{+11}, \pinf{0.001}).
\level{RF} was faster than \level{RB} (\pinf{0.001}), \level{LB} (\pinf{0.001}), and \level{LF} (\pinf{0.001});
and \level{LF} was faster than \level{RB} (\p{0.03}).
\paragraph{Contacts}
On the number of contacts, there were two statistically significant effects:
\factor{Positioning} (\anova{4}{3990}{15.1}, \pinf{0.001}, see \figref{results/Grasp-Contacts-Location-Overall-Means}) %
and \factor{Target} (\anova{3}{3990}{7.6}, \pinf{0.001}).
Fewer contacts were made with \level{Opposite} than with \level{Fingertips} (\percent{-26}, \pinf{0.001}), \level{Proximal} (\percent{-17}, \pinf{0.001}), or \level{Wrist} (\percent{-12}, \p{0.002});
but more with \level{Fingertips} than with \level{Wrist} (\percent{+13}, \p{0.002}) or \level{Nowhere} (\percent{+17}, \pinf{0.001}).
It was also easier on \level{LF} than on \level{RB} (\pinf{0.001}), \level{LB} (\p{0.006}), or \level{RF} (\p{0.03}).
\paragraph{Time per Contact}
On the mean time spent on each contact, there were two statistically significant effects:
\factor{Positioning} (\anova{4}{3990}{2.9}, \p{0.02}, see \figref{results/Grasp-TimePerContact-Location-Overall-Means})
and \factor{Target} (\anova{3}{3990}{62.6}, \pinf{0.001}).
It was shorter with \level{Fingertips} than with \level{Opposite} (\percent{+7}, \p{0.01}).
It was also shorter on \level{RF} than on \level{RB}, \level{LB} or \level{LF} (\pinf{0.001});
but longer on \level{LF} than on \level{RB} or \level{LB} (\pinf{0.001}).
\paragraph{Grip Aperture}
On the average distance between the thumb's fingertip and the other fingertips during grasping, there were two
statistically significant effects:
\factor{Positioning} (\anova{4}{3990}{30.1}, \pinf{0.001}, see \figref{results/Grasp-GripAperture-Location-Overall-Means})
and \factor{Target} (\anova{3}{3990}{19.9}, \pinf{0.001}).
It was longer with \level{Fingertips} than with \level{Proximal} (\pinf{0.001}), \level{Wrist} (\pinf{0.001}), \level{Opposite} (\pinf{0.001}), or \level{Nowhere} (\pinf{0.001});
and longer with \level{Proximal} than with \level{Wrist} (\pinf{0.001}) or \level{Nowhere} (\pinf{0.001}).
But, it was shorter with \level{RB} than with \level{LB} or \level{LF} (\pinf{0.001});
and shorter with \level{RF} than with \level{LB} or \level{LF} (\pinf{0.001}).
\begin{subfigs}{grasp_results}{Results of the grasp task performance metrics for each vibrotactile positioning.}[
Geometric means with bootstrap \percent{95} confidence and Tukey's \HSD pairwise comparisons: *** is \pinf{0.001}, ** is \pinf{0.01}, and * is \pinf{0.05}.
][
\item Time to complete a trial.
\item Number of contacts with the cube.
\item Time spent on each contact.
\item Distance between thumb and the other fingertips when grasping.
]
\subfig[0.4]{results/Grasp-CompletionTime-Location-Overall-Means}
\subfig[0.4]{results/Grasp-Contacts-Location-Overall-Means}
\par
\subfig[0.4]{results/Grasp-TimePerContact-Location-Overall-Means}
\subfig[0.4]{results/Grasp-GripAperture-Location-Overall-Means}
\end{subfigs}

View File

@@ -0,0 +1,72 @@
\subsection{Discrimination of Vibration Techniques}
\label{technique_results}
Seven participants were able to correctly discriminate between the two vibration techniques, which they described as the contact vibration (being the \level{Impact} technique) and the continuous vibration (being the \level{Distance} technique) respectively.
Seven participants said they only felt differences of intensity with a weak one (being the \level{Impact} technique) and a strong one (being the \level{Distance} technique).
Six participants did not notice the difference between the two vibration techniques.
There was no evidence that the ability to discriminate the vibration techniques was correlated with the participants' haptic or \AR/\VR expertise (\pearson{0.4}), nor that it had a statistically significant effect on the performance in the tasks.
As the tasks had to be completed as quickly as possible, we hypothesize that little attention was devoted to the different vibration techniques.
Indeed, some participants explained that the contact cues were sufficient to indicate whether the cube was being properly pushed or grasped.
Although the \level{Distance} technique provided additional feedback on the interpenetration of the finger with the cube, it was not strictly necessary to manipulate the cube quickly.
\subsection{Questionnaire}
\label{questions}
\figref{results_questions} shows the questionnaire results for each vibrotactile positioning.
Questionnaire results were analyzed using \ART non-parametric \ANOVA (\secref{metrics}).
Statistically significant effects were further analyzed with post-hoc pairwise comparisons with Holm-Bonferroni adjustment.
Wilcoxon signed-rank tests were used for main effects and \ART contrasts procedure for interaction effects.
Only significant results are reported.
\paragraph{Vibrotactile Rendering Rating}
\label{vibration_ratings}
There was a main effect of \factor{Positioning} (\anova{4}{171}{27.0}, \pinf{0.001}, see \figref{results/Question-Vibration Rating-Positioning-Overall}).
Participants preferred \level{Fingertips} more than \level{Wrist} (\p{0.01}), \level{Opposite} (\pinf{0.001}), and \level{Nowhere} (\pinf{0.001});
\level{Proximal} more than \level{Wrist} (\p{0.007}), \level{Opposite} (\pinf{0.001}), and \level{Nowhere} (\pinf{0.001});
And \level{Wrist} more than \level{Opposite} (\p{0.01}) and \level{Nowhere} (\pinf{0.001}).
\paragraph{Positioning \x Hand Rating}
\label{positioning_hand}
There were two main effects of \factor{Positioning} (\anova{4}{171}{20.6}, \pinf{0.001}) and of \factor{Hand} (\anova{1}{171}{12.2}, \pinf{0.001}).
Participants preferred \level{Fingertips} more than \level{Wrist} (\p{0.03}), \level{Opposite} (\pinf{0.001}), and \level{Nowhere} (\pinf{0.001});
\level{Proximal} more than \level{Wrist} (\p{0.003}), \level{Opposite} (\pinf{0.001}), and \level{Nowhere} (\pinf{0.001});
\level{Wrist} more than \level{Opposite} (\p{0.03}) and \level{Nowhere} (\pinf{0.001});
And \level{Skeleton} more than \level{No Hand} (\pinf{0.001}).
\paragraph{Workload}
\label{workload}
There was a main effect of \factor{Positioning} (\anova{4}{171}{3.9}, \p{0.004}, see \figref{results/Question-Workload-Positioning-Overall}).
Participants found \level{Opposite} more fatiguing than \level{Fingertips} (\p{0.01}), \level{Proximal} (\p{0.003}), and \level{Wrist} (\p{0.02}).
\paragraph{Usefulness}
\label{usefulness}
There was a main effect of \factor{Positioning} (\anova{4}{171}{38.0}, \p{0.041}, see \figref{results/Question-Usefulness-Positioning-Overall}).
Participants found \level{Fingertips} the most useful, more than \level{Proximal} (\p{0.02}), \level{Wrist} (\pinf{0.001}), \level{Opposite} (\pinf{0.001}), and \level{Nowhere} (\pinf{0.001});
\level{Proximal} more than \level{Wrist} (\p{0.008}), \level{Opposite} (\pinf{0.001}), and \level{Nowhere} (\pinf{0.001});
\level{Wrist} more than \level{Opposite} (\p{0.008}) and \level{Nowhere} (\pinf{0.001});
And \level{Opposite} more than \level{Nowhere} (\p{0.004}).
\paragraph{Realism}
\label{realism}
There was a main effect of \factor{Positioning} (\anova{4}{171}{28.8}, \pinf{0.001}, see \figref{results/Question-Realism-Positioning-Overall}).
Participants found \level{Fingertips} the most realistic, more than \level{Proximal} (\p{0.05}), \level{Wrist} (\p{0.004}), \level{Opposite} (\pinf{0.001}), and \level{Nowhere} (\pinf{0.001});
\level{Proximal} more than \level{Wrist} (\p{0.03}), \level{Opposite} (\pinf{0.001}), and \level{Nowhere} (\pinf{0.001});
\level{Wrist} more than \level{Opposite} (\p{0.03}) and \level{Nowhere} (\pinf{0.001});
And \level{Opposite} more than \level{Nowhere} (\p{0.03}).
\begin{subfigs}{results_questions}{Boxplots of the questionnaire results for each vibrotactile positioning.}[
Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment: *** is \pinf{0.001}, ** is \pinf{0.01}, and * is \pinf{0.05}.
Higher is better for \textbf{(a)} vibrotactile rendering rating, \textbf{(c)} usefulness and \textbf{(c)} fatigue.
Lower is better for \textbf{(d)} workload.
]
\subfig[0.4]{results/Question-Vibration Rating-Positioning-Overall}
\subfig[0.4]{results/Question-Workload-Positioning-Overall}
\par
\subfig[0.4]{results/Question-Usefulness-Positioning-Overall}
\subfig[0.4]{results/Question-Realism-Positioning-Overall}
\end{subfigs}

View File

@@ -0,0 +1,50 @@
\section{Discussion}
\label{discussion}
We evaluated twenty visuo-haptic renderings of the hand, in the same two virtual object manipulation tasks in \AR as in the \chapref{visual_hand}, as the combination of two vibrotactile contact techniques provided at five delocalized positions on the hand with the two most representative visual hand renderings established in the \chapref{visual_hand}.
In the \level{Push} task, vibrotactile haptic hand rendering has been proven beneficial with the \level{Proximal} positioning, which registered a low completion time, but detrimental with the \level{Fingertips} positioning, which performed worse (\figref{results/Push-CompletionTime-Location-Overall-Means}) than the \level{Proximal} and \level{Opposite} (on the contralateral hand) positionings.
The cause might be the intensity of vibrations, which many participants found rather strong and possibly distracting when provided at the fingertips.
This result was also observed by \textcite{bermejo2021exploring}, who provided vibrotactile cues when pressing a virtual keypad.
Another reason could be the visual impairment caused by the vibrotactile motors when worn on the fingertips, which could have disturbed the visualization of the virtual cube.
We observed different strategies than in the \chapref{visual_hand} for the two tasks.
During the \level{Push} task, participants made more and shorter contacts to adjust the cube inside the target volume (\figref{results/Push-Contacts-Location-Overall-Means} and \figref{results/Push-TimePerContact-Location-Overall-Means}).
During the \level{Grasp} task, participants pressed the cube \percent{25} harder on average (\figref{results/Grasp-GripAperture-Location-Overall-Means}).
The \level{Fingertips} and \level{Proximal} positionings led to a slightly larger grip aperture than the others.
We think that the proximity of the vibrotactile rendering to the point of contact made users to take more time to adjust their grip in a more realistic manner, \ie closer to the surface of the cube.
This could also be the cause of the higher number of failed grasps or cube drops: indeed, we observed that the larger the grip aperture, the higher the number of contacts.
Consequently, the \level{Fingertips} positioning was slower (\figref{results/Grasp-CompletionTime-Location-Overall-Means}) and more prone to error (\figref{results/Grasp-Contacts-Location-Overall-Means}) than the \level{Opposite} and \level{Nowhere} positionings.
In both tasks, the \level{Opposite} positioning also seemed to be faster (\figref{results/Push-CompletionTime-Location-Overall-Means}) than having no vibrotactile hand rendering (\level{Nowhere} positioning).
However, participants also felt more workload (\figref{results_questions}) with this positioning opposite to the site of the interaction.
This result might mean that participants focused more on learning to interpret these sensations, which led to better performance in the long run.
Overall, many participants appreciated the vibrotactile hand renderings, commenting that they made the tasks more realistic and easier.
However, the closer to the contact point, the better the vibrotactile rendering was perceived (\figref{results_questions}).
This seemed inversely correlated with the performance, except for the \level{Nowhere} positioning, \eg both the \level{Fingertips} and \level{Proximal} positionings were perceived as more effective, useful, and realistic than the other positionings despite lower performance.
Considering the two tasks, no clear difference in performance or appreciation was found between the two contact vibration techniques.
While the majority of participants discriminated the two different techniques, only a minority identified them correctly (\secref{technique_results}).
It seemed that the Impact technique was sufficient to provide contact information compared to the \level{Distance} technique, which provided additional feedback on interpenetration, as reported by participants.
No difference in performance was found between the two visual hand renderings, except for the \level{Push} task, where the \level{Skeleton} hand rendering resulted again in longer contacts.
Additionally, the \level{Skeleton} rendering was appreciated and perceived as more effective than having no visual hand rendering, confirming the results of our \chapref{visual_hand}.
Participants reported that this visual hand rendering provided good feedback on the status of the hand tracking while being constrained to the cube, and helped with rotation adjustment in both tasks.
However, many also felt that it was a bit redundant with the vibrotactile hand rendering.
Indeed, receiving a vibrotactile hand rendering was found by participants as a more accurate and reliable information regarding the contact with the cube than simply seeing the cube and the visual hand reacting to the manipulation.
This result suggests that providing a visual hand rendering may not be useful during the grasping phase, but may be beneficial prior to contact with the virtual object and during position and rotation adjustment, providing valuable information about the hand pose.
It is also worth noting that the improved hand tracking and grasp helper improved the manipulation of the cube with respect to the \chapref{visual_hand}, as shown by the shorter completion time during the \level{Grasp} task.
This improvement could also be the reason for the smaller differences between the \level{Skeleton} and the \level{None} visual hand renderings in this second experiment.
In summary, the positioning of the vibrotactile haptic rendering of the hand affected on the performance and experience of users manipulating virtual objects with their bare hands in \AR.
The closer the vibrotactile hand rendering was to the point of contact, the better it was perceived in terms of effectiveness, usefulness, and realism.
These subjective appreciations of wearable haptic hand rendering for manipulating virtual objects in \AR were also observed by \textcite{maisto2017evaluation} and \textcite{meli2018combining}.
However, the best performance was obtained with the farthest positioning on the contralateral hand (\level{Opposite}), which is somewhat surprising.
This apparent paradox could be explained in two ways.
On the one hand, participants behave differently when the haptic rendering was given on the fingers (\level{Fingertips} and \level{Proximal}), close to the contact point, with shorter pushes and larger grip apertures.
This behavior has likely given them a better experience of the tasks and more confidence in their actions, as well as leading to a lower interpenetration/force applied to the cube \cite{pacchierotti2015cutaneous}.
On the other hand, the unfamiliarity of the contralateral hand positioning (\level{Opposite}) caused participants to spend more time understanding the haptic stimuli, which might have made them more focused on performing the task.
In terms of the contact vibration technique, the continuous vibration technique on the finger interpenetration (\level{Distance}) did not make a difference to performance, although it provided more information.
Participants felt that vibration bursts were sufficient (\level{Distance}) to confirm contact with the virtual object.
Finally, it was interesting to note that the visual hand rendering was appreciated but felt less necessary when provided together with vibrotactile hand rendering, as the latter was deemed sufficient for acknowledging the contact.

View File

@@ -0,0 +1,22 @@
\section{Conclusion}
\label{conclusion}
In this chapter, we investigated the visuo-haptic feedback of the hand when manipulating virtual objects in immersive \OST-\AR using wearable vibrotactile haptic.
To do so, we provided vibrotactile feedback of the fingertip contacts with virtual objects by moving away the haptic actuator that do not cover the inside of the hand: on the nails, the proximal phalanges, the wrist, and the nails of the opposite hand.
We selected these four different delocalized positions on the hand from the literature for direct hand interaction in \AR using wearable haptic devices.
In a user study, we compared twenty visuo-haptic feedback of the hand as the combination of two vibrotactile contact techniques, provided at five different delocalized positions on the user's hand, and with the two most representative visual hand augmentations established in the \chapref{visual_hand}, \ie the skeleton hand rendering and no hand rendering.
Results showed that delocalized vibrotactile haptic hand feedback improved the perceived effectiveness, realism, and usefulness when it is provided close to the contact point.
However, the farthest positioning on the contralateral hand gave the best performance even though it was disliked: the unfamiliarity of the positioning probably caused the participants to take more effort to consider the haptic stimuli and to focus more on the task.
The visual hand augmentation was perceived less necessary than the vibrotactile haptic feedback, but still provided a useful feedback on the hand tracking.
This study provide evidence that moving away the feedback from the inside of the hand is a simple but promising approach for wearable haptics in \AR.
If integration with the hand tracking system allows it, and if the task requires it, a haptic ring worn on the middle or proximal phalanx seems preferable.
However, a wrist-mounted haptic device will be able to provide richer feedback by embedding more diverse haptic actuators with larger bandwidths and maximum amplitudes, while being less obtrusive than a ring.
Finally, we think that the visual hand augmentation complements the haptic contact rendering well by providing continuous feedback on the hand tracking, and that it can be disabled during the grasping phase to avoid redundancy with the haptic feedback of the contact with the virtual object.
\noindentskip This work was published in Transactions on Haptics:
Erwan Normand, Claudio Pacchierotti, Eric Marchand, and Maud Marchal.
\enquote{Visuo-Haptic Rendering of the Hand during 3D Manipulation in Augmented Reality}.
In: \textit{IEEE Transactions on Haptics}. 27.4 (2024), pp. 2481--2487.

View File

@@ -0,0 +1,13 @@
\chapter{Visuo-Haptic Augmentation of Hand Manipulation with Virtual Objects in AR}
\mainlabel{visuo_haptic_hand}
\chaptertoc
\input{1-introduction}
\input{2-method}
\input{3-0-results}
\input{3-1-push}
\input{3-2-grasp}
\input{3-3-questions}
\input{4-discussion}
\input{5-conclusion}