WIP

2024-09-25 17:25:13 +02:00
parent d6c8184df8
commit 0a21557052
16 changed files with 103 additions and 96 deletions
--- a/3-manipulation/visual-hand/1-introduction.tex
+++ b/3-manipulation/visual-hand/1-introduction.tex
@@ -1,47 +1,22 @@
-Augmented reality (AR) integrates virtual content into our real-world surroundings, giving the illusion of one unique environment and promising natural and seamless interactions with real and virtual objects.
-%
-Virtual object manipulation is particularly critical for useful and effective \AR usage, such as in medical applications, training, or entertainment \cite{laviolajr20173d, kim2018revisiting}.
-%
-Hand tracking technologies \cite{xiao2018mrtouch}, grasping techniques \cite{holl2018efficient}, and real-time physics engines permit users to directly manipulate virtual objects with their bare hands as if they were real \cite{piumsomboon2014graspshell}, without requiring controllers \cite{krichenbauer2018augmented}, gloves \cite{prachyabrued2014visual}, or predefined gesture techniques \cite{piumsomboon2013userdefined, ha2014wearhand}.
-%
-Optical see-through \AR (OST-AR) head-mounted displays (HMDs), such as the Microsoft HoloLens 2 or the Magic Leap, are particularly suited for this type of direct hand interaction \cite{kim2018revisiting}.
+\noindent Touching, grasping and manipulating \VOs are fundamental interactions in \AR (\secref[related_work]{ve_tasks}) and essential for many of its applications (\secref[related_work]{ar_applications}).
+The most common current \AR systems, in the form of portable and immersive \OST-\AR headsets \cite{hertel2021taxonomy}, allow real-time hand tracking and direct interaction with \VOs with bare hands (\secref[related_work]{real_virtual_gap}).
+Manipulation of \VOs is achieved using a virtual hand interaction technique that represents the user's hand in the \VE and simulates interaction with \VOs (\secref[related_work]{ar_virtual_hands}).
+However, direct hand manipulation is still challenging due to the intangibility of the \VE, the lack of mutual occlusion between the hand and the \VO in \OST-\AR (\secref[related_work]{ar_displays}), and the inherent delays between the user's hand and the result of the interaction simulation (\secref[related_work]{ar_virtual_hands}).

-However, there are still several haptic and visual limitations that affect manipulation in OST-AR, degrading the user experience.
-%
-For example, it is difficult to estimate the position of one's hand in relation to a virtual content because mutual occlusion between the hand and the virtual object is often lacking \cite{macedo2023occlusion}, the depth of virtual content is underestimated \cite{diaz2017designing, peillard2019studying}, and hand tracking still has a noticeable latency \cite{xiao2018mrtouch}.
-%
-Similarly, it is challenging to ensure confident and realistic contact with a virtual object due to the lack of haptic feedback and the intangibility of the virtual environment, which of course cannot apply physical constraints on the hand \cite{maisto2017evaluation, meli2018combining, lopes2018adding, teng2021touch}.
-%
-These limitations also make it difficult to confidently move a grasped object towards a target \cite{maisto2017evaluation, meli2018combining}.
+In this chapter, we investigate the \textbf{visual rendering as hand augmentation} for direct manipulation of \VOs in \OST-\AR.
+To this end, we selected in the literature and compared the most popular visual hand renderings used to interact with \VOs in \AR.
+The virtual hand is \textbf{displayed superimposed} on the user's hand with these visual rendering, providing a \textbf{feedback on the tracking} of the real hand, as shown in \figref{hands}.
+The movement of the virtual hand is also \textbf{constrained to the surface} of the \VO, providing an additional \textbf{feedback on the interaction} with the \VO.
+We \textbf{evaluate in a user study}, using the \OST-\AR headset Microsoft HoloLens~2, the effect of six visual hand renderings on the user performance and experience in two representative manipulation tasks: push-and-slide and grasp-and-place a \VO directly with the hand.

-To address these haptic and visual limitations, we investigate two types of sensory feedback that are known to improve virtual interactions with hands, but have not been studied together in an \AR context: visual hand rendering and delocalized haptic rendering.
-%
-A few works explored the effect of a visual hand rendering on interactions in \AR by simulating mutual occlusion between the real hand and virtual objects \cite{ha2014wearhand, piumsomboon2014graspshell, al-kalbani2016analysis}, or displaying a 3D virtual hand model, semi-transparent \cite{ha2014wearhand, piumsomboon2014graspshell} or opaque \cite{blaga2017usability, yoon2020evaluating, saito2021contact}.
-%
-Indeed, some visual hand renderings are known to improve interactions or user experience in virtual reality (VR), where the real hand is not visible \cite{prachyabrued2014visual, argelaguet2016role, grubert2018effects, schwind2018touch, vanveldhuizen2021effect}.
-%
-However, the role of a visual hand rendering superimposed and seen above the real tracked hand has not yet been investigated in \AR.
-%
-Conjointly, several studies have demonstrated that wearable haptics can significantly improve interactions performance and user experience in \AR \cite{maisto2017evaluation, meli2018combining, sarac2022perceived}.
-%
-But haptic rendering for \AR remains a challenge as it is difficult to provide rich and realistic haptic sensations while limiting their negative impact on hand tracking \cite{pacchierotti2016hring} and keeping the fingertips and palm free to interact with the real environment \cite{lopes2018adding, teng2021touch, sarac2022perceived, palmer2022haptic}.
-%
-Therefore, the haptic feedback of the fingertip contact with the virtual environment needs to be rendered elsewhere on the hand, it is unclear which positioning should be preferred or which type of haptic feedback is best suited for manipulating virtual objects in \AR.
-%
-A final question is whether one or the other of these (haptic or visual) hand renderings should be preferred \cite{maisto2017evaluation, meli2018combining}, or whether a combined visuo-haptic rendering is beneficial for users.
-%
-In fact, both hand renderings can provide sufficient sensory cues for efficient manipulation of virtual objects in \AR, or conversely, they can be shown to be complementary.
-
-In this paper, we investigate the role of the visuo-haptic rendering of the hand during 3D manipulation of virtual objects in OST-AR.
-%
-We consider two representative manipulation tasks: push-and-slide and grasp-and-place a virtual object.
-%
-The main contributions of this work are:
+noindentskip
+The main contributions of this chapter are:
 \begin{itemize}
-  \item A comparison from the literature of the six most common visual hand renderings used in \AR.
+  \item A comparison from the literature of the six most common visual hand renderings used to interact with \VOs in \AR.
  \item A user study evaluating with 24 participants the performance and user experience of the six visual hand renderings superimposed on the real hand during free and direct hand manipulation of \VOs in \OST-\AR.
 \end{itemize}

+noindentskip
 In the next sections, we first present the six visual hand renderings considered in this study and gathered from the literature. We then describe the experimental setup and design, the two manipulation tasks, and the metrics used. We present the results of the user study and discuss the implications of these results for the manipulation of \VOs directly with the hand in \AR.

 \begin{subfigs}{hands}{The six visual hand renderings.}[
--- a/3-manipulation/visual-hand/2-method.tex
+++ b/3-manipulation/visual-hand/2-method.tex
@@ -50,7 +50,7 @@ We aim to investigate whether the chosen visual hand rendering affects the perfo
 \subsection{Manipulation Tasks and Virtual Scene}
 \label{tasks}

-Following the guidelines of \textcite{bergstrom2021how} for designing object manipulation tasks, we considered two variations of a 3D pick-and-place task, commonly found in interaction and manipulation studies \cite{prachyabrued2014visual,maisto2017evaluation,meli2018combining,blaga2017usability,vanveldhuizen2021effect}.
+Following the guidelines of \textcite{bergstrom2021how} for designing object manipulation tasks, we considered two variations of a 3D pick-and-place task, commonly found in interaction and manipulation studies \cite{prachyabrued2014visual,blaga2017usability,maisto2017evaluation,meli2018combining,vanveldhuizen2021effect}.

 \subsubsection{Push Task}
 \label{push-task}
@@ -72,15 +72,15 @@ However, this time, the target volume can spawn in eight different locations on
 Users are asked to grasp, lift, and move the cube towards the target volume using their fingertips in any way they prefer.
 As before, the task is considered completed when the cube is \emph{fully} inside the volume.

-\begin{subfigs}{tasks}{The two manipulation tasks wof the user study. }[
-    The cube to manipulate is in the middle of the table (5-cm-edge and opaque) and the eight possible targets to reach are arround (7-cm-edge volume and semi-transparent).
+\begin{subfigs}{tasks}{The two manipulation tasks wof the user study.}[
+    The cube to manipulate is in the middle of the table (\qty{5}{cm} edge and opaque) and the eight possible targets to reach are arround (\qty{7}{cm} edge volume and semi-transparent).
    Only one target at a time was shown during the experiments.
  ][
  \item Push task: pushing the virtual cube along a table towards a target placed on the same surface.
  \item Grasp task: grasping and lifting the virtual cube towards a target placed on a \qty{20}{\cm} higher plane.
  ]
-  \subfig[0.4]{method/task-push}
-  \subfig[0.4]{method/task-grasp}
+  \subfig[0.45]{method/task-push}
+  \subfig[0.45]{method/task-grasp}
 \end{subfigs}

 \subsection{Experimental Design}
@@ -128,7 +128,7 @@ First, participants were given a consent form that briefed them about the tasks
 Then, participants were asked to comfortably sit in front of a table and wear the HoloLens~2 headset as shown in~\figref{tasks}, perform the calibration of the visual hand size as described in~\secref{apparatus}, and complete a \qty{2}{min} training to familiarize with the \AR rendering and the two considered tasks.
 During this training, we did not use any of the six hand renderings we want to test, but rather a fully-opaque white hand rendering that completely occluded the real hand of the user.
 Participants were asked to carry out the two tasks as naturally and as fast as possible.
-Similarly to \cite{prachyabrued2014visual, maisto2017evaluation, blaga2017usability, vanveldhuizen2021effect}, we only allowed the use of the dominant hand.
+Similarly to \cite{prachyabrued2014visual,maisto2017evaluation,blaga2017usability,vanveldhuizen2021effect}, we only allowed the use of the dominant hand.
 The experiment took around 1 hour and 20 minutes to complete.

 \subsection{Participants}
--- a/3-manipulation/visual-hand/3-1-push.tex
+++ b/3-manipulation/visual-hand/3-1-push.tex
@@ -41,7 +41,7 @@ On the contrary, the lack of visual hand constrained the participants to give mo

 Targets on the left (\level{L}, \level{LF}) and the right (\level{R}) sides had higher \response{Timer per Contact} than all the other targets (\p{0.005}).

-\begin{subfigs}{push_results}{Results of the push task performance metrics for each visual hand rendering. }[
+\begin{subfigs}{push_results}{Results of the push task performance metrics for each visual hand rendering.}[
    Geometric means with bootstrap 95~\% \CI
    and Tukey's \HSD pairwise comparisons: *** is \pinf{0.001}, ** is \pinf{0.01}, and * is \pinf{0.05}.
  ][
--- a/3-manipulation/visual-hand/3-2-grasp.tex
+++ b/3-manipulation/visual-hand/3-2-grasp.tex
@@ -54,7 +54,7 @@ The \level{Mesh} rendering seemed to have provided the most confidence to partic

 The \response{Grip Aperture} was longer on the right-front (\level{RF}) target volume, indicating a higher confidence, than on back and side targets (\level{R}, \level{RB}, \level{B}, \level{L}, \p{0.03}).

-\begin{subfigs}{grasp_results}{Results of the grasp task performance metrics for each visual hand rendering. }[
+\begin{subfigs}{grasp_results}{Results of the grasp task performance metrics for each visual hand rendering.}[
    Geometric means with bootstrap 95~\% \CI
    and Tukey's \HSD pairwise comparisons: *** is \pinf{0.001}, ** is \pinf{0.01}, and * is \pinf{0.05}.
  ][
--- a/3-manipulation/visual-hand/3-3-ranks.tex
+++ b/3-manipulation/visual-hand/3-3-ranks.tex
@@ -14,7 +14,7 @@ Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment were then us
    A complete visual hand rendering seemed to be preferred over no visual hand rendering when grasping.
 \end{itemize}

-\begin{subfigs}{results_ranks}{Boxplots of the ranking for each visual hand rendering. }[
+\begin{subfigs}{results_ranks}{Boxplots of the ranking for each visual hand rendering.}[
    Lower is better.
    Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment: ** is \pinf{0.01} and * is \pinf{0.05}.
  ][
--- a/3-manipulation/visual-hand/3-4-questions.tex
+++ b/3-manipulation/visual-hand/3-4-questions.tex
@@ -19,7 +19,7 @@ Moreover, having no visible visual \factor{Hand} rendering was felt by users fat
 Surprisingly, no clear consensus was found on \response{Rating}.
 Each visual hand rendering, except for \level{Occlusion}, had simultaneously received the minimum and maximum possible notes.

-\begin{subfigs}{results_questions}{Boxplots of the questionnaire results for each visual hand rendering. }[
+\begin{subfigs}{results_questions}{Boxplots of the questionnaire results for each visual hand rendering.}[
    Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment: ** is \pinf{0.01} and * is \pinf{0.05}.
    Lower is better for \textbf{(a)} difficulty and \textbf{(b)} fatigue.
    Higher is better for \textbf{(d)} performance, \textbf{(d)} precision, \textbf{(e)} efficiency, and \textbf{(f)} rating.
--- a/3-manipulation/visual-hand/4-discussion.tex
+++ b/3-manipulation/visual-hand/4-discussion.tex
@@ -25,8 +25,8 @@ This result is consistent with \textcite{saito2021contact}, who found that displ

 To summarize, when employing a visual hand rendering overlaying the real hand, participants were more performant and confident in manipulating virtual objects with bare hands in \AR.
 These results contrast with similar manipulation studies, but in non-immersive, on-screen \AR, where the presence of a visual hand rendering was found by participants to improve the usability of the interaction, but not their performance \cite{blaga2017usability,maisto2017evaluation,meli2018combining}.
-Our results show the most effective visual hand rendering to be the \level{Skeleton} one. Participants appreciated that it provided a detailed and precise view of the tracking of the real hand, without hiding or masking it.
+Our results show the most effective visual hand rendering to be the \level{Skeleton} one.
+Participants appreciated that it provided a detailed and precise view of the tracking of the real hand, without hiding or masking it.
 Although the \level{Contour} and \level{Mesh} hand renderings were also highly rated, some participants felt that they were too visible and masked the real hand.
 This result is in line with the results of virtual object manipulation in \VR of \textcite{prachyabrued2014visual}, who found that the most effective visual hand rendering was a double representation of both the real tracked hand and a visual hand physically constrained by the virtual environment.
-This type of \level{Skeleton} rendering was also the one that provided the best sense of agency (control) in \VR \cite{argelaguet2016role, schwind2018touch}.
-
+This type of \level{Skeleton} rendering was also the one that provided the best sense of agency (control) in \VR \cite{argelaguet2016role,schwind2018touch}.
--- a/3-manipulation/visual-hand/5-conclusion.tex
+++ b/3-manipulation/visual-hand/5-conclusion.tex
@@ -1,7 +1,18 @@
 \section{Conclusion}
 \label{conclusion}

-This paper presented two human subject studies aimed at better understanding the role of visuo-haptic rendering of the hand during virtual object manipulation in OST-AR.
-The first experiment compared six visual hand renderings in two representative manipulation tasks in \AR, \ie push-and-slide and grasp-and-place of a virtual object.
-Results show that a visual hand rendering improved the performance, perceived effectiveness, and user confidence.
+In this chapter, we addressed the challenge of touching, grasping and manipulating \VOs directly with the hand in immersive \OST-\AR by providing and evaluating visual renderings as hand augmentations.
+Superimposed on the user's hand, these visual renderings provide feedback from the virtual hand, which tracks the real hand, and simulates the interaction with \VOs as a proxy.
+We first selected and compared the six most popular visual hand renderings used to interact with \VOs in \AR.
+Then, in a user study with 24 participants and an immersive \OST-\AR headset, we evaluated the effect of these six visual hand renderings on the user performance and experience in two representative manipulation tasks.
+
+Our results showed that a visual hand rendering overlaying the real hand improved the performance, perceived effectiveness and confidence of participants compared to none.
 A skeleton rendering, providing a detailed view of the tracked joints and phalanges while not hiding the real hand, was the most performant and effective.
+The contour and mesh renderings were found to mask the real hand, while the tips rendering was controversial.
+The occlusion rendering too much tracking latency to be effective.
+This is consistent with similar manipulation studies in \VR and in non-immersive \VST-\AR setups.
+
+This study suggests that a \ThreeD visual hand rendering is important in \AR when interacting through a virtual hand technique.
+It seems particularly required for interaction tasks that involves precise movements of the fingers in relation to virtual content, such as \ThreeD windows, buttons and sliders, or stacking and assembly tasks.
+A minimal but detailed rendering of the hand that does not hide the real hand, like the skeleton rendering we evaluated, seems to be the best compromise between provided feedback and effectiveness.
+Still, users should be able to choose and adapt the visual hand rendering to their preferences and needs.
--- a/3-manipulation/visuo-haptic-hand/2-method.tex
+++ b/3-manipulation/visuo-haptic-hand/2-method.tex
@@ -50,7 +50,7 @@ The chosen visuo-haptic hand renderings are the combination of the two most repr
 \subsection{Experimental Design}
 \label{design}

-\begin{subfigs}{tasks}{The two manipulation tasks of the user study. }[
+\begin{subfigs}{tasks}{The two manipulation tasks of the user study.}[
    Both pictures show the cube to manipulate in the middle (\qty{5}{\cm} and opaque) and the eight possible targets to reach (\qty{7}{\cm} cube and semi-transparent).
    Only one target at a time was shown during the experiments.
  ][
@@ -61,7 +61,7 @@ The chosen visuo-haptic hand renderings are the combination of the two most repr
  \subfig[0.23]{method/task-grasp}
 \end{subfigs}

-\begin{subfigs}{push_results}{Results of the grasp task performance metrics. }[
+\begin{subfigs}{push_results}{Results of the grasp task performance metrics.}[
    Geometric means with bootstrap 95~\% \CI for each vibrotactile positioning (a, b and c) or visual hand rendering (d)
    and Tukey's \HSD pairwise comparisons: *** is \pinf{0.001}, ** is \pinf{0.01}, and * is \pinf{0.05}.
  ][
--- a/3-manipulation/visuo-haptic-hand/3-3-questions.tex
+++ b/3-manipulation/visuo-haptic-hand/3-3-questions.tex
@@ -18,7 +18,7 @@ Although the \level{Distance} technique provided additional feedback on the inte
 \subsection{Questionnaire}
 \label{questions}

-\begin{subfigs}{results_questions}{Boxplots of the questionnaire results for each vibrotactile positioning. }[
+\begin{subfigs}{results_questions}{Boxplots of the questionnaire results for each vibrotactile positioning.}[
    Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment: *** is \pinf{0.001}, ** is \pinf{0.01}, and * is \pinf{0.05}.
    Higher is better for \textbf{(a)} vibrotactile rendering rating, \textbf{(c)} usefulness and \textbf{(c)} fatigue.
    Lower is better for \textbf{(d)} workload.
--- a/3-manipulation/visuo-haptic-hand/3-results.tex
+++ b/3-manipulation/visuo-haptic-hand/3-results.tex
@@ -1,7 +1,7 @@
 \section{Results}
 \label{results}

-\begin{subfigs}{grasp_results}{Results of the grasp task performance metrics for each vibrotactile positioning. }[
+\begin{subfigs}{grasp_results}{Results of the grasp task performance metrics for each vibrotactile positioning.}[
    Geometric means with bootstrap 95~\% confidence and Tukey's \HSD pairwise comparisons: *** is \pinf{0.001}, ** is \pinf{0.01}, and * is \pinf{0.05}.
  ][
  \item Time to complete a trial.