Complete comments

2025-04-18 11:21:19 +02:00
parent 3de6ad37df
commit 0a0e1ff4b5
11 changed files with 35 additions and 38 deletions
--- a/3-perception/vhar-system/1-introduction.tex
+++ b/3-perception/vhar-system/1-introduction.tex
@@ -2,7 +2,7 @@
 \label{intro}

 One approach to render virtual haptic textures consists in simulating the roughness of a periodic grating surface as a vibrotactile sinusoidal (\secref[related_work]{texture_rendering}).
-The vibrations are rendered to a voice-coil actuator embedded in a hand-held tool or worn on the finger.
+The vibrations are rendered to a voice-coil actuator embedded in a hand-held tool or worn on the finger (\secref[related_work]{vhar_haptics}).
 To create the illusion of touching a pattern with a fixed spatial period, the frequency of signal must be modulated according to the finger movement.
 Previous work either used mechanical system to track the movement at high frequency \cite{strohmeier2017generating,friesen2024perceived}, or required the user to move at a constant speed to keep the signal frequency constant \cite{asano2015vibrotactile,ujitoko2019modulating}.
 However, this method has not yet been integrated in an \AR headset context, where the user should be able to freely touch and explore the visuo-haptic texture augmentations.
@@ -10,10 +10,12 @@ However, this method has not yet been integrated in an \AR headset context, wher
 %which either constrained hand to a constant speed to keep the signal frequency constant \cite{asano2015vibrotactile,friesen2024perceived}, or used mechanical sensors attached to the hand \cite{friesen2024perceived,strohmeier2017generating}

 In this chapter, we propose a \textbf{system for rendering visual and haptic virtual textures that augment real surfaces}.
-It is implemented with the \OST-\AR headset Microsoft HoloLens~2 and a wearable vibrotactile (voice-coil) device worn on the outside of finger (not covering the fingertip, \secref[related_work]{vhar_haptics}).
-The visuo-haptic augmentations can be \textbf{viewed from any angle} and \textbf{explored freely with the bare finger}, as if they were real textures.
+It is implemented with the \OST-\AR headset Microsoft HoloLens~2 and a wearable vibrotactile (voice-coil) device worn on the outside of finger (not covering the fingertip).
+The visuo-haptic augmentations rendered with this design allow a user to \textbf{see the textures from any angle} and \textbf{explore them freely with the bare finger}, as if they were real textures.
 To ensure both real-time and reliable renderings, the hand and the real surfaces are tracked using a webcam and marker-based pose estimation.
 The haptic textures are rendered as a vibrotactile signal representing a patterned grating texture that is synchronized with the finger movement on the augmented surface.
+The goal of this design is to enable new \AR applications capable of augmenting real objects with virtual visuo-haptic textures in a portable, on-demand manner, and without impairing with the interaction of the user with the \RE.
+\comans{SJ}{The rationale behind the proposed design is not provided. Since there are multiple ways to implement mechanically transparent haptic devices, the thesis should at least clarify why this design is considered optimal for a specific purpose at this stage.}{This has been better explained in the introduction.}

 \noindentskip The contributions of this chapter are:
 \begin{itemize}
--- a/3-perception/vhar-system/2-method.tex
+++ b/3-perception/vhar-system/2-method.tex
@@ -1,8 +1,6 @@
 \section{Concept}
 \label{principle}

-\comans{SJ}{The rationale behind the proposed design is not provided. Since there are multiple ways to implement mechanically transparent haptic devices, the thesis should at least clarify why this design is considered optimal for a specific purpose at this stage.}{TODO}
-
 The visuo-haptic texture rendering system is based on:
 \begin{enumerate}[label=(\arabic*)]
  \item a real-time interaction loop between the finger movements and a coherent visuo-haptic feedback simulating the sensation of a touched texture,
@@ -49,7 +47,6 @@ Finally, this filtered finger velocity is transformed into the augmented surface
 \subsection{Virtual Environment Registration}
 \label{virtual_real_registration}

-\comans{JG}{The registration process between the external camera, the finger, surface and HoloLens could have been described in more detail. Specifically, it could have been described clearer how the HoloLens coordinate system was aligned (e.g., by also tracking the fiducials on the surface and or finger).}{This has been better described.}
 Before a user interacts with the system, it is necessary to design a \VE that will be registered with the \RE during the experiment.
 Each real element tracked by a marker is modelled virtually, \eg the hand and the augmented surface (\figref{device}).
 In addition, the pose and size of the virtual textures were defined on the virtual replicas.
@@ -58,12 +55,13 @@ First, the coordinate system of the headset is manually aligned with that of the
 This resulted in a \qty{\pm .5}{\cm} spatial alignment error between the \RE and the \VE.
 While this was sufficient for our use cases, other methods can achieve better accuracy if needed \cite{grubert2018survey}.
 The registration of the coordinate systems of the camera and the headset thus allows the use of the marker estimation poses performed with the camera to display in the headset the virtual models aligned with their real-world counterparts.
+\comans{JG}{The registration process between the external camera, the finger, surface and HoloLens could have been described in more detail. Specifically, it could have been described clearer how the HoloLens coordinate system was aligned (e.g., by also tracking the fiducials on the surface and or finger).}{This has been better described.}

-\comans{JG}{A description if and how the offset between the lower side of the fingertip touching the surface and the fiducial mounted on the top of the finger was calibrated / compensated is missing}{This has been better described.}
 An additional calibration is performed to compensate for the offset between the finger contact point and the estimated marker pose \cite{son2022effect}.
 The current user then places the index finger on the origin point, whose respective poses are known from the attached fiducial markers.
 The transformation between the marker pose of the finger and the finger contact point can be estimated and compensated with an inverse transformation.
 This allows to detect if the calibrated real finger touches a virtual texture using a collision detection algorithm (Nvidia PhysX).
+\comans{JG}{A description if and how the offset between the lower side of the fingertip touching the surface and the fiducial mounted on the top of the finger was calibrated / compensated is missing}{This has been better described.}

 In our implementation, the \VE is designed with Unity (v2021.1) and the Mixed Reality Toolkit (v2.7)\footnoteurl{https://learn.microsoft.com/windows/mixed-reality/mrtk-unity}.
 The visual rendering is achieved using the Microsoft HoloLens~2, an \OST-\AR headset with a \qtyproduct{43 x 29}{\degree} \FoV, a \qty{60}{\Hz} refresh rate, and self-localisation capabilities.
--- a/3-perception/vhar-textures/2-experiment.tex
+++ b/3-perception/vhar-textures/2-experiment.tex
@@ -122,5 +122,5 @@ After each of the two tasks, participants answered to the following 7-item Liker

 In an open question, participants also commented on their strategy for completing the \level{Matching} task (\enquote{How did you associate the tactile textures with the visual textures?}) and the \level{Ranking} task (\enquote{How did you rank the textures?}).

-\comans{JG}{I suggest to also report on [...] the software packages used for statistical analysis (this holds also for the subsequent chapters).}{This has been added to all chapters where necessary.}
 The results were analyzed using R (v4.4) and the packages \textit{afex} (v1.4), \textit{ARTool} (v0.11), \textit{corrr} (v0.4), \textit{FactoMineR} (v2.11), \textit{lme4} (v1.1), and \textit{performance} (v0.13).
+\comans{JG}{I suggest to also report on [...] the software packages used for statistical analysis (this holds also for the subsequent chapters).}{This has been added to all chapters where necessary.}
--- a/3-perception/vhar-textures/3-results.tex
+++ b/3-perception/vhar-textures/3-results.tex
@@ -7,10 +7,10 @@
 \paragraph{Confusion Matrix}
 \label{results_matching_confusion_matrix}

-\comans{JG}{For the two-sample Chi-Squared tests in the matching task, the number of samples reported is 540 due to 20 participants conducting 3 trials for 9 textures each. However, this would only hold true if the repetitions per participant would be independent and not correlated (and then, one could theoretically also run 10 participants with 6 trials each, or 5 participants with 12 trials each). If they are not independent, this would lead to an artificial inflated sample size and Type I error. If the trials are not independent (please double check), I suggest either aggregating data on the participant level or to use alternative models that account for the within-subject correlation (as was done in other chapters).}{Data of the three confusion matrices have been aggregated on the participant level and analyzed using a Poisson regression.}
 \figref{results/matching_confusion_matrix} shows the confusion matrix of the \level{Matching} task with the visual textures and the proportion of haptic texture selected in response, \ie the proportion of times the corresponding \response{Haptic Texture} was selected in response to the presentation of the corresponding \factor{Visual Texture}.
 To determine which haptic textures were selected most often, the repetitions of the trials were first aggregated by counting the number of selections per participant for each (\factor{Visual Texture}, \response{Haptic Texture}) pair.
 An \ANOVA based on a Poisson regression (no overdispersion was detected) indicated a statistically significant effect on the number of selections of the interaction \factor{Visual Texture} \x \response{Haptic Texture} (\chisqr{64}{180}{414}, \pinf{0.001}).
+\comans{JG}{For the two-sample Chi-Squared tests in the matching task, the number of samples reported is 540 due to 20 participants conducting 3 trials for 9 textures each. However, this would only hold true if the repetitions per participant would be independent and not correlated (and then, one could theoretically also run 10 participants with 6 trials each, or 5 participants with 12 trials each). If they are not independent, this would lead to an artificial inflated sample size and Type I error. If the trials are not independent (please double check), I suggest either aggregating data on the participant level or to use alternative models that account for the within-subject correlation (as was done in other chapters).}{Data of the three confusion matrices have been aggregated on the participant level and analyzed using a Poisson regression.}
 Post-hoc pairwise comparisons using the Tukey's \HSD test then indicated there was statistically significant differences for the following visual textures:
 \begin{itemize}
  \item With \level{Sandpaper~320}, \level{Coffee Filter} was more selected than the other haptic textures (\ztest{3.4}, \pinf{0.05} each) except \level{Plastic Mesh~1} and \level{Terra Cotta}.
--- a/3-perception/xr-perception/3-experiment.tex
+++ b/3-perception/xr-perception/3-experiment.tex
@@ -34,12 +34,12 @@ Participants rated the roughness of the paper (without any texture augmentation)
 The visual rendering of the virtual hand and the \VE was achieved using the \OST-\AR headset Microsoft HoloLens~2 running at \qty{60}{FPS} a custom application made with Unity (v2021.1) and Mixed Reality Toolkit (v2.7).
 An \OST-\AR headset was chosen over a \VST-\AR headset because the former only adds virtual content to the \RE, while the latter streams a real-time video capture of the \RE, and one of our objectives was to directly compare a \VE replicating a real one, not to a video feed that introduces many other visual limitations (\secref[related_work]{ar_displays}).

-\comans{JG}{In addition, the lag between the real and virtual hand in the Mixed condition could have been quantified (e.g. using a camera filming through the headset) to shed more light on the reported differences, as also noted in Section 4.5, as well as the registration error between the real and the virtual hand (as visible in Figure 4.1, Mixed).}{This has been added.}
 We carefully reproduced the \RE in the \VE, including the geometry of the box, textures, lighting, and shadows (\figref{renderings}, \level{Virtual}).
 The virtual hand model was a gender-neutral human right hand with realistic skin texture, similar to that used by \textcite{schwind2017these}.
 Prior to the experiment, the virtual hand and the \VE were registered to the real hand of the participant and the \RE, respectively, as described in \secref[vhar_system]{virtual_real_registration}.
 The size of the virtual hand was also manually adjusted to match the real hand of the participant.
 A \qty{\pm .5}{\cm} spatial alignment error (\secref[vhar_system]{virtual_real_registration}) and a \qty{160 \pm 30}{\ms} lag (\secref[vhar_system]{virtual_real_registration}) between the real hand the virtual hand were measured.
+\comans{JG}{In addition, the lag between the real and virtual hand in the Mixed condition could have been quantified (e.g. using a camera filming through the headset) to shed more light on the reported differences, as also noted in Section 4.5, as well as the registration error between the real and the virtual hand (as visible in Figure 4.1, Mixed).}{This has been added.}

 To ensure the same \FoV in all \factor{Visual Rendering} condition, a cardboard mask was attached to the \AR headset (\figref{experiment/headset}).
 In the \level{Virtual} rendering, the mask only had holes for sensors to block the view of the \RE and simulate a \VR headset.