Update xr-perception chapter

2024-09-27 09:23:52 +02:00
parent c89299649e
commit 8a85b14d3b
7 changed files with 177 additions and 133 deletions
--- a/2-perception/vhar-system/2-method.tex
+++ b/2-perception/vhar-system/2-method.tex
@@ -1,6 +1,6 @@
 %With a vibrotactile actuator attached to a hand-held device or directly on the finger, it is possible to simulate virtual haptic sensations as vibrations, such as texture, friction or contact vibrations \cite{culbertson2018haptics}.
 %
-In this section, we describe a system for rendering vibrotactile roughness texture in real time, on any tangible surface, touched directly with the index fingertip, with no constraints on hand movement and using a simple camera to track the finger pose.
+In this section, we describe a system for rendering vibrotactile roughness textures in real time, on any tangible surface, touched directly with the index fingertip, with no constraints on hand movement and using a simple camera to track the finger pose.
 %
 We also describe how to pair this tactile rendering with an immersive \AR or \VR headset visual display to provide a coherent, multimodal visuo-haptic augmentation of the real environment.

@@ -11,22 +11,22 @@ The visuo-haptic texture rendering system is based on
 %
 \begin{enumerate*}[label=(\arabic*)]
 \item a real-time interaction loop between the finger movements and a coherent visuo-haptic feedback simulating the sensation of a touched texture,
-\item a precise alignement of the virtual environment with its real counterpart,
-\item and a modulation of the signal frequency by the estimated finger speed with a phase matching.
+\item a precise alignment of the virtual environment with its real counterpart, and
+\item a modulation of the signal frequency by the estimated finger speed with a phase matching.
 \end{enumerate*}
 %
-\figref{diagram} shows the diagram of the interaction loop and \eqref{signal} the definition of the vibrotactile signal.
+\figref{diagram} shows the interaction loop diagram and \eqref{signal} the definition of the vibrotactile signal.
 %
-The system is composed of three main components: the pose estimation of the tracked real elements, the visual rendering of the virtual environment, and the vibrotactile signal generation and rendering.
+The system consists of three main components: the pose estimation of the tracked real elements, the visual rendering of the virtual environment, and the vibrotactile signal generation and rendering.

 \figwide[1]{diagram}{Diagram of the visuo-haptic texture rendering system. }[
  Fiducial markers attached to the voice-coil actuator and to tangible surfaces to track are captured by a camera.
  The positions and rotations (the poses) ${}^c\mathbf{T}_i$, $i=1..n$ of the $n$ defined markers in the camera frame $\mathcal{F}_c$ are estimated, then filtered with an adaptive low-pass filter.
-  %These poses are transformed to the AR/VR headset frame $\mathcal{F}_h$ and applied to the virtual model replicas to display them superimposed and aligned with the real environment.
+  %These poses are transformed to the \AR/\VR headset frame $\mathcal{F}_h$ and applied to the virtual model replicas to display them superimposed and aligned with the real environment.
  These poses are used to move and display the virtual model replicas aligned with the real environment.
  A collision detection algorithm detects a contact of the virtual hand with the virtual textures.
  If so, the velocity of the finger marker ${}^c\dot{\mathbf{X}}_f$ is estimated using discrete derivative of position and adaptive low-pass filtering, then transformed onto the texture frame $\mathcal{F}_t$.
-  The vibrotactile signal $s_k$ is generated by modulating the finger velocity ${}^t\hat{\dot{X}}_f$ in the texture direction with the texture period $\lambda$ (\eqref{signal}).
+  The vibrotactile signal $s_k$ is generated by modulating the (scalar) finger velocity ${}^t\hat{\dot{X}}_f$ in the texture direction with the texture period $\lambda$ (\eqref{signal}).
  The signal is sampled at 48~kHz and sent to the voice-coil actuator via an audio amplifier.
  All computation steps except signal sampling are performed at 60~Hz and in separate threads to parallelize them.
 ]
@@ -47,19 +47,21 @@ The system is composed of three main components: the pose estimation of the trac

 A fiducial marker (AprilTag) is glued to the top of the actuator (\figref{device}) to track the finger pose with a camera (StreamCam, Logitech) which is placed above the experimental setup and capturing \qtyproduct{1280 x 720}{px} images at \qty{60}{\hertz} (\figref{apparatus}).
 %
-Other markers are placed on the tangible surfaces to augment to estimate the relative position of the finger with respect to the surfaces (\figref{setup}).
+Other markers are placed on the tangible surfaces to augment (\figref{setup}). %  to estimate the relative position of the finger with respect to the surfaces
 %
 Contrary to similar work which either constrained hand to a constant speed to keep the signal frequency constant \cite{asano2015vibrotactile,friesen2024perceived}, or used mechanical sensors attached to the hand \cite{friesen2024perceived,strohmeier2017generating}, using vision-based tracking allows both to free the hand movements and to augment any tangible surface.
 %
-A camera external to the AR/VR headset with a marker-based technique is employed to provide accurate and robust tracking with a constant view of the markers \cite{marchand2016pose}.
+A camera external to the \AR/\VR headset with a marker-based technique is employed to provide accurate and robust tracking with a constant view of the markers \cite{marchand2016pose}.
 %
-To reduce the noise the pose estimation while maintaining a good responsiveness, the 1€ filter \cite{casiez2012filter} is applied.
+We denote ${}^c\mathbf{T}_i$, $i=1..n$ the homogenous transformation matrix that defines the position and rotation of the $i$-th marker out of the $n$ defined markers in the camera frame $\mathcal{F}_c$, \eg the finger pose ${}^c\mathbf{T}_f$ and the texture pose ${}^c\mathbf{T}_t$.
 %
-It is a low-pass filter with an adaptive cutoff frequency, specifically designed for tracking human motion.
+To reduce the noise in the pose estimation while maintaining good responsiveness, the 1€ filter \cite{casiez2012filter} is applied; a low-pass filter with an adaptive cutoff frequency, specifically designed for human motion tracking..
+%
+The filtered pose is denoted as ${}^c\hat{\mathbf{T}}_i$.
 %
 The optimal filter parameters were determined using the method of \textcite{casiez2012filter}, with a minimum cutoff frequency of \qty{10}{\hertz} and a slope of \num{0.01}.
 %
-The velocity of the marker is estimated using the discrete derivative of the position and an other 1€ filter with the same parameters.
+The velocity (without angular velocity) of the marker, denoted as ${}^c\dot{\mathbf{X}}_i$, is estimated using the discrete derivative of the position and an other 1€ filter with the same parameters.

 To be able to compare virtual and augmented realities, we then create a virtual environment that closely replicate the real one.
 %Before a user interacts with the system, it is necessary to design a virtual environment that will be registered with the real environment during the experiment.
@@ -74,11 +76,11 @@ This allows to detect if a finger touches a virtual texture using a collision de

 In our implementation, the virtual hand and environment are designed with Unity and the Mixed Reality Toolkit (MRTK).
 %
-The visual rendering is achieved using the Microsoft HoloLens~2, an OST-AR headset with a \qtyproduct{43 x 29}{\degree} field of view (FoV), a \qty{60}{\Hz} refresh rate, and self-localisation capabilities.
+The visual rendering is achieved using the Microsoft HoloLens~2, an \OST-\AR headset with a \qtyproduct{43 x 29}{\degree} \FoV, a \qty{60}{\Hz} refresh rate, and self-localisation capabilities.
 %
-It was chosen over VST-AR because OST-AR only adds virtual content to the real environment, while VST-AR streams a real-time video capture of the real environment \cite{macedo2023occlusion}.
+It was chosen over \VST-\AR because \OST-\AR only adds virtual content to the real environment, while \VST-\AR streams a real-time video capture of the real environment \cite{macedo2023occlusion}.
 %
-Indeed, one of our objectives (\secref{experiment}) is to directly compare a virtual environment that replicates a real one. %, rather than a video feed that introduces many supplementary visual limitations.
+Indeed, one of our objectives (\secref{experiment}) is to directly compare a virtual environment that replicates a real one, rather than a video feed that introduces many supplementary visual limitations \cite{kim2018revisiting,macedo2023occlusion}.
 %
 To simulate a \VR headset, a cardboard mask (with holes for sensors) is attached to the headset to block the view of the real environment (\figref{headset}).

@@ -89,49 +91,64 @@ A voice-coil actuator (HapCoil-One, Actronika) is used to display the vibrotacti
 %
 The voice-coil actuator is encased in a 3D printed plastic shell and firmly attached to the middle phalanx of the user's index finger with a Velcro strap, to enable the fingertip to directly touch the environment (\figref{device}).
 %
-The actuator is driven by a Class D audio amplifier (XY-502 / TPA3116D2, Texas Instrument). %, which has proven to be an effective type of amplifier for driving moving-coil \cite{mcmahan2014dynamic}.
+The actuator is driven by a class D audio amplifier (XY-502 / TPA3116D2, Texas Instrument). %, which has proven to be an effective type of amplifier for driving moving-coil \cite{mcmahan2014dynamic}.
 %
 The amplifier is connected to the audio output of a computer that generates the signal using the WASAPI driver in exclusive mode and the NAudio library.

 The represented haptic texture is a series of parallels virtual grooves and ridges, similar to real grating textures manufactured for psychophysical roughness perception studies \cite{friesen2024perceived,klatzky2003feeling,unger2011roughness}.
 %
-It is generated as a square wave audio signal, sampled at \qty{48}{\kilo\hertz}, with a period $\lambda$ (usually in the millimetre range) and an amplitude $A$.
+It is generated as a square wave audio signal $s_k$, sampled at \qty{48}{\kilo\hertz}, with a period $\lambda$ and an amplitude $A$.
 %
-A sample $s_k$ of the audio signal at sampling time $t_k$ is given by:
+Its frequency is a ratio of the absolute finger filtered (scalar) velocity ${}^t\hat{\dot{|X|}}_f$, transformed into the texture frame $\mathcal{F}_t$, and the texture period $\lambda$ \cite{friesen2024perceived}.
+%
+As the finger is moving horizontally on the texture, only the $x$ component of the velocity is used.
+%
+%This velocity modulation strategy is necessary as the finger position is estimated at a far lower rate (\qty{60}{\hertz}) than the audio signal.
+%
+%
+%As the finger position is estimated at a far lower rate (\qty{60}{\hertz}), the filtered finger (scalar) position ${}^t\hat{X}_f$ in the texture frame $\mathcal{F}_t$ cannot be directly used. % to render the signal if the finger moves fast or if the texture period is small.
+%
+%The best strategy instead is to modulate the frequency of the signal as a ratio of the filtered finger velocity ${}^t\hat{\dot{\mathbf{X}}}_f$ and the texture period $\lambda$ \cite{friesen2024perceived}.
+%
+When a new finger velocity ${}^t\hat{\dot{X}}_{f,j}$ is estimated at time $t_j$, the phase $\phi_j$ of the signal $s$ needs also to be adjusted to ensure a continuity in the signal.
+%
+In other words, the sampling of the audio signal runs at \qty{48}{\kilo\hertz}, and its frequency and phase is updated at a far lower rate of \qty{60}{\hertz} when a new finger velocity is estimated.
+%
+A sample $s_k$ of the audio signal at sampling time $t_k$, with $t_k >= t_j$, is thus given by:
 %
 \begin{subequations}
  \label{eq:signal}
  \begin{align}
-    s(x_{f,j}, t_k) & = A \text{\,sgn} ( \sin (2 \pi \frac{\dot{x}_{f,j}}{\lambda} t_k + \phi_j) ) & \label{eq:signal_speed} \\
-    \phi_j          & = \phi_{j-1} + 2 \pi \frac{x_{f,j} - x_{f,{j-1}}}{\lambda} t_k               & \label{eq:signal_phase}
+    s_k(x_{f,j}, t_k) & = A\, \text{sgn} ( \sin (2 \pi \frac{|\dot{X}_{f,j}|}{\lambda} t_k + \phi_j) ) & \label{eq:signal_speed} \\
+    \phi_j            & = \phi_{j-1} + 2 \pi \frac{x_{f,j} - x_{f,{j-1}}}{\lambda} t_k                 & \label{eq:signal_phase}
  \end{align}
 \end{subequations}
 %
-This is a common rendering method for vibrotactile textures, with well-defined parameters, that has been employed to modify perceived haptic roughness of a tangible surface \cite{asano2015vibrotactile,konyo2005tactile,ujitoko2019modulating}.
+%This is a common rendering method for vibrotactile textures, with well-defined parameters, that has been employed to modify perceived haptic roughness of a tangible surface \cite{asano2015vibrotactile,konyo2005tactile,ujitoko2019modulating}.
 %
-As the finger position is estimated at a far lower rate (\qty{60}{\hertz}) than the audio signal, the finger position $x_f$ cannot be directly used to render the signal if the finger moves fast or if the texture period is small.
+%As the finger position is estimated at a far lower rate (\qty{60}{\hertz}) than the audio signal, the finger position $x_f$ cannot be directly used to render the signal if the finger moves fast or if the texture period is small.
 %
-The best strategy instead is to modulate the frequency of the signal $s$ as a ratio of the finger velocity $\dot{x}_f$ and the texture period $\lambda$ \cite{friesen2024perceived}.
+%The best strategy instead is to modulate the frequency of the signal $s$ as a ratio of the finger velocity $\dot{x}_f$ and the texture period $\lambda$ \cite{friesen2024perceived}.
 %
-This is important because it preserves the sensation of a constant spatial frequency of the virtual texture while the finger moves at various speeds, which is crucial for the perception of roughness \cite{klatzky2003feeling,unger2011roughness}.
+This rendering preserves the sensation of a constant spatial frequency of the virtual texture while the finger moves at various speeds, which is crucial for the perception of roughness \cite{klatzky2003feeling,unger2011roughness}.
 %
-Note that the finger position and velocity are transformed from the camera frame $\mathcal{F}_c$ to the texture frame $\mathcal{F}_t$, with the $x$ axis aligned with the texture direction.
+%Note that the finger position and velocity are transformed from the camera frame $\mathcal{F}_c$ to the texture frame $\mathcal{F}_t$, with the $x$ axis aligned with the texture direction.
 %
-However, when a new finger position is estimated at time $t_j$, the phase $\phi_j$ needs to be adjusted as well with the frequency to ensure a continuity in the signal as described in \eqref{signal}.
+%However, when a new finger position is estimated at time $t_j$, the phase $\phi_j$ needs to be adjusted as well with the frequency to ensure a continuity in the signal as described in \eqref{signal_phase}.
 %
-This approach avoids sudden changes in the actuator movement thus affecting the texture perception in an uncontrolled way (\figref{phase_adjustment}) and, contrary to previous work \cite{asano2015vibrotactile,friesen2024perceived}, it enables no constraints a free exploration of the texture by the user with no constraints on the finger speed.
+The phase matching avoids sudden changes in the actuator movement thus affecting the texture perception in an uncontrolled way (\figref{phase_matching}) and, contrary to previous work \cite{asano2015vibrotactile,friesen2024perceived}, it enables no constraints a free exploration of the texture by the user with no constraints on the finger speed.
 %
-Finally, as \textcite{ujitoko2019modulating}, a square wave is chosen over a sine wave to get a rendering closer to a real grating texture with the sensation of crossing edges, and because the roughness perception of sine wave textures has been shown not to reproduce the roughness perception of real grating textures \cite{unger2011roughness}.
+Finally, a square wave is chosen to get a rendering closer to a real grating texture with the sensation of crossing edges \cite{ujitoko2019modulating}, and because the roughness perception of sine wave textures has been shown not to reproduce the roughness perception of real grating textures \cite{unger2011roughness}.
 %
 %And secondly, to be able to render low frequencies that occurs when the finger moves slowly or the texture period is large, as the actuator cannot render frequencies below \qty{\approx 20}{\Hz} with enough amplitude to be perceived with a pure sine wave signal.
 %
 The tactile texture is described and rendered in this work as a one dimensional signal by integrating the relative finger movement to the texture on a single direction, but it is easily extended to a two-dimensional texture by simply generating a second signal for the orthogonal direction and summing the two signals in the rendering.

 \fig[0.7]{phase_adjustment}{
-  Change in frequency of a sinusoidal signal with (light green) and without phase matching (in dark green).
+  Change in frequency of a sinusoidal signal with and without phase matching.
 }[
-  The phase matching ensures a continuity in the signal and avoids glitches in the rendering of the signal.
-  A sinusoidal signal is show here for clarity, but a different waveform, such as a square wave, will give a similar effect.
+  Phase matching ensures a continuity and avoids glitches in the rendering of the signal.
+  A sinusoidal signal is shown here for clarity, but a different waveform will give a similar effect.
 ]

 \section{System Latency}
@@ -139,7 +156,7 @@ The tactile texture is described and rendered in this work as a one dimensional

 %As shown in \figref{diagram} and described above, the system includes various haptic and visual sensors and rendering devices linked by software processes for image processing, 3D rendering and audio generation.
 %
-Because the chosen \AR headset is a standalone device (like most current AR/VR headsets) and cannot directly control the sound card and haptic actuator, the image capture, pose estimation and audio signal generation steps are performed on an external computer.
+Because the chosen \AR headset is a standalone device (like most current \AR/\VR headsets) and cannot directly control the sound card and haptic actuator, the image capture, pose estimation and audio signal generation steps are performed on an external computer.
 %
 All computation steps run in a separate thread to parallelize them and reduce latency, and are synchronised with the headset via a local network and the ZeroMQ library.
 %
@@ -166,4 +183,3 @@ With respect to the real hand position, it causes a distance error in the displa
 This is proportional to the speed of the finger, \eg distance error is \qty{12 +- 2.3}{\mm} when the finger moves at \qty{75}{\mm\per\second}.
 %
 %and of the vibrotactile signal frequency with respect to the finger speed.%, that is proportional to the speed of the finger.
-%
--- a/2-perception/xr-perception/1-introduction.tex
+++ b/2-perception/xr-perception/1-introduction.tex
@@ -1,6 +1,6 @@
-% Insist on the advantage of wearable : augment any surface see bau2012revel
+% Delivers the motivation for your paper. It explains why you did the work you did.

-% Even before manipulating a visual representation to induce a haptic sensation, shifts and latencies between user input and co-localised visuo-haptic feedback can be experienced differently in \AR and \VR, which we aim to investigate in this work.
+% Insist on the advantage of wearable : augment any surface see bau2012revel

 %Imagine you're an archaeologist or in a museum, and you want to examine an ancient object.
 %
@@ -12,7 +12,7 @@
 %
 %Such tactile augmentation is made possible by wearable haptic devices, which are worn directly on the finger or hand and can provide a variety of sensations on the skin, while being small, light and discreet \cite{pacchierotti2017wearable}.
 %
-Wearable haptic devices, worn directly on the finger or hand, have been used to render a variety of tactile sensations to virtual objects seen in \VR \cite{choi2018claw,detinguy2018enhancing,pezent2019tasbi} or \AR \cite{maisto2017evaluation,meli2018combining,teng2021touch}.
+Wearable haptic devices, worn directly on the finger or hand, have been used to render a variety of tactile sensations to virtual objects in \VR \cite{detinguy2018enhancing,pezent2019tasbi} and \AR \cite{maisto2017evaluation,teng2021touch}.
 %
 They have also been used to alter the perception of roughness, stiffness, friction, and local shape perception of real tangible objects \cite{asano2015vibrotactile,detinguy2018enhancing,salazar2020altering}.
 %
@@ -28,9 +28,9 @@ Although \AR and \VR are closely related, they have significant differences that
 %
 %Current \AR systems also suffer from display and rendering limitations not present in \VR, affecting the user experience with virtual content that may be less realistic or inconsistent with the real augmented environment \cite{kim2018revisiting,macedo2023occlusion}.
 %
-It therefore seems necessary to investigate and understand the potential effect of these differences in visual rendering on the perception of haptically augmented tangible objects.
+Therefore, it seems necessary to investigate and understand the potential effect of these differences in visual rendering on the HAR perception.
 %
-Previous works have shown, for example, that the stiffness of a virtual piston rendered with a force feedback haptic system seen in \AR is perceived as less rigid than in \VR \cite{gaffary2017ar} or when the visual rendering is ahead of the haptic rendering \cite{diluca2011effects,knorlein2009influence}.
+For example, previous works have shown that the stiffness of a virtual piston rendered with a force feedback haptic system seen in \AR is perceived as less rigid than in \VR \cite{gaffary2017ar}, or when the visual rendering is ahead of the haptic rendering \cite{diluca2011effects,knorlein2009influence}.
 %
 %Taking our example from the beginning of this introduction, you now want to learn more about the context of the discovery of the ancient object or its use at the time of its creation by immersing yourself in a virtual environment in \VR.
 %
@@ -39,16 +39,24 @@ Previous works have shown, for example, that the stiffness of a virtual piston r
 The goal of this paper is to study the role of the visual rendering of the hand (real or virtual) and its environment (AR or \VR) on the perception of a tangible surface whose texture is augmented with a wearable vibrotactile device worn on the finger.
 %
 We focus on the perception of roughness, one of the main tactile sensations of materials \cite{baumgartner2013visual,hollins1993perceptual,okamoto2013psychophysical} and one of the most studied haptic augmentations \cite{asano2015vibrotactile,culbertson2014modeling,friesen2024perceived,strohmeier2017generating,ujitoko2019modulating}.
-%
-By understanding how these visual factors influence the perception of haptically augmented tangible objects, the many wearable haptic systems that already exist but have not yet been fully explored with \AR can be better applied and new visuo-haptic renderings adapted to \AR can be designed.

 Our contributions are:
 %
 \begin{itemize}
-  \item A system for rendering virtual vibrotactile roughness textures in real time on a tangible surface touched directly with the finger, integrated with an immersive visual AR/VR headset to provide a coherent multimodal visuo-haptic augmentation of the real environment.
-  \item A psychophysical study with 20 participants to evaluate the perception of these virtual roughness textures in three visual rendering conditions: without visual augmentation, with a realistic virtual hand rendering in \AR, and with the same virtual hand in \VR.
+  \item A system for rendering virtual vibrotactile roughness textures in real time on a tangible surface touched directly with the finger, integrated with an immersive visual \AR/\VR headset to provide a coherent multimodal visuo-haptic augmentation of the real environment; and %It is presented in \secref{method}.
+  \item A psychophysical study with 20 participants to evaluate the perception of these virtual roughness textures in three visual rendering conditions: without visual augmentation, with a realistic virtual hand rendering in \AR, and with the same virtual hand in \VR. %It is described in \secref{experiment} and those results are detailed in \secref{discussion}.
 \end{itemize}
-%First, we present a system for rendering virtual vibrotactile textures in real time without constraints on hand movements and integrated with an immersive visual AR/VR headset to provide a coherent multimodal visuo-haptic augmentation of the real environment.
+
+%In the remainder of this paper, we first present related work on wearable haptic texture augmentations and the haptic perception in \AR and \VR in \secref{related_work}.
+%
+%We then describe the visuo-haptic texture rendering system in \secref{method}.
+%
+%We present the experimental protocol and apparatus of the user study in \secref{experiment}, and the results obtained in \secref{results}.
+%
+%We discuss these results in \secref{discussion}, and conclude in \secref{conclusion}.
+
+%In the remainder of this paper, we first present related work on perception in \VR and \AR in Section 2. Then, in Section 3, we describe the protocol and apparatus of our experimental study. The results obtained are presented in Section 4, followed by a discussion in Section 5. The paper ends with a general conclusion in Section 6.
+%First, we present a system for rendering virtual vibrotactile textures in real time without constraints on hand movements and integrated with an immersive visual \AR/\VR headset to provide a coherent multimodal visuo-haptic augmentation of the real environment.
 %
 %An experimental setup is then presented to compare haptic roughness augmentation with an optical \AR headset (Microsoft HoloLens~2) that can be transformed into a \VR headset using a cardboard mask.
 %
--- a/2-perception/xr-perception/3-experiment.tex
+++ b/2-perception/xr-perception/3-experiment.tex
@@ -1,22 +1,6 @@
 \section{User Study}
 \label{experiment}

-\begin{subfigs}{renderings}{
-    The three visual rendering conditions and the experimental procedure of the \TIFC psychophysical study.
-  }[
-    During a trial, two tactile textures were rendered on the augmented area of the paper sheet (black rectangle) for \qty{3}{\s} each, one after the other, then the participant chose which one was the roughest.
-    The visual rendering stayed the same during the trial.
-    %The pictures are captured directly from the Microsoft HoloLens 2 headset.
-  ][
-  \item The real environment and real hand view without any visual augmentation.
-  \item The real environment and hand view with the virtual hand.
-  \item Virtual environment with the virtual hand.
-  ]
-  \subfig[0.32]{experiment/real}
-  \subfig[0.32]{experiment/mixed}
-  \subfig[0.32]{experiment/virtual}
-\end{subfigs}
-
 The visuo-haptic rendering system, described in \secref[vhar_system]{method}, allows free exploration of virtual vibrotactile textures on tangible surfaces directly touched with the bare finger to simulate roughness augmentation, while the visual rendering of the hand and environment can be controlled to be in \AR or \VR.
 %
 The user study aimed to investigate the effect of visual hand rendering in \AR or \VR on the perception of roughness texture augmentation. % of a touched tangible surface.
@@ -28,15 +12,15 @@ In order not to influence the perception, as vision is an important source of in
 \subsection{Participants}
 \label{participants}

-Twenty participants were recruited for the study (16 males, 3 females, 1 prefer not to say), aged between 18 and 61 years old (\median{26}{}, \iqr{6.8}{}).
+Twenty participants were recruited for the study (16 males, 3 females, 1 preferred not to say), aged between 18 and 61 years (\median{26}{}, \iqr{6.8}{}).
 %
-All participants had normal or corrected-to-normal vision, none of them had a known hand or finger impairment.
+All participants had normal or corrected-to-normal vision, and none had a known hand or finger impairment.
 %
-One was left-handed while the rest were right-handed; they all performed the task with their right index.
+One was left-handed and the rest were right-handed; they all performed the task with their right index.
 %
-In rating their experience with haptics, \AR and \VR (\enquote{I use it several times a year}), 12 were experienced with haptics, 5 with \AR, and 10 with \VR.
+When rating their experience with haptics, \AR and \VR (\enquote{I use it several times a year}), 12 were experienced with haptics, 5 with \AR, and 10 with \VR.
 %
-Experiences were correlated between haptics and \VR (\pearson{0.59}), and \AR and \VR (\pearson{0.67}) but not haptics and \AR (\pearson{0.20}) nor haptics, \AR, or \VR with age (\pearson{0.05} to \pearson{0.12}).
+Experience was correlated between haptics and \VR (\pearson{0.59}), and \AR and \VR (\pearson{0.67}), but not haptics and \AR (\pearson{0.20}), nor haptics, \AR, or \VR with age (\pearson{0.05} to \pearson{0.12}).
 %
 Participants were recruited at the university on a voluntary basis.
 %
@@ -45,19 +29,19 @@ They all signed an informed consent form before the user study and were unaware
 \subsection{Apparatus}
 \label{apparatus}

-An experimental environment similar as \textcite{gaffary2017ar} was created to ensure a similar visual rendering in \AR and \VR (\figref{renderings}).
+An experimental environment was created to ensure a similar visual rendering in \AR and \VR (\figref{renderings}).
 %
-It consisted of a \qtyproduct{300 x 210 x 400}{\mm} medium-density fibreboard (MDF) box with a paper sheet glued inside, and a \qtyproduct{15 x 5}{\mm} rectangle printed on the sheet to delimit the area where the tactile textures were rendered.
+It consisted of a \qtyproduct{300 x 210 x 400}{\mm} medium-density fibreboard (MDF) box with a paper sheet glued inside and a \qtyproduct{50 x 15}{\mm} rectangle printed on the sheet to delimit the area where the tactile textures were rendered.
 %
 A single light source of \qty{800}{\lumen} placed \qty{70}{\cm} above the table fully illuminated the inside of the box.
 %
-Participants rated the roughness of the paper (without any texture augmentation) before the experiment on a 7-point Likert scale (1 = Extremely smooth, 7 = Extremely rough) as quite smooth (\mean{2.5}, \sd{1.3}).
+Participants rated the roughness of the paper (without any texture augmentation) before the experiment on a 7-point Likert scale (1~=~Extremely smooth, 7~=~Extremely rough) as quite smooth (\mean{2.5}, \sd{1.3}).

 %The visual rendering of the virtual hand and environment was achieved using the Microsoft HoloLens~2, an OST-AR headset with a \qtyproduct{43 x 29}{\degree} field of view (FoV) and a \qty{60}{\Hz} refresh rate, running a custom application made with Unity 2021.1.0f1 and Mixed Reality Toolkit (MRTK) 2.7.2.
 %f
-The virtual environment was carefully reproducing the real environment including the geometry of the box, the textures, the lighting, and the shadows (\figref{renderings}, \level{Virtual}).
+The virtual environment carefully reproduced the real environment, including the geometry of the box, textures, lighting, and shadows (\figref{renderings}, \level{Virtual}).
 %
-The virtual hand model was a gender-neutral human right hand with realistic skin texture, similar to the one used by \textcite{schwind2017these}.
+The virtual hand model was a gender-neutral human right hand with realistic skin texture, similar to that used by \textcite{schwind2017these}.
 %
 Its size was adjusted to match the real hand of the participants before the experiment.
 %
@@ -67,11 +51,11 @@ The visual rendering of the virtual hand and environment is described in \secref
 %
 %In the \level{Virtual} rendering, a cardboard mask (with holes for sensors) was attached to the headset to block the view of the real environment and simulate a \VR headset (\figref{method/headset}).
 %
-To ensure for the same FoV in all \factor{Visual Rendering} condition, a cardboard mask was attached to the \AR headset (\figref{method/headset}).
+To ensure the same \FoV in all \factor{Visual Rendering} condition, a cardboard mask was attached to the \AR headset (\figref{method/headset}).
 %
-In the \level{Virtual} rendering, the mask had only holes for sensors to block the view of the real environment and simulate a \VR headset.
+In the \level{Virtual} rendering, the mask only had holes for sensors to block the view of the real environment and simulate a \VR headset.
 %
-In the \level{Mixed} and \level{Real} conditions, the mask had two additional holes for the eyes that matched the FoV of the HoloLens~2 (\figref{method/headset}).
+In the \level{Mixed} and \level{Real} conditions, the mask had two additional holes for the eyes that matched the \FoV of the HoloLens~2 (\figref{method/headset}).
 %
 \figref{renderings} shows the resulting views in the three considered \factor{Visual Rendering} conditions.

@@ -79,7 +63,7 @@ In the \level{Mixed} and \level{Real} conditions, the mask had two additional ho
 %
 %This voice-coil was chosen for its wide frequency range (\qtyrange{10}{1000}{\Hz}) and its relatively low acceleration distortion, as specified by the manufacturer\footnotemark[1].
 %
-%It was driven by an audio amplifier (XY-502, not branded) connected to a computer that generated the audio signal of the textures as described in \secref{xr_perception:method}, using the NAudio library and the WASAPI driver in exclusive mode.
+%It was driven by an audio amplifier (XY-502, not branded) connected to a computer that generated the audio signal of the textures as described in \secref{method}, using the NAudio library and the WASAPI driver in exclusive mode.
 %
 %The position of the finger relative to the sheet was estimated using a webcam placed on top of the box (StreamCam, Logitech) and the OpenCV library by tracking a \qty{2}{\cm} square fiducial marker (AprilTag) glued to top of the vibrotactile actuator.
 %
@@ -91,11 +75,13 @@ Participants sat comfortably in front of the box at a distance of \qty{30}{\cm},
 %
 %A vibrotactile voice-coil actuator (HapCoil-One, Actronika) was encased in a 3D printed plastic shell with a \qty{2}{\cm} AprilTag glued to top, and firmly attached to the middle phalanx of the right index finger of the participants using a Velcro strap.
 %
-The generation of the virtual texture and the control of the virtual hand is described in \secref{method}.
+The generation of the virtual texture and the control of the virtual hand are described in \secref{method}.
 %
 They also wore headphones with a pink noise masking the sound of the voice-coil.
 %
-The user study was held in a quiet room with no windows.
+The experimental setup was held in a quiet room with no windows.
+%
+The user study took on average one hour to complete.

 \subsection{Procedure}
 \label{procedure}
@@ -130,7 +116,11 @@ The order of presentation was randomised and not revealed to the participants.
 %
 All textures were rendered as described in \secref{texture_generation} with period $\lambda$ of \qty{2}{\mm}, but with different amplitudes $A$ to create different levels of roughness.
 %
-Preliminary studies allowed us to determine a range of amplitudes that could be felt by the participants and were not too uncomfortable, and the reference texture was chosen to be the one with the middle amplitude.
+Preliminary studies allowed us to determine a range of amplitudes that could be felt by the participants and were not too uncomfortable.
+%
+The reference texture was chosen to be the one with the middle amplitude to compare it with lower and higher roughness levels and to determine key perceptual variables such as the \PSE and the \JND of each \factor{Visual Rendering} condition.
+%
+The chosen \TIFC task is a common psychophysical method used in haptics to determine \PSE and \JND by testing comparison stimuli against a fixed reference stimulus and by fitting a psychometric function to the participant's responses \cite{jones2013application}.

 \subsection{Experimental Design}
 \label{experimental_design}
@@ -138,17 +128,17 @@ Preliminary studies allowed us to determine a range of amplitudes that could be
 The user study was a within-subjects design with two factors:
 %
 \begin{itemize}
-  \item \factor{Visual Rendering}, consisting of the augmented or virtual view of the environment, the hand and the wearable haptic device, with 3 levels: real environment and real hand view without any visual augmentation (\figref{renderings}, \level{Real}), real environment and hand view with the virtual hand (\figref{renderings}, \level{Mixed}) and virtual environment with the virtual hand (\figref{renderings}, \level{Virtual}).
-  \item \factor{Amplitude Difference}, consisting of the difference in amplitude between the comparison and the reference textures, with 6 levels: \qtylist{0; +-12.5; +-25.0; +-37.5}{\%}.
+  \item \factor{Visual Rendering} consists of the augmented or virtual view of the environment, the hand and the wearable haptic device, with 3 levels: real environment and real hand view without any visual augmentation (\figref{renderings}, \level{Real}), real environment and hand view with the virtual hand (\figref{renderings}, \level{Mixed}) and virtual environment with the virtual hand (\figref{renderings}, \level{Virtual}).
+  \item \factor{Amplitude Difference} consists of the difference in amplitude of the comparison texture with the reference texture (which is identical for all visual renderings), with 6 levels: \qtylist{+-12.5; +-25.0; +-37.5}{\%}.
 \end{itemize}

-A trial consisted on a \TIFC task where a participant had to touch two virtual vibrotactile textures one after the other and decide which one was the roughest.
+A trial consisted of a \TIFC task in which the participant touched two virtual vibrotactile textures one after the other and decided which one was the roughest.
 %
 To avoid any order effect, the order of \factor{Visual Rendering} conditions was counterbalanced between participants using a balanced Latin square design.
 %
-Within each condition, the order of presentation of the reference and comparison textures was also counterbalanced, and all possible texture pairs were presented in random order and repeated three times.
+Within each condition, the presentation order of the reference and comparison textures was also counterbalanced, and all possible texture pairs were presented in random order and repeated three times.
 %
-A total of 3 visual renderings \x 6 amplitude differences \x 2 texture presentation order \x 3 repetitions = 107 trials were performed by each participant.
+A total of 3 visual renderings \x 6 amplitude differences \x 2 texture presentation order \x 3 repetitions = 108 trials were performed by each participant.

 \subsection{Collected Data}
 \label{collected_data}
@@ -157,22 +147,23 @@ For each trial, the \response{Texture Choice} by the participant as the roughest
 %
 The \response{Response Time} between the end of the trial and the choice of the participant was also measured as an indicator of the difficulty of the task.
 %
-At each frame the \response{Finger Position} and \response{Finger Speed} were recorded to control for possible differences in texture exploration behaviour.
+At each frame, the \response{Finger Position} and \response{Finger Speed} were recorded to control for possible differences in texture exploration behaviour.
 %
-After each \factor{Visual Rendering} block of trials, participants rated their experience with the vibrotactile textures (all blocks), the vibrotactile device (all blocks), the virtual hand rendering (all except \level{Mixed} block) and the virtual environment (\level{Virtual} block) using the questions shown in \tabref{questions}.
+Participants also rated their experience after each \factor{Visual Rendering} block of trials using the questions shown in \tabref{questions}.
+%After each \factor{Visual Rendering} block of trials, participants rated their experience with the vibrotactile textures (all blocks), the vibrotactile device (all blocks), the virtual hand rendering (all except \level{Mixed} block) and the virtual environment (\level{Virtual} block) using the questions shown in \tabref{questions}.
 %
-%They also assessed their workload with the NASA Task Load Index (\response{NASA-TLX}) questionnaire after each blocks of trials.
+They also assessed their workload with the NASA Task Load Index (\response{NASA-TLX}) questionnaire after each blocks of trials.
 %
-For all questions, participants were shown only labels (\eg \enquote{Not at all} or \enquote{Extremely}) and not the actual scale values (\eg 1 or 5), following the recommendations of \textcite{muller2014survey}.
+For all questions, participants were shown only labels (\eg \enquote{Not at all} or \enquote{Extremely}) and not the actual scale values (\eg 1 or 5) \cite{muller2014survey}.

 \newcommand{\scalegroup}[2]{\multirow{#1}{1\linewidth}{#2}}
 \begin{tabwide}{questions}
  {Questions asked to participants after each \factor{Visual Rendering} block of trials.}
  [
-    Unipolar scale questions were 5-point Likert scales (1 = Not at all, 2 = Slightly, 3 = Moderately, 4 = Very and 5 = Extremely), and %
-    bipolar scale questions were 7-point Likert scales (1 = Extremely A, 2 = Moderately A, 3 = Slightly A, 4 = Neither A nor B, 5 = Slightly B, 6 = Moderately B, 7 = Extremely B), %
+    Unipolar scale questions were 5-point Likert scales (1 = Not at all, 2 = Slightly, 3 = Moderately, 4 = Very and 5 = Extremely).
+    Bipolar scale questions were 7-point Likert scales (1 = Extremely A, 2 = Moderately A, 3 = Slightly A, 4 = Neither A nor B, 5 = Slightly B, 6 = Moderately B, 7 = Extremely B),
    where A and B are the two poles of the scale (indicated in parentheses in the Scale column of the questions).
-    %, and NASA TLX questions were bipolar 100-points scales (0 = Very Low and 100 = Very High, except for Performance where 0 = Perfect and 100 = Failure). %
+    NASA TLX questions were bipolar 100-points scales (0 = Very Low and 100 = Very High, except for Performance where 0 = Perfect and 100 = Failure).
    Participants were shown only the labels for all questions.
  ]
  \begin{tabularx}{\linewidth}{l X p{0.2\linewidth}}
@@ -198,13 +189,13 @@ For all questions, participants were shown only labels (\eg \enquote{Not at all}
    \midrule
    Virtual Realism      & How realistic was the virtual environment?                                                                         & \scalegroup{2}{Unipolar (1-5)}       \\
    Virtual Similarity   & How similar was the virtual environment to the real one?                                                           &                                      \\
-    %\midrule
-    %Mental Demand        & How mentally demanding was the task?                                                                               & \scalegroup{6}{Bipolar (0-100)}       \\
-    %Temporal Demand      & How hurried or rushed was the pace of the task?                                                                    &                                       \\
-    %Physical Demand      & How physically demanding was the task?                                                                             &                                       \\
-    %Performance          & How successful were you in accomplishing what you were asked to do?                                                &                                       \\
-    %Effort               & How hard did you have to work to accomplish your level of performance?                                             &                                       \\
-    %Frustration          & How insecure, discouraged, irritated, stressed, and annoyed were you?                                              &                                       \\
+    \midrule
+    Mental Demand        & How mentally demanding was the task?                                                                               & \scalegroup{6}{Bipolar (0-100)}       \\
+    Temporal Demand      & How hurried or rushed was the pace of the task?                                                                    &                                       \\
+    Physical Demand      & How physically demanding was the task?                                                                             &                                       \\
+    Performance          & How successful were you in accomplishing what you were asked to do?                                                &                                       \\
+    Effort               & How hard did you have to work to accomplish your level of performance?                                             &                                       \\
+    Frustration          & How insecure, discouraged, irritated, stressed, and annoyed were you?                                              &                                       \\
    \bottomrule
  \end{tabularx}
 \end{tabwide}
--- a/2-perception/xr-perception/4-results.tex
+++ b/2-perception/xr-perception/4-results.tex
@@ -21,9 +21,9 @@ A \GLMM was adjusted to the \response{Texture Choice} in the \TIFC vibrotactile
 %
 The \PSEs (\figref{results/trial_pses}) and \JNDs (\figref{results/trial_jnds}) for each visual rendering and their respective differences were estimated from the model, along with their corresponding \percent{95} \CI, using a non-parametric bootstrap procedure (1000 samples).
 %
-A \PSE represents the estimated amplitude difference at which the comparison texture was perceived as rougher than the reference texture 50\% of the time. %, \ie it is the accuracy of participants in discriminating vibrotactile roughness.
+The \PSE represents the estimated amplitude difference at which the comparison texture was perceived as rougher than the reference texture \percent{50} of the time. %, \ie it is the accuracy of participants in discriminating vibrotactile roughness.
 %
-A \level{Real} rendering had the highest \PSE (\percent{7.9} \ci{1.2}{4.1}) and was statistically significantly different from the \level{Mixed} rendering (\percent{1.9} \ci{-2.4}{6.1}) and from the \level{Virtual} rendering (\percent{5.1} \ci{2.4}{7.6}).
+The \level{Real} rendering had the highest \PSE (\percent{7.9} \ci{1.2}{4.1}) and was statistically significantly different from the \level{Mixed} rendering (\percent{1.9} \ci{-2.4}{6.1}) and from the \level{Virtual} rendering (\percent{5.1} \ci{2.4}{7.6}).
 %
 The \JND represents the estimated minimum amplitude difference between the comparison and reference textures that participants could perceive,
 % \ie the sensitivity to vibrotactile roughness differences,
@@ -50,22 +50,24 @@ All pairwise differences were statistically significant.
 \subsubsection{Response Time}
 \label{response_time}

-A \LMM \ANOVA with by-participant random slopes for \factor{Visual Rendering}, and a log transformation (as \response{Response Time} measures were gamma distributed) indicated a statistically significant effects on \response{Response Time} of \factor{Visual Rendering} (\anova{2}{18}{6.2}, \p{0.009}, see \figref{results/trial_response_times}).
+A \LMM \ANOVA with by-participant random slopes for \factor{Visual Rendering}, and a log transformation (as \response{Response Time} measures were gamma distributed) indicated a statistically significant effect on \response{Response Time} of \factor{Visual Rendering} (\anova{2}{18}{6.2}, \p{0.009}, \figref{results/trial_response_times}).
 %
-Participants took longer on average to respond with the \level{Virtual} rendering (\geomean{1.65}{s} \ci{1.59}{1.72}) than with the \level{Real} rendering (\geomean{1.38}{s} \ci{1.32}{1.43}), which is the only statistically significant difference (\ttest{19}{0.3}, \p{0.005}).
+Reported response times are \GM.
 %
-The \level{Mixed} rendering was in between (\geomean{1.56}{s} \ci{1.49}{1.63}).
+Participants took longer on average to respond with the \level{Virtual} rendering (\geomean{1.65}{\s} \ci{1.59}{1.72}) than with the \level{Real} rendering (\geomean{1.38}{\s} \ci{1.32}{1.43}), which is the only statistically significant difference (\ttest{19}{0.3}, \p{0.005}).
+%
+The \level{Mixed} rendering was in between (\geomean{1.56}{\s} \ci{1.49}{1.63}).

 \subsubsection{Finger Position and Speed}
 \label{finger_position_speed}

 The frames analysed were those in which the participants actively touched the comparison textures with a finger speed greater than \SI{1}{\mm\per\second}.
 %
-A \LMM \ANOVA with by-participant random slopes for \factor{Visual Rendering} indicated only one statistically significant effect on the total distance traveled by the finger in a trial of \factor{Visual Rendering} (\anova{2}{18}{3.9}, \p{0.04}, see \figref{results/trial_distances}).
+A \LMM \ANOVA with by-participant random slopes for \factor{Visual Rendering} indicated only one statistically significant effect on the total distance traveled by the finger in a trial of \factor{Visual Rendering} (\anova{2}{18}{3.9}, \p{0.04}, \figref{results/trial_distances}).
 %
 On average, participants explored a larger distance with the \level{Real} rendering (\geomean{20.0}{\cm} \ci{19.4}{20.7}) than with \level{Virtual} rendering (\geomean{16.5}{\cm} \ci{15.8}{17.1}), which is the only statistically significant difference (\ttest{19}{1.2}, \p{0.03}), with the \level{Mixed} rendering (\geomean{17.4}{\cm} \ci{16.8}{18.0}) in between.
 %
-Another \LMM \ANOVA with by-trial and by-participant random intercepts but no random slopes indicated only one statistically significant effect on \response{Finger Speed} of \factor{Visual Rendering} (\anova{2}{2142}{2.0}, \pinf{0.001}, see \figref{results/trial_speeds}).
+Another \LMM \ANOVA with by-trial and by-participant random intercepts but no random slopes indicated only one statistically significant effect on \response{Finger Speed} of \factor{Visual Rendering} (\anova{2}{2142}{2.0}, \pinf{0.001}, \figref{results/trial_speeds}).
 %
 On average, the textures were explored with the highest speed with the \level{Real} rendering (\geomean{5.12}{\cm\per\second} \ci{5.08}{5.17}), the lowest with the \level{Virtual} rendering (\geomean{4.40}{\cm\per\second} \ci{4.35}{4.45}), and the \level{Mixed} rendering (\geomean{4.67}{\cm\per\second} \ci{4.63}{4.71}) in between.
 %
@@ -86,7 +88,7 @@ All pairwise differences were statistically significant: \level{Real} \vs \level
 \end{subfigs}

 \subsection{Questionnaires}
-\label{questions}
+\label{results_questions}

 %\figref{results/question_heatmaps} shows the median and interquartile range (IQR) ratings to the questions in \tabref{questions} and to the NASA-TLX questionnaire.
 %
@@ -105,11 +107,11 @@ Overall, participants' sense of control over the virtual hand was very high (\re
 %
 The textures were also overall found to be very much caused by the finger movements (\response{Texture Agency}, \num{4.5 +- 1.0}) with a very low perceived latency (\response{Texture Latency}, \num{1.6 +- 0.8}), and to be quite realistic (\response{Texture Realism}, \num{3.6 +- 0.9}) and quite plausible (\response{Texture Plausibility}, \num{3.6 +- 1.0}).
 %
-Participants were mixed between feeling the vibrations on the surface or on the top of their finger (\response{Vibration Location}, \num{3.9 +- 1.7}); the distribution of scores was split between the two poles of the scale with \level{Real} and \level{Mixed} renderings (42.5\% more on surface or on finger top, 15\% neutral), but there was a trend towards the top of the finger in \VR renderings (65\% \vs 25\% more on surface and 10\% neutral), but this difference was not statistically significant neither.
+Participants were mixed between feeling the vibrations on the surface or on the top of their finger (\response{Vibration Location}, \num{3.9 +- 1.7}); the distribution of scores was split between the two poles of the scale with \level{Real} and \level{Mixed} renderings (\percent{42.5} more on surface or on finger top, \percent{15} neutral), but there was a trend towards the top of the finger in VR renderings (\percent{65} \vs \percent{25} more on surface and \percent{10} neutral), but this difference was not statistically significant neither.
 %
 The vibrations were felt a slightly weak overall (\response{Vibration Strength}, \num{4.2 +- 1.1}), and the vibrotactile device was perceived as neither distracting (\response{Device Distraction}, \num{1.2 +- 0.4}) nor uncomfortable (\response{Device Discomfort}, \num{1.3 +- 0.6}).
 %
-%Finally, the overall workload (mean NASA-TLX score) was low (\num{21 +- 14}), with no statistically significant differences found between the visual renderings for any of the subscales or the overall score.
+Finally, the overall workload (mean NASA-TLX score) was low (\num{21 +- 14}), with no statistically significant differences found between the visual renderings for any of the subscales or the overall score.

 %\figwide{results/question_heatmaps}{%
 %
--- a/2-perception/xr-perception/5-discussion.tex
+++ b/2-perception/xr-perception/5-discussion.tex
@@ -4,64 +4,81 @@
 %Interpret the findings in results, answer to the problem asked in the introduction, contrast with previous articles, draw possible implications. Give limitations of the study.

 % But how different is the perception of the haptic augmentation in \AR compared to \VR, with a virtual hand instead of the real hand?
-% The goal of this paper is to study the visual rendering of the hand (real or virtual) and its environment (AR or \VR) on the perception of a tangible surface whose texture is augmented with a wearable vibrotactile device mounted on the finger.
+% The goal of this paper is to study the visual rendering of the hand (real or virtual) and its environment (\AR or \VR) on the perception of a tangible surface whose texture is augmented with a wearable vibrotactile device mounted on the finger.

 The results showed a difference in vibrotactile roughness perception between the three visual rendering conditions.
 %
-Given the estimated \PSE, the textures in the \level{Real} rendering were on average perceived as \enquote{rougher} than in the \level{Virtual} (\percent{-2.8}) and \level{Mixed} (\percent{-6.0}) renderings (\figref{results/trial_pses}).
+Given the estimated \PSEs, the textures were on average perceived as \enquote{rougher} in the \level{Real} rendering than in the \level{Virtual} (\percent{-2.8}) and \level{Mixed} (\percent{-6.0}) renderings (\figref{results/trial_pses}).
 %
-\textcite{gaffary2017ar} found a \PSE difference in the same range between \AR and \VR for perceived stiffness, with the \VR perceived as \enquote{stiffer} and the \AR as \enquote{softer}.
+A \\PSE difference in the same range was found for perceived stiffness, with the \VR perceived as \enquote{stiffer} and the \AR as \enquote{softer} \cite{gaffary2017ar}.
 %
 %However, the difference between the \level{Virtual} and \level{Mixed} conditions was not significant.
 %
-Surprisingly, the \PSE of the \level{Real} rendering was shifted to the right (to be "rougher", \percent{7.9}) compared to the reference texture, whereas the \PSEs of the \level{Virtual} (\percent{5.1}) and \level{Mixed} (\percent{1.9}) renderings were closer to the reference texture, being perceived as \enquote{smoother}  (\figref{results/trial_predictions}).
+Surprisingly, the \\PSE of the \level{Real} rendering was shifted to the right (to be "rougher", \percent{7.9}) compared to the reference texture, whereas the \PSEs of the \level{Virtual} (\percent{5.1}) and \level{Mixed} (\percent{1.9}) renderings were perceived as \enquote{smoother} and closer to the reference texture (\figref{results/trial_predictions}).
 %
-The sensitivity of participants to roughness \JND also varied between all the visual renderings, with the \level{Real} rendering having the best \JND (\percent{26}), followed by the \level{Virtual} (\percent{30}) and \level{Virtual} (\percent{33}) renderings (\figref{results/trial_jnds}).
+The sensitivity of participants to roughness differences also varied, with the \level{Real} rendering having the best \JND (\percent{26}), followed by the \level{Virtual} (\percent{30}) and \level{Virtual} (\percent{33}) renderings (\figref{results/trial_jnds}).
 %
-These \JNDs are in line with and at the upper end of the range of previous studies \cite{choi2013vibrotactile}, which may be due to the location of the actuator on the top of the middle phalanx of the finger, being less sensitive to vibration than the fingertip.
+These \JND values are in line with and at the upper end of the range of previous studies \cite{choi2013vibrotactile}, which may be due to the location of the actuator on the top of the finger middle phalanx, being less sensitive to vibration than the fingertip.
 %
-Thus, compared to no visual rendering (\level{Real}), the addition of a visual rendering of the hand or environment reduced the roughness sensitivity (\JND) and the average roughness perception (\PSE), as if the virtual haptic textures felt \enquote{smoother}.
+Thus, compared to no visual rendering (\level{Real}), the addition of a visual rendering of the hand or environment reduced the roughness sensitivity (\JND) and the roughness perception (\PSE), as if the virtual vibrotactile textures felt \enquote{smoother}.

 Differences in user behaviour were also observed between the visual renderings (but not between the haptic textures).
 %
 On average, participants responded faster (\percent{-16}), explored textures at a greater distance (\percent{+21}) and at a higher speed (\percent{+16}) without visual augmentation (\level{Real} rendering) than in \VR (\level{Virtual} rendering) (\figref{results_finger}).
 %
-The \level{Mixed} rendering, displaying both the real and virtual hands, was always in between, with no significant difference from the other two renderings.
+The \level{Mixed} rendering was always in between, with no significant difference from the other two.
 %
 This suggests that touching a virtual vibrotactile texture on a tangible surface with a virtual hand in \VR is different from touching it with one's own hand: users were more cautious or less confident in their exploration in \VR.
 %
-This seems not due to the realism of the virtual hand or environment, nor the control of the virtual hand, that were all rated high to very high by the participants (\secref{questions}) in both the \level{Mixed} and \level{Virtual} renderings.
+This does not seem to be due to the realism of the virtual hand or the environment, nor to the control of the virtual hand, all of which were rated high to very high by the participants (\secref{questions}) in both the \level{Mixed} and \level{Virtual} renderings.
 %
-Very interestingly, the evaluation of the vibrotactile device and textures was also the same between the visual rendering, with a very high sensation of control, a good realism and a very low perceived latency of the textures (\secref{questions}).
+Very interestingly, the evaluation of the vibrotactile device and the textures was also the same between the visual rendering, with a very high sense of control, a good realism and a very low perceived latency of the textures (\secref{questions}).
 %
-However, the perceived latency of the virtual hand (\response{Hand Latency} question) seems to be related to the perceived roughness of the textures (with the \PSEs).
+Conversely, the perceived latency of the virtual hand (\response{Hand Latency} question) seemed to be related to the perceived roughness of the textures (with the \PSEs).
 %
 The \level{Mixed} rendering had the lowest \PSE and highest perceived latency, the \level{Virtual} rendering had a higher \PSE and lower perceived latency, and the \level{Real} rendering had the highest \PSE and no virtual hand latency (as it was not displayed).

 Our visuo-haptic augmentation system aimed to provide a coherent multimodal virtual rendering integrated with the real environment.
 %
-Yet, it involves different sensory interaction loops between the user's movements and the visuo-haptic feedback (\figref{method/diagram}), which are subject to different latencies and may not be in synchronised with each other, or may even being inconsistent with other sensory modalities such as proprioception.
+Yet, it involves different sensory interaction loops between the user's movements and the visuo-haptic feedback (\figref{method/diagram}), which may not feel to be in synchronised with each other or with proprioception.
 %
-When a user runs their finger over a vibrotactile virtual texture, the haptic sensations and eventual display of the virtual hand lag behind the visual displacement and proprioceptive sensations of the real hand.
+%When a user runs their finger over a vibrotactile virtual texture, the haptic sensations and eventual display of the virtual hand lag behind the visual displacement and proprioceptive sensations of the real hand.
 %
-Conversely, when interacting with a real texture, there is no lag between any of these sensory modalities.
+Thereby, we hypothesise that the differences in the perception of vibrotactile roughness are less due to the visual rendering of the hand or the environment and their associated differences in exploration behaviour, but rather to the difference in the \emph{perceived} latency between one's own hand (visual and proprioception) and the virtual hand (visual and haptic).
 %
-Thereby, we hypothesise that the differences in the perception of vibrotactile roughness are less due to the visual rendering of the hand or environment and their associated difference in exploration behaviour, but rather to the difference in the perceived latency between one's own hand (visually and proprioceptively) and the virtual hand (visually and haptically).
+The perceived delay was the most important in \AR, where the virtual hand visually lags significantly behind the real one, but less so in \VR, where only the proprioceptive sense can help detect the lag.
 %
-\textcite{diluca2011effects} demonstrated, in a VST-AR setup, how visual latency relative to proprioception increased the perception of stiffness of a virtual piston, while haptic latency decreased it.
+This delay was not perceived when touching the virtual haptic textures without visual augmentation, because only the finger velocity was used to render them, and, despite the varied finger movements and velocities while exploring the textures, the participants did not perceive any latency in the vibrotactile rendering (\secref{questions}). %, and the exploratory movements typically observed in our study had a fairly constant speed during a passage over the textures.
+%
+\textcite{diluca2011effects} demonstrated similarly, in a \VST-\AR setup, how visual latency relative to proprioception increased the perception of stiffness of a virtual piston, while haptic latency decreased it.
 %
 Another complementary explanation could be a pseudo-haptic effect of the displacement of the virtual hand, as already observed with this vibrotactile texture rendering, but seen on a screen in a non-immersive context \cite{ujitoko2019modulating}.
 %
 Such hypotheses could be tested by manipulating the latency and tracking accuracy of the virtual hand or the vibrotactile feedback. % to observe their effects on the roughness perception of the virtual textures.

-The main limitation of our study is, of course, the absence of a visual representation of the touched virtual texture.
+We can outline recommendations for future \AR/\VR studies or applications using wearable haptics.
 %
-This is indeed a source of information as important as haptic sensations for perception for both real textures \cite{baumgartner2013visual,bergmanntiest2007haptic,vardar2019fingertip} and virtual textures \cite{degraen2019enhancing,gunther2022smooth}.
+Attention should be paid to the respective latencies of the visual and haptic sensory feedbacks inherent in such systems and, more importantly, to \emph{the perception of their possible asynchrony}.
+%
+%This is in line with embodiment studies in \VR that compared realism, latency and control \cite{waltemate2016impact,fribourg2020avatar}.
+%
+Latencies should be measured \cite{friston2014measuring}, minimised to an acceptable level for users and kept synchronised with each other \cite{diluca2019perceptual}.
+%
+It seems that the visual aspect of the hand or the environment on itself has little effect on the perception of haptic feedback, but the degree of visual reality-virtuality can affect the asynchrony sensation of the latencies, even though they remain identical.
+%
+%As we have shown, the visual representation of the hand or the environment can affect the experience of the unchanged latencies and thus the perception of haptic feedback.
+%
+Therefore, when designing for wearable haptics or integrating it into \AR/\VR, it seems important to test its perception in real, augmented and virtual environments.
+%Finally, a visual hand representation in OST-\AR together with wearable haptics should be avoided until acceptable tracking latencies are achieved, as was also observed for virtual object interaction with the bare hand \cite{normand2024visuohaptic}.
+
+The main limitation of our study is the absence of a visual representation of the virtual texture.
+%
+This is indeed a source of information as important as haptic sensations for the perception of both real textures \cite{baumgartner2013visual,bergmanntiest2007haptic,vardar2019fingertip} and virtual textures \cite{degraen2019enhancing,gunther2022smooth}, and their interaction in the overall perception is complex.
 %
 %Specifically, it remains to be investigated how to visually represent vibrotactile textures in an immersive \AR or \VR context, as the visuo-haptic coupling of such grating textures is not trivial \cite{unger2011roughness} even with real textures \cite{klatzky2003feeling}.
+
+Also, our study was conducted with an \OST-\AR headset, but the results may be different with a \VST-\AR headset.
 %
-The interactions between the visual and haptic sensory modalities is complex and deserves further investigations, in particular in the context of visuo-haptic \AR.
+Finally, we focused on the perception of roughness sensations using wearable haptics in \AR \vs \VR using a square wave vibrotactile signal, but different haptic texture rendering methods should be considered.
 %
-Also, our study was conducted with an OST-AR headset, but the results may be different with a VST-AR headset.
-%
-More generally, we focused on the perception of roughness sensations using wearable haptics in \AR \vs \VR, but many other haptic feedbacks could be investigated using the same system and methodology, such as stiffness, friction, local deformations, or temperature.
+More generally, many other haptic feedbacks could be investigated in \AR \vs \VR using the same system and methodology, such as stiffness, friction, local deformations, or temperature.
--- a/2-perception/xr-perception/6-conclusion.tex
+++ b/2-perception/xr-perception/6-conclusion.tex
@@ -3,13 +3,21 @@

 %Summary of the research problem, method, main findings, and implications.

-We designed and implemented a system for rendering virtual haptic grating textures on a real tangible surface touched directly with the fingertip, using a wearable vibrotactile voice-coil device mounted on the middle phalanx of the finger. %, and allowing free explorative movements of the hand on the surface.
+We investigated virtual textures that modify the roughness perception of real, tangible surfaces, using a wearable vibrotactile device worn on the finger.
 %
-This tactile feedback was integrated with an immersive visual virtual environment, using an OST-AR headset, to provide users with a coherent multimodal visuo-haptic augmentation of the real environment, that can be switched between an \AR and a \VR view.
+%We studied how different such wearable haptic augmented textures are perceived when touched with a virtual hand instead of one's own hand, and when the hand and its environment are visually rendered in AR or VR.
 %
-We investigated then with a psychophysical user study the effect of visual rendering of the hand and its environment on the roughness perception of the designed tactile texture augmentations: without visual augmentation (\level{Real} rendering), in \AR with a realistic virtual hand superimposed on the real hand (\level{Mixed} rendering), and in \VR with the same virtual hand as an avatar (\level{Virtual} rendering).
+To this end, we first designed and implemented a visuo-haptic texture rendering system that allows free exploration of the augmented surface using a visual AR/VR headset.
+%to render virtual vibrotactile textures on any tangible surface, allowing free exploration of the surface, and integrated them with an immersive visual OST-AR headset, that could be switched to a VR view.
 %
-%Only the amplitude $A$ varied between the reference and comparison textures to create the different levels of roughness.
+%This provided a coherent and synchronised multimodal visuo-haptic augmentation of the real environment, which could also be switched between an AR and a VR view.
 %
-%Participants were not informed there was a reference and comparison textures, and
-No texture was represented visually, to avoid any influence on the perception \cite{bergmanntiest2007haptic,yanagisawa2015effects}.
+We then conducted a psychophysical user study with 20 participants to assess the roughness perception of these virtual texture augmentations directly touched with the finger (1) without visual augmentation, (2) with a realistic virtual hand rendering in AR, and (3) with the same virtual hand in VR.
+%
+%The results showed that the visual rendering of the hand and environment had a significant effect on the perception of haptic textures and the exploration behaviour of the participants.
+%
+The textures were on average perceived as \enquote{rougher} and with a higher sensitivity when touched with the real hand alone than with a virtual hand either in AR or VR.
+%
+We hypothesised that this difference in perception was due to the \emph{perceived latency} between the finger movements and the different visual, haptic and proprioceptive feedbacks, which were the same in all visual renderings, but were more noticeable in AR and VR. % than without visual augmentation.
+%
+With a better understanding of how visual factors influence the perception of haptically augmented tangible objects, the many wearable haptic systems that already exist but have not yet been fully explored with AR can be better applied and new visuo-haptic renderings adapted to AR can be designed.