Correct vhar system equations

2024-11-12 18:40:55 +01:00
parent 3328d0a3c1
commit 592c0e90df
10 changed files with 39 additions and 35 deletions
--- a/1-introduction/introduction.tex
+++ b/1-introduction/introduction.tex
@@ -99,7 +99,7 @@ For example, (visual) \AR using a real object as a proxy to manipulate a virtual
 In this thesis we call \AR/\VR \emph{systems} the computational set of hardware (input devices, sensors, displays and haptic devices) and software (tracking, simulation and rendering) that allows the user to interact with the \VE. % by implementing the interaction loop we proposed in \figref{interaction-loop}.
 Many \AR displays have been explored, from projection systems to hand-held displays.
 \textbf{\AR headsets are the most promising display technology as they are portable and provide the user with an immersive augmented environment} \cite{hertel2021taxonomy}.
-Commercial headsets also have integrated real-time self-location and mapping of the \RE and hand tracking of the user.
+Commercial headsets also have integrated real-time self-location and mapping of the \RE and hand pose estimation of the user.
 While \AR and \VR systems can address any of the human senses, most focus only on visual augmentation \cite[p.144]{billinghurst2015survey} and \cite{kim2018revisiting}.
 %but the most \textbf{promising devices are \AR headsets}, which are \textbf{portable displays worn directly on the head}, providing the user with an \textbf{immersive visual augmented environment}.
@@ -284,7 +284,7 @@ We evaluate how the visual feedback of the hand (real or virtual), the environme
 In \textbf{\chapref{vhar_system}}, we design and implement a system for rendering visuo-haptic virtual textures that augment real surfaces. %, using an immersive \OST-\AR headset and a wearable vibrotactile device.
 The haptic textures represent a periodical patterned texture rendered by a wearable vibrotactile actuator worn on the middle phalanx of the finger touching the surface.
-The tracking of the real hand and the environment is achieved using a marker-based technique.
+The pose estimation of the real hand and the environment is achieved using a vision-based technique.
 The visual rendering is done using the immersive \OST-\AR headset Microsoft HoloLens~2.
 The system allows free visual and haptic exploration of the textures, as if they were real, and forms the basis of the next two chapters.
--- a/2-related-work/1-haptic-hand.tex
+++ b/2-related-work/1-haptic-hand.tex
@@ -379,7 +379,7 @@ That is, \textcite{bergmanntiest2009cues} showed the perception of hardness reli
 \label{haptic_sense_conclusion}
 Haptic perception and manipulation of objects with the hand involve several simultaneous mechanisms with complex interactions.
-Exploratory movements of the hand are performed on contact with the object to obtain multiple sensory information from several cutaneous and kinaesthetic receptors.
+Exploratory movements of the hand are performed on contact with the object to obtain multiple sensory information from several cutaneous and kinesthetic receptors.
 These sensations express physical parameters in the form of perceptual cues, which are then integrated to form a perception of the property being explored.
 For the perception of roughness (texture) or hardness, one perceptual cue is particularly important, but perceptual constancy is possible by compensating for its absence with others.
 In turn, these perceptions help to guide the grasping and manipulation of the object by adapting the grasp type and the forces applied to the shape of the object and the task to be performed.
--- a/2-related-work/3-augmented-reality.tex
+++ b/2-related-work/3-augmented-reality.tex
@@ -339,7 +339,7 @@ Taken together, these results suggest that a visual augmentation of the hand in
 \label{ar_conclusion}
 \AR systems integrate virtual content into the user's perception as if it were part of the \RE.
-\AR headsets now enable real-time tracking of the head and hands, and high-quality display of virtual content, while being portable and mobile.
+\AR headsets now enable real-time pose estimation of the head and hands, and high-quality display of virtual content, while being portable and mobile.
 They enable highly immersive augmented environments that users can explore with a strong sense of the presence of the virtual content.
 However, without direct and seamless interaction with the virtual objects using the hands, the coherence of the augmented environment experience is compromised.
 In particular, when manipulating virtual objects in \OST-\AR, there is a lack of mutual occlusion and interaction cues between the hands and the virtual content, which could be mitigated by a visual augmentation of the hand.
--- a/2-related-work/4-visuo-haptic-ar.tex
+++ b/2-related-work/4-visuo-haptic-ar.tex
@@ -26,11 +26,11 @@ The \MLE model then predicts that the integrated estimated property $\tilde{s}$
 \begin{equation}{MLE}
  \tilde{s} = \sum_i w_i \tilde{s}_i \quad \text{with} \quad \sum_i w_i = 1
 \end{equation}
-Where the individual weights $w_i$ are proportional to their inverse variances:
+where the individual weights $w_i$ are proportional to their inverse variances:
 \begin{equation}{MLE_weights}
  w_i = \frac{1/\sigma_i^2}{\sigma^2}
 \end{equation}
-And the integrated variance $\sigma^2$ is the inverse of the sum of the individual variances:
+and the integrated variance $\sigma^2$ is the inverse of the sum of the individual variances:
 \begin{equation}{MLE_variance}
  \sigma^2 = \left( \sum_i \frac{1}{\sigma_i^2} \right)^{-1}
 \end{equation}
--- a/3-perception/vhar-system/1-introduction.tex
+++ b/3-perception/vhar-system/1-introduction.tex
@@ -11,12 +11,12 @@ However, this method has not yet been integrated in an \AR context, where the us
 In this chapter, we propose a \textbf{system for rendering visual and haptic virtual textures that augment real surfaces}.
 It is implemented with an immersive \OST-\AR headset Microsoft HoloLens~2 and a wearable vibrotactile (voice-coil) device worn on the outside of finger (not covering the fingertip, \secref[related_work]{vhar_haptics}).
 The visuo-haptic augmentations can be \textbf{viewed from any angle} and \textbf{explored freely with the bare finger}, as if they were real textures.
-To ensure both real-time and reliable renderings, the hand and the real surfaces are tracked using a webcam and marker-based tracking.
+To ensure both real-time and reliable renderings, the hand and the real surfaces are tracked using a webcam and marker-based pose estimation.
 The haptic textures are rendered as a vibrotactile signal representing a patterned grating texture that is synchronized with the finger movement on the augmented surface.
 \noindentskip The contributions of this chapter are:
 \begin{itemize}
-  \item The rendering of virtual vibrotactile roughness textures representing a patterned grating texture in real time from free finger movements and using vision-based tracking.
+  \item The rendering of virtual vibrotactile roughness textures representing a patterned grating texture in real time from free finger movements and using vision-based finger pose estimation.
  \item A system to provide a coherent visuo-haptic texture augmentations of the \RE in a direct touch context using an immersive \AR headset and wearable haptics.
 \end{itemize}
@@ -26,7 +26,7 @@ The haptic textures are rendered as a vibrotactile signal representing a pattern
 \begin{subfigs}{setup}{Visuo-haptic texture rendering system setup.}[][
  \item HapCoil-One voice-coil actuator with a fiducial marker on top attached to the middle-phalanx of the user's index finger.
-  \item Our implementation of the system using a Microsoft HoloLens~2, a webcam for tracking the hand and the real surfaces, and an external computer for processing the tracking data and rendering the haptic textures.
+  \item Our implementation of the system using a Microsoft HoloLens~2, a webcam for estimating the poses the hand and the real surfaces, and an external computer for processing the tracking data and rendering the haptic textures.
  ]
  \subfigsheight{60mm}
  \subfig{device}
--- a/3-perception/vhar-system/2-method.tex
+++ b/3-perception/vhar-system/2-method.tex
@@ -19,11 +19,11 @@ The system consists of three main components: the pose estimation of the tracked
 \figwide{diagram}{Diagram of the visuo-haptic texture rendering system. }[
  Fiducial markers attached to the voice-coil actuator and to augmented surfaces to track are captured by a camera.
-  The positions and rotations (the poses) ${}^c\mathbf{T}_i$, $i=1..n$ of the $n$ defined markers in the camera frame $\mathcal{F}_c$ are estimated, then filtered with an adaptive low-pass filter.
+  The positions and rotations (the poses) ${}^c\mathbf{T}_i$, $i=1..n$ of the $n$ defined markers in the camera frame $\poseFrame{c}$ are estimated, then filtered with an adaptive low-pass filter.
-  %These poses are transformed to the \AR/\VR headset frame $\mathcal{F}_h$ and applied to the virtual model replicas to display them superimposed and aligned with the \RE.
+  %These poses are transformed to the \AR/\VR headset frame $\poseFrame{h}$ and applied to the virtual model replicas to display them superimposed and aligned with the \RE.
  These poses are used to move and display the virtual model replicas aligned with the \RE.
  A collision detection algorithm detects a contact of the virtual hand with the virtual textures.
-  If so, the velocity of the finger marker ${}^c\dot{\mathbf{X}}_f$ is estimated using discrete derivative of position and adaptive low-pass filtering, then transformed onto the texture frame $\mathcal{F}_t$.
+  If so, the velocity of the finger marker ${}^c\dot{\mathbf{X}}_f$ is estimated using discrete derivative of position and adaptive low-pass filtering, then transformed onto the texture frame $\poseFrame{t}$.
  The vibrotactile signal $s_k$ is generated by modulating the (scalar) finger velocity ${}^t\hat{\dot{X}}_f$ in the texture direction with the texture period $\lambda$ (\eqref{signal}).
  The signal is sampled at 48~kHz and sent to the voice-coil actuator via an audio amplifier.
  All computation steps except signal sampling are performed at 60~Hz and in separate threads to parallelize them.
@@ -37,14 +37,18 @@ The system consists of three main components: the pose estimation of the tracked
 A \qty{2}{\cm} AprilTag fiducial marker \cite{wang2016apriltag} is glued to the top of the actuator (\figref{device}) to track the finger pose with a camera (StreamCam, Logitech) which is placed above the experimental setup and capturing \qtyproduct{1280 x 720}{px} images at \qty{60}{\hertz} (\figref{apparatus}).
 Other markers are placed on the real surfaces to augment (\figref{setup}) to estimate the relative position of the finger with respect to the surfaces.
-Contrary to similar work, using vision-based tracking allows both to free the hand movements and to augment any real surface.
+Contrary to similar work, using vision-based pose estimation allows both to free the hand movements and to augment any real surface.
-A camera external to the \AR headset with a marker-based technique is employed to provide accurate and robust tracking with a constant view of the markers \cite{marchand2016pose}.
+
-We denote ${}^c\mathbf{T}_i$, $i=1..n$ the homogenous transformation matrix that defines the position and rotation of the $i$-th marker out of the $n$ defined markers in the camera frame $\mathcal{F}_c$, \eg the finger pose ${}^c\mathbf{T}_f$ and the texture pose ${}^c\mathbf{T}_t$.
+A camera external to the \AR headset with a marker-based technique is employed to provide accurate and robust pose estimation with a constant view of the markers \cite{marchand2016pose}.
 We denote $\pose{c}{T}{i}$, $i=1..n$ the homogenous transformation matrix that defines the position and rotation of the $i$-th marker out of the $n$ defined markers in the camera frame $\poseFrame{c}$, \eg the finger pose $\pose{c}{T}{f}$ and the augmented surface pose $\pose{c}{T}{s}$ in the camera frame.
 To reduce the noise in the pose estimation while maintaining good responsiveness, the 1€ filter \cite{casiez2012filter} is applied; a low-pass filter with an adaptive cut-off frequency, specifically designed for human motion tracking.
-The filtered pose is denoted as ${}^c\hat{\mathbf{T}}_i$.
+The filtered pose is denoted as $\pose{c}{\hat{T}}{i}$.
 The optimal filter parameters were determined using the method of \textcite{casiez2012filter}, with a minimum cut-off frequency of \qty{10}{\hertz} and a slope of \num{0.01}.
-The velocity (without angular velocity) of the marker, denoted as ${}^c\dot{\mathbf{X}}_i$, is estimated using the discrete derivative of the position and another 1€ filter with the same parameters.
+
 The velocity (without angular velocity) of the finger marker, denoted as $\pose{c}{\dot{T}}{f}$, is estimated using the discrete derivative of the position.
 It is then filtered with another 1€ filter with the same parameters, and denoted as $\pose{c}{\hat{\dot{T}}}{f}$.
 Finally, this filtered finger velocity is transformed into the augmented surface frame $\poseFrame{s}$ to be used in the vibrotactile signal generation, such as $\pose{s}{\hat{\dot{T}}}{f} = \pose{c}{T}{s} \, \pose{c}{\hat{\dot{T}}}{f}$.
 \subsection{Virtual Environment Alignment}
 \label{virtual_real_alignment}
@@ -68,30 +72,30 @@ The voice-coil actuator is encased in a \ThreeD printed plastic shell and firmly
 The actuator is driven by a class D audio amplifier (XY-502 / TPA3116D2, Texas Instrument). %, which has proven to be an effective type of amplifier for driving moving-coil \cite{mcmahan2014dynamic}.
 The amplifier is connected to the audio output of a computer that generates the signal using the WASAPI driver in exclusive mode and the NAudio library\footnoteurl{https://github.com/naudio/NAudio}.
-The represented haptic texture is a series of parallels virtual grooves and ridges, similar to real grating textures manufactured for psychophysical roughness perception studies \secref[related_work]{roughness}. %\cite{friesen2024perceived,klatzky2003feeling,unger2011roughness}.
+The represented haptic texture is a 1D series of parallels virtual grooves and ridges, similar to the real linear grating textures manufactured for psychophysical roughness perception studies \secref[related_work]{roughness}. %\cite{friesen2024perceived,klatzky2003feeling,unger2011roughness}.
-It is generated as a square wave audio signal $s_k$, sampled at \qty{48}{\kilo\hertz}, with a period $\lambda$ and an amplitude $A$, similar to \eqref[related_work]{grating_rendering}.
+It is generated as a square wave audio signal $r$, sampled at \qty{48}{\kilo\hertz}, with a texture period $\lambda$ and an amplitude $A$, similar to \eqref[related_work]{grating_rendering}.
-Its frequency is a ratio of the absolute finger filtered (scalar) velocity ${}^t\hat{\dot{|X|}}_f$, transformed into the texture frame $\mathcal{F}_t$, and the texture period $\lambda$ \cite{friesen2024perceived}.
+Its frequency is a ratio of the absolute finger filtered (scalar) velocity $x_f = \pose{s}{|\hat{\dot{T}}|}{f}$, and the texture period $\lambda$ \cite{friesen2024perceived}.
-As the finger is moving horizontally on the texture, only the $x$ component of the velocity is used.
+As the finger is moving horizontally on the texture, only the $X$ component of the velocity is used.
 This velocity modulation strategy is necessary as the finger position is estimated at a far lower rate (\qty{60}{\hertz}) than the audio signal (unlike high-fidelity force-feedback devices \cite{unger2011roughness}).
-%As the finger position is estimated at a far lower rate (\qty{60}{\hertz}), the filtered finger (scalar) position ${}^t\hat{X}_f$ in the texture frame $\mathcal{F}_t$ cannot be directly used. % to render the signal if the finger moves fast or if the texture period is small.
+%As the finger position is estimated at a far lower rate (\qty{60}{\hertz}), the filtered finger (scalar) position ${}^t\hat{X}_f$ in the texture frame $\poseFrame{t}$ cannot be directly used. % to render the signal if the finger moves fast or if the texture period is small.
 %
 %The best strategy instead is to modulate the frequency of the signal as a ratio of the filtered finger velocity ${}^t\hat{\dot{\mathbf{X}}}_f$ and the texture period $\lambda$ \cite{friesen2024perceived}.
 %
-When a new finger velocity ${}^t\hat{\dot{X}}_{f,j}$ is estimated at time $t_j$, the phase $\phi_j$ of the signal $s$ needs also to be adjusted to ensure a continuity in the signal.
+When a new finger velocity $x_f\,(t_j)$ is estimated at time $t_j$, the phase $\phi$ of the signal $r$ needs also to be adjusted to ensure a continuity in the signal.
 In other words, the sampling of the audio signal runs at \qty{48}{\kilo\hertz}, and its frequency and phase is updated at a far lower rate of \qty{60}{\hertz} when a new finger velocity is estimated.
-A sample $s_k$ of the audio signal at sampling time $t_k$, with $t_k >= t_j$, is thus given by:
+A sample $r(x_f, t_j, t_k)$ of the audio signal at sampling time $t_k$, with $t_k >= t_j$, is thus given by:
 \begin{subequations}
  \label{eq:signal}
  \begin{align}
-    s_k(x_{f,j}, t_k) & = A\, \text{sgn} ( \sin (2 \pi \frac{|\dot{X}_{f,j}|}{\lambda} t_k + \phi_j) ) & \label{eq:signal_speed} \\
+    r(x_f, t_j, t_k) & = A\, \text{sgn} ( \sin (2 \pi \frac{x_f\,(t_j)}{\lambda} t_k + \phi(t_j) ) ) & \label{eq:signal_speed} \\
-    \phi_j            & = \phi_{j-1} + 2 \pi \frac{x_{f,j} - x_{f,{j-1}}}{\lambda} t_k                 & \label{eq:signal_phase}
+    \phi(t_j)            & = \phi(t_{j-1}) + 2 \pi \frac{x_f\,(t_j) - x_f\,(t_j - 1)}{\lambda} t_k                 & \label{eq:signal_phase}
  \end{align}
 \end{subequations}
 This rendering preserves the sensation of a constant spatial frequency of the virtual texture while the finger moves at various speeds, which is crucial for the perception of roughness \cite{klatzky2003feeling,unger2011roughness}.
 %
-%Note that the finger position and velocity are transformed from the camera frame $\mathcal{F}_c$ to the texture frame $\mathcal{F}_t$, with the $x$ axis aligned with the texture direction.
+%Note that the finger position and velocity are transformed from the camera frame $\poseFrame{c}$ to the texture frame $\poseFrame{t}$, with the $x$ axis aligned with the texture direction.
 %
 %However, when a new finger position is estimated at time $t_j$, the phase $\phi_j$ needs to be adjusted as well with the frequency to ensure a continuity in the signal as described in \eqref{signal_phase}.
 %
@@ -119,10 +123,10 @@ This complex assembly inevitably introduces latency, which must be measured.
 The rendering system provides a user with two interaction loops between the movements of their hand and the visual (loop 1) and haptic (loop 2) feedbacks.
 Measures are shown as (mean $\pm$ standard deviation), when it is known.
 The end-to-end latency from finger movement to feedback is measured at \qty{36 \pm 4}{\ms} in the haptic loop and \qty{43 \pm 9}{\ms} in the visual loop.
-Both are the result of latency in image capture \qty{16 \pm 1}{\ms}, markers tracking \qty{2 \pm 1}{\ms} and network communication \qty{4 \pm 1}{\ms}.
+Both are the result of latency in image capture \qty{16 \pm 1}{\ms}, markers pose estimation \qty{2 \pm 1}{\ms} and network communication \qty{4 \pm 1}{\ms}.
 The haptic loop also includes the voice-coil latency \qty{15}{\ms} (as specified by the manufacturer\footnotemark[1]), whereas the visual loop includes the latency in \ThreeD rendering \qty{16 \pm 5}{\ms} (60 frames per second) and display \qty{5}{\ms}.
 The total haptic latency is below the \qty{60}{\ms} detection threshold in vibrotactile feedback \cite{okamoto2009detectability}.
-The total visual latency can be considered slightly high, yet it is typical for an \AR rendering involving vision-based tracking \cite{knorlein2009influence}.
+The total visual latency can be considered slightly high, yet it is typical for an \AR rendering involving vision-based pose estimation \cite{knorlein2009influence}.
 The two filters also introduce a constant lag between the finger movement and the estimated position and velocity, measured at \qty{160 \pm 30}{\ms}.
 With respect to the real hand position, it causes a distance error in the displayed virtual hand position, and thus a delay in the triggering of the vibrotactile signal.
--- a/3-perception/vhar-system/6-conclusion.tex
+++ b/3-perception/vhar-system/6-conclusion.tex
@@ -5,7 +5,7 @@
 In this chapter, we designed and implemented a system for rendering virtual visuo-haptic textures that augment a real surface.
 Directly touched with the fingertip, the perceived roughness of the surface can be increased using a wearable vibrotactile voice-coil device mounted on the middle phalanx of the finger.
-We adapted the 1D sinusoidal grating rendering method, common in the literature but not yet integrated in a direct touch context, for use with vision-based tracking of the finger and paired it with an immersive \AR headset.
+We adapted the 1D sinusoidal grating rendering method, common in the literature but not yet integrated in a direct touch context, for use with vision-based pose estimation of the finger and paired it with an immersive \AR headset.
 Our wearable visuo-haptic augmentation system enable any real surface to be augmented with a minimal setup.
 It also allows a free exploration of the textures, as if they were real (\secref[related_work]{ar_presence}), by letting the user view them from different poses and touch them with the bare finger without constraints on hand movements.
--- a/3-perception/vhar-textures/2-experiment.tex
+++ b/3-perception/vhar-textures/2-experiment.tex
@@ -58,7 +58,7 @@ Participants were first given written instructions about the experimental setup,
 Then, after having signed an informed consent form, they were asked to seat in front of the table with the experimental setup and to wear the \AR headset.
 %The experimenter firmly attached the plastic shell encasing the vibrotactile actuator to the middle index phalanx of their dominant hand.
 As the haptic textures generated no audible noise, participants did not wear any noise reduction headphones.
-A calibration of both the HoloLens~2 and the hand tracking was performed to ensure the correct alignment of the visual and haptic textures on the real surfaces.
+A calibration of both the HoloLens~2 and the finger pose estimation was performed to ensure the correct alignment of the visual and haptic textures on the real surfaces.
 Finally, participants familiarized with the augmented surface in a \qty{2}{min} training session with textures different from the ones used in the user study.
 Participants started with the \level{Matching} task.
--- a/3-perception/xr-perception/5-discussion.tex
+++ b/3-perception/xr-perception/5-discussion.tex
@@ -30,4 +30,4 @@ The perceived delay was the most important in \AR, where the virtual hand visual
 This delay was not perceived when touching the virtual haptic textures without visual augmentation, because only the finger velocity was used to render them, and, despite the varied finger movements and velocities while exploring the textures, the participants did not perceive any latency in the vibrotactile rendering (\secref{results_questions}).
 \textcite{diluca2011effects} demonstrated similarly, in a \VST-\AR setup, how visual latency relative to proprioception increased the perception of stiffness of a virtual piston, while haptic latency decreased it (\secref[related_work]{ar_vr_haptic}).
 Another complementary explanation could be a pseudo-haptic effect (\secref[related_work]{visual_haptic_influence}) of the displacement of the virtual hand, as already observed with this vibrotactile texture rendering, but seen on a screen in a non-immersive context \cite{ujitoko2019modulating}.
-Such hypotheses could be tested by manipulating the latency and tracking accuracy of the virtual hand or the vibrotactile feedback. % to observe their effects on the roughness perception of the virtual textures.
+Such hypotheses could be tested by manipulating the latency and pose estimation accuracy of the virtual hand or the vibrotactile feedback. % to observe their effects on the roughness perception of the virtual textures.
--- a/5-conclusion/conclusion.tex
+++ b/5-conclusion/conclusion.tex
@@ -61,9 +61,9 @@ In addition, combination with pseudo-haptic rendering techniques \cite{ujitoko20
 \paragraph{Fully Integrated Tracking.}
 In our system, we registered the real and virtual environments (\secref[related_work]{ar_definition}) using fiducial markers and a webcam external to the \AR headset.
-This only allowed us to track the index finger and the surface to be augmented with the haptic texture, but the tracking was reliable and accurate enough for our needs.
+This only allowed us to estimate poses of the index finger and the surface to be augmented with the haptic texture, but it was reliable and accurate enough for our needs.
 In fact, preliminary tests we conducted showed that the built-in tracking capabilities of the Microsoft HoloLens~2 were not able to track hands wearing a vibrotactile voice-coil device.
-A more robust hand tracking system would support wearing haptic devices on the hand as well as holding real objects.
+A more robust hand pose estimation system would support wearing haptic devices on the hand as well as holding real objects.
 A complementary solution would be to embed tracking sensors in the wearable haptic devices, such as an inertial measurement unit (IMU) or cameras \cite{preechayasomboon2021haplets}.
 Prediction of hand movements should also be considered \cite{klein2020predicting,gamage2021predictable}.
 This would allow a complete portable and wearable visuo-haptic system to be used in practical applications.
@@ -117,7 +117,7 @@ The visual hand augmentations we evaluated were displayed on the Microsoft HoloL
 We purposely chose this type of display because in \OST-\AR the lack of mutual occlusion between the hand and the virtual object is the most challenging to solve \cite{macedo2023occlusion}.
 We therefore hypothesized that a visual hand augmentation would be more beneficial to users with this type of display.
 However, the user's visual perception and experience are different with other types of displays, such as \VST-\AR, where the \RE view is seen through cameras and screens (\secref[related_work]{ar_displays}).
-While the mutual occlusion problem and the hand tracking latency could be overcome with \VST-\AR, the visual hand augmentation could still be beneficial to users as it provides depth cues and feedback on the hand tracking, and should be evaluated as such.
+While the mutual occlusion problem and the hand pose estimation latency could be overcome with \VST-\AR, the visual hand augmentation could still be beneficial to users as it provides depth cues and feedback on the hand tracking, and should be evaluated as such.
 \paragraph{More Practical Usages.}