WIP xr-perception

2024-09-27 22:11:28 +02:00
parent a9319210df
commit 344496cbef
19 changed files with 270 additions and 366 deletions
--- a/2-perception/vhar-system/2-method.tex
+++ b/2-perception/vhar-system/2-method.tex
@@ -2,7 +2,7 @@
 %
 In this section, we describe a system for rendering vibrotactile roughness textures in real time, on any tangible surface, touched directly with the index fingertip, with no constraints on hand movement and using a simple camera to track the finger pose.
 %
-We also describe how to pair this tactile rendering with an immersive \AR or \VR headset visual display to provide a coherent, multimodal visuo-haptic augmentation of the real environment.
+We also describe how to pair this tactile rendering with an immersive \AR or \VR headset visual display to provide a coherent, multimodal visuo-haptic augmentation of the \RE.

 \section{Principle}
 \label{principle}
@@ -11,19 +11,19 @@ The visuo-haptic texture rendering system is based on
 %
 \begin{enumerate*}[label=(\arabic*)]
 \item a real-time interaction loop between the finger movements and a coherent visuo-haptic feedback simulating the sensation of a touched texture,
-\item a precise alignment of the virtual environment with its real counterpart, and
+\item a precise alignment of the \VE with its real counterpart, and
 \item a modulation of the signal frequency by the estimated finger speed with a phase matching.
 \end{enumerate*}
 %
 \figref{diagram} shows the interaction loop diagram and \eqref{signal} the definition of the vibrotactile signal.
 %
-The system consists of three main components: the pose estimation of the tracked real elements, the visual rendering of the virtual environment, and the vibrotactile signal generation and rendering.
+The system consists of three main components: the pose estimation of the tracked real elements, the visual rendering of the \VE, and the vibrotactile signal generation and rendering.

 \figwide[1]{diagram}{Diagram of the visuo-haptic texture rendering system. }[
  Fiducial markers attached to the voice-coil actuator and to tangible surfaces to track are captured by a camera.
  The positions and rotations (the poses) ${}^c\mathbf{T}_i$, $i=1..n$ of the $n$ defined markers in the camera frame $\mathcal{F}_c$ are estimated, then filtered with an adaptive low-pass filter.
-  %These poses are transformed to the \AR/\VR headset frame $\mathcal{F}_h$ and applied to the virtual model replicas to display them superimposed and aligned with the real environment.
-  These poses are used to move and display the virtual model replicas aligned with the real environment.
+  %These poses are transformed to the \AR/\VR headset frame $\mathcal{F}_h$ and applied to the virtual model replicas to display them superimposed and aligned with the \RE.
+  These poses are used to move and display the virtual model replicas aligned with the \RE.
  A collision detection algorithm detects a contact of the virtual hand with the virtual textures.
  If so, the velocity of the finger marker ${}^c\dot{\mathbf{X}}_f$ is estimated using discrete derivative of position and adaptive low-pass filtering, then transformed onto the texture frame $\mathcal{F}_t$.
  The vibrotactile signal $s_k$ is generated by modulating the (scalar) finger velocity ${}^t\hat{\dot{X}}_f$ in the texture direction with the texture period $\lambda$ (\eqref{signal}).
@@ -36,13 +36,13 @@ The system consists of three main components: the pose estimation of the tracked

 \begin{subfigs}{setup}{Visuo-haptic texture rendering system setup. }[][
  \item HapCoil-One voice-coil actuator with a fiducial marker on top attached to a participant's right index finger.
-  \item HoloLens~2 \AR headset, the two cardboard masks to switch the real or virtual environments with the same field of view, and the 3D-printed piece for attaching the masks to the headset.
+  \item HoloLens~2 \AR headset, the two cardboard masks to switch the real or virtual environments with the same \FoV, and the \ThreeD-printed piece for attaching the masks to the headset.
  \item User exploring a virtual vibrotactile texture on a tangible sheet of paper.
  ]
  \subfig[0.325]{device}
-  \subfig[0.65]{headset}
-  \par\vspace{2.5pt}
-  \subfig[0.992]{apparatus}
+  %\subfig[0.65]{headset}
+  %\par\vspace{2.5pt}
+  %\subfig[0.992]{apparatus}
 \end{subfigs}

 A fiducial marker (AprilTag) is glued to the top of the actuator (\figref{device}) to track the finger pose with a camera (StreamCam, Logitech) which is placed above the experimental setup and capturing \qtyproduct{1280 x 720}{px} images at \qty{60}{\hertz} (\figref{apparatus}).
@@ -63,8 +63,8 @@ The optimal filter parameters were determined using the method of \textcite{casi
 %
 The velocity (without angular velocity) of the marker, denoted as ${}^c\dot{\mathbf{X}}_i$, is estimated using the discrete derivative of the position and an other 1€ filter with the same parameters.

-To be able to compare virtual and augmented realities, we then create a virtual environment that closely replicate the real one.
-%Before a user interacts with the system, it is necessary to design a virtual environment that will be registered with the real environment during the experiment.
+To be able to compare virtual and augmented realities, we then create a \VE that closely replicate the real one.
+%Before a user interacts with the system, it is necessary to design a virtual environment that will be registered with the \RE during the experiment.
 %
 Each real element tracked by a marker is modelled virtually, \ie the hand and the augmented tangible surface (\figref{renderings}).
 %
@@ -72,24 +72,24 @@ In addition, the pose and size of the virtual textures are defined on the virtua
 %
 During the experiment, the system uses marker pose estimates to align the virtual models with their real-world counterparts. %, according to the condition being tested.
 %
-This allows to detect if a finger touches a virtual texture using a collision detection algorithm (Nvidia PhysX), and to show the virtual elements and textures in real-time, aligned with the real environment (\figref{renderings}), using the considered \AR or \VR headset.
+This allows to detect if a finger touches a virtual texture using a collision detection algorithm (Nvidia PhysX), and to show the virtual elements and textures in real-time, aligned with the \RE (\figref{renderings}), using the considered \AR or \VR headset.

 In our implementation, the virtual hand and environment are designed with Unity and the Mixed Reality Toolkit (MRTK).
 %
 The visual rendering is achieved using the Microsoft HoloLens~2, an \OST-\AR headset with a \qtyproduct{43 x 29}{\degree} \FoV, a \qty{60}{\Hz} refresh rate, and self-localisation capabilities.
 %
-It was chosen over \VST-\AR because \OST-\AR only adds virtual content to the real environment, while \VST-\AR streams a real-time video capture of the real environment \cite{macedo2023occlusion}.
+It was chosen over \VST-\AR because \OST-\AR only adds virtual content to the \RE, while \VST-\AR streams a real-time video capture of the \RE \cite{macedo2023occlusion}.
 %
-Indeed, one of our objectives (\secref{experiment}) is to directly compare a virtual environment that replicates a real one, rather than a video feed that introduces many supplementary visual limitations \cite{kim2018revisiting,macedo2023occlusion}.
+Indeed, one of our objectives (\secref{experiment}) is to directly compare a \VE that replicates a real one, rather than a video feed that introduces many supplementary visual limitations \cite{kim2018revisiting,macedo2023occlusion}.
 %
-To simulate a \VR headset, a cardboard mask (with holes for sensors) is attached to the headset to block the view of the real environment (\figref{headset}).
+To simulate a \VR headset, a cardboard mask (with holes for sensors) is attached to the headset to block the view of the \RE (\figref{headset}).

 \section{Vibrotactile Signal Generation and Rendering}
 \label{texture_generation}

 A voice-coil actuator (HapCoil-One, Actronika) is used to display the vibrotactile signal, as it allows the frequency and amplitude of the signal to be controlled independently over time, covers a wide frequency range (\qtyrange{10}{1000}{\Hz}), and outputs the signal accurately with relatively low acceleration distortion\footnote{HapCoil-One specific characteristics are described in its data sheet: \url{https://web.archive.org/web/20240228161416/https://tactilelabs.com/wp-content/uploads/2023/11/HapCoil_One_datasheet.pdf}}.
 %
-The voice-coil actuator is encased in a 3D printed plastic shell and firmly attached to the middle phalanx of the user's index finger with a Velcro strap, to enable the fingertip to directly touch the environment (\figref{device}).
+The voice-coil actuator is encased in a \ThreeD printed plastic shell and firmly attached to the middle phalanx of the user's index finger with a Velcro strap, to enable the fingertip to directly touch the environment (\figref{device}).
 %
 The actuator is driven by a class D audio amplifier (XY-502 / TPA3116D2, Texas Instrument). %, which has proven to be an effective type of amplifier for driving moving-coil \cite{mcmahan2014dynamic}.
 %
@@ -154,7 +154,7 @@ The tactile texture is described and rendered in this work as a one dimensional
 \section{System Latency}
 \label{latency}

-%As shown in \figref{diagram} and described above, the system includes various haptic and visual sensors and rendering devices linked by software processes for image processing, 3D rendering and audio generation.
+%As shown in \figref{diagram} and described above, the system includes various haptic and visual sensors and rendering devices linked by software processes for image processing, \ThreeD rendering and audio generation.
 %
 Because the chosen \AR headset is a standalone device (like most current \AR/\VR headsets) and cannot directly control the sound card and haptic actuator, the image capture, pose estimation and audio signal generation steps are performed on an external computer.
 %
@@ -166,20 +166,20 @@ The rendering system provides a user with two interaction loops between the move
 %
 Measures are shown as mean $\pm$ standard deviation (when it is known).
 %
-The end-to-end latency from finger movement to feedback is measured at \qty{36 +- 4}{\ms} in the haptic loop and \qty{43 +- 9}{\ms} in the visual loop.
+The end-to-end latency from finger movement to feedback is measured at \qty{36 \pm 4}{\ms} in the haptic loop and \qty{43 \pm 9}{\ms} in the visual loop.
 %
-Both are the result of latency in image capture \qty{16 +- 1}{\ms}, markers tracking \qty{2 +- 1}{\ms} and network communication \qty{4 +- 1}{\ms}.
+Both are the result of latency in image capture \qty{16 \pm 1}{\ms}, markers tracking \qty{2 \pm 1}{\ms} and network communication \qty{4 \pm 1}{\ms}.
 %
-The haptic loop also includes the voice-coil latency \qty{15}{\ms} (as specified by the manufacturer\footnotemark[1]), whereas the visual loop includes the latency in 3D rendering \qty{16 +- 5}{\ms} (60 frames per second) and display \qty{5}{\ms}.
+The haptic loop also includes the voice-coil latency \qty{15}{\ms} (as specified by the manufacturer\footnotemark[1]), whereas the visual loop includes the latency in \ThreeD rendering \qty{16 \pm 5}{\ms} (60 frames per second) and display \qty{5}{\ms}.
 %
 The total haptic latency is below the \qty{60}{\ms} detection threshold in vibrotactile feedback \cite{okamoto2009detectability}.
 %
 The total visual latency can be considered slightly high, yet it is typical for an \AR rendering involving vision-based tracking \cite{knorlein2009influence}.

-The two filters also introduce a constant lag between the finger movement and the estimated position and velocity, measured at \qty{160 +- 30}{\ms}.
+The two filters also introduce a constant lag between the finger movement and the estimated position and velocity, measured at \qty{160 \pm 30}{\ms}.
 %
 With respect to the real hand position, it causes a distance error in the displayed virtual hand position, and thus a delay in the triggering of the vibrotactile signal.
 %
-This is proportional to the speed of the finger, \eg distance error is \qty{12 +- 2.3}{\mm} when the finger moves at \qty{75}{\mm\per\second}.
+This is proportional to the speed of the finger, \eg distance error is \qty{12 \pm 2.3}{\mm} when the finger moves at \qty{75}{\mm\per\second}.
 %
 %and of the vibrotactile signal frequency with respect to the finger speed.%, that is proportional to the speed of the finger.