WIP xr-perception

This commit is contained in:
2024-09-27 22:11:28 +02:00
parent a9319210df
commit 344496cbef
19 changed files with 270 additions and 366 deletions

View File

@@ -2,7 +2,7 @@
%
In this section, we describe a system for rendering vibrotactile roughness textures in real time, on any tangible surface, touched directly with the index fingertip, with no constraints on hand movement and using a simple camera to track the finger pose.
%
We also describe how to pair this tactile rendering with an immersive \AR or \VR headset visual display to provide a coherent, multimodal visuo-haptic augmentation of the real environment.
We also describe how to pair this tactile rendering with an immersive \AR or \VR headset visual display to provide a coherent, multimodal visuo-haptic augmentation of the \RE.
\section{Principle}
\label{principle}
@@ -11,19 +11,19 @@ The visuo-haptic texture rendering system is based on
%
\begin{enumerate*}[label=(\arabic*)]
\item a real-time interaction loop between the finger movements and a coherent visuo-haptic feedback simulating the sensation of a touched texture,
\item a precise alignment of the virtual environment with its real counterpart, and
\item a precise alignment of the \VE with its real counterpart, and
\item a modulation of the signal frequency by the estimated finger speed with a phase matching.
\end{enumerate*}
%
\figref{diagram} shows the interaction loop diagram and \eqref{signal} the definition of the vibrotactile signal.
%
The system consists of three main components: the pose estimation of the tracked real elements, the visual rendering of the virtual environment, and the vibrotactile signal generation and rendering.
The system consists of three main components: the pose estimation of the tracked real elements, the visual rendering of the \VE, and the vibrotactile signal generation and rendering.
\figwide[1]{diagram}{Diagram of the visuo-haptic texture rendering system. }[
Fiducial markers attached to the voice-coil actuator and to tangible surfaces to track are captured by a camera.
The positions and rotations (the poses) ${}^c\mathbf{T}_i$, $i=1..n$ of the $n$ defined markers in the camera frame $\mathcal{F}_c$ are estimated, then filtered with an adaptive low-pass filter.
%These poses are transformed to the \AR/\VR headset frame $\mathcal{F}_h$ and applied to the virtual model replicas to display them superimposed and aligned with the real environment.
These poses are used to move and display the virtual model replicas aligned with the real environment.
%These poses are transformed to the \AR/\VR headset frame $\mathcal{F}_h$ and applied to the virtual model replicas to display them superimposed and aligned with the \RE.
These poses are used to move and display the virtual model replicas aligned with the \RE.
A collision detection algorithm detects a contact of the virtual hand with the virtual textures.
If so, the velocity of the finger marker ${}^c\dot{\mathbf{X}}_f$ is estimated using discrete derivative of position and adaptive low-pass filtering, then transformed onto the texture frame $\mathcal{F}_t$.
The vibrotactile signal $s_k$ is generated by modulating the (scalar) finger velocity ${}^t\hat{\dot{X}}_f$ in the texture direction with the texture period $\lambda$ (\eqref{signal}).
@@ -36,13 +36,13 @@ The system consists of three main components: the pose estimation of the tracked
\begin{subfigs}{setup}{Visuo-haptic texture rendering system setup. }[][
\item HapCoil-One voice-coil actuator with a fiducial marker on top attached to a participant's right index finger.
\item HoloLens~2 \AR headset, the two cardboard masks to switch the real or virtual environments with the same field of view, and the 3D-printed piece for attaching the masks to the headset.
\item HoloLens~2 \AR headset, the two cardboard masks to switch the real or virtual environments with the same \FoV, and the \ThreeD-printed piece for attaching the masks to the headset.
\item User exploring a virtual vibrotactile texture on a tangible sheet of paper.
]
\subfig[0.325]{device}
\subfig[0.65]{headset}
\par\vspace{2.5pt}
\subfig[0.992]{apparatus}
%\subfig[0.65]{headset}
%\par\vspace{2.5pt}
%\subfig[0.992]{apparatus}
\end{subfigs}
A fiducial marker (AprilTag) is glued to the top of the actuator (\figref{device}) to track the finger pose with a camera (StreamCam, Logitech) which is placed above the experimental setup and capturing \qtyproduct{1280 x 720}{px} images at \qty{60}{\hertz} (\figref{apparatus}).
@@ -63,8 +63,8 @@ The optimal filter parameters were determined using the method of \textcite{casi
%
The velocity (without angular velocity) of the marker, denoted as ${}^c\dot{\mathbf{X}}_i$, is estimated using the discrete derivative of the position and an other 1€ filter with the same parameters.
To be able to compare virtual and augmented realities, we then create a virtual environment that closely replicate the real one.
%Before a user interacts with the system, it is necessary to design a virtual environment that will be registered with the real environment during the experiment.
To be able to compare virtual and augmented realities, we then create a \VE that closely replicate the real one.
%Before a user interacts with the system, it is necessary to design a virtual environment that will be registered with the \RE during the experiment.
%
Each real element tracked by a marker is modelled virtually, \ie the hand and the augmented tangible surface (\figref{renderings}).
%
@@ -72,24 +72,24 @@ In addition, the pose and size of the virtual textures are defined on the virtua
%
During the experiment, the system uses marker pose estimates to align the virtual models with their real-world counterparts. %, according to the condition being tested.
%
This allows to detect if a finger touches a virtual texture using a collision detection algorithm (Nvidia PhysX), and to show the virtual elements and textures in real-time, aligned with the real environment (\figref{renderings}), using the considered \AR or \VR headset.
This allows to detect if a finger touches a virtual texture using a collision detection algorithm (Nvidia PhysX), and to show the virtual elements and textures in real-time, aligned with the \RE (\figref{renderings}), using the considered \AR or \VR headset.
In our implementation, the virtual hand and environment are designed with Unity and the Mixed Reality Toolkit (MRTK).
%
The visual rendering is achieved using the Microsoft HoloLens~2, an \OST-\AR headset with a \qtyproduct{43 x 29}{\degree} \FoV, a \qty{60}{\Hz} refresh rate, and self-localisation capabilities.
%
It was chosen over \VST-\AR because \OST-\AR only adds virtual content to the real environment, while \VST-\AR streams a real-time video capture of the real environment \cite{macedo2023occlusion}.
It was chosen over \VST-\AR because \OST-\AR only adds virtual content to the \RE, while \VST-\AR streams a real-time video capture of the \RE \cite{macedo2023occlusion}.
%
Indeed, one of our objectives (\secref{experiment}) is to directly compare a virtual environment that replicates a real one, rather than a video feed that introduces many supplementary visual limitations \cite{kim2018revisiting,macedo2023occlusion}.
Indeed, one of our objectives (\secref{experiment}) is to directly compare a \VE that replicates a real one, rather than a video feed that introduces many supplementary visual limitations \cite{kim2018revisiting,macedo2023occlusion}.
%
To simulate a \VR headset, a cardboard mask (with holes for sensors) is attached to the headset to block the view of the real environment (\figref{headset}).
To simulate a \VR headset, a cardboard mask (with holes for sensors) is attached to the headset to block the view of the \RE (\figref{headset}).
\section{Vibrotactile Signal Generation and Rendering}
\label{texture_generation}
A voice-coil actuator (HapCoil-One, Actronika) is used to display the vibrotactile signal, as it allows the frequency and amplitude of the signal to be controlled independently over time, covers a wide frequency range (\qtyrange{10}{1000}{\Hz}), and outputs the signal accurately with relatively low acceleration distortion\footnote{HapCoil-One specific characteristics are described in its data sheet: \url{https://web.archive.org/web/20240228161416/https://tactilelabs.com/wp-content/uploads/2023/11/HapCoil_One_datasheet.pdf}}.
%
The voice-coil actuator is encased in a 3D printed plastic shell and firmly attached to the middle phalanx of the user's index finger with a Velcro strap, to enable the fingertip to directly touch the environment (\figref{device}).
The voice-coil actuator is encased in a \ThreeD printed plastic shell and firmly attached to the middle phalanx of the user's index finger with a Velcro strap, to enable the fingertip to directly touch the environment (\figref{device}).
%
The actuator is driven by a class D audio amplifier (XY-502 / TPA3116D2, Texas Instrument). %, which has proven to be an effective type of amplifier for driving moving-coil \cite{mcmahan2014dynamic}.
%
@@ -154,7 +154,7 @@ The tactile texture is described and rendered in this work as a one dimensional
\section{System Latency}
\label{latency}
%As shown in \figref{diagram} and described above, the system includes various haptic and visual sensors and rendering devices linked by software processes for image processing, 3D rendering and audio generation.
%As shown in \figref{diagram} and described above, the system includes various haptic and visual sensors and rendering devices linked by software processes for image processing, \ThreeD rendering and audio generation.
%
Because the chosen \AR headset is a standalone device (like most current \AR/\VR headsets) and cannot directly control the sound card and haptic actuator, the image capture, pose estimation and audio signal generation steps are performed on an external computer.
%
@@ -166,20 +166,20 @@ The rendering system provides a user with two interaction loops between the move
%
Measures are shown as mean $\pm$ standard deviation (when it is known).
%
The end-to-end latency from finger movement to feedback is measured at \qty{36 +- 4}{\ms} in the haptic loop and \qty{43 +- 9}{\ms} in the visual loop.
The end-to-end latency from finger movement to feedback is measured at \qty{36 \pm 4}{\ms} in the haptic loop and \qty{43 \pm 9}{\ms} in the visual loop.
%
Both are the result of latency in image capture \qty{16 +- 1}{\ms}, markers tracking \qty{2 +- 1}{\ms} and network communication \qty{4 +- 1}{\ms}.
Both are the result of latency in image capture \qty{16 \pm 1}{\ms}, markers tracking \qty{2 \pm 1}{\ms} and network communication \qty{4 \pm 1}{\ms}.
%
The haptic loop also includes the voice-coil latency \qty{15}{\ms} (as specified by the manufacturer\footnotemark[1]), whereas the visual loop includes the latency in 3D rendering \qty{16 +- 5}{\ms} (60 frames per second) and display \qty{5}{\ms}.
The haptic loop also includes the voice-coil latency \qty{15}{\ms} (as specified by the manufacturer\footnotemark[1]), whereas the visual loop includes the latency in \ThreeD rendering \qty{16 \pm 5}{\ms} (60 frames per second) and display \qty{5}{\ms}.
%
The total haptic latency is below the \qty{60}{\ms} detection threshold in vibrotactile feedback \cite{okamoto2009detectability}.
%
The total visual latency can be considered slightly high, yet it is typical for an \AR rendering involving vision-based tracking \cite{knorlein2009influence}.
The two filters also introduce a constant lag between the finger movement and the estimated position and velocity, measured at \qty{160 +- 30}{\ms}.
The two filters also introduce a constant lag between the finger movement and the estimated position and velocity, measured at \qty{160 \pm 30}{\ms}.
%
With respect to the real hand position, it causes a distance error in the displayed virtual hand position, and thus a delay in the triggering of the vibrotactile signal.
%
This is proportional to the speed of the finger, \eg distance error is \qty{12 +- 2.3}{\mm} when the finger moves at \qty{75}{\mm\per\second}.
This is proportional to the speed of the finger, \eg distance error is \qty{12 \pm 2.3}{\mm} when the finger moves at \qty{75}{\mm\per\second}.
%
%and of the vibrotactile signal frequency with respect to the finger speed.%, that is proportional to the speed of the finger.