131 lines
14 KiB
TeX
131 lines
14 KiB
TeX
%With a vibrotactile actuator attached to a hand-held device or directly on the finger, it is possible to simulate virtual haptic sensations as vibrations, such as texture, friction or contact vibrations \cite{culbertson2018haptics}.
|
|
%
|
|
%We describe a system for rendering vibrotactile roughness textures in real time, on any real surface, touched directly with the index fingertip, with no constraints on hand movement and using a simple camera to track the finger pose.
|
|
%
|
|
%We also describe how to pair this tactile rendering with an immersive \AR or \VR headset visual display to provide a coherent visuo-haptic augmentation of the \RE.
|
|
|
|
\section{Principle}
|
|
\label{principle}
|
|
|
|
The visuo-haptic texture rendering system is based on:
|
|
\begin{enumerate}[label=(\arabic*)]
|
|
\item a real-time interaction loop between the finger movements and a coherent visuo-haptic feedback simulating the sensation of a touched texture,
|
|
\item a precise alignment of the \VE with its real counterpart, and
|
|
\item a modulation of the signal frequency by the estimated finger speed with a phase matching.
|
|
\end{enumerate}
|
|
|
|
\figref{diagram} shows the interaction loop diagram and \eqref{signal} the definition of the vibrotactile signal.
|
|
The system consists of three main components: the pose estimation of the tracked real elements, the visual rendering of the \VE, and the vibrotactile signal generation and rendering.
|
|
|
|
\figwide{diagram}{Diagram of the visuo-haptic texture rendering system. }[
|
|
Fiducial markers attached to the voice-coil actuator and to augmented surfaces to track are captured by a camera.
|
|
The positions and rotations (the poses) ${}^c\mathbf{T}_i$, $i=1..n$ of the $n$ defined markers in the camera frame $\mathcal{F}_c$ are estimated, then filtered with an adaptive low-pass filter.
|
|
%These poses are transformed to the \AR/\VR headset frame $\mathcal{F}_h$ and applied to the virtual model replicas to display them superimposed and aligned with the \RE.
|
|
These poses are used to move and display the virtual model replicas aligned with the \RE.
|
|
A collision detection algorithm detects a contact of the virtual hand with the virtual textures.
|
|
If so, the velocity of the finger marker ${}^c\dot{\mathbf{X}}_f$ is estimated using discrete derivative of position and adaptive low-pass filtering, then transformed onto the texture frame $\mathcal{F}_t$.
|
|
The vibrotactile signal $s_k$ is generated by modulating the (scalar) finger velocity ${}^t\hat{\dot{X}}_f$ in the texture direction with the texture period $\lambda$ (\eqref{signal}).
|
|
The signal is sampled at 48~kHz and sent to the voice-coil actuator via an audio amplifier.
|
|
All computation steps except signal sampling are performed at 60~Hz and in separate threads to parallelize them.
|
|
]
|
|
|
|
\section{Description of the System Components}
|
|
\label{system_components}
|
|
|
|
\subsection{Pose Estimation}
|
|
\label{pose_estimation}
|
|
|
|
A \qty{2}{\cm} AprilTag fiducial marker \cite{wang2016apriltag} is glued to the top of the actuator (\figref{device}) to track the finger pose with a camera (StreamCam, Logitech) which is placed above the experimental setup and capturing \qtyproduct{1280 x 720}{px} images at \qty{60}{\hertz} (\figref{apparatus}).
|
|
Other markers are placed on the real surfaces to augment (\figref{setup}) to estimate the relative position of the finger with respect to the surfaces.
|
|
Contrary to similar work using vision-based tracking allows both to free the hand movements and to augment any real surface.
|
|
A camera external to the \AR headset with a marker-based technique is employed to provide accurate and robust tracking with a constant view of the markers \cite{marchand2016pose}.
|
|
We denote ${}^c\mathbf{T}_i$, $i=1..n$ the homogenous transformation matrix that defines the position and rotation of the $i$-th marker out of the $n$ defined markers in the camera frame $\mathcal{F}_c$, \eg the finger pose ${}^c\mathbf{T}_f$ and the texture pose ${}^c\mathbf{T}_t$.
|
|
|
|
To reduce the noise in the pose estimation while maintaining good responsiveness, the 1€ filter \cite{casiez2012filter} is applied; a low-pass filter with an adaptive cut-off frequency, specifically designed for human motion tracking.
|
|
The filtered pose is denoted as ${}^c\hat{\mathbf{T}}_i$.
|
|
The optimal filter parameters were determined using the method of \textcite{casiez2012filter}, with a minimum cut-off frequency of \qty{10}{\hertz} and a slope of \num{0.01}.
|
|
The velocity (without angular velocity) of the marker, denoted as ${}^c\dot{\mathbf{X}}_i$, is estimated using the discrete derivative of the position and another 1€ filter with the same parameters.
|
|
|
|
\subsection{Virtual Environment Alignment}
|
|
\label{virtual_real_alignment}
|
|
|
|
%To be able to compare virtual and augmented realities, we then create a \VE that closely replicate the real one.
|
|
Before a user interacts with the system, it is necessary to design a \VE that will be registered with the \RE during the experiment.
|
|
Each real element tracked by a marker is modelled virtually, \eg the hand and the augmented surface (\figref{device}).
|
|
In addition, the pose and size of the virtual textures were defined on the virtual replicas.
|
|
During the experiment, the system uses marker pose estimates to align the virtual models with their real-world counterparts. %, according to the condition being tested.
|
|
This allows to detect if a finger touches a virtual texture using a collision detection algorithm (Nvidia PhysX), and to show the virtual elements and textures in real-time, aligned with the \RE, using the considered \AR or \VR headset.
|
|
|
|
In our implementation, the \VE is designed with Unity and the Mixed Reality Toolkit (MRTK)\footnoteurl{https://learn.microsoft.com/windows/mixed-reality/mrtk-unity}.
|
|
The visual rendering is achieved using the Microsoft HoloLens~2, an \OST-\AR headset with a \qtyproduct{43 x 29}{\degree} \FoV, a \qty{60}{\Hz} refresh rate, and self-localisation capabilities.
|
|
A \VST-\AR or a \VR headset could have been used as well.
|
|
|
|
\subsection{Vibrotactile Signal Generation and Rendering}
|
|
\label{texture_generation}
|
|
|
|
A voice-coil actuator (HapCoil-One, Actronika) is used to display the vibrotactile signal, as it allows the frequency and amplitude of the signal to be controlled independently over time, covers a wide frequency range (\qtyrange{10}{1000}{\Hz}), and outputs the signal accurately with relatively low acceleration distortion\footnote{HapCoil-One specific characteristics are described in its data sheet: \url{https://tactilelabs.com/wp-content/uploads/2023/11/HapCoil_One_datasheet.pdf}}.
|
|
The voice-coil actuator is encased in a \ThreeD printed plastic shell and firmly attached to the middle phalanx of the user's index finger with a Velcro strap, to enable the fingertip to directly touch the environment (\figref{device}).
|
|
The actuator is driven by a class D audio amplifier (XY-502 / TPA3116D2, Texas Instrument). %, which has proven to be an effective type of amplifier for driving moving-coil \cite{mcmahan2014dynamic}.
|
|
The amplifier is connected to the audio output of a computer that generates the signal using the WASAPI driver in exclusive mode and the NAudio library\footnoteurl{https://github.com/naudio/NAudio}.
|
|
|
|
The represented haptic texture is a series of parallels virtual grooves and ridges, similar to real grating textures manufactured for psychophysical roughness perception studies \secref[related_work]{roughness}. %\cite{friesen2024perceived,klatzky2003feeling,unger2011roughness}.
|
|
It is generated as a square wave audio signal $s_k$, sampled at \qty{48}{\kilo\hertz}, with a period $\lambda$ and an amplitude $A$, similar to \eqref[related_work]{grating_rendering}.
|
|
Its frequency is a ratio of the absolute finger filtered (scalar) velocity ${}^t\hat{\dot{|X|}}_f$, transformed into the texture frame $\mathcal{F}_t$, and the texture period $\lambda$ \cite{friesen2024perceived}.
|
|
As the finger is moving horizontally on the texture, only the $x$ component of the velocity is used.
|
|
This velocity modulation strategy is necessary as the finger position is estimated at a far lower rate (\qty{60}{\hertz}) than the audio signal (unlike high-fidelity force-feedback devices \cite{unger2011roughness}).
|
|
|
|
%As the finger position is estimated at a far lower rate (\qty{60}{\hertz}), the filtered finger (scalar) position ${}^t\hat{X}_f$ in the texture frame $\mathcal{F}_t$ cannot be directly used. % to render the signal if the finger moves fast or if the texture period is small.
|
|
%
|
|
%The best strategy instead is to modulate the frequency of the signal as a ratio of the filtered finger velocity ${}^t\hat{\dot{\mathbf{X}}}_f$ and the texture period $\lambda$ \cite{friesen2024perceived}.
|
|
%
|
|
When a new finger velocity ${}^t\hat{\dot{X}}_{f,j}$ is estimated at time $t_j$, the phase $\phi_j$ of the signal $s$ needs also to be adjusted to ensure a continuity in the signal.
|
|
In other words, the sampling of the audio signal runs at \qty{48}{\kilo\hertz}, and its frequency and phase is updated at a far lower rate of \qty{60}{\hertz} when a new finger velocity is estimated.
|
|
A sample $s_k$ of the audio signal at sampling time $t_k$, with $t_k >= t_j$, is thus given by:
|
|
\begin{subequations}
|
|
\label{eq:signal}
|
|
\begin{align}
|
|
s_k(x_{f,j}, t_k) & = A\, \text{sgn} ( \sin (2 \pi \frac{|\dot{X}_{f,j}|}{\lambda} t_k + \phi_j) ) & \label{eq:signal_speed} \\
|
|
\phi_j & = \phi_{j-1} + 2 \pi \frac{x_{f,j} - x_{f,{j-1}}}{\lambda} t_k & \label{eq:signal_phase}
|
|
\end{align}
|
|
\end{subequations}
|
|
|
|
This rendering preserves the sensation of a constant spatial frequency of the virtual texture while the finger moves at various speeds, which is crucial for the perception of roughness \cite{klatzky2003feeling,unger2011roughness}.
|
|
%
|
|
%Note that the finger position and velocity are transformed from the camera frame $\mathcal{F}_c$ to the texture frame $\mathcal{F}_t$, with the $x$ axis aligned with the texture direction.
|
|
%
|
|
%However, when a new finger position is estimated at time $t_j$, the phase $\phi_j$ needs to be adjusted as well with the frequency to ensure a continuity in the signal as described in \eqref{signal_phase}.
|
|
%
|
|
The phase matching avoids sudden changes in the actuator movement thus affecting the texture perception in an uncontrolled way (\figref{phase_adjustment}) and, contrary to previous work \cite{asano2015vibrotactile,ujitoko2019modulating}, it enables a free exploration of the texture by the user with no constraints on the finger speed.
|
|
A square wave is chosen to get a rendering closer to a real grating texture with the sensation of crossing edges \cite{ujitoko2019modulating}, and because the roughness perception of sine wave textures has been shown not to reproduce the roughness perception of real grating textures \cite{unger2011roughness}.
|
|
A square wave also makes it possible to render low signal frequencies that occur when the finger moves slowly or the texture period is large, as the actuator cannot render a pure sine wave signal below \qty{\approx 20}{\Hz} with sufficient amplitude to be perceived.
|
|
|
|
The vibrotactile texture is described and rendered in this chapter as a 1D signal by integrating the relative finger movement to the texture on a single direction, but it is easily extended to a two-dimensional texture by simply generating a second signal for the orthogonal direction and summing the two signals in the rendering \cite{girard2016haptip}.
|
|
|
|
\fig[0.68]{phase_adjustment}{
|
|
Change in frequency of a sinusoidal signal with and without phase matching.
|
|
}[
|
|
Phase matching ensures a continuity and avoids glitches in the rendering of the signal.
|
|
A sinusoidal signal is shown here for clarity, but a different waveform will give a similar effect.
|
|
]
|
|
|
|
\section{System Latency}
|
|
\label{latency}
|
|
|
|
As shown in \figref{diagram} and described above, the system includes various haptic and visual sensors and rendering devices linked by software processes for image processing, \ThreeD rendering and audio generation.
|
|
Because the chosen \AR headset is a standalone device (like most current \AR/\VR headsets) and cannot directly control the sound card and haptic actuator, the image capture, pose estimation and audio signal generation steps are performed on an external computer.
|
|
All computation steps run in a separate thread to parallelize them and reduce latency, and are synchronized with the headset via a local network and the ZeroMQ library\footnoteurl{https://zeromq.org/}.
|
|
This complex assembly inevitably introduces latency, which must be measured.
|
|
|
|
The rendering system provides a user with two interaction loops between the movements of their hand and the visual (loop 1) and haptic (loop 2) feedbacks.
|
|
Measures are shown as (mean $\pm$ standard deviation), when it is known.
|
|
The end-to-end latency from finger movement to feedback is measured at \qty{36 \pm 4}{\ms} in the haptic loop and \qty{43 \pm 9}{\ms} in the visual loop.
|
|
Both are the result of latency in image capture \qty{16 \pm 1}{\ms}, markers tracking \qty{2 \pm 1}{\ms} and network communication \qty{4 \pm 1}{\ms}.
|
|
The haptic loop also includes the voice-coil latency \qty{15}{\ms} (as specified by the manufacturer\footnotemark[1]), whereas the visual loop includes the latency in \ThreeD rendering \qty{16 \pm 5}{\ms} (60 frames per second) and display \qty{5}{\ms}.
|
|
The total haptic latency is below the \qty{60}{\ms} detection threshold in vibrotactile feedback \cite{okamoto2009detectability}.
|
|
The total visual latency can be considered slightly high, yet it is typical for an \AR rendering involving vision-based tracking \cite{knorlein2009influence}.
|
|
|
|
The two filters also introduce a constant lag between the finger movement and the estimated position and velocity, measured at \qty{160 \pm 30}{\ms}.
|
|
With respect to the real hand position, it causes a distance error in the displayed virtual hand position, and thus a delay in the triggering of the vibrotactile signal.
|
|
This is proportional to the speed of the finger, \eg distance error is \qty{12 \pm 2.3}{\mm} when the finger moves at \qty{75}{\mm\per\second}.
|
|
%and of the vibrotactile signal frequency with respect to the finger speed.%, that is proportional to the speed of the finger.
|