WIP vhar_system
This commit is contained in:
@@ -1,22 +1,20 @@
|
||||
%With a vibrotactile actuator attached to a hand-held device or directly on the finger, it is possible to simulate virtual haptic sensations as vibrations, such as texture, friction or contact vibrations \cite{culbertson2018haptics}.
|
||||
%
|
||||
In this section, we describe a system for rendering vibrotactile roughness textures in real time, on any tangible surface, touched directly with the index fingertip, with no constraints on hand movement and using a simple camera to track the finger pose.
|
||||
%We describe a system for rendering vibrotactile roughness textures in real time, on any tangible surface, touched directly with the index fingertip, with no constraints on hand movement and using a simple camera to track the finger pose.
|
||||
%
|
||||
We also describe how to pair this tactile rendering with an immersive \AR or \VR headset visual display to provide a coherent, multimodal visuo-haptic augmentation of the \RE.
|
||||
%We also describe how to pair this tactile rendering with an immersive \AR or \VR headset visual display to provide a coherent, multimodal visuo-haptic augmentation of the \RE.
|
||||
|
||||
\section{Principle}
|
||||
\label{principle}
|
||||
|
||||
The visuo-haptic texture rendering system is based on
|
||||
%
|
||||
\begin{enumerate*}[label=(\arabic*)]
|
||||
\item a real-time interaction loop between the finger movements and a coherent visuo-haptic feedback simulating the sensation of a touched texture,
|
||||
\item a precise alignment of the \VE with its real counterpart, and
|
||||
\item a modulation of the signal frequency by the estimated finger speed with a phase matching.
|
||||
\end{enumerate*}
|
||||
%
|
||||
The visuo-haptic texture rendering system is based on:
|
||||
\begin{enumerate}[label=(\arabic*)]
|
||||
\item a real-time interaction loop between the finger movements and a coherent visuo-haptic feedback simulating the sensation of a touched texture,
|
||||
\item a precise alignment of the \VE with its real counterpart, and
|
||||
\item a modulation of the signal frequency by the estimated finger speed with a phase matching.
|
||||
\end{enumerate}
|
||||
|
||||
\figref{diagram} shows the interaction loop diagram and \eqref{signal} the definition of the vibrotactile signal.
|
||||
%
|
||||
The system consists of three main components: the pose estimation of the tracked real elements, the visual rendering of the \VE, and the vibrotactile signal generation and rendering.
|
||||
|
||||
\figwide[1]{diagram}{Diagram of the visuo-haptic texture rendering system. }[
|
||||
@@ -31,13 +29,14 @@ The system consists of three main components: the pose estimation of the tracked
|
||||
All computation steps except signal sampling are performed at 60~Hz and in separate threads to parallelize them.
|
||||
]
|
||||
|
||||
\section{Pose Estimation and Virtual Environment Alignment}
|
||||
\label{virtual_real_alignment}
|
||||
\section{Description of the System Components}
|
||||
\label{system_components}
|
||||
|
||||
\subsection{Pose Estimation}
|
||||
\label{pose_estimation}
|
||||
|
||||
\begin{subfigs}{setup}{Visuo-haptic texture rendering system setup. }[][
|
||||
\item HapCoil-One voice-coil actuator with a fiducial marker on top attached to a participant's right index finger.
|
||||
\item HoloLens~2 \AR headset, the two cardboard masks to switch the real or virtual environments with the same \FoV, and the \ThreeD-printed piece for attaching the masks to the headset.
|
||||
\item User exploring a virtual vibrotactile texture on a tangible sheet of paper.
|
||||
\item HapCoil-One voice-coil actuator with a fiducial marker on top attached to the middle-phalanx of the user's index finger.
|
||||
]
|
||||
\subfig[0.325]{device}
|
||||
%\subfig[0.65]{headset}
|
||||
@@ -46,76 +45,51 @@ The system consists of three main components: the pose estimation of the tracked
|
||||
\end{subfigs}
|
||||
|
||||
A \qty{2}{\cm} AprilTag fiducial marker \cite{wang2016apriltag} is glued to the top of the actuator (\figref{device}) to track the finger pose with a camera (StreamCam, Logitech) which is placed above the experimental setup and capturing \qtyproduct{1280 x 720}{px} images at \qty{60}{\hertz} (\figref{apparatus}).
|
||||
%
|
||||
Other markers are placed on the tangible surfaces to augment (\figref{setup}). % to estimate the relative position of the finger with respect to the surfaces
|
||||
%
|
||||
Contrary to similar work which either constrained hand to a constant speed to keep the signal frequency constant \cite{asano2015vibrotactile,friesen2024perceived}, or used mechanical sensors attached to the hand \cite{friesen2024perceived,strohmeier2017generating}, using vision-based tracking allows both to free the hand movements and to augment any tangible surface.
|
||||
%
|
||||
A camera external to the \AR/\VR headset with a marker-based technique is employed to provide accurate and robust tracking with a constant view of the markers \cite{marchand2016pose}.
|
||||
%
|
||||
Other markers are placed on the tangible surfaces to augment (\figref{setup}) to estimate the relative position of the finger with respect to the surfaces.
|
||||
Contrary to similar work using vision-based tracking allows both to free the hand movements and to augment any tangible surface.
|
||||
A camera external to the \AR headset with a marker-based technique is employed to provide accurate and robust tracking with a constant view of the markers \cite{marchand2016pose}.
|
||||
We denote ${}^c\mathbf{T}_i$, $i=1..n$ the homogenous transformation matrix that defines the position and rotation of the $i$-th marker out of the $n$ defined markers in the camera frame $\mathcal{F}_c$, \eg the finger pose ${}^c\mathbf{T}_f$ and the texture pose ${}^c\mathbf{T}_t$.
|
||||
%
|
||||
To reduce the noise in the pose estimation while maintaining good responsiveness, the 1€ filter \cite{casiez2012filter} is applied; a low-pass filter with an adaptive cutoff frequency, specifically designed for human motion tracking..
|
||||
%
|
||||
|
||||
To reduce the noise in the pose estimation while maintaining good responsiveness, the 1€ filter \cite{casiez2012filter} is applied; a low-pass filter with an adaptive cut-off frequency, specifically designed for human motion tracking.
|
||||
The filtered pose is denoted as ${}^c\hat{\mathbf{T}}_i$.
|
||||
%
|
||||
The optimal filter parameters were determined using the method of \textcite{casiez2012filter}, with a minimum cutoff frequency of \qty{10}{\hertz} and a slope of \num{0.01}.
|
||||
%
|
||||
The velocity (without angular velocity) of the marker, denoted as ${}^c\dot{\mathbf{X}}_i$, is estimated using the discrete derivative of the position and an other 1€ filter with the same parameters.
|
||||
The optimal filter parameters were determined using the method of \textcite{casiez2012filter}, with a minimum cut-off frequency of \qty{10}{\hertz} and a slope of \num{0.01}.
|
||||
The velocity (without angular velocity) of the marker, denoted as ${}^c\dot{\mathbf{X}}_i$, is estimated using the discrete derivative of the position and another 1€ filter with the same parameters.
|
||||
|
||||
To be able to compare virtual and augmented realities, we then create a \VE that closely replicate the real one.
|
||||
%Before a user interacts with the system, it is necessary to design a virtual environment that will be registered with the \RE during the experiment.
|
||||
%
|
||||
Each real element tracked by a marker is modelled virtually, \ie the hand and the augmented tangible surface (\figref{renderings}).
|
||||
%
|
||||
In addition, the pose and size of the virtual textures are defined on the virtual replicas.
|
||||
%
|
||||
\subsection{Virtual Environment Alignment}
|
||||
\label{virtual_real_alignment}
|
||||
|
||||
%To be able to compare virtual and augmented realities, we then create a \VE that closely replicate the real one.
|
||||
Before a user interacts with the system, it is necessary to design a \VE that will be registered with the \RE during the experiment.
|
||||
Each real element tracked by a marker is modelled virtually, \eg the hand and the augmented tangible surface (\figref{device}).
|
||||
In addition, the pose and size of the virtual textures were defined on the virtual replicas.
|
||||
During the experiment, the system uses marker pose estimates to align the virtual models with their real-world counterparts. %, according to the condition being tested.
|
||||
%
|
||||
This allows to detect if a finger touches a virtual texture using a collision detection algorithm (Nvidia PhysX), and to show the virtual elements and textures in real-time, aligned with the \RE (\figref{renderings}), using the considered \AR or \VR headset.
|
||||
This allows to detect if a finger touches a virtual texture using a collision detection algorithm (Nvidia PhysX), and to show the virtual elements and textures in real-time, aligned with the \RE, using the considered \AR or \VR headset.
|
||||
|
||||
In our implementation, the virtual hand and environment are designed with Unity and the Mixed Reality Toolkit (MRTK).
|
||||
%
|
||||
In our implementation, the \VE is designed with Unity and the Mixed Reality Toolkit (MRTK)\footnoteurl{https://learn.microsoft.com/windows/mixed-reality/mrtk-unity}.
|
||||
The visual rendering is achieved using the Microsoft HoloLens~2, an \OST-\AR headset with a \qtyproduct{43 x 29}{\degree} \FoV, a \qty{60}{\Hz} refresh rate, and self-localisation capabilities.
|
||||
%
|
||||
It was chosen over \VST-\AR because \OST-\AR only adds virtual content to the \RE, while \VST-\AR streams a real-time video capture of the \RE \cite{macedo2023occlusion}.
|
||||
%
|
||||
Indeed, one of our objectives (\secref{experiment}) is to directly compare a \VE that replicates a real one, rather than a video feed that introduces many supplementary visual limitations \cite{kim2018revisiting,macedo2023occlusion}.
|
||||
%
|
||||
To simulate a \VR headset, a cardboard mask (with holes for sensors) is attached to the headset to block the view of the \RE (\figref{headset}).
|
||||
A \VST-\AR or a \VR headset could have been used as well.
|
||||
|
||||
\section{Vibrotactile Signal Generation and Rendering}
|
||||
\subsection{Vibrotactile Signal Generation and Rendering}
|
||||
\label{texture_generation}
|
||||
|
||||
A voice-coil actuator (HapCoil-One, Actronika) is used to display the vibrotactile signal, as it allows the frequency and amplitude of the signal to be controlled independently over time, covers a wide frequency range (\qtyrange{10}{1000}{\Hz}), and outputs the signal accurately with relatively low acceleration distortion\footnote{HapCoil-One specific characteristics are described in its data sheet: \url{https://web.archive.org/web/20240228161416/https://tactilelabs.com/wp-content/uploads/2023/11/HapCoil_One_datasheet.pdf}}.
|
||||
%
|
||||
The voice-coil actuator is encased in a \ThreeD printed plastic shell and firmly attached to the middle phalanx of the user's index finger with a Velcro strap, to enable the fingertip to directly touch the environment (\figref{device}).
|
||||
%
|
||||
The actuator is driven by a class D audio amplifier (XY-502 / TPA3116D2, Texas Instrument). %, which has proven to be an effective type of amplifier for driving moving-coil \cite{mcmahan2014dynamic}.
|
||||
%
|
||||
The amplifier is connected to the audio output of a computer that generates the signal using the WASAPI driver in exclusive mode and the NAudio library.
|
||||
The amplifier is connected to the audio output of a computer that generates the signal using the WASAPI driver in exclusive mode and the NAudio library\footnoteurl{https://github.com/naudio/NAudio}.
|
||||
|
||||
The represented haptic texture is a series of parallels virtual grooves and ridges, similar to real grating textures manufactured for psychophysical roughness perception studies \cite{friesen2024perceived,klatzky2003feeling,unger2011roughness}.
|
||||
%
|
||||
It is generated as a square wave audio signal $s_k$, sampled at \qty{48}{\kilo\hertz}, with a period $\lambda$ and an amplitude $A$.
|
||||
%
|
||||
The represented haptic texture is a series of parallels virtual grooves and ridges, similar to real grating textures manufactured for psychophysical roughness perception studies \secref[related_work]{roughness}. %\cite{friesen2024perceived,klatzky2003feeling,unger2011roughness}.
|
||||
It is generated as a square wave audio signal $s_k$, sampled at \qty{48}{\kilo\hertz}, with a period $\lambda$ and an amplitude $A$, similar to \eqref[related_work]{grating_rendering}.
|
||||
Its frequency is a ratio of the absolute finger filtered (scalar) velocity ${}^t\hat{\dot{|X|}}_f$, transformed into the texture frame $\mathcal{F}_t$, and the texture period $\lambda$ \cite{friesen2024perceived}.
|
||||
%
|
||||
As the finger is moving horizontally on the texture, only the $x$ component of the velocity is used.
|
||||
%
|
||||
%This velocity modulation strategy is necessary as the finger position is estimated at a far lower rate (\qty{60}{\hertz}) than the audio signal.
|
||||
%
|
||||
%
|
||||
This velocity modulation strategy is necessary as the finger position is estimated at a far lower rate (\qty{60}{\hertz}) than the audio signal (unlike high-fidelity force-feedback devices \cite{unger2011roughness}).
|
||||
|
||||
%As the finger position is estimated at a far lower rate (\qty{60}{\hertz}), the filtered finger (scalar) position ${}^t\hat{X}_f$ in the texture frame $\mathcal{F}_t$ cannot be directly used. % to render the signal if the finger moves fast or if the texture period is small.
|
||||
%
|
||||
%The best strategy instead is to modulate the frequency of the signal as a ratio of the filtered finger velocity ${}^t\hat{\dot{\mathbf{X}}}_f$ and the texture period $\lambda$ \cite{friesen2024perceived}.
|
||||
%
|
||||
When a new finger velocity ${}^t\hat{\dot{X}}_{f,j}$ is estimated at time $t_j$, the phase $\phi_j$ of the signal $s$ needs also to be adjusted to ensure a continuity in the signal.
|
||||
%
|
||||
In other words, the sampling of the audio signal runs at \qty{48}{\kilo\hertz}, and its frequency and phase is updated at a far lower rate of \qty{60}{\hertz} when a new finger velocity is estimated.
|
||||
%
|
||||
A sample $s_k$ of the audio signal at sampling time $t_k$, with $t_k >= t_j$, is thus given by:
|
||||
%
|
||||
\begin{subequations}
|
||||
\label{eq:signal}
|
||||
\begin{align}
|
||||
@@ -123,28 +97,20 @@ A sample $s_k$ of the audio signal at sampling time $t_k$, with $t_k >= t_j$, is
|
||||
\phi_j & = \phi_{j-1} + 2 \pi \frac{x_{f,j} - x_{f,{j-1}}}{\lambda} t_k & \label{eq:signal_phase}
|
||||
\end{align}
|
||||
\end{subequations}
|
||||
%
|
||||
%This is a common rendering method for vibrotactile textures, with well-defined parameters, that has been employed to modify perceived haptic roughness of a tangible surface \cite{asano2015vibrotactile,konyo2005tactile,ujitoko2019modulating}.
|
||||
%
|
||||
%As the finger position is estimated at a far lower rate (\qty{60}{\hertz}) than the audio signal, the finger position $x_f$ cannot be directly used to render the signal if the finger moves fast or if the texture period is small.
|
||||
%
|
||||
%The best strategy instead is to modulate the frequency of the signal $s$ as a ratio of the finger velocity $\dot{x}_f$ and the texture period $\lambda$ \cite{friesen2024perceived}.
|
||||
%
|
||||
|
||||
This rendering preserves the sensation of a constant spatial frequency of the virtual texture while the finger moves at various speeds, which is crucial for the perception of roughness \cite{klatzky2003feeling,unger2011roughness}.
|
||||
%
|
||||
%Note that the finger position and velocity are transformed from the camera frame $\mathcal{F}_c$ to the texture frame $\mathcal{F}_t$, with the $x$ axis aligned with the texture direction.
|
||||
%
|
||||
%However, when a new finger position is estimated at time $t_j$, the phase $\phi_j$ needs to be adjusted as well with the frequency to ensure a continuity in the signal as described in \eqref{signal_phase}.
|
||||
%
|
||||
The phase matching avoids sudden changes in the actuator movement thus affecting the texture perception in an uncontrolled way (\figref{phase_matching}) and, contrary to previous work \cite{asano2015vibrotactile,friesen2024perceived}, it enables no constraints a free exploration of the texture by the user with no constraints on the finger speed.
|
||||
%
|
||||
Finally, a square wave is chosen to get a rendering closer to a real grating texture with the sensation of crossing edges \cite{ujitoko2019modulating}, and because the roughness perception of sine wave textures has been shown not to reproduce the roughness perception of real grating textures \cite{unger2011roughness}.
|
||||
%
|
||||
%And secondly, to be able to render low frequencies that occurs when the finger moves slowly or the texture period is large, as the actuator cannot render frequencies below \qty{\approx 20}{\Hz} with enough amplitude to be perceived with a pure sine wave signal.
|
||||
%
|
||||
The tactile texture is described and rendered in this work as a one dimensional signal by integrating the relative finger movement to the texture on a single direction, but it is easily extended to a two-dimensional texture by simply generating a second signal for the orthogonal direction and summing the two signals in the rendering.
|
||||
The phase matching avoids sudden changes in the actuator movement thus affecting the texture perception in an uncontrolled way (\figref{phase_adjustment}) and, contrary to previous work \cite{asano2015vibrotactile,ujitoko2019modulating}, it enables a free exploration of the texture by the user with no constraints on the finger speed.
|
||||
A square wave is chosen to get a rendering closer to a real grating texture with the sensation of crossing edges \cite{ujitoko2019modulating}, and because the roughness perception of sine wave textures has been shown not to reproduce the roughness perception of real grating textures \cite{unger2011roughness}.
|
||||
A square wave also makes it possible to render low signal frequencies that occur when the finger moves slowly or the texture period is large, as the actuator cannot render a pure sine wave signal below \qty{\approx 20}{\Hz} with sufficient amplitude to be perceived.
|
||||
|
||||
\fig[0.7]{phase_adjustment}{
|
||||
The vibrotactile texture is described and rendered in this chapter as a 1D signal by integrating the relative finger movement to the texture on a single direction, but it is easily extended to a two-dimensional texture by simply generating a second signal for the orthogonal direction and summing the two signals in the rendering \cite{girard2016haptip}.
|
||||
|
||||
\fig[0.68]{phase_adjustment}{
|
||||
Change in frequency of a sinusoidal signal with and without phase matching.
|
||||
}[
|
||||
Phase matching ensures a continuity and avoids glitches in the rendering of the signal.
|
||||
@@ -154,32 +120,20 @@ The tactile texture is described and rendered in this work as a one dimensional
|
||||
\section{System Latency}
|
||||
\label{latency}
|
||||
|
||||
%As shown in \figref{diagram} and described above, the system includes various haptic and visual sensors and rendering devices linked by software processes for image processing, \ThreeD rendering and audio generation.
|
||||
%
|
||||
As shown in \figref{diagram} and described above, the system includes various haptic and visual sensors and rendering devices linked by software processes for image processing, \ThreeD rendering and audio generation.
|
||||
Because the chosen \AR headset is a standalone device (like most current \AR/\VR headsets) and cannot directly control the sound card and haptic actuator, the image capture, pose estimation and audio signal generation steps are performed on an external computer.
|
||||
%
|
||||
All computation steps run in a separate thread to parallelize them and reduce latency, and are synchronised with the headset via a local network and the ZeroMQ library.
|
||||
%
|
||||
All computation steps run in a separate thread to parallelize them and reduce latency, and are synchronized with the headset via a local network and the ZeroMQ library\footnoteurl{https://zeromq.org/}.
|
||||
This complex assembly inevitably introduces latency, which must be measured.
|
||||
|
||||
The rendering system provides a user with two interaction loops between the movements of their hand and the visual (loop 1) and haptic (loop 2) feedbacks.
|
||||
%
|
||||
Measures are shown as mean $\pm$ standard deviation (when it is known).
|
||||
%
|
||||
Measures are shown as (mean $\pm$ standard deviation), when it is known.
|
||||
The end-to-end latency from finger movement to feedback is measured at \qty{36 \pm 4}{\ms} in the haptic loop and \qty{43 \pm 9}{\ms} in the visual loop.
|
||||
%
|
||||
Both are the result of latency in image capture \qty{16 \pm 1}{\ms}, markers tracking \qty{2 \pm 1}{\ms} and network communication \qty{4 \pm 1}{\ms}.
|
||||
%
|
||||
The haptic loop also includes the voice-coil latency \qty{15}{\ms} (as specified by the manufacturer\footnotemark[1]), whereas the visual loop includes the latency in \ThreeD rendering \qty{16 \pm 5}{\ms} (60 frames per second) and display \qty{5}{\ms}.
|
||||
%
|
||||
The total haptic latency is below the \qty{60}{\ms} detection threshold in vibrotactile feedback \cite{okamoto2009detectability}.
|
||||
%
|
||||
The total visual latency can be considered slightly high, yet it is typical for an \AR rendering involving vision-based tracking \cite{knorlein2009influence}.
|
||||
|
||||
The two filters also introduce a constant lag between the finger movement and the estimated position and velocity, measured at \qty{160 \pm 30}{\ms}.
|
||||
%
|
||||
With respect to the real hand position, it causes a distance error in the displayed virtual hand position, and thus a delay in the triggering of the vibrotactile signal.
|
||||
%
|
||||
This is proportional to the speed of the finger, \eg distance error is \qty{12 \pm 2.3}{\mm} when the finger moves at \qty{75}{\mm\per\second}.
|
||||
%
|
||||
%and of the vibrotactile signal frequency with respect to the finger speed.%, that is proportional to the speed of the finger.
|
||||
|
||||
Reference in New Issue
Block a user