Correct vhar system equations
This commit is contained in:
@@ -11,12 +11,12 @@ However, this method has not yet been integrated in an \AR context, where the us
|
||||
In this chapter, we propose a \textbf{system for rendering visual and haptic virtual textures that augment real surfaces}.
|
||||
It is implemented with an immersive \OST-\AR headset Microsoft HoloLens~2 and a wearable vibrotactile (voice-coil) device worn on the outside of finger (not covering the fingertip, \secref[related_work]{vhar_haptics}).
|
||||
The visuo-haptic augmentations can be \textbf{viewed from any angle} and \textbf{explored freely with the bare finger}, as if they were real textures.
|
||||
To ensure both real-time and reliable renderings, the hand and the real surfaces are tracked using a webcam and marker-based tracking.
|
||||
To ensure both real-time and reliable renderings, the hand and the real surfaces are tracked using a webcam and marker-based pose estimation.
|
||||
The haptic textures are rendered as a vibrotactile signal representing a patterned grating texture that is synchronized with the finger movement on the augmented surface.
|
||||
|
||||
\noindentskip The contributions of this chapter are:
|
||||
\begin{itemize}
|
||||
\item The rendering of virtual vibrotactile roughness textures representing a patterned grating texture in real time from free finger movements and using vision-based tracking.
|
||||
\item The rendering of virtual vibrotactile roughness textures representing a patterned grating texture in real time from free finger movements and using vision-based finger pose estimation.
|
||||
\item A system to provide a coherent visuo-haptic texture augmentations of the \RE in a direct touch context using an immersive \AR headset and wearable haptics.
|
||||
\end{itemize}
|
||||
|
||||
@@ -26,7 +26,7 @@ The haptic textures are rendered as a vibrotactile signal representing a pattern
|
||||
|
||||
\begin{subfigs}{setup}{Visuo-haptic texture rendering system setup.}[][
|
||||
\item HapCoil-One voice-coil actuator with a fiducial marker on top attached to the middle-phalanx of the user's index finger.
|
||||
\item Our implementation of the system using a Microsoft HoloLens~2, a webcam for tracking the hand and the real surfaces, and an external computer for processing the tracking data and rendering the haptic textures.
|
||||
\item Our implementation of the system using a Microsoft HoloLens~2, a webcam for estimating the poses the hand and the real surfaces, and an external computer for processing the tracking data and rendering the haptic textures.
|
||||
]
|
||||
\subfigsheight{60mm}
|
||||
\subfig{device}
|
||||
|
||||
@@ -19,11 +19,11 @@ The system consists of three main components: the pose estimation of the tracked
|
||||
|
||||
\figwide{diagram}{Diagram of the visuo-haptic texture rendering system. }[
|
||||
Fiducial markers attached to the voice-coil actuator and to augmented surfaces to track are captured by a camera.
|
||||
The positions and rotations (the poses) ${}^c\mathbf{T}_i$, $i=1..n$ of the $n$ defined markers in the camera frame $\mathcal{F}_c$ are estimated, then filtered with an adaptive low-pass filter.
|
||||
%These poses are transformed to the \AR/\VR headset frame $\mathcal{F}_h$ and applied to the virtual model replicas to display them superimposed and aligned with the \RE.
|
||||
The positions and rotations (the poses) ${}^c\mathbf{T}_i$, $i=1..n$ of the $n$ defined markers in the camera frame $\poseFrame{c}$ are estimated, then filtered with an adaptive low-pass filter.
|
||||
%These poses are transformed to the \AR/\VR headset frame $\poseFrame{h}$ and applied to the virtual model replicas to display them superimposed and aligned with the \RE.
|
||||
These poses are used to move and display the virtual model replicas aligned with the \RE.
|
||||
A collision detection algorithm detects a contact of the virtual hand with the virtual textures.
|
||||
If so, the velocity of the finger marker ${}^c\dot{\mathbf{X}}_f$ is estimated using discrete derivative of position and adaptive low-pass filtering, then transformed onto the texture frame $\mathcal{F}_t$.
|
||||
If so, the velocity of the finger marker ${}^c\dot{\mathbf{X}}_f$ is estimated using discrete derivative of position and adaptive low-pass filtering, then transformed onto the texture frame $\poseFrame{t}$.
|
||||
The vibrotactile signal $s_k$ is generated by modulating the (scalar) finger velocity ${}^t\hat{\dot{X}}_f$ in the texture direction with the texture period $\lambda$ (\eqref{signal}).
|
||||
The signal is sampled at 48~kHz and sent to the voice-coil actuator via an audio amplifier.
|
||||
All computation steps except signal sampling are performed at 60~Hz and in separate threads to parallelize them.
|
||||
@@ -37,14 +37,18 @@ The system consists of three main components: the pose estimation of the tracked
|
||||
|
||||
A \qty{2}{\cm} AprilTag fiducial marker \cite{wang2016apriltag} is glued to the top of the actuator (\figref{device}) to track the finger pose with a camera (StreamCam, Logitech) which is placed above the experimental setup and capturing \qtyproduct{1280 x 720}{px} images at \qty{60}{\hertz} (\figref{apparatus}).
|
||||
Other markers are placed on the real surfaces to augment (\figref{setup}) to estimate the relative position of the finger with respect to the surfaces.
|
||||
Contrary to similar work, using vision-based tracking allows both to free the hand movements and to augment any real surface.
|
||||
A camera external to the \AR headset with a marker-based technique is employed to provide accurate and robust tracking with a constant view of the markers \cite{marchand2016pose}.
|
||||
We denote ${}^c\mathbf{T}_i$, $i=1..n$ the homogenous transformation matrix that defines the position and rotation of the $i$-th marker out of the $n$ defined markers in the camera frame $\mathcal{F}_c$, \eg the finger pose ${}^c\mathbf{T}_f$ and the texture pose ${}^c\mathbf{T}_t$.
|
||||
Contrary to similar work, using vision-based pose estimation allows both to free the hand movements and to augment any real surface.
|
||||
|
||||
A camera external to the \AR headset with a marker-based technique is employed to provide accurate and robust pose estimation with a constant view of the markers \cite{marchand2016pose}.
|
||||
We denote $\pose{c}{T}{i}$, $i=1..n$ the homogenous transformation matrix that defines the position and rotation of the $i$-th marker out of the $n$ defined markers in the camera frame $\poseFrame{c}$, \eg the finger pose $\pose{c}{T}{f}$ and the augmented surface pose $\pose{c}{T}{s}$ in the camera frame.
|
||||
|
||||
To reduce the noise in the pose estimation while maintaining good responsiveness, the 1€ filter \cite{casiez2012filter} is applied; a low-pass filter with an adaptive cut-off frequency, specifically designed for human motion tracking.
|
||||
The filtered pose is denoted as ${}^c\hat{\mathbf{T}}_i$.
|
||||
The filtered pose is denoted as $\pose{c}{\hat{T}}{i}$.
|
||||
The optimal filter parameters were determined using the method of \textcite{casiez2012filter}, with a minimum cut-off frequency of \qty{10}{\hertz} and a slope of \num{0.01}.
|
||||
The velocity (without angular velocity) of the marker, denoted as ${}^c\dot{\mathbf{X}}_i$, is estimated using the discrete derivative of the position and another 1€ filter with the same parameters.
|
||||
|
||||
The velocity (without angular velocity) of the finger marker, denoted as $\pose{c}{\dot{T}}{f}$, is estimated using the discrete derivative of the position.
|
||||
It is then filtered with another 1€ filter with the same parameters, and denoted as $\pose{c}{\hat{\dot{T}}}{f}$.
|
||||
Finally, this filtered finger velocity is transformed into the augmented surface frame $\poseFrame{s}$ to be used in the vibrotactile signal generation, such as $\pose{s}{\hat{\dot{T}}}{f} = \pose{c}{T}{s} \, \pose{c}{\hat{\dot{T}}}{f}$.
|
||||
|
||||
\subsection{Virtual Environment Alignment}
|
||||
\label{virtual_real_alignment}
|
||||
@@ -68,30 +72,30 @@ The voice-coil actuator is encased in a \ThreeD printed plastic shell and firmly
|
||||
The actuator is driven by a class D audio amplifier (XY-502 / TPA3116D2, Texas Instrument). %, which has proven to be an effective type of amplifier for driving moving-coil \cite{mcmahan2014dynamic}.
|
||||
The amplifier is connected to the audio output of a computer that generates the signal using the WASAPI driver in exclusive mode and the NAudio library\footnoteurl{https://github.com/naudio/NAudio}.
|
||||
|
||||
The represented haptic texture is a series of parallels virtual grooves and ridges, similar to real grating textures manufactured for psychophysical roughness perception studies \secref[related_work]{roughness}. %\cite{friesen2024perceived,klatzky2003feeling,unger2011roughness}.
|
||||
It is generated as a square wave audio signal $s_k$, sampled at \qty{48}{\kilo\hertz}, with a period $\lambda$ and an amplitude $A$, similar to \eqref[related_work]{grating_rendering}.
|
||||
Its frequency is a ratio of the absolute finger filtered (scalar) velocity ${}^t\hat{\dot{|X|}}_f$, transformed into the texture frame $\mathcal{F}_t$, and the texture period $\lambda$ \cite{friesen2024perceived}.
|
||||
As the finger is moving horizontally on the texture, only the $x$ component of the velocity is used.
|
||||
The represented haptic texture is a 1D series of parallels virtual grooves and ridges, similar to the real linear grating textures manufactured for psychophysical roughness perception studies \secref[related_work]{roughness}. %\cite{friesen2024perceived,klatzky2003feeling,unger2011roughness}.
|
||||
It is generated as a square wave audio signal $r$, sampled at \qty{48}{\kilo\hertz}, with a texture period $\lambda$ and an amplitude $A$, similar to \eqref[related_work]{grating_rendering}.
|
||||
Its frequency is a ratio of the absolute finger filtered (scalar) velocity $x_f = \pose{s}{|\hat{\dot{T}}|}{f}$, and the texture period $\lambda$ \cite{friesen2024perceived}.
|
||||
As the finger is moving horizontally on the texture, only the $X$ component of the velocity is used.
|
||||
This velocity modulation strategy is necessary as the finger position is estimated at a far lower rate (\qty{60}{\hertz}) than the audio signal (unlike high-fidelity force-feedback devices \cite{unger2011roughness}).
|
||||
|
||||
%As the finger position is estimated at a far lower rate (\qty{60}{\hertz}), the filtered finger (scalar) position ${}^t\hat{X}_f$ in the texture frame $\mathcal{F}_t$ cannot be directly used. % to render the signal if the finger moves fast or if the texture period is small.
|
||||
%As the finger position is estimated at a far lower rate (\qty{60}{\hertz}), the filtered finger (scalar) position ${}^t\hat{X}_f$ in the texture frame $\poseFrame{t}$ cannot be directly used. % to render the signal if the finger moves fast or if the texture period is small.
|
||||
%
|
||||
%The best strategy instead is to modulate the frequency of the signal as a ratio of the filtered finger velocity ${}^t\hat{\dot{\mathbf{X}}}_f$ and the texture period $\lambda$ \cite{friesen2024perceived}.
|
||||
%
|
||||
When a new finger velocity ${}^t\hat{\dot{X}}_{f,j}$ is estimated at time $t_j$, the phase $\phi_j$ of the signal $s$ needs also to be adjusted to ensure a continuity in the signal.
|
||||
When a new finger velocity $x_f\,(t_j)$ is estimated at time $t_j$, the phase $\phi$ of the signal $r$ needs also to be adjusted to ensure a continuity in the signal.
|
||||
In other words, the sampling of the audio signal runs at \qty{48}{\kilo\hertz}, and its frequency and phase is updated at a far lower rate of \qty{60}{\hertz} when a new finger velocity is estimated.
|
||||
A sample $s_k$ of the audio signal at sampling time $t_k$, with $t_k >= t_j$, is thus given by:
|
||||
A sample $r(x_f, t_j, t_k)$ of the audio signal at sampling time $t_k$, with $t_k >= t_j$, is thus given by:
|
||||
\begin{subequations}
|
||||
\label{eq:signal}
|
||||
\begin{align}
|
||||
s_k(x_{f,j}, t_k) & = A\, \text{sgn} ( \sin (2 \pi \frac{|\dot{X}_{f,j}|}{\lambda} t_k + \phi_j) ) & \label{eq:signal_speed} \\
|
||||
\phi_j & = \phi_{j-1} + 2 \pi \frac{x_{f,j} - x_{f,{j-1}}}{\lambda} t_k & \label{eq:signal_phase}
|
||||
r(x_f, t_j, t_k) & = A\, \text{sgn} ( \sin (2 \pi \frac{x_f\,(t_j)}{\lambda} t_k + \phi(t_j) ) ) & \label{eq:signal_speed} \\
|
||||
\phi(t_j) & = \phi(t_{j-1}) + 2 \pi \frac{x_f\,(t_j) - x_f\,(t_j - 1)}{\lambda} t_k & \label{eq:signal_phase}
|
||||
\end{align}
|
||||
\end{subequations}
|
||||
|
||||
This rendering preserves the sensation of a constant spatial frequency of the virtual texture while the finger moves at various speeds, which is crucial for the perception of roughness \cite{klatzky2003feeling,unger2011roughness}.
|
||||
%
|
||||
%Note that the finger position and velocity are transformed from the camera frame $\mathcal{F}_c$ to the texture frame $\mathcal{F}_t$, with the $x$ axis aligned with the texture direction.
|
||||
%Note that the finger position and velocity are transformed from the camera frame $\poseFrame{c}$ to the texture frame $\poseFrame{t}$, with the $x$ axis aligned with the texture direction.
|
||||
%
|
||||
%However, when a new finger position is estimated at time $t_j$, the phase $\phi_j$ needs to be adjusted as well with the frequency to ensure a continuity in the signal as described in \eqref{signal_phase}.
|
||||
%
|
||||
@@ -119,10 +123,10 @@ This complex assembly inevitably introduces latency, which must be measured.
|
||||
The rendering system provides a user with two interaction loops between the movements of their hand and the visual (loop 1) and haptic (loop 2) feedbacks.
|
||||
Measures are shown as (mean $\pm$ standard deviation), when it is known.
|
||||
The end-to-end latency from finger movement to feedback is measured at \qty{36 \pm 4}{\ms} in the haptic loop and \qty{43 \pm 9}{\ms} in the visual loop.
|
||||
Both are the result of latency in image capture \qty{16 \pm 1}{\ms}, markers tracking \qty{2 \pm 1}{\ms} and network communication \qty{4 \pm 1}{\ms}.
|
||||
Both are the result of latency in image capture \qty{16 \pm 1}{\ms}, markers pose estimation \qty{2 \pm 1}{\ms} and network communication \qty{4 \pm 1}{\ms}.
|
||||
The haptic loop also includes the voice-coil latency \qty{15}{\ms} (as specified by the manufacturer\footnotemark[1]), whereas the visual loop includes the latency in \ThreeD rendering \qty{16 \pm 5}{\ms} (60 frames per second) and display \qty{5}{\ms}.
|
||||
The total haptic latency is below the \qty{60}{\ms} detection threshold in vibrotactile feedback \cite{okamoto2009detectability}.
|
||||
The total visual latency can be considered slightly high, yet it is typical for an \AR rendering involving vision-based tracking \cite{knorlein2009influence}.
|
||||
The total visual latency can be considered slightly high, yet it is typical for an \AR rendering involving vision-based pose estimation \cite{knorlein2009influence}.
|
||||
|
||||
The two filters also introduce a constant lag between the finger movement and the estimated position and velocity, measured at \qty{160 \pm 30}{\ms}.
|
||||
With respect to the real hand position, it causes a distance error in the displayed virtual hand position, and thus a delay in the triggering of the vibrotactile signal.
|
||||
|
||||
@@ -5,7 +5,7 @@
|
||||
|
||||
In this chapter, we designed and implemented a system for rendering virtual visuo-haptic textures that augment a real surface.
|
||||
Directly touched with the fingertip, the perceived roughness of the surface can be increased using a wearable vibrotactile voice-coil device mounted on the middle phalanx of the finger.
|
||||
We adapted the 1D sinusoidal grating rendering method, common in the literature but not yet integrated in a direct touch context, for use with vision-based tracking of the finger and paired it with an immersive \AR headset.
|
||||
We adapted the 1D sinusoidal grating rendering method, common in the literature but not yet integrated in a direct touch context, for use with vision-based pose estimation of the finger and paired it with an immersive \AR headset.
|
||||
|
||||
Our wearable visuo-haptic augmentation system enable any real surface to be augmented with a minimal setup.
|
||||
It also allows a free exploration of the textures, as if they were real (\secref[related_work]{ar_presence}), by letting the user view them from different poses and touch them with the bare finger without constraints on hand movements.
|
||||
|
||||
Reference in New Issue
Block a user