WIP xr-perception
This commit is contained in:
@@ -2,7 +2,7 @@
|
||||
%
|
||||
In this section, we describe a system for rendering vibrotactile roughness textures in real time, on any tangible surface, touched directly with the index fingertip, with no constraints on hand movement and using a simple camera to track the finger pose.
|
||||
%
|
||||
We also describe how to pair this tactile rendering with an immersive \AR or \VR headset visual display to provide a coherent, multimodal visuo-haptic augmentation of the real environment.
|
||||
We also describe how to pair this tactile rendering with an immersive \AR or \VR headset visual display to provide a coherent, multimodal visuo-haptic augmentation of the \RE.
|
||||
|
||||
\section{Principle}
|
||||
\label{principle}
|
||||
@@ -11,19 +11,19 @@ The visuo-haptic texture rendering system is based on
|
||||
%
|
||||
\begin{enumerate*}[label=(\arabic*)]
|
||||
\item a real-time interaction loop between the finger movements and a coherent visuo-haptic feedback simulating the sensation of a touched texture,
|
||||
\item a precise alignment of the virtual environment with its real counterpart, and
|
||||
\item a precise alignment of the \VE with its real counterpart, and
|
||||
\item a modulation of the signal frequency by the estimated finger speed with a phase matching.
|
||||
\end{enumerate*}
|
||||
%
|
||||
\figref{diagram} shows the interaction loop diagram and \eqref{signal} the definition of the vibrotactile signal.
|
||||
%
|
||||
The system consists of three main components: the pose estimation of the tracked real elements, the visual rendering of the virtual environment, and the vibrotactile signal generation and rendering.
|
||||
The system consists of three main components: the pose estimation of the tracked real elements, the visual rendering of the \VE, and the vibrotactile signal generation and rendering.
|
||||
|
||||
\figwide[1]{diagram}{Diagram of the visuo-haptic texture rendering system. }[
|
||||
Fiducial markers attached to the voice-coil actuator and to tangible surfaces to track are captured by a camera.
|
||||
The positions and rotations (the poses) ${}^c\mathbf{T}_i$, $i=1..n$ of the $n$ defined markers in the camera frame $\mathcal{F}_c$ are estimated, then filtered with an adaptive low-pass filter.
|
||||
%These poses are transformed to the \AR/\VR headset frame $\mathcal{F}_h$ and applied to the virtual model replicas to display them superimposed and aligned with the real environment.
|
||||
These poses are used to move and display the virtual model replicas aligned with the real environment.
|
||||
%These poses are transformed to the \AR/\VR headset frame $\mathcal{F}_h$ and applied to the virtual model replicas to display them superimposed and aligned with the \RE.
|
||||
These poses are used to move and display the virtual model replicas aligned with the \RE.
|
||||
A collision detection algorithm detects a contact of the virtual hand with the virtual textures.
|
||||
If so, the velocity of the finger marker ${}^c\dot{\mathbf{X}}_f$ is estimated using discrete derivative of position and adaptive low-pass filtering, then transformed onto the texture frame $\mathcal{F}_t$.
|
||||
The vibrotactile signal $s_k$ is generated by modulating the (scalar) finger velocity ${}^t\hat{\dot{X}}_f$ in the texture direction with the texture period $\lambda$ (\eqref{signal}).
|
||||
@@ -36,13 +36,13 @@ The system consists of three main components: the pose estimation of the tracked
|
||||
|
||||
\begin{subfigs}{setup}{Visuo-haptic texture rendering system setup. }[][
|
||||
\item HapCoil-One voice-coil actuator with a fiducial marker on top attached to a participant's right index finger.
|
||||
\item HoloLens~2 \AR headset, the two cardboard masks to switch the real or virtual environments with the same field of view, and the 3D-printed piece for attaching the masks to the headset.
|
||||
\item HoloLens~2 \AR headset, the two cardboard masks to switch the real or virtual environments with the same \FoV, and the \ThreeD-printed piece for attaching the masks to the headset.
|
||||
\item User exploring a virtual vibrotactile texture on a tangible sheet of paper.
|
||||
]
|
||||
\subfig[0.325]{device}
|
||||
\subfig[0.65]{headset}
|
||||
\par\vspace{2.5pt}
|
||||
\subfig[0.992]{apparatus}
|
||||
%\subfig[0.65]{headset}
|
||||
%\par\vspace{2.5pt}
|
||||
%\subfig[0.992]{apparatus}
|
||||
\end{subfigs}
|
||||
|
||||
A fiducial marker (AprilTag) is glued to the top of the actuator (\figref{device}) to track the finger pose with a camera (StreamCam, Logitech) which is placed above the experimental setup and capturing \qtyproduct{1280 x 720}{px} images at \qty{60}{\hertz} (\figref{apparatus}).
|
||||
@@ -63,8 +63,8 @@ The optimal filter parameters were determined using the method of \textcite{casi
|
||||
%
|
||||
The velocity (without angular velocity) of the marker, denoted as ${}^c\dot{\mathbf{X}}_i$, is estimated using the discrete derivative of the position and an other 1€ filter with the same parameters.
|
||||
|
||||
To be able to compare virtual and augmented realities, we then create a virtual environment that closely replicate the real one.
|
||||
%Before a user interacts with the system, it is necessary to design a virtual environment that will be registered with the real environment during the experiment.
|
||||
To be able to compare virtual and augmented realities, we then create a \VE that closely replicate the real one.
|
||||
%Before a user interacts with the system, it is necessary to design a virtual environment that will be registered with the \RE during the experiment.
|
||||
%
|
||||
Each real element tracked by a marker is modelled virtually, \ie the hand and the augmented tangible surface (\figref{renderings}).
|
||||
%
|
||||
@@ -72,24 +72,24 @@ In addition, the pose and size of the virtual textures are defined on the virtua
|
||||
%
|
||||
During the experiment, the system uses marker pose estimates to align the virtual models with their real-world counterparts. %, according to the condition being tested.
|
||||
%
|
||||
This allows to detect if a finger touches a virtual texture using a collision detection algorithm (Nvidia PhysX), and to show the virtual elements and textures in real-time, aligned with the real environment (\figref{renderings}), using the considered \AR or \VR headset.
|
||||
This allows to detect if a finger touches a virtual texture using a collision detection algorithm (Nvidia PhysX), and to show the virtual elements and textures in real-time, aligned with the \RE (\figref{renderings}), using the considered \AR or \VR headset.
|
||||
|
||||
In our implementation, the virtual hand and environment are designed with Unity and the Mixed Reality Toolkit (MRTK).
|
||||
%
|
||||
The visual rendering is achieved using the Microsoft HoloLens~2, an \OST-\AR headset with a \qtyproduct{43 x 29}{\degree} \FoV, a \qty{60}{\Hz} refresh rate, and self-localisation capabilities.
|
||||
%
|
||||
It was chosen over \VST-\AR because \OST-\AR only adds virtual content to the real environment, while \VST-\AR streams a real-time video capture of the real environment \cite{macedo2023occlusion}.
|
||||
It was chosen over \VST-\AR because \OST-\AR only adds virtual content to the \RE, while \VST-\AR streams a real-time video capture of the \RE \cite{macedo2023occlusion}.
|
||||
%
|
||||
Indeed, one of our objectives (\secref{experiment}) is to directly compare a virtual environment that replicates a real one, rather than a video feed that introduces many supplementary visual limitations \cite{kim2018revisiting,macedo2023occlusion}.
|
||||
Indeed, one of our objectives (\secref{experiment}) is to directly compare a \VE that replicates a real one, rather than a video feed that introduces many supplementary visual limitations \cite{kim2018revisiting,macedo2023occlusion}.
|
||||
%
|
||||
To simulate a \VR headset, a cardboard mask (with holes for sensors) is attached to the headset to block the view of the real environment (\figref{headset}).
|
||||
To simulate a \VR headset, a cardboard mask (with holes for sensors) is attached to the headset to block the view of the \RE (\figref{headset}).
|
||||
|
||||
\section{Vibrotactile Signal Generation and Rendering}
|
||||
\label{texture_generation}
|
||||
|
||||
A voice-coil actuator (HapCoil-One, Actronika) is used to display the vibrotactile signal, as it allows the frequency and amplitude of the signal to be controlled independently over time, covers a wide frequency range (\qtyrange{10}{1000}{\Hz}), and outputs the signal accurately with relatively low acceleration distortion\footnote{HapCoil-One specific characteristics are described in its data sheet: \url{https://web.archive.org/web/20240228161416/https://tactilelabs.com/wp-content/uploads/2023/11/HapCoil_One_datasheet.pdf}}.
|
||||
%
|
||||
The voice-coil actuator is encased in a 3D printed plastic shell and firmly attached to the middle phalanx of the user's index finger with a Velcro strap, to enable the fingertip to directly touch the environment (\figref{device}).
|
||||
The voice-coil actuator is encased in a \ThreeD printed plastic shell and firmly attached to the middle phalanx of the user's index finger with a Velcro strap, to enable the fingertip to directly touch the environment (\figref{device}).
|
||||
%
|
||||
The actuator is driven by a class D audio amplifier (XY-502 / TPA3116D2, Texas Instrument). %, which has proven to be an effective type of amplifier for driving moving-coil \cite{mcmahan2014dynamic}.
|
||||
%
|
||||
@@ -154,7 +154,7 @@ The tactile texture is described and rendered in this work as a one dimensional
|
||||
\section{System Latency}
|
||||
\label{latency}
|
||||
|
||||
%As shown in \figref{diagram} and described above, the system includes various haptic and visual sensors and rendering devices linked by software processes for image processing, 3D rendering and audio generation.
|
||||
%As shown in \figref{diagram} and described above, the system includes various haptic and visual sensors and rendering devices linked by software processes for image processing, \ThreeD rendering and audio generation.
|
||||
%
|
||||
Because the chosen \AR headset is a standalone device (like most current \AR/\VR headsets) and cannot directly control the sound card and haptic actuator, the image capture, pose estimation and audio signal generation steps are performed on an external computer.
|
||||
%
|
||||
@@ -166,20 +166,20 @@ The rendering system provides a user with two interaction loops between the move
|
||||
%
|
||||
Measures are shown as mean $\pm$ standard deviation (when it is known).
|
||||
%
|
||||
The end-to-end latency from finger movement to feedback is measured at \qty{36 +- 4}{\ms} in the haptic loop and \qty{43 +- 9}{\ms} in the visual loop.
|
||||
The end-to-end latency from finger movement to feedback is measured at \qty{36 \pm 4}{\ms} in the haptic loop and \qty{43 \pm 9}{\ms} in the visual loop.
|
||||
%
|
||||
Both are the result of latency in image capture \qty{16 +- 1}{\ms}, markers tracking \qty{2 +- 1}{\ms} and network communication \qty{4 +- 1}{\ms}.
|
||||
Both are the result of latency in image capture \qty{16 \pm 1}{\ms}, markers tracking \qty{2 \pm 1}{\ms} and network communication \qty{4 \pm 1}{\ms}.
|
||||
%
|
||||
The haptic loop also includes the voice-coil latency \qty{15}{\ms} (as specified by the manufacturer\footnotemark[1]), whereas the visual loop includes the latency in 3D rendering \qty{16 +- 5}{\ms} (60 frames per second) and display \qty{5}{\ms}.
|
||||
The haptic loop also includes the voice-coil latency \qty{15}{\ms} (as specified by the manufacturer\footnotemark[1]), whereas the visual loop includes the latency in \ThreeD rendering \qty{16 \pm 5}{\ms} (60 frames per second) and display \qty{5}{\ms}.
|
||||
%
|
||||
The total haptic latency is below the \qty{60}{\ms} detection threshold in vibrotactile feedback \cite{okamoto2009detectability}.
|
||||
%
|
||||
The total visual latency can be considered slightly high, yet it is typical for an \AR rendering involving vision-based tracking \cite{knorlein2009influence}.
|
||||
|
||||
The two filters also introduce a constant lag between the finger movement and the estimated position and velocity, measured at \qty{160 +- 30}{\ms}.
|
||||
The two filters also introduce a constant lag between the finger movement and the estimated position and velocity, measured at \qty{160 \pm 30}{\ms}.
|
||||
%
|
||||
With respect to the real hand position, it causes a distance error in the displayed virtual hand position, and thus a delay in the triggering of the vibrotactile signal.
|
||||
%
|
||||
This is proportional to the speed of the finger, \eg distance error is \qty{12 +- 2.3}{\mm} when the finger moves at \qty{75}{\mm\per\second}.
|
||||
This is proportional to the speed of the finger, \eg distance error is \qty{12 \pm 2.3}{\mm} when the finger moves at \qty{75}{\mm\per\second}.
|
||||
%
|
||||
%and of the vibrotactile signal frequency with respect to the finger speed.%, that is proportional to the speed of the finger.
|
||||
|
||||
Reference in New Issue
Block a user