phd-thesis/3-perception/xr-perception/4-results.tex

\section{Results}
\label{results}

\subsection{Trial Measures}
\label{results_trials}

All measures from trials were analyzed using \LMM or \GLMM with \factor{Visual Rendering}, \factor{Amplitude Difference} and their interaction as within-participant factors, and by-participant random intercepts.
Depending on the data, different random effect structures were tested.
Only the best converging models are reported, with the lowest Akaike Information Criterion (AIC) values.
Post-hoc pairwise comparisons were performed using the Tukey's \HSD test.
Each estimate is reported with its \percent{95} \CI as follows: \ci{\textrm{lower limit}}{\textrm{upper limit}}.

\subsubsection{Discrimination Accuracy}
\label{discrimination_accuracy}

A \GLMM was adjusted to the \response{Texture Choice} in the \TIFC vibrotactile texture roughness discrimination task, with by-participant random intercepts but no random slopes, and a probit link function (\figref{results/trial_predictions}).
The \PSEs (\figref{results/trial_pses}) and \JNDs (\figref{results/trial_jnds}) for each visual rendering and their respective differences were estimated from the model, along with their corresponding \percent{95} \CI, using a non-parametric bootstrap procedure (1000 samples).
The \PSE represents the estimated amplitude difference at which the comparison texture was perceived as rougher than the reference texture \percent{50} of the time, \ie it is the accuracy of participants in discriminating vibrotactile roughness.
The \level{Real} rendering had the highest \PSE (\percent{7.9} \ci{1.2}{4.1}) and was statistically significantly different from the \level{Mixed} rendering (\percent{1.9} \ci{-2.4}{6.1}) and from the \level{Virtual} rendering (\percent{5.1} \ci{2.4}{7.6}).
The \JND represents the estimated minimum amplitude difference between the comparison and reference textures that participants could perceive, \ie the sensitivity to vibrotactile roughness differences,
calculated at the 84th percentile of the predictions of the \GLMM (\ie one standard deviation of the normal distribution) \cite{ernst2002humans}.
The \level{Real} rendering had the lowest \JND (\percent{26} \ci{23}{29}), the \level{Mixed} rendering had the highest (\percent{33} \ci{30}{37}), and the \level{Virtual} rendering was in between (\percent{30} \ci{28}{32}).
All pairwise differences were statistically significant.

\fig[0.7]{results/trial_predictions}{Proportion of trials in which the comparison texture was perceived as rougher than the reference texture, as a function of the amplitude difference between the two textures and the visual rendering.}[
  Curves represent predictions from the \GLMM model (probit link function), and points are estimated marginal means with non-parametric bootstrap \percent{95} \CIs.
]

\begin{subfigs}{discrimination_accuracy}{Results of the vibrotactile texture roughness discrimination task. }[][
  \item Estimated \PSE of each visual rendering, defined as the amplitude difference at which both reference and comparison textures are perceived to be equivalent. %, \ie the accuracy in discriminating vibrotactile roughness.
  \item Estimated \JND of each visual rendering. %, defined as the minimum perceptual amplitude difference, \ie the sensitivity to vibrotactile roughness differences.
  ]
  \subfig[0.35]{results/trial_pses}
  \subfig[0.35]{results/trial_jnds}
\end{subfigs}

\subsubsection{Response Time}
\label{response_time}

A \LMM \ANOVA with by-participant random slopes for \factor{Visual Rendering}, and a log transformation (as \response{Response Time} measures were gamma distributed) indicated a statistically significant effect on \response{Response Time} of \factor{Visual Rendering} (\anova{2}{18}{6.2}, \p{0.009}, \figref{results/trial_response_times}).
Reported response times are \GM.
Participants took longer on average to respond with the \level{Virtual} rendering (\geomean{1.65}{\s} \ci{1.59}{1.72}) than with the \level{Real} rendering (\geomean{1.38}{\s} \ci{1.32}{1.43}), which is the only statistically significant difference (\ttest{19}{0.3}, \p{0.005}).
The \level{Mixed} rendering was in between (\geomean{1.56}{\s} \ci{1.49}{1.63}).

\subsubsection{Finger Position and Speed}
\label{finger_position_speed}

The frames analyzed were those in which the participants actively touched the comparison textures with a finger speed greater than \SI{1}{\mm\per\second}.

A \LMM \ANOVA with by-participant random slopes for \factor{Visual Rendering} indicated only one statistically significant effect on the total distance traveled by the finger in a trial of \factor{Visual Rendering} (\anova{2}{18}{3.9}, \p{0.04}, \figref{results/trial_distances}).
On average, participants explored a larger distance with the \level{Real} rendering (\geomean{20.0}{\cm} \ci{19.4}{20.7}) than with \level{Virtual} rendering (\geomean{16.5}{\cm} \ci{15.8}{17.1}), which is the only statistically significant difference (\ttest{19}{1.2}, \p{0.03}), with the \level{Mixed} rendering (\geomean{17.4}{\cm} \ci{16.8}{18.0}) in between.

Another \LMM \ANOVA with by-trial and by-participant random intercepts but no random slopes indicated only one statistically significant effect on \response{Finger Speed} of \factor{Visual Rendering} (\anova{2}{2142}{2.0}, \pinf{0.001}, \figref{results/trial_speeds}).
On average, the textures were explored with the highest speed with the \level{Real} rendering (\geomean{5.12}{\cm\per\second} \ci{5.08}{5.17}), the lowest with the \level{Virtual} rendering (\geomean{4.40}{\cm\per\second} \ci{4.35}{4.45}), and the \level{Mixed} rendering (\geomean{4.67}{\cm\per\second} \ci{4.63}{4.71}) in between.
All pairwise differences were statistically significant: \level{Real} \vs \level{Virtual} (\ttest{19}{1.17}, \pinf{0.001}), \level{Real} \vs \level{Mixed} (\ttest{19}{1.10}, \pinf{0.001}), and \level{Mixed} \vs \level{Virtual} (\ttest{19}{1.07}, \p{0.02}).

This means that within the same time window on the same surface, participants explored the comparison texture on average at a greater distance and at a higher speed when in the \RE without visual representation of the hand (\level{Real} condition) than when in \VR (\level{Virtual} condition).

\begin{subfigs}{results_finger}{Results of the performance metrics for the rendering condition.}[
    Boxplots and geometric means with bootstrap \percent{95} \CI, with Tukey's \HSD pairwise comparisons: * is \pinf{0.05}, ** is \pinf{0.01} and *** is \pinf{0.001}.
  ][
  \item Response time at the end of a trial.
  \item Distance travelled by the finger in a trial.
  \item Speed of the finger in a trial.
  ]
  \subfig[0.25]{results/trial_response_times}
  \subfig[0.25]{results/trial_distances}
  \subfig[0.25]{results/trial_speeds}
\end{subfigs}

\subsection{Questionnaires}
\label{results_questions}

%\figref{results/question_heatmaps} shows the median and interquartile range (IQR) ratings to the questions in \tabref{questions} and to the NASA-TLX questionnaire.
%
Friedman tests were employed to compare the ratings to the questions (\tabref{questions1} and \tabref{questions2}), with post-hoc Wilcoxon signed-rank tests and Holm-Bonferroni adjustment, except for the questions regarding the virtual hand that were directly compared with Wilcoxon signed-rank tests.

\figref{results_questions} shows these ratings for questions where statistically significant differences were found (results are shown as mean $\pm$ standard deviation):
\begin{itemize}
  \item \response{Hand Ownership}: participants slightly feel the virtual hand as their own with the \level{Mixed} rendering (\num{2.3 \pm 1.0}) but quite with the \level{Virtual} rendering (\num{3.5 \pm 0.9}, \pinf{0.001}).
  \item \response{Hand Latency}: the virtual hand was found to have a moderate latency with the \level{Mixed} rendering (\num{2.8 \pm 1.2}) but a low one with the \level{Virtual} rendering (\num{1.9 \pm 0.7}, \pinf{0.001}).
  \item \response{Hand Reference}: participants focused slightly more on their own hand with the \level{Mixed} rendering (\num{3.2 \pm 2.0}) but slightly more on the virtual hand with the \level{Virtual} rendering (\num{5.3 \pm 2.1}, \pinf{0.001}).
  \item \response{Hand Distraction}: the virtual hand was slightly distracting with the \level{Mixed} rendering (\num{2.1 \pm 1.1}) but not at all with the \level{Virtual} rendering (\num{1.2 \pm 0.4}, \p{0.004}).
\end{itemize}

\begin{subfigs}{results_questions}{Boxplots of the questionnaire results for the virtual hand renderings.}[
    Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment: * is \pinf{0.05}, ** is \pinf{0.01} and *** is \pinf{0.001}.
  ][
  \item Hand ownership.
  \item Hand latency.
  \item Hand reference.
  \item Hand distraction.
  ]
  \subfig[0.18]{results/questions_hand_ownership}
  \subfig[0.18]{results/questions_hand_latency}
  \subfig[0.18]{results/questions_hand_reference}
  \subfig[0.18]{results/questions_hand_distraction}
\end{subfigs}

Overall, participants' sense of control over the virtual hand was very high (\response{Hand Agency}, \num{4.4 \pm 0.6}), felt the virtual hand was quite similar to their own hand (\response{Hand Similarity}, \num{3.5 \pm 0.9}), and that the \VE was very realistic (\response{Virtual Realism}, \num{4.2 \pm 0.7}) and very similar to the real one (\response{Virtual Similarity}, \num{4.5 \pm 0.7}).
The overall workload (mean NASA-TLX score) was low (\num{21 \pm 14}), with no statistically significant differences found between the visual renderings for any of the subscales or the overall score.

The textures were also overall found to be very much caused by the finger movements (\response{Texture Agency}, \num{4.5 \pm 1.0}) with a very low perceived latency (\response{Texture Latency}, \num{1.6 \pm 0.8}), and to be quite realistic (\response{Texture Realism}, \num{3.6 \pm 0.9}) and quite plausible (\response{Texture Plausibility}, \num{3.6 \pm 1.0}).
The vibrations were felt a slightly weak overall (\response{Vibration Strength}, \num{4.2 \pm 1.1}), and the vibrotactile device was perceived as neither distracting (\response{Device Distraction}, \num{1.2 \pm 0.4}) nor uncomfortable (\response{Device Discomfort}, \num{1.3 \pm 0.6}).

Participants were mixed between feeling the vibrations on the surface or on the top of their finger (\response{Vibration Location}, \num{3.9 \pm 1.7}); the distribution of scores was split between the two poles of the scale with \level{Real} and \level{Mixed} renderings (\percent{42.5} more on surface or on finger top, \percent{15} neutral), but there was a trend towards the top of the finger in VR renderings (\percent{65} \vs \percent{25} more on surface and \percent{10} neutral), but this difference was not statistically significant neither.

\begin{tab}{questions2}
  {NASA-TLX questions asked to participants after each \factor{Visual Rendering} block of trials.}
  [
    Questions were bipolar 100-points scales (0~=~Very Low and 100~=~Very High, except for Performance where 0~=~Perfect and 100~=~Failure), with increments of 5.
    %Participants were shown only the labels for all questions.
  ]
  \begin{tabularx}{\linewidth}{l X}
    \toprule
    \textbf{Code}        & \textbf{Question}                                                                                                                         \\
    \midrule
    Mental Demand        & How mentally demanding was the task?                                                                                      \\
    Temporal Demand      & How hurried or rushed was the pace of the task?                                                                                                           \\
    Physical Demand      & How physically demanding was the task?                                                                                                                    \\
    Performance          & How successful were you in accomplishing what you were asked to do?                                                                                       \\
    Effort               & How hard did you have to work to accomplish your level of performance?                                                                                    \\
    Frustration          & How insecure, discouraged, irritated, stressed, and annoyed were you?                                                                                     \\
    \bottomrule
  \end{tabularx}
\end{tab}

%\figwide{results/question_heatmaps}{%
%
%    Heatmaps of the questionnaire responses, with the median rating and the interquartile range in brackets on each cell.
%
%    (Left) point Likert scale questions (1=Not at all, 2=Slightly, 3=Moderately, 4=Very, 5=Extremely).
%
%    (Middle) point Likert scale questions (1=Extremely A, 2=Moderately A, 3=Slightly A, 4=Neither A nor B, 5=Slightly B, 6=Moderately B, 7=Extremely B) with A and B being the two poles of the scale.
%
%    (Right)  Load Index (NASA-TLX) questionnaire (lower values are better).
%}