phd-thesis/2-perception/xr-perception/4-results.tex

\section{Results}
\label{results}

\subsection{Trial Measures}
\label{results_trials}

All measures from trials were analysed using linear mixed models (LMM) or generalised linear mixed models (GLMM) with \factor{Visual Rendering}, \factor{Amplitude Difference} and their interaction as within-participant factors, and by-participant random intercepts.
%
Depending on the data, different random effect structures were tested.
%
Only the best converging models are reported, with the lowest Akaike Information Criterion (AIC) values.
%
Post-hoc pairwise comparisons were performed using the Tukey's Honest Significant Difference (HSD) test.
%
Each estimate is reported with its 95\% confidence interval (CI) as follows: \ci{\textrm{lower limit}}{\textrm{upper limit}}.


\subsubsection{Discrimination Accuracy}
\label{discrimination_accuracy}

A GLMM was adjusted to the \response{Texture Choice} in the 2AFC vibrotactile texture roughness discrimination task, with by-participant random intercepts but no random slopes, and a probit link function (see \figref{results/trial_predictions}).
%
The points of subjective equality (PSEs, see \figref{results/trial_pses}) and just-noticeable differences (JNDs, see \figref{results/trial_jnds}) for each visual rendering and their respective differences were estimated from the model, along with their corresponding 95\% CI, using a non-parametric bootstrap procedure (1000 samples).
%
The PSE represents the estimated amplitude difference at which the comparison texture was perceived as rougher than the reference texture 50\% of the time. %, \ie it is the accuracy of participants in discriminating vibrotactile roughness.
%
The \level{Real} rendering had the highest PSE (\percent{7.9} \ci{1.2}{4.1}) and was statistically significantly different from the \level{Mixed} rendering (\percent{1.9} \ci{-2.4}{6.1}) and from the \level{Virtual} rendering (\percent{5.1} \ci{2.4}{7.6}).
%
The JND represents the estimated minimum amplitude difference between the comparison and reference textures that participants could perceive,
% \ie the sensitivity to vibrotactile roughness differences,
calculated at the 84th percentile of the predictions of the GLMM (\ie one standard deviation of the normal distribution)~\autocite{ernst2002humans}.
%
The \level{Real} rendering had the lowest JND (\percent{26} \ci{23}{29}), the \level{Mixed} rendering had the highest (\percent{33} \ci{30}{37}), and the \level{Virtual} rendering was in between (\percent{30} \ci{28}{32}).
%
All pairwise differences were statistically significant.

\begin{subfigs}{discrimination_accuracy}{%
        Generalized Linear Mixed Model (GLMM) results in the vibrotactile texture roughness discrimination task, with non-parametric bootstrap 95\% confidence intervals.
    }[%
        \item Percentage of trials in which the comparison texture was perceived as rougher than the reference texture, as a function of the amplitude difference between the two textures and the visual rendering.
        Curves represent predictions from the GLMM (probit link function) and points are estimated marginal means.
        \item Estimated points of subjective equality (PSE) of each visual rendering.
        %, defined as the amplitude difference at which both reference and comparison textures are perceived to be equivalent, \ie the accuracy in discriminating vibrotactile roughness.
        \item Estimated just-noticeable difference (JND) of each visual rendering.
        %, defined as the minimum perceptual amplitude difference, \ie the sensitivity to vibrotactile roughness differences.
    ]
    \subfig[0.85]{results/trial_predictions}\\
    \subfig[0.45]{results/trial_pses}
    \subfig[0.45]{results/trial_jnds}
\end{subfigs}


\subsubsection{Response Time}
\label{response_time}

A LMM analysis of variance (AOV) with by-participant random slopes for \factor{Visual Rendering}, and a log transformation (as \response{Response Time} measures were gamma distributed) indicated a statistically significant effects on \response{Response Time} of \factor{Visual Rendering} (\anova{2}{18}{6.2}, \p{0.009}, see \figref{results/trial_response_times}).
%
Participants took longer on average to respond with the \level{Virtual} rendering (\geomean{1.65}{s} \ci{1.59}{1.72}) than with the \level{Real} rendering (\geomean{1.38}{s} \ci{1.32}{1.43}), which is the only statistically significant difference (\ttest{19}{0.3}, \p{0.005}).
%
The \level{Mixed} rendering was in between (\geomean{1.56}{s} \ci{1.49}{1.63}).


\subsubsection{Finger Position and Speed}
\label{finger_position_speed}

The frames analysed were those in which the participants actively touched the comparison textures with a finger speed greater than \SI{1}{\mm\per\second}.
%
A LMM AOV with by-participant random slopes for \factor{Visual Rendering} indicated only one statistically significant effect on the total distance traveled by the finger in a trial of \factor{Visual Rendering} (\anova{2}{18}{3.9}, \p{0.04}, see \figref{results/trial_distances}).
%
On average, participants explored a larger distance with the \level{Real} rendering (\geomean{20.0}{\cm} \ci{19.4}{20.7}) than with \level{Virtual} rendering (\geomean{16.5}{\cm} \ci{15.8}{17.1}), which is the only statistically significant difference (\ttest{19}{1.2}, \p{0.03}), with the \level{Mixed} rendering (\geomean{17.4}{\cm} \ci{16.8}{18.0}) in between.
%
Another LMM AOV with by-trial and by-participant random intercepts but no random slopes indicated only one statistically significant effect on \response{Finger Speed} of \factor{Visual Rendering} (\anova{2}{2142}{2.0}, \pinf{0.001}, see \figref{results/trial_speeds}).
%
On average, the textures were explored with the highest speed with the \level{Real} rendering (\geomean{5.12}{\cm\per\second} \ci{5.08}{5.17}), the lowest with the \level{Virtual} rendering (\geomean{4.40}{\cm\per\second} \ci{4.35}{4.45}), and the \level{Mixed} rendering (\geomean{4.67}{\cm\per\second} \ci{4.63}{4.71}) in between.
%
All pairwise differences were statistically significant: \level{Real} \vs \level{Virtual} (\ttest{19}{1.17}, \pinf{0.001}), \level{Real} \vs \level{Mixed} (\ttest{19}{1.10}, \pinf{0.001}), and \level{Mixed} \vs \level{Virtual} (\ttest{19}{1.07}, \p{0.02}).
%
%This means that within the same time window on the same surface, participants explored the comparison texture on average at a greater distance and at a higher speed when in the real environment without visual representation of the hand (\level{Real} condition) than when in VR (\level{Virtual} condition).

\begin{subfigs}{results_finger}{%
        Boxplots and geometric means of response time at the end of a trial, and finger position and finger speed measures when exploring the comparison texture, with pairwise Tukey's HSD tests: * is \pinf{0.05}, ** is \pinf{0.01} and *** is \pinf{0.001}.
    }[%
        \item Response time of a trial.
        \item Distance traveled by the finger in a trial.
        \item Speed of the finger in a trial.
    ]
    \subfig[0.32]{results/trial_response_times}
    \subfig[0.32]{results/trial_distances}
    \subfig[0.32]{results/trial_speeds}
\end{subfigs}


\subsection{Questionnaires}
\label{questions}

%\figref{results/question_heatmaps} shows the median and interquartile range (IQR) ratings to the questions in \tabref{questions} and to the NASA-TLX questionnaire.
%
Friedman tests were employed to compare the ratings to the questions (see \tabref{questions}), with post-hoc Wilcoxon signed-rank tests and Holm-Bonferroni adjustment, except for the questions regarding the virtual hand that were directly compared with Wilcoxon signed-rank tests.
%
\figref{question_plots} shows these ratings for questions where statistically significant differences were found (results are shown as mean $\pm$ standard deviation):
%
\begin{itemize}
    \item \response{Hand Ownership}: participants slightly feel the virtual hand as their own with the \level{Mixed} rendering (\num{2.3 +- 1.0}) but quite with the \level{Virtual} rendering (\num{3.5 +- 0.9}, \pinf{0.001}).
    \item \response{Hand Latency}: the virtual hand was found to have a moderate latency with the \level{Mixed} rendering (\num{2.8 +- 1.2}) but a low one with the \level{Virtual} rendering (\num{1.9 +- 0.7}, \pinf{0.001}).
    \item \response{Hand Reference}: participants focused slightly more on their own hand with the \level{Mixed} rendering (\num{3.2 +- 2.0}) but slightly more on the virtual hand with the \level{Virtual} rendering (\num{5.3 +- 2.1}, \pinf{0.001}).
    \item \response{Hand Distraction}: the virtual hand was slightly distracting with the \level{Mixed} rendering (\num{2.1 +- 1.1}) but not at all with the \level{Virtual} rendering (\num{1.2 +- 0.4}, \p{0.004}).
\end{itemize}
%
Overall, participants' sense of control over the virtual hand was very high (\response{Hand Agency}, \num{4.4 +- 0.6}), felt the virtual hand was quite similar to their own hand (\response{Hand Similarity}, \num{3.5 +- 0.9}), and that the virtual environment was very realistic (\response{Virtual Realism}, \num{4.2 +- 0.7}) and very similar to the real one (\response{Virtual Similarity}, \num{4.5 +- 0.7}).
%
The textures were also overall found to be very much caused by the finger movements (\response{Texture Agency}, \num{4.5 +- 1.0}) with a very low perceived latency (\response{Texture Latency}, \num{1.6 +- 0.8}), and to be quite realistic (\response{Texture Realism}, \num{3.6 +- 0.9}) and quite plausible (\response{Texture Plausibility}, \num{3.6 +- 1.0}).
%
Participants were mixed between feeling the vibrations on the surface or on the top of their finger (\response{Vibration Location}, \num{3.9 +- 1.7}); the distribution of scores was split between the two poles of the scale with \level{Real} and \level{Mixed} renderings (42.5\% more on surface or on finger top, 15\% neutral), but there was a trend towards the top of the finger in VR renderings (65\% \vs 25\% more on surface and 10\% neutral), but this difference was not statistically significant neither.
%
The vibrations were felt a slightly weak overall (\response{Vibration Strength}, \num{4.2 +- 1.1}), and the vibrotactile device was perceived as neither distracting (\response{Device Distraction}, \num{1.2 +- 0.4}) nor uncomfortable (\response{Device Discomfort}, \num{1.3 +- 0.6}).
%
%Finally, the overall workload (mean NASA-TLX score) was low (\num{21 +- 14}), with no statistically significant differences found between the visual renderings for any of the subscales or the overall score.

%\figwide{results/question_heatmaps}{%
%
%    Heatmaps of the questionnaire responses, with the median rating and the interquartile range in parentheses on each cell.
%
%    (Left) point Likert scale questions (1=Not at all, 2=Slightly, 3=Moderately, 4=Very, 5=Extremely).
%
%    (Middle) point Likert scale questions (1=Extremely A, 2=Moderately A, 3=Slightly A, 4=Neither A nor B, 5=Slightly B, 6=Moderately B, 7=Extremely B) with A and B being the two poles of the scale.
%
%    (Right)  Load Index (NASA-TLX) questionnaire (lower values are better).
%}

\begin{subfigs}{question_plots}{%
        Boxplots of responses to questions with significant differences and pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment: * is \pinf{0.05}, ** is \pinf{0.01} and *** is \pinf{0.001}.
    }
    \subfig[0.24]{results/questions_hand_ownership}
    \subfig[0.24]{results/questions_hand_latency}
    \subfig[0.24]{results/questions_hand_reference}
    \subfig[0.24]{results/questions_hand_distraction}
\end{subfigs}