phd-thesis/3-perception/vhar-textures/3-results.tex

\section{Results}
\label{results}

\subsection{Textures Matching}
\label{results_matching}

\paragraph{Confusion Matrix}
\label{results_matching_confusion_matrix}

\comans{JG}{For the two-sample Chi-Squared tests in the matching task, the number of samples reported is 540 due to 20 participants conducting 3 trials for 9 textures each. However, this would only hold true if the repetitions per participant would be independent and not correlated (and then, one could theoretically also run 10 participants with 6 trials each, or 5 participants with 12 trials each). If they are not independent, this would lead to an artificial inflated sample size and Type I error. If the trials are not independent (please double check), I suggest either aggregating data on the participant level or to use alternative models that account for the within-subject correlation (as was done in other chapters).}{Data of the three confusion matrices have been aggregated on the participant level and analyzed using a Poisson regression.}
\figref{results/matching_confusion_matrix} shows the confusion matrix of the \level{Matching} task with the visual textures and the proportion of haptic texture selected in response, \ie the proportion of times the corresponding \response{Haptic Texture} was selected in response to the presentation of the corresponding \factor{Visual Texture}.
To determine which haptic textures were selected most often, the repetitions of the trials were first aggregated by counting the number of selections per participant for each (\factor{Visual Texture}, \response{Haptic Texture}) pair.
An \ANOVA based on a Poisson regression (no overdispersion was detected) indicated a statistically significant effect on the number of selections of the interaction \factor{Visual Texture} \x \response{Haptic Texture} (\chisqr{64}{180}{414}, \pinf{0.001}).
Post-hoc pairwise comparisons using the Tukey's \HSD test then indicated there was statistically significant differences for the following visual textures:
\begin{itemize}
  \item With \level{Sandpaper~320}, \level{Coffee Filter} was more selected than the other haptic textures (\ztest{3.4}, \pinf{0.05} each) except \level{Plastic Mesh~1} and \level{Terra Cotta}.
  \item With \level{Terra Cotta}, \level{Coffee Filter} was more selected than the others (\ztest{3.4}, \pinf{0.05} each) except \level{Plastic Mesh~1} and \level{Terra Cotta}.
  \item With \level{Coffee Filter}, \level{Coffee Filter} was more selected than the others (\ztest{4.0}, \pinf{0.01} each) except \level{Terra Cotta}.
\end{itemize}

\fig[0.85]{results/matching_confusion_matrix}{Confusion matrix of the \level{Matching} task results.}[
  With the presented visual textures as columns and the selected haptic texture in proportion as rows.
  The number in a cell is the proportion of times the corresponding haptic texture was selected in response to the presentation of the corresponding visual texture.
  The diagonal represents the expected correct answers.
]

Many mistakes were made: the expected haptic texture was selected on average only \percent{20} of the time for five of the visual textures, and even around \percent{5} for (visual) \level{Sandpaper~100}, \level{Brick~2}, and \level{Sandpaper~320}.
Only haptic \level{Coffee Filter} was correctly selected \percent{57} of the time, and was also particularly matched with the visual \level{Sandpaper~320} and \level{Terra Cotta} (around \percent{44} each).
Similarly, the haptic textures \level{Sandpaper~320} and \level{Plastic Mesh~1} were also selected for four and three visual textures, respectively (around \percent{25} each).
Additionally, the Spearman correlations between the trials were computed for each participant and only 21 out of 60 were statistically significant (\pinf{0.05}), with a mean \spearman{0.52} \ci{0.43}{0.59}.

These results indicate that the participants hesitated between several haptic textures for a given visual texture, as also reported in several comments, some haptic textures being more favored while some others were almost not selected at all.
Another explanation could be that the participants had difficulties to estimate the roughness of the visual textures.
Indeed, many participants explained that they tried to identify or imagine the roughness of a given visual texture then to select the most plausible haptic texture, in terms of frequency and/or amplitude of vibrations.

\paragraph{Completion Time}

To verify that the difficulty with all the visual textures was the same on the \level{Matching} task, the \response{Completion Time} of a trial was analyzed.
As the \response{Completion Time} results were Gamma distributed, they were transformed with a log to approximate a normal distribution.
An \ANOVA based on a \LMM on the log \response{Completion Time} with the \factor{Visual Texture} as fixed effect and the participant as random intercept was performed.
Normality was verified with a QQ-plot of the model residuals.
No statistical significant effect of \factor{Visual Texture} was found (\anova{8}{512}{1.9}, \p{0.06}) on \response{Completion Time} (\geomean{44}{\s} \ci{42}{46}), indicating an equal difficulty and participant behaviour for all the visual textures.

\subsection{Textures Ranking}
\label{results_ranking}

\figref{results/rankings_modality} presents the results of the three rankings of the haptic textures alone, the visual textures alone, and the visuo-haptic texture pairs.
For each ranking, a Friedman test was performed with post-hoc Wilcoxon signed-rank tests and Holm-Bonferroni adjustment.

\fig[1]{results/rankings_modality}{Means with bootstrap \percent{95} \CI of the \level{Ranking} task results for each \factor{Modality}.}[
  Shown for the haptic textures alone (left), the visual textures alone (center) and the visuo-haptic textures pairs (right).
  The order of the visual textures on the x-axis differs between modalities.
  A lower rank means that the texture was considered rougher, a higher rank means smoother.
  Wilcoxon signed-rank tests and Holm-Bonferroni adjustment: all comparisons were statistically significantly different (\pinf{0.05}) except when marked with an \enquote{X}.
]

\paragraph{Haptic Textures Ranking}

Almost all the texture pairs in the haptic textures ranking results were statistically significantly different (\chisqr{8}{20}{146}, \pinf{0.001}; \pinf{0.05} for each comparison; see \figref{results/rankings_modality}, left).
However, no difference was found between the pairs (\level{Metal Mesh}, \level{Sandpaper~100}), (\level{Cork}, \level{Brick~2}), (\level{Cork}, \level{Sandpaper~320}), (\level{Plastic Mesh~1}, \level{Velcro Hooks}), and (\level{Plastic Mesh~1}, \level{Terra Cotta}).
Average Kendall's Tau correlations between the participants indicated a high consensus (\kendall{0.82} \ci{0.81}{0.84}) showing that participants perceived similarly the roughness of the haptic textures.

\paragraph{Visual Textures Ranking}

Most of the texture pairs in the visual textures ranking results were also statistically significantly different (\chisqr{8}{20}{119}, \pinf{0.001}; \pinf{0.05} for each comparison; see \figref{results/rankings_modality}, center), except for the following.
No difference was found between \level{Plastic Mesh~1} and \level{Metal Mesh}, \level{Brick 2}, \level{Sandpaper 100}, \level{Cork}, \level{Velcro Hooks};
nor between \level{Velcro Hooks} and \level{Sandpaper 100}, \level{Cork}, \level{Brick 2}.
No difference was also found between the pairs (\level{Metal Mesh}, \level{Cork}), (\level{Sandpaper~100}, \level{Brick~2}), (\level{Sandpaper~320}, \level{Terra Cotta}) and (\level{Sandpaper~320}, \level{Coffee Filter}).
Even though the consensus was high (\kendall{0.61} \ci{0.58}{0.64}), the roughness of the visual textures were more difficult to estimate, in particular for \level{Plastic Mesh~1} and \level{Velcro Hooks}.

\paragraph{Visuo-Haptic Textures Ranking}

Also, almost all the texture pairs in the visuo-haptic textures ranking results were statistically significantly different (\chisqr{8}{20}{140}, \pinf{0.001}; \pinf{0.05} for each comparison; see \figref{results/rankings_modality}, right).
However, no difference was found between the textures for each of the following groups: \{\level{Sandpaper~100}, \level{Cork}\}; \{\level{Cork}, \level{Brick~2}\}; and \{\level{Plastic Mesh~1}, \level{Velcro Hooks}, \level{Sandpaper~320}\}.
The consensus between the participants was also high \kendall{0.77} \ci{0.74}{0.79}.

Finally, the similarity of the three rankings of each participant was calculated (\figref{results/rankings_texture}).
The \textit{Visuo-Haptic Textures Ranking} was on average highly similar to the \textit{Haptic Textures Ranking} (\kendall{0.79} \ci{0.72}{0.86}) and moderately to the \textit{Visual Textures Ranking} (\kendall{0.48} \ci{0.39}{0.56}).
A Wilcoxon signed-rank test indicated that this difference was statistically significant (\wilcoxon{190}, \p{0.002}).
These results indicate that the two haptic and visual modalities were integrated together, the resulting roughness ranking being between the two rankings of the modalities alone, but with haptics predominating.

\fig[1]{results/rankings_texture}{Means with bootstrap \percent{95} \CI of the \level{Ranking} task results for each \factor{Visual Texture}.}[
  A lower rank means that the texture was considered rougher, a higher rank means smoother.
]

\subsection{Perceived Similarity of Visual and Haptic Textures}
\label{results_clusters}

The high level of agreement between participants on the three haptic, visual and visuo-haptic rankings in the \level{Ranking} task (\secref{results_ranking}), as well as the similarity of the within-participant rankings, suggest that participants perceived the roughness of the textures similarly, but differed in their strategies for matching the haptic and visual textures in the \level{Matching} task (\secref{results_matching}).

To further investigate the perceived similarity of the haptic and visual textures, and to identify groups of textures that were perceived as similar on the \level{Matching} task, a correspondence analysis and a hierarchical clustering were performed on the matching task confusion matrix (\figref{results/matching_confusion_matrix}).

\paragraph{Correspondence Analysis}

The correspondence analysis captured \percent{60} and \percent{29} of the variance in the first and second dimensions, respectively, with the remaining dimensions each accounting for less than \percent{5} each.
\figref{results/matching_correspondence_analysis} shows the first two dimensions with the 18 haptic and visual textures.
The first dimension was similar to the rankings (\figref{results/rankings_texture}), distributing the textures according to their perceived roughness.
It seems that the second dimension opposed textures that were perceived as hard with those perceived as softer, as also reported by participants.
Stiffness is indeed an important perceptual dimension of a material (\secref[related_work]{hardness}).

\fig[1]{results/matching_correspondence_analysis}{Correspondence analysis of the confusion matrix of the \level{Matching} task.}[
  The closer the haptic and visual textures are, the more similar they were judged.
  The first dimension (horizontal axis) explains \percent{60} of the variance, the second dimension (vertical axis) explains \percent{29} of the variance.
  The confusion matrix is shown in \figref{results/matching_confusion_matrix}.
]

\paragraph{Hierarchical Clustering}

\figref{results_clusters} shows the dendrograms of the two hierarchical clusterings of the haptic and visual textures, constructed using the Euclidean distance and the Ward's method on squared distance.

The four identified haptic texture clusters were: \enquote{Roughest} \{\level{Metal Mesh}, \level{Sandpaper~100}, \level{Brick~2}, \level{Cork}\}; \enquote{Rougher} \{\level{Sandpaper~320}, \level{Velcro Hooks}\}; \enquote{Smoother} \{\level{Plastic Mesh~1}, \level{Terra Cotta}\}; \enquote{Smoothest} \{\level{Coffee Filter}\} (\figref{results/clusters_haptic}).
Similar to the haptic ranks (\figref{results/rankings_modality}, left), the clusters could have been named according to their perceived roughness.
It also shows that the participants compared and ranked the haptic textures during the \level{Matching} task to select the one that best matched the given visual texture.

The five identified visual texture clusters were: \enquote{Roughest} \{\level{Metal Mesh}\}; \enquote{Rougher} \{\level{Sandpaper~100}, \level{Brick~2}, \level{Velcro Hooks}\}; \enquote{Medium} \{\level{Cork}, \level{Plastic Mesh~1}\}; \enquote{Smoother} \{\level{Sandpaper~320}, \level{Terra Cotta}\}; \enquote{Smoothest} \{\level{Coffee Filter}\} (\figref{results/clusters_visual}).
They are also easily identifiable on the visual ranking results, which also made it possible to name them.

\begin{subfigs}{results_clusters}{Dendrograms of the hierarchical clusterings of the confusion matrix of the \level{Matching} task.}[
    Done with the Euclidean distance and the Ward's method on squared distance.
    The height of the dendrograms represents the distance between the clusters.
  ][
  \item For the haptic textures.
  \item For the visual textures.
  ]
  \subfig[0.48]{results/clusters_haptic}
  \subfig[0.48]{results/clusters_visual}
\end{subfigs}

\paragraph{Confusion Matrices of Clusters}

Based on these results, two alternative confusion matrices were constructed.
Similarly to \secref{results_matching}, an \ANOVA based on a Poisson regression was performed for each confusion matrix on the number of selections, followed by post-hoc pairwise comparisons using the Tukey's \HSD test. No overdispersion was detected on the Poisson regressions.

\figref{results/haptic_visual_clusters_confusion_matrices} (left) shows the confusion matrix of the \level{Matching} task with visual texture clusters and the proportion of haptic texture clusters selected in response.
There was a statistically significant effect on the number of selections of the interaction visual texture cluster \x haptic texture cluster (\chisqr{12}{180}{324}, \pinf{0.001}), and statistically significant differences for the following visual clusters:
\begin{itemize}
  \item With \enquote{Roughest}, the haptic cluster \enquote{Roughest} was the most selected (\ztest{4.6}, \pinf{0.001}).
  \item With \enquote{Rougher}, \enquote{Smoothest} was the least selected (\ztest{-4.0}, \pinf{0.001}) and \enquote{Rougher} more than \enquote{Smoother} (\ztest{-3.4}, \pinf{0.001}).
  \item With \enquote{Medium}, \enquote{Rougher} and \enquote{Smoother} were both (\ztest{4.5}, \pinf{0.001}) more selected than \enquote{Roughest} and \enquote{Smoothest}.
  \item With \enquote{Smoother}, \enquote{Smoother} (\ztest{4.2}, \pinf{0.001}) and \enquote{Smoothest} (\ztest{4.7}, \pinf{0.001}) were both more selected than \enquote{Roughest} and \enquote{Rougher}.
  \item With \enquote{Smoothest}, \enquote{Smoother} (\ztest{2.6}, \p{0.05}) and \enquote{Smoothest} (\ztest{3.9}, \pinf{0.001}) were both more selected than \enquote{Roughest} and \enquote{Rougher}.
\end{itemize}

\figref{results/haptic_visual_clusters_confusion_matrices} (right) shows the confusion matrix of the \level{Matching} task with visual texture ranks and the proportion of haptic texture clusters selected in response.
There was a statistically significant effect on the number of selections of the visual texture rank \x haptic texture cluster interaction (\chisqr{24}{180}{340}, \pinf{0.001}), and statistically significant differences for the following visual texture ranks:
\begin{itemize}
  \item Rank 0: the haptic cluster \enquote{Roughest} was the most selected (\ztest{4.5}, \pinf{0.001}).
  \item Ranks 1, 2 and 3: \enquote{Smoothest} was the least selected (\ztest{-3.0}, \p{0.04}).
  \item Rank 4: \enquote{Rougher} was more selected than \enquote{Roughest} and \enquote{Smoothest} (\ztest{3.0}, \p{0.03}).
  \item Rank 5: \enquote{Rougher} and \enquote{Smoother} were both (\ztest{4.5}, \p{0.01}) more selected than \enquote{Roughest} and \enquote{Smoothest}.
  \item Rank 6: \enquote{Smoother} was more selected than \enquote{Roughest} (\ztest{3.2}, \p{0.006}).
  \item Rank 7: \enquote{Smoother} and \enquote{Smoothest} were both (\ztest{3.4}, \p{0.04}) more selected than \enquote{Roughest} and \enquote{Rougher}.
  \item Rank 7: \enquote{Smoother} and \enquote{Smoothest} were both (\ztest{3.2}, \p{0.04}) more selected than \enquote{Roughest} and \enquote{Rougher}.
\end{itemize}

\fig{results/haptic_visual_clusters_confusion_matrices}{
  Confusion matrices of the visual texture (left) or rank (right) with the corresponding haptic texture clusters selected in proportion.
}[]

\subsection{Questionnaire}
\label{results_questions}

\figref{results_questions} presents the questionnaire results of the \level{Matching} and \level{Ranking} tasks.
A non-parametric \ANOVA on \ART models were used for the \response{Difficulty} and \response{Realism} question results.
The other question results were analyzed using Wilcoxon signed-rank tests, with Holm-Bonferroni adjustment.
The results are shown as mean $\pm$ standard deviation.

On \response{Difficulty}, there were statistically significant effects of \factor{Task} (\anova{1}{57}{13}, \pinf{0.001}) and of \factor{Modality} (\anova{1}{57}{8}, \p{0.007}), but no interaction effect \factor{Task} \x \factor{Modality} (\anova{1}{57}{2}, \ns).
The \level{Ranking} task was found easier (\num{2.9 \pm 1.2}) than the \level{Matching} task (\num{3.9 \pm 1.5}), and the Haptic textures were found easier to discriminate (\num{3.0 \pm 1.3}) than the Visual ones (\num{3.8 \pm 1.5}).

Both haptic and visual textures were judged moderately realistic for both tasks (\num{4.2 \pm 1.3}), with no statistically significant effect of \factor{Task}, \factor{Modality} or their interaction on \response{Realism}.
No statistically significant effects of \factor{Task} on \response{Textures Match} and \response{Uncomfort} were found either.
The coherence of the texture pairs was considered moderate (\num{4.6 \pm 1.2}) and the haptic device was not felt uncomfortable (\num{2.4 \pm 1.4}).

\begin{subfigs}{results_questions}{Boxplots of the questionnaire results for each visual hand rendering.}[
    Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment: * is \pinf{0.05}, ** is \pinf{0.01} and *** is \pinf{0.001}.
    Lower is better for Difficulty and Uncomfortable; higher is better for Realism and Textures Match.
  ][
  \item By \factor{Modality}.
  \item By \factor{Task}.
  ]
  \subfigsheight{70mm}
  \subfig{results/questions_modalities}
  \subfig{results/questions_tasks}
\end{subfigs}