phd-thesis/3-perception/vhar-textures/3-results.tex

\section{Results}
\label{results}

\subsection{Textures Matching}
\label{results_matching}

\paragraph{Confusion Matrix}
\label{results_matching_confusion_matrix}

\figref{results/matching_confusion_matrix} shows the confusion matrix of the \level{Matching} task with the visual textures and the proportion of haptic texture selected in response, \ie the proportion of times the corresponding haptic texture was selected in response to the presentation of the corresponding visual texture.
A two-sample Pearson Chi-Squared test (\chisqr{64}{540}{420}, \pinf{0.001}) and Holm-Bonferroni adjusted binomial tests indicated that the following (\factor{Visual Texture}, \response{Haptic Texture}) pairs have proportion selections statistically significantly higher than chance (\ie \percent{11} each):
\begin{itemize}
  \item (\level{Sandpaper~320}, \level{Coffee Filter}), (\level{Terra Cotta}, \level{Coffee Filter}), and (\level{Coffee Filter}, \level{Coffee Filter}) (\pinf{0.001} each);
  \item (\level{Cork}, \level{Sandpaper~320}), (\level{Brick~2}, \level{Plastic Mesh~1}), (\level{Brick~2}, \level{Sandpaper~320}), (\level{Plastic Mesh~1}, \level{Sandpaper~320}), and (\level{Sandpaper~320}, \level{Plastic Mesh~1}) (\pinf{0.01}); and
  \item (\level{Metal Mesh}, \level{Cork}), (\level{Cork}, \level{Velcro Hooks}), (\level{Velcro Hooks}, \level{Plastic Mesh~1}), (\level{Velcro Hooks}, \level{Sandpaper~320}), and (\level{Coffee Filter}, \level{Terra Cotta}) (\pinf{0.05} each).
\end{itemize}

Except for one visual texture (\level{Sandpaper~100}) and 4 haptic textures (\level{Metal Mesh}, \level{Sandpaper~100}, \level{Brick~2}, and \level{Terra Cotta}), all haptic and visual textures were matched statistically significantly higher than chance with at least one visual and haptic texture, respectively.
However, many mistakes were made: the expected haptic texture was selected on average only \percent{20} of the time for five of the visual textures, and even around \percent{5} for (visual) \level{Sandpaper~100}, \level{Brick~2}, and \level{Sandpaper~320}.
Only haptic \level{Coffee Filter} was correctly selected \percent{59} of the time, and was also particularly matched with the visual \level{Sandpaper~320} and \level{Terra Cotta} (around \percent{45} each).
Similarly, the haptic textures \level{Sandpaper~320} and \level{Plastic Mesh~1} were also selected for four and three visual textures, respectively (around \percent{25} each).
Additionally, the Spearman correlations between the trials were computed for each participant and only 21 out of 60 were statistically significant (\pinf{0.05}), with a mean \spearman{0.52} (\ci{0.43}{0.59}).

\fig[0.82]{results/matching_confusion_matrix}{Confusion matrix of the \level{Matching} task.}[
  With the presented visual textures as columns and the selected haptic texture in proportion as rows.
  The number in a cell is the proportion of times the corresponding haptic texture was selected in response to the presentation of the corresponding visual texture.
  The diagonal represents the expected correct answers.
  Holm-Bonferroni adjusted binomial test results are marked in bold when the proportion is higher than chance (\ie more than \percent{11}, \pinf{0.05}).
]

These results indicate that the participants hesitated between several haptic textures for a given visual texture, as also reported in several comments, some haptic textures being more favored while some others were almost not selected at all.
Another explanation could be that the participants had difficulties to estimate the roughness of the visual textures.
Indeed, many participants explained that they tried to identify or imagine the roughness of a given visual texture then to select the most plausible haptic texture, in terms of frequency and/or amplitude of vibrations.

\paragraph{Completion Time}

To verify that the difficulty with all the visual textures was the same on the \level{Matching} task, the \response{Completion Time} of a trial was analyzed.
As the \response{Completion Time} results were Gamma distributed, they were transformed with a log to approximate a normal distribution.
A \LMM on the log \response{Completion Time} with the \factor{Visual Texture} as fixed effect and the participant as random intercept was performed.
Normality was verified with a QQ-plot of the model residuals.
No statistical significant effect of \factor{Visual Texture} was found (\anova{8}{512}{1.9}, \p{0.06}) on \response{Completion Time} (\geomean{44}{\s} \ci{42}{46}), indicating an equal difficulty and participant behaviour for all the visual textures.

\subsection{Textures Ranking}
\label{results_ranking}

\figref{results/ranking_mean_ci} presents the results of the three rankings of the haptic textures alone, the visual textures alone, and the visuo-haptic texture pairs.
For each ranking, a Friedman test was performed with post-hoc Wilcoxon signed-rank tests and Holm-Bonferroni adjustment.

\paragraph{Haptic Textures Ranking}

Almost all the texture pairs in the haptic textures ranking results were statistically significantly different (\chisqr{8}{20}{146}, \pinf{0.001}; \pinf{0.05} for each comparison), except between (\level{Metal Mesh}, \level{Sandpaper~100}), (\level{Cork}, \level{Brick~2}), (\level{Cork}, \level{Sandpaper~320}) (\level{Plastic Mesh~1}, \level{Velcro Hooks}), and (\level{Plastic Mesh~1}, \level{Terra Cotta}).
Average Kendall's Tau correlations between the participants indicated a high consensus (\kendall{0.82} \ci{0.81}{0.84}) showing that participants perceived similarly the roughness of the haptic textures.

\paragraph{Visual Textures Ranking}

Most of the texture pairs in the visual textures ranking results were also statistically significantly different (\chisqr{8}{20}{119}, \pinf{0.001}; \pinf{0.05} for each comparison), except for the following groups: \{\level{Metal Mesh}, \level{Cork}, \level{Plastic Mesh~1}\}; \{\level{Sandpaper~100}, \level{Brick~2}, \level{Plastic Mesh~1}, \level{Velcro Hooks}\}; \{\level{Cork}, \level{Velcro Hooks}\}; \{\level{Sandpaper~320}, \level{Terra Cotta}\}; and \{\level{Sandpaper~320}, \level{Coffee Filter}\}.
Even though the consensus was high (\kendall{0.61} \ci{0.58}{0.64}), the roughness of the visual textures were more difficult to estimate, in particular for \level{Plastic Mesh~1} and \level{Velcro Hooks}.

\paragraph{Visuo-Haptic Textures Ranking}

Also, almost all the texture pairs in the visuo-haptic textures ranking results were statistically significantly different (\chisqr{8}{20}{140}, \pinf{0.001}; \pinf{0.05} for each comparison), except for the following groups: \{\level{Sandpaper~100}, \level{Cork}\}; \{\level{Cork}, \level{Brick~2}\}; and \{\level{Plastic Mesh~1}, \level{Velcro Hooks}, \level{Sandpaper~320}\}.
The consensus between the participants was also high \kendall{0.77} \ci{0.74}{0.79}.
Finally, calculating the similarity of the three rankings of each participant, the \textit{Visuo-Haptic Textures Ranking} was on average highly similar to the \textit{Haptic Textures Ranking} (\kendall{0.79} \ci{0.72}{0.86}) and moderately to the \textit{Visual Textures Ranking} (\kendall{0.48} \ci{0.39}{0.56}).
A Wilcoxon signed-rank test indicated that this difference was statistically significant (\wilcoxon{190}, \p{0.002}).
These results indicate that the two haptic and visual modalities were integrated together, the resulting roughness ranking being between the two rankings of the modalities alone, but with haptics predominating.

\fig[0.7]{results/ranking_mean_ci}{Means with bootstrap \percent{95} \CI of the three rankings of the haptic textures alone, the visual textures alone, and the visuo-haptic texture pairs. }[
  A lower rank means that the texture was considered rougher, a higher rank means smoother.
]

\subsection{Perceived Similarity of Visual and Haptic Textures}
\label{results_clusters}

The high level of agreement between participants on the three haptic, visual and visuo-haptic rankings in the \level{Ranking} task (\secref{results_ranking}), as well as the similarity of the within-participant rankings, suggest that participants perceived the roughness of the textures similarly, but differed in their strategies for matching the haptic and visual textures in the \level{Matching} task (\secref{results_matching}).

To further investigate the perceived similarity of the haptic and visual textures and to identify groups of textures that were perceived as similar on the \level{Matching} task, a correspondence analysis and a hierarchical clustering were performed on the matching task confusion matrix (\figref{results/matching_confusion_matrix}).

\paragraph{Correspondence Analysis}

The correspondence analysis captured \percent{60} and \percent{29} of the variance in the first and second dimensions, respectively, with the remaining dimensions each accounting for less than \percent{5} each.
\figref{results/matching_correspondence_analysis} shows the first two dimensions with the 18 haptic and visual textures.
The first dimension was similar to the rankings (\figref{results/ranking_mean_ci}), distributing the textures according to their perceived roughness.
It seems that the second dimension opposed textures that were perceived as hard with those perceived as softer, as also reported by participants.
Stiffness is indeed an important perceptual dimension of a material (\secref[related_work]{hardness}).% \cite{okamoto2013psychophysical,culbertson2014modeling}.

\fig[0.6]{results/matching_correspondence_analysis}{
  Correspondence analysis of the confusion matrix of the \level{Matching} task.
}[
  The closer the haptic and visual textures are, the more similar they were judged. %
  The first dimension (horizontal axis) explains \percent{60} of the variance, the second dimension (vertical axis) explains \percent{30} of the variance.
  The confusion matrix is \figref{results/matching_confusion_matrix}.
]

\paragraph{Hierarchical Clustering}

\figref{results_clusters} shows the dendrograms of the two hierarchical clusterings of the haptic and visual textures, constructed using the Euclidean distance and the Ward's method on squared distance.

The four identified haptic texture clusters were: "Roughest" \{\level{Metal Mesh}, \level{Sandpaper~100}, \level{Brick~2}, \level{Cork}\}; "Rougher" \{\level{Sandpaper~320}, \level{Velcro Hooks}\}; "Smoother" \{\level{Plastic Mesh~1}, \level{Terra Cotta}\}; "Smoothest" \{\level{Coffee Filter}\} (\figref{results/clusters_haptic}).
Similar to the haptic ranks (\figref{results/ranking_mean_ci}), the clusters could have been named according to their perceived roughness.
It also shows that the participants compared and ranked the haptic textures during the \level{Matching} task to select the one that best matched the given visual texture.

The five identified visual texture clusters were: "Roughest" \{\level{Metal Mesh}\}; "Rougher" \{\level{Sandpaper~100}, \level{Brick~2}, \level{Velcro Hooks}\}; "Medium" \{\level{Cork}, \level{Plastic Mesh~1}\}; "Smoother" \{\level{Sandpaper~320}, \level{Terra Cotta}\}; "Smoothest" \{\level{Coffee Filter}\} (\figref{results/clusters_visual}).
They are also easily identifiable on the visual ranking results, which also made it possible to name them.

\begin{subfigs}{results_clusters}{Dendrograms of the hierarchical clusterings of the \level{Matching} task confusion matrix.}[
    Done with the Euclidean distance and the Ward's method on squared distance.
    The height of the dendrograms represents the distance between the clusters.
  ][%
  \item For the haptic textures.
  \item For the visual textures.
  ]
  \subfig[0.45]{results/clusters_haptic}
  \subfig[0.45]{results/clusters_visual}
\end{subfigs}

\paragraph{Confusion Matrices of Clusters}

Based on these results, two alternative confusion matrices were constructed.

\figref{results/haptic_visual_clusters_confusion_matrices} (left) shows the confusion matrix of the \level{Matching} task with visual texture clusters and the proportion of haptic texture clusters selected in response.
A two-sample Pearson Chi-Squared test (\chisqr{16}{540}{353}, \pinf{0.001}) and Holm-Bonferroni adjusted binomial tests indicated that the following (Visual Cluster, Haptic Cluster) pairs have proportion selections statistically significantly higher than chance (\ie \percent{20} each): %
(Roughest, Roughest), (Rougher, Rougher), (Medium, Rougher), (Medium, Smoother), (Smoother, Smoother), (Smoother, Smoothest), and (Smoothest, Smoothest) (\pinf{0.005} each).

\figref{results/haptic_visual_clusters_confusion_matrices} (right) shows the confusion matrix of the \level{Matching} task with visual texture ranks and the proportion of haptic texture clusters selected in response.
A two-sample Pearson Chi-Squared test (\chisqr{24}{540}{342}, \pinf{0.001}) and Holm-Bonferroni adjusted binomial tests indicated that the following (Visual Texture Rank, Haptic Cluster) pairs have proportion selections statistically significantly higher than chance: %
(0, Roughest); (1, Rougher); (2, Rougher); (3, Rougher); (4, Rougher); (5, Smoother); (6, Smoother); (7, Smoothest); and (8, Smoothest) (\pinf{0.05} each).
This shows that the participants consistently identified the roughness of each visual texture and selected the corresponding haptic texture cluster.

\fig{results/haptic_visual_clusters_confusion_matrices}{
  Confusion matrices of the visual texture (left) or rank (right) with the corresponding haptic texture clusters selected in proportion.
}[
  Holm-Bonferroni adjusted binomial test results are marked in bold when the proportion is higher than chance (\ie more than \percent{20}, \pinf{0.05}).
]

\subsection{Questionnaire}
\label{results_questions}

\figref{results_questions} presents the questionnaire results of the \level{Matching} and \level{Ranking} tasks.
A non-parametric \ANOVA on \ART models were used for the \response{Difficulty} and \response{Realism} question results.
The other question results were analyzed using Wilcoxon signed-rank tests, with Holm-Bonferroni adjustment.
The results are shown as mean $\pm$ standard deviation.

On \response{Difficulty}, there were statistically significant effects of \factor{Task} (\anova{1}{57}{13}, \pinf{0.001}) and of \factor{Modality} (\anova{1}{57}{8}, \p{0.007}), but no interaction effect. % \factor{Task} \x \factor{Modality} (\anova{1}{57}{2}, \ns).
The \level{Ranking} task was found easier (\num{2.9 \pm 1.2}) than the \level{Matching} task (\num{3.9 \pm 1.5}), and the Haptic textures were found easier to discriminate (\num{3.0 \pm 1.3}) than the Visual ones (\num{3.8 \pm 1.5}).

Both haptic and visual textures were judged moderately realistic for both tasks (\num{4.2 \pm 1.3}), with no statistically significant effect of \factor{Task}, \factor{Modality} or their interaction on \response{Realism}.
No statistically significant effects of \factor{Task} on \response{Textures Match} and \response{Uncomfort} were found either.
The coherence of the texture pairs was considered moderate (\num{4.6 \pm 1.2}) and the haptic device was not felt uncomfortable (\num{2.4 \pm 1.4}).

\begin{subfigs}{results_questions}{Boxplots of the questionnaire results for each visual hand rendering.}[
    Pairwise Wilcoxon signed-rank tests with Holm-Bonferroni adjustment: * is \pinf{0.05}, ** is \pinf{0.01} and *** is \pinf{0.001}.
    Lower is better for Difficulty and Uncomfortable; higher is better for Realism and Textures Match.
  ][
  \item By modality.
  \item By task.
  ]
  \subfigsheight{70mm}
  \subfig{results/questions_modalities}%
  \subfig{results/questions_tasks}%
\end{subfigs}