Files
phd-thesis/1-introduction/related-work/3-augmented-reality.tex
2024-09-16 09:15:19 +02:00

255 lines
23 KiB
TeX
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

\section{Principles and Capabilities of AR}
\label{augmented_reality}
The first \AR headset was invented by \textcite{sutherland1968headmounted}: With the technology available at the time, it was already capable of displaying virtual objects at a fixed point in space in real time, giving the user the illusion that the content was present in the room (see \figref{sutherland1968headmounted}).
Fixed to the ceiling, the headset displayed a stereoscopic (one image per eye) perspective projection of the virtual content on a transparent screen, taking into account the user's position, and thus already following the interaction loop presented in \figref[introduction]{interaction-loop}.
\begin{subfigs}{sutherland1968headmounted}{Photos of the first \AR system~\cite{sutherland1968headmounted}. }[
\item The \AR headset.
\item Wireframe \ThreeD virtual objects were displayed registered in the real environment (as if there were part of it).
]
\subfigsheight{45mm}
\subfig{sutherland1970computer3}
\subfig{sutherland1970computer2}
\end{subfigs}
\subsection{What is Augmented Reality?}
\label{ar_definition}
\paragraph{A Definition}
The system of \cite{sutherland1968headmounted} already fulfilled the first formal definition of \AR, proposed by \textcite{azuma1997survey} in the first survey of the domain:
\begin{enumerate}[label=(\arabic*)]
\item combine real and virtual,
\item be interactive in real time, and
\item register real and virtual\footnotemark.
\end{enumerate}
%\footnotetext{There quite confusion in the literature and in (because of) the industry about the terms \AR and \MR. The term \MR is very often used as a synonym of \AR, or a version of \AR that enables an interaction with the virtual content. The title of this section refers to the title of the highly cited paper by \textcite{speicher2019what} that examines this debate.}
\footnotetext{This third characteristic has been slightly adapted to use the version of \textcite{marchand2016pose}, the original definition was: \enquote{registered in \ThreeD}.}
Each of these characteristics is essential: the real-virtual combination distinguishes \AR from \VR, a movie with integrated digital content is not interactive and a \TwoD overlay like an image filter is not registered.
There are also two key aspects to this definition: it does not focus on technology or method, but on the user's perspective of the system experience, and it does not specify a particular human sense, \ie it can be auditory~\cite{yang2022audio}, haptic~\cite{bhatia2024augmenting}, or even olfactory~\cite{brooks2021stereosmell} or gustatory~\cite{brooks2023taste}.
Yet, most of the research have focused on visual augmentations, and the term \AR (without a prefix) is almost always understood as \v-\AR.
%For example, \textcite{milgram1994taxonomy} proposed a taxonomy of \MR experiences based on the degree of mixing real and virtual environments, and \textcite{skarbez2021revisiting} revisited this taxonomy to include the user's perception of the experience.
\paragraph{Applications}
Advances in technology, research and development have enabled many usages of \AR, including medicine, education, industrial, navigation, collaboration and entertainment applications~\cite{dey2018systematic}.
For example, \AR can help surgeons to visualize \ThreeD images of the brain overlaid on the patient's head prior or during surgery, \eg in \figref{watanabe2016transvisible}~\cite{watanabe2016transvisible}, or improve the learning of students with complex concepts and phenomena such as optics or chemistry~\cite{bousquet2024reconfigurable}.
It can also guide workers in complex tasks, such as assembly, maintenance or verification, \eg in \figref{hartl2013mobile}~\cite{hartl2013mobile}, reinvent the way we interact with desktop computers, \eg in \figref{lee2013spacetop}~\cite{lee2013spacetop}, or can create complete new forms of gaming or tourism experiences, \eg in \figref{roo2017inner}~\cite{roo2017inner}.
Most of (visual) \AR/\VR experience can now be implemented with commercially available hardware and software solutions, in particular for tracking, rendering and display.
Yet, the user experience in \AR is still highly dependent on the display used.
\begin{subfigs}{ar_applications}{Examples of \AR applications. }[
\item Neurosurgery \AR visualization of the brain on a patient's head~\cite{watanabe2016transvisible}.
\item SpaceTop is transparent \AR desktop computer featuring direct hand manipulation of \ThreeD content~\cite{lee2013spacetop}.
\item \AR can interactively guide in document verification tasks by recognizing and comparing with virtual references
~\cite{hartl2013mobile}.
\item Inner Garden is a tangible, spatial \AR zen garden for relaxation and meditation~\cite{roo2017inner}.
]
\subfigsheight{47mm}
\subfig{watanabe2016transvisible}
\subfig{lee2013spacetop}
\subfig{hartl2013mobile}
\subfig{roo2017inner}
\end{subfigs}
\subsection{AR Displays and Perception}
\label{ar_displays}
\cite{bimber2005spatial}
\paragraph{Spatial Augmented Reality}
\paragraph{Window on World Displays}
\paragraph{Video See-Through Headsets}
Vergence-accommodation conflict.
Using a VST-AR headset have notable consequences, as the "real" view of the environment and the hand is actually a visual stream from a camera, which has a noticeable delay and lower quality (\eg resolution, frame rate, field of view) compared to the direct view of the real environment with OST-AR~\cite{macedo2023occlusion}.
\paragraph{Optical See-Through Headsets}
%Distances are underestimated~\cite{adams2022depth,peillard2019studying}.
% billinghurst2021grand
\subsection{Presence and Embodiment in AR}
\label{ar_presence}
Despite the clear and acknowledged definition presented in \secref{ar_definition} and the viewpoint of this thesis that \AR and \VR are two type of \MR experience with different levels of mixing real and virtual environments, as presented in \secref[introduction]{visuo_haptic_augmentations}, there is still a debate on defining \AR and \MR as well as how to characterize and categorized such experiences~\cite{speicher2019what,skarbez2021revisiting}.
\paragraph{Presence}
Presence is one of the key concept to characterize a \VR experience.
\AR and \VR are both essentially illusions as the virtual content does not physically exist but is just digitally simulated and rendered to the user's perception through a user interface and the user's senses.
Such experience of disbelief suspension in \VR is what is called presence, and it can be decomposed into two dimensions: \PI and \PSI~\cite{slater2009place}.
\PI is the sense of the user of \enquote{being there} in the \VE (see \figref{presence-vr}).
It emerges from the real time rendering of the \VE from the user's perspective: to be able to move around inside the \VE and look from different point of views.
\PSI is the illusion that the virtual events are really happening, even if the user knows that they are not real.
It doesn't mean that the virtual events are realistic, but that they are plausible and coherent with the user's expectations.
A third strong illusion in \VR is the \SoE, which is the illusion that the virtual body is one's own~\cite{slater2022separate,guy2023sense}.
The \AR presence is far less defined and studied than for \VR~\cite{tran2024survey}, but it will be useful to design, evaluate and discuss our contributions in the next chapters.
Thereby, \textcite{slater2022separate} proposed to invert \PI to what we can call \enquote{object illusion}, \ie the sense of the virtual object to \enquote{feels here} in the \RE (see \figref{presence-ar}).
As with VR, \VOs must be able to be seen from different angles by moving the head but also, this is more difficult, be consistent with the \RE, \eg occlude or be occluded by real objects~\cite{macedo2023occlusion}, cast shadows or reflect lights.
The \PSI can be applied to \AR as is, but the \VOs must additionally have knowledge of the \RE and react accordingly to it.
\textcite{skarbez2021revisiting} also named \PI for \AR as \enquote{immersion} and \PSI as \enquote{coherence}, and these terms will be used in the remainder of this thesis.
\begin{subfigs}{presence}{The sense of immersion in virtual and augmented environments. Adapted from \textcite{stevens2002putting}. }[
\item Place Illusion (PI) is the sense of the user of \enquote{being there} in the \VE.
\item Objet illusion is the sense of the virtual object to \enquote{feels here} in the \RE.
]
\subfigsheight{35mm}
\subfig{presence-vr}
\subfig{presence-ar}
\end{subfigs}
\paragraph{Embodiment}
As presence, \SoE in \AR is a recent topic and little is known about its perception on the user experience~\cite{genay2021virtual}.
\subsection{Direct Hand Manipulation in AR}
Both \AR/\VR and haptic systems are able to render virtual objects and environments as sensations displayed to the user's senses.
However, as presented in \figref[introduction]{interaction-loop}, the user must be able to manipulate the virtual objects and environments to complete the loop, \eg through a hand-held controller, a tangible object, or even directly with the hands.
An interaction technique is then required to map user inputs to actions on the \VE~\cite{laviola20173d}.
\subsubsection{Interaction Techniques}
For a user to interact with a computer system, they first perceive the state of the system and then act on it using an input interface.
An input interface can be either an active sensing, physically held or worn device, such as a mouse, a touchscreen, or a hand-held controller, or a passive sensing, not requiring any physical contact, such as eye trackers, voice recognition, or hand tracking.
The sensors' information gathered by the input interface are then translated into actions within the computer system by an interaction technique.
For example, a cursor on a screen can be moved either with a mouse or with arrow keys on a keyboard, or a two-finger swipe on a touchscreen can be used to scroll or zoom an image.
Choosing useful and efficient input interfaces and interaction techniques is crucial for the user experience and the tasks that can be performed within the system~\cite{laviola20173d}.
\fig[0.5]{interaction-technique}{An interaction technique map user inputs to actions within a computer system. Adapted from \textcite{billinghurst2005designing}.}
\paragraph{Tasks}
\textcite{laviola20173d} classify interaction techniques into three categories based on the tasks they enable users to perform: manipulation, navigation, and system control.
\textcite{hertel2021taxonomy} proposed a revised taxonomy of interaction techniques specifically for immersive \AR.
The \emph{manipulation tasks} are the most fundamental tasks in \AR and \VR systems, and the basic blocks for more complex interactions.
\emph{Selection} is the identification or acquisition of a specific virtual object, \eg pointing at a target as in \figref{grubert2015multifi}, touching a button with a finger, or grasping an object with a hand.
\emph{Positioning} and \emph{rotation} of a selected object are respectively the change of its position and orientation in \ThreeD space.
It is also common to \emph{resize} a virtual object to change its size.
These three tasks are geometric (rigid) manipulations of the object: they do not change its shape.
The \emph{navigation tasks} are the movements of the user within the \VE.
Travel is the control of the position and orientation of the viewpoint in the \VE, \eg physical walking, velocity control, or teleportation.
Wayfinding is the cognitive planning of the movement such as pathfinding or route following (see \figref{grubert2017pervasive}).
The \emph{system control tasks} are changes in the system state through commands or menus such as creation, deletion, or modification of objects, \eg as in \figref{roo2017onea}. It is also the input of text, numbers, or symbols.
\paragraph{Reducing the Physical-Virtual Gap}
In \AR and \VR, the state of the system is displayed to the user as a \VE seen spatially in 3D.
Within an immersive and portable \AR system, this \VE is experienced at a 1:1 scale and as an integral part of the \RE.
The rendering gap between the physical and virtual elements, as described on the interaction loop in \figref[introduction]{interaction-loop}, is thus experienced as very narrow or even not consciously perceived by the user.
This manifests as a sense of presence of the virtual, as presented in \secref{ar_presence}.
As the physical-virtual rendering gap is reduced, we could expect a similar and seamless interaction with the \VE as with a physical environment that \cite{jacob2008realitybased} called \emph{reality based interactions}.
As of today, an immersive \AR system track itself with the user in \ThreeD, using tracking sensors and pose estimation algorithms~\cite{marchand2016pose}, \eg as in \figref{newcombe2011kinectfusion}.
It enables to register the \VE with the \RE and the user simply moves themselves to navigate within the virtual content.
%This tracking and mapping of the user and \RE into the \VE is named the \enquote{extent of world knowledge} by \textcite{skarbez2021revisiting}, \ie to what extent the \AR system knows about the \RE and is able to respond to changes in it.
However, direct hand manipulation of the virtual content is a challenge that requires specific interaction techniques~\cite{billinghurst2021grand}.
This is often achieved using two interaction techniques: \emph{tangible objects} and \emph{virtual hands}~\cite{hertel2021taxonomy}.
\begin{subfigs}{interaction-techniques}{Interaction techniques in \AR. }[
\item Spatial selection of virtual item of an extended display using a hand-held smartphone~\cite{grubert2015multifi}.
\item Displaying as an overlay registered on the \RE the route to follow~\cite{grubert2017pervasive}.
\item Virtual drawing on a tangible object with a hand-held pen~\cite{roo2017onea}.
\item Simultaneous Localization and Mapping (SLAM) techniques such as KinectFusion~\cite{newcombe2011kinectfusion} reconstruct the \RE in real time and enables to register the \VE in it.
]
\subfigsheight{36mm}
\subfig{grubert2015multifi}
\subfig{grubert2017pervasive}
\subfig{roo2017onea}
\subfig{newcombe2011kinectfusion}
\end{subfigs}
\paragraph{Manipulating with Virtual Hands}
Dans le cas de la RA immersive avec une interaction "naturelles" (cf \cite{billinghurst2005designing}), la sélection consiste à toucher l'objet virtuel avec les mains, et la manipulation à le saisir et le déplacer avec les mains.
C'est ce qu'on appelle les "virtual hands" : les mains virtuelles de l'utilisateur dans le \VE.
Le dispositif d'entrée n'est pas une manette comme c'est souvent le cas en VR, mais directement les mains.
Les mains sont donc détectées et reproduites dans le \VE.
Maglré tout, le principal problème de l'interaction naturelle avec les mains dans un \VE, outre la détection des mains, est le manque de contrainte physique sur le mouvement de la main et des doigts, ce qui rend les actions fatiguantes (\cite{hincapie-ramos2014consumed}), imprécises (on ne sait pas si on touche l'objet virtuel sans retour haptique) et difficile (idem, sans retour haptique on ne sent pas l'objet glisser, et on a pas de confirmation qu'il est bien en main). Des techniques d'interactions d'une part sont toujours nécessaire,et un retour haptique adapté aux contraintes d'interactions de la RA est indispensable pour une bonne expérience utilisateur.
Cela peut être aussi difficile à comprendre : "\cite{chan2010touching} proposent la combinaison de retours continus, pour que lutilisateur situe le suivi de son corps, et de retours discrets pour confirmer ses actions." Un rendu et affichage visuel des mains est un retour continu, un bref changement de couleur ou un retour haptique est un retour discret. Mais cette combinaison n'a pas été évaluée.
\cite{hilliges2012holodesk}
\cite{piumsomboon2013userdefined} : user-defined gestures for manipulation of virtual objects in AR.
\cite{piumsomboon2014graspshell} : direct hand manipulation of virtual objects in immersive AR vs vocal commands.
\cite{chan2010touching} : cues for touching (selection) virtual objects.
Problèmes d'occultation, les objets virtuels doivent toujours êtres visibles : soit en utilisant une main virtuelle transparente plutôt quopaque, soit en affichant leurs contours si elle les cache \cite{piumsomboon2014graspshell}.
\paragraph{Manipulating with Tangibles}
\cite{issartel2016tangible}
\cite{englmeier2020tangible}
en OST-AR \cite{kahl2021investigation,kahl2022influence,kahl2023using}
Triple problème :
il faut un tangible par objet, problème de l'association qui ne fonctionne pas toujours (\cite{hettiarachchi2016annexing}) et du nombre de tangibles à avoir
et l'objet visuellement peut ne pas correspondre aux sensations haptiques du tangible manipulé (\cite{tinguy2019how}).
C'est pourquoi utiliser du wearable pour modifier les sensations cutanées du tangible est une solution qui fonctionne en VR (\cite{detinguy2018enhancing,salazar2020altering}) et pourrait être adaptée à la RA.
Mais, spécifique à la RA vs RV, le tangible et la main sont visibles, du moins partiellement, même si caché par un objet virtuel : comment va fonctionner l'augmentation haptique en RA vs RV ? Biais perceptuels ? Le fait de voir toucher avec sa propre main le tangible vs en RV où il est caché, donc illusion potentiellement plus forte en RV ?
\subsection{Visual Rendering of Hands in AR}
In VR, as the user is fully immersed in the \VE and cannot see their real hands, it is necessary to represent them virtually.
Virtual hand rendering is also known to influence how an object is grasped in VR~\cite{prachyabrued2014visual,blaga2020too} and AR, or even how real bumps and holes are perceived in VR~\cite{schwind2018touch}, but its effect on the perception of a haptic texture augmentation has not yet been investigated.
It is known that the virtual hand representation has an impact on perception, interaction performance, and preference of users~\cite{prachyabrued2014visual, argelaguet2016role, grubert2018effects, schwind2018touch}.
In a pick-and-place task in VR, \textcite{prachyabrued2014visual} found that the virtual hand representation whose motion was constrained to the surface of the virtual objects performed the worst, while the virtual hand representation following the tracked human hand (thus penetrating the virtual objects), performed the best, even though it was rather disliked.
The authors also observed that the best compromise was a double rendering, showing both the tracked hand and a hand rendering constrained by the virtual environment.
It has also been shown that over a realistic avatar, a skeleton rendering can provide a stronger sense of being in control~\cite{argelaguet2016role} and that minimalistic fingertip rendering can be more effective in a typing task~\cite{grubert2018effects}.
\fig{prachyabrued2014visual}{Effect of different hand renderings on a pick-and-place task in VR~\cite{prachyabrued2014visual}.}
Mutual visual occlusion between a virtual object and the real hand, \ie hiding the virtual object when the real hand is in front of it and hiding the real hand when it is behind the virtual object, is often presented as natural and realistic, enhancing the blending of real and virtual environments~\cite{piumsomboon2014graspshell, al-kalbani2016analysis}.
In video see-through AR (VST-AR), this could be solved as a masking problem by combining the image of the real world captured by a camera and the generated virtual image~\cite{macedo2023occlusion}.
In OST-AR, this is more difficult because the virtual environment is displayed as a transparent 2D image on top of the 3D real world, which cannot be easily masked~\cite{macedo2023occlusion}.
Moreover, in VST-AR, the grip aperture and depth positioning of virtual objects often seem to be wrongly estimated~\cite{al-kalbani2016analysis, maisto2017evaluation}.
However, this effect has yet to be verified in an OST-AR setup.
An alternative is to render the virtual objects and the hand semi-transparents, so that they are partially visible even when one is occluding the other, \eg the real hand is behind the virtual cube but still visible.
Although perceived as less natural, this seems to be preferred to a mutual visual occlusion in VST-AR~\cite{buchmann2005interaction,ha2014wearhand,piumsomboon2014graspshell} and VR~\cite{vanveldhuizen2021effect}, but has not yet been evaluated in OST-AR.
However, this effect still causes depth conflicts that make it difficult to determine if one's hand is behind or in front of a virtual object, \eg the thumb is in front of the virtual cube, but it appears to be behind it.
In AR, as the real hand of a user is visible but not physically constrained by the virtual environment, adding a visual hand rendering that can physically interact with virtual objects would achieve a similar result to the promising double-hand rendering of \textcite{prachyabrued2014visual}.
Additionally, \textcite{kahl2021investigation} showed that a virtual object overlaying a tangible object in OST-AR can vary in size without worsening the users' experience nor the performance.
This suggests that a visual hand rendering superimposed on the real hand could be helpful, but should not impair users.
Few works have explored the effect of visual hand rendering in AR~\cite{blaga2017usability, maisto2017evaluation, krichenbauer2018augmented, yoon2020evaluating, saito2021contact}.
For example, \textcite{blaga2017usability} evaluated a skeleton rendering in several virtual object manipulations against no visual hand overlay.
Performance did not improve, but participants felt more confident with the virtual hand.
However, the experiment was carried out on a screen, in a non-immersive AR scenario.
\textcite{saito2021contact} found that masking the real hand with a textured 3D opaque virtual hand did not improve performance in a reach-to-grasp task but displaying the points of contact on the virtual object did.
To the best of our knowledge, evaluating the role of a visual rendering of the hand displayed \enquote{and seen} directly above real tracked hands in immersive OST-AR has not been explored, particularly in the context of virtual object manipulation.
Mais se pose la question de la représentation, qui a montré des effets sur la performance et expérience utilisateur en RV mais reste peu étudiée en RA.
\subsection{Conclusion}
\label{ar_conclusion}
\AR systems integrate virtual objects into the visual perception as if they were part of the \RE.
\AR headsets now enable real-time tracking of the head and hands, and high-quality display of virtual content, while being portable and mobile.
They enable highly immersive \AEs that users can explore with a strong sense of the presence of the virtual content.
But without a direct and seamless interaction with the virtual objects using the hands, the coherence of the \AE experience is compromised.
In particular, there is a lack of mutual occlusion and interaction cues between hands and virtual objects in \OST-\AR that could be mitigated by visual rendering of the hand.
A common alternative approach is to use tangible objects as proxies for interaction with virtual objects, but this raises concerns about their number and association with virtual objects, as well as consistency with the visual rendering.
In this context, the use of wearable haptic systems worn on the hand seems to be a promising solution both for improving direct hand manipulation of virtual objects and for coherent visuo-haptic augmentation of touched tangible objects.