We address the task of monocular visual head tracking in the context of applications that involve... more We address the task of monocular visual head tracking in the context of applications that involve human-robot interactions, where both near field and far field tracking settings could occur and real-time constraints are imposed. The original contribution of this paper is a real-time multiperson tracking model that combines a priori texture and colour models for different head poses with face detectors for different face orientations. We show that such a combination improves tracker performance significantly. At the same time the proposed model takes into account major difficulties that are related to real-time data processing (non-uniform observations, processing time restrictions). The model is evaluated on a set of realistic scenarios recorded on a humanoid robot that involve interactions between the robot and the participants with robot motion, unconstrainted displacement of the participants, lighting variations etc. The algorithm runs in real-time and shows significant improveme...
Gaze estimation methods usually regress gaze directions directly from a single face or eye image.... more Gaze estimation methods usually regress gaze directions directly from a single face or eye image. However, due to important variabilities in eye shapes and inner eye structures amongst individuals, universal models obtain limited accuracies and their output usually exhibit high variance as well as biases which are subject dependent. Therefore, increasing accuracy is usually done through calibration, allowing gaze predictions for a subject to be mapped to his/her specific gaze. In this paper, we introduce a novel image differential method for gaze estimation. We propose to directly train a convolutional neural network to predict the gaze differences between two eye input images of the same subject. Then, given a set of subject specific calibration images, we can use the inferred differences to predict the gaze direction of a novel eye sample. The assumption is that by allowing the comparison between two eye images, annoyance factors (alignment, eyelid closing, illumination perturbati...
We address the problem of 3D gaze estimation within a 3D environment from remote sensors,which is... more We address the problem of 3D gaze estimation within a 3D environment from remote sensors,which is highly valuable for applications in human-human and human-robot interactions. To the contrary of most previous works, which are limited to screen gazing applications, we propose to leverage the depth data of RGB-D cameras to perform an accurate head pose tracking, acquire head pose invariance through a 3D rectification process that renders head pose dependent eye images into a canonical viewpoint, and computes the line-ofsight in 3D space. To address the low resolution issue of the eye image resulting from the use of remote sensors, we rely on the appearance based gaze estimation paradigm, which has demonstrated robustness against this factor. In this context, we do a comparative study of recent appearance based strategies within our framework, study the generalization of these methods to unseen individual, and propose a cross-user eye image alignment technique relying on the direct reg...
Thanks to the digital preservation of cultural heritage materials, multimedia tools (e.g., based ... more Thanks to the digital preservation of cultural heritage materials, multimedia tools (e.g., based on automatic visual processing) considerably ease the work of scholars in the humanities and help them to perform quantitative analysis of their data. In this context, this article assesses three different Convolutional Neural Network (CNN) architectures along with three learning approaches to train them for hieroglyph classification, which is a very challenging task due to the limited availability of segmented ancient Maya glyphs. More precisely, the first approach, the baseline, relies on pretrained networks as feature extractor. The second one investigates a transfer learning method by fine-tuning a pretrained network for our glyph classification task. The third approach considers directly training networks from scratch with our glyph data. The merits of three different network architectures are compared: a generic sequential model (i.e., LeNet), a sketch-specific sequential network (...
Shape representations are critical for visual analysis of cultural heritage materials. This artic... more Shape representations are critical for visual analysis of cultural heritage materials. This article studies two types of shape representations in a bag-of-words-based pipeline to recognize Maya glyphs. The first is a knowledge-driven Histogram of Orientation Shape Context (HOOSC) representation, and the second is a data-driven representation obtained by applying an unsupervised Sparse Autoencoder (SA). In addition to the glyph data, the generalization ability of the descriptors is investigated on a larger-scale sketch dataset. The contributions of this article are four-fold: (1) the evaluation of the performance of a data-driven auto-encoder approach for shape representation; (2) a comparative study of hand-designed HOOSC and data-driven SA; (3) an experimental protocol to assess the effect of the different parameters of both representations; and (4) bridging humanities and computer vision/machine learning for Maya studies, specifically for visual analysis of glyphs. From our experi...
We address the task of monocular visual head tracking in the context of applications that involve... more We address the task of monocular visual head tracking in the context of applications that involve human-robot interactions, where both near field and far field tracking settings could occur and real-time constraints are imposed. The original contribution of this paper is a real-time multiperson tracking model that combines a priori texture and colour models for different head poses with face detectors for different face orientations. We show that such a combination improves tracker performance significantly. At the same time the proposed model takes into account major difficulties that are related to real-time data processing (non-uniform observations, processing time restrictions). The model is evaluated on a set of realistic scenarios recorded on a humanoid robot that involve interactions between the robot and the participants with robot motion, unconstrainted displacement of the participants, lighting variations etc. The algorithm runs in real-time and shows significant improveme...
Gaze estimation methods usually regress gaze directions directly from a single face or eye image.... more Gaze estimation methods usually regress gaze directions directly from a single face or eye image. However, due to important variabilities in eye shapes and inner eye structures amongst individuals, universal models obtain limited accuracies and their output usually exhibit high variance as well as biases which are subject dependent. Therefore, increasing accuracy is usually done through calibration, allowing gaze predictions for a subject to be mapped to his/her specific gaze. In this paper, we introduce a novel image differential method for gaze estimation. We propose to directly train a convolutional neural network to predict the gaze differences between two eye input images of the same subject. Then, given a set of subject specific calibration images, we can use the inferred differences to predict the gaze direction of a novel eye sample. The assumption is that by allowing the comparison between two eye images, annoyance factors (alignment, eyelid closing, illumination perturbati...
We address the problem of 3D gaze estimation within a 3D environment from remote sensors,which is... more We address the problem of 3D gaze estimation within a 3D environment from remote sensors,which is highly valuable for applications in human-human and human-robot interactions. To the contrary of most previous works, which are limited to screen gazing applications, we propose to leverage the depth data of RGB-D cameras to perform an accurate head pose tracking, acquire head pose invariance through a 3D rectification process that renders head pose dependent eye images into a canonical viewpoint, and computes the line-ofsight in 3D space. To address the low resolution issue of the eye image resulting from the use of remote sensors, we rely on the appearance based gaze estimation paradigm, which has demonstrated robustness against this factor. In this context, we do a comparative study of recent appearance based strategies within our framework, study the generalization of these methods to unseen individual, and propose a cross-user eye image alignment technique relying on the direct reg...
Thanks to the digital preservation of cultural heritage materials, multimedia tools (e.g., based ... more Thanks to the digital preservation of cultural heritage materials, multimedia tools (e.g., based on automatic visual processing) considerably ease the work of scholars in the humanities and help them to perform quantitative analysis of their data. In this context, this article assesses three different Convolutional Neural Network (CNN) architectures along with three learning approaches to train them for hieroglyph classification, which is a very challenging task due to the limited availability of segmented ancient Maya glyphs. More precisely, the first approach, the baseline, relies on pretrained networks as feature extractor. The second one investigates a transfer learning method by fine-tuning a pretrained network for our glyph classification task. The third approach considers directly training networks from scratch with our glyph data. The merits of three different network architectures are compared: a generic sequential model (i.e., LeNet), a sketch-specific sequential network (...
Shape representations are critical for visual analysis of cultural heritage materials. This artic... more Shape representations are critical for visual analysis of cultural heritage materials. This article studies two types of shape representations in a bag-of-words-based pipeline to recognize Maya glyphs. The first is a knowledge-driven Histogram of Orientation Shape Context (HOOSC) representation, and the second is a data-driven representation obtained by applying an unsupervised Sparse Autoencoder (SA). In addition to the glyph data, the generalization ability of the descriptors is investigated on a larger-scale sketch dataset. The contributions of this article are four-fold: (1) the evaluation of the performance of a data-driven auto-encoder approach for shape representation; (2) a comparative study of hand-designed HOOSC and data-driven SA; (3) an experimental protocol to assess the effect of the different parameters of both representations; and (4) bridging humanities and computer vision/machine learning for Maya studies, specifically for visual analysis of glyphs. From our experi...
Uploads
Papers