Issue Downloads
Sparse LIDAR Measurement Fusion with Joint Updating Cost for Fast Stereo Matching
The complementary virtues of active and passive depth sensors inspire the LIDAR-Stereo fusion for enhancing the accuracy of stereo matching. However, most of the fusion based stereo matching algorithms have exploited dense LIDAR priors with single fusion ...
Online Learning for Adaptive Video Streaming in Mobile Networks
- Theodoros Karagkioules,
- Georgios S. Paschos,
- Nikolaos Liakopoulos,
- Attilio Fiandrotti,
- Dimitrios Tsilimantos,
- Marco Cagnazzo
In this paper, we propose a novel algorithm for video bitrate adaptation in HTTP Adaptive Streaming (HAS), based on online learning. The proposed algorithm, named Learn2Adapt (L2A), is shown to provide a robust bitrate adaptation strategy which, unlike ...
Modeling the User Experience of Watching 360° Videos with Head-Mounted Displays
Conducting user studies to quantify the Quality of Experience (QoE) of watching the increasingly more popular 360° videos in Head-Mounted Displays (HMDs) is time-consuming, tedious, and expensive. Deriving QoE models, however, is very challenging because ...
TTV Regularized LRTA Technique for the Estimation of Haze Model Parameters in Video Dehazing
Nowadays, intelligent transport systems have a major role in providing a safe and secure traffic society for passengers, pedestrians, and vehicles. However, some bad weather conditions such as haze or fog may affect the visual clarity of video footage ...
MMSUM Digital Twins: A Multi-view Multi-modality Summarization Framework for Sporting Events
Sporting events generate a massive amount of traffic on social media with live moment-to-moment accounts as any given situation unfolds. The generated data are intensified by fans feelings, reactions, and subjective opinions towards what happens during ...
Multi-feature Fusion VoteNet for 3D Object Detection
In this article, we propose a Multi-feature Fusion VoteNet (MFFVoteNet) framework for improving the 3D object detection performance in cluttered and heavily occluded scenes. Our method takes the point cloud and the synchronized RGB image as inputs to ...
A Novel Multi-Modal Network-Based Dynamic Scene Understanding
In recent years, dynamic scene understanding has gained attention from researchers because of its widespread applications. The main important factor in successfully understanding the dynamic scenes lies in jointly representing the appearance and motion ...
Facial-expression-aware Emotional Color Transfer Based on Convolutional Neural Network
Emotional color transfer aims to change the evoked emotion of a source image to that of a target image by adjusting color distribution. Most of existing emotional color transfer methods only consider the low-level visual features of an image and ignore ...
The Impact of Artificial Intelligence on the Creativity of Videos
This study explored the impact Artificial Intelligence (AI) has on the evaluation of creative elements in artistic videos. The aim was to verify to what extent the use of an AI algorithm (Style Transfer) contributes to changes in the perceived creativity ...
Learning Hierarchical Video Graph Networks for One-Stop Video Delivery
The explosive growth of video data has brought great challenges to video retrieval, which aims to find out related videos from a video collection. Most users are usually not interested in all the content of retrieved videos but have a more fine-grained ...
Mask-Guided Deformation Adaptive Network for Human Parsing
Due to the challenges of densely compacted body parts, nonrigid clothing items, and severe overlap in crowd scenes, human parsing needs to focus more on multilevel feature representations compared to general scene parsing tasks. Based on this observation, ...
Mimicking Individual Media Quality Perception with Neural Network based Artificial Observers
The media quality assessment research community has traditionally been focusing on developing objective algorithms to predict the result of a typical subjective experiment in terms of Mean Opinion Score (MOS) value. However, the MOS, being a single value, ...
Diversely-Supervised Visual Product Search
This article strives for a diversely supervised visual product search, where queries specify a diverse set of labels to search for. Where previous works have focused on representing attribute, instance, or category labels individually, we consider them ...
CAPTAIN: Comprehensive Composition Assistance for Photo Taking
Many people are interested in taking astonishing photos and sharing them with others. Emerging high-tech hardware and software facilitate the ubiquitousness and functionality of digital photography. Because composition matters in photography, researchers ...
Defining Scents: A Systematic Literature Review of Olfactory-based Computing Systems
The human sense of smell is a primal ability that has the potential to reveal unexplored relationships between user behaviors and technology. Humans use millions of olfactory receptor cells to observe the environment around them. Olfaction studies are ...
Hyperspectral Image Reconstruction Using Multi-scale Fusion Learning
Hyperspectral imaging is a promising imaging modality that simultaneously captures several images for the same scene on narrow spectral bands, and it has made considerable progress in different fields, such as agriculture, astronomy, and surveillance. ...
An Empirical Method for Causal Inference of Constructs for QoE in Haptic–Audiovisual Communications
This article proposes an empirical method for inferring causal directions in multidimensional Quality of Experience (QoE) in multimedia communications, noting that causation in QoE is perceptual. As an example for modeling framework, we pick up a Bayesian ...
RD-IOD: Two-Level Residual-Distillation-Based Triple-Network for Incremental Object Detection
As a basic component in multimedia applications, object detectors are generally trained on a fixed set of classes that are pre-defined. However, new object classes often emerge after the models are trained in practice. Modern object detectors based on ...
Optimizing Immersive Video Coding Configurations Using Deep Learning: A Case Study on TMIV
Immersive video streaming technologies improve Virtual Reality (VR) user experience by providing users more intuitive ways to move in simulated worlds, e.g., with 6 Degree-of-Freedom (6DoF) interaction mode. A naive method to achieve 6DoF is deploying ...
Robust Unsupervised Gaze Calibration Using Conversation and Manipulation Attention Priors
Gaze estimation is a difficult task, even for humans. However, as humans, we are good at understanding a situation and exploiting it to guess the expected visual focus of attention of people, and we usually use this information to retrieve people’s gaze. ...
LogoDet-3K: A Large-scale Image Dataset for Logo Detection
Logo detection has been gaining considerable attention because of its wide range of applications in the multimedia field, such as copyright infringement detection, brand visibility monitoring, and product brand management on social media. In this article, ...
Authentication of LINE Chat History Files by Information Hiding
With the prevalence of smartphones, message exchanges via mobile chatting programs like LINE have become popular. The messages in the form of chat records in a LINE chat history, after being downloaded for legal uses, might be tampered with illicitly. A ...