Abstract
Banding is manifested as false contours in otherwise smooth regions in an image or a video. There are many different reasons for banding artifacts to occur, however, one of the most prominent causes is the quantization inside a video encoder. Compared to other types of artifacts common in video processing, e.g. blur, ringing, or blockiness, only a relatively small change of the original pixel values can produce an easily noticeable and visually very annoying case of banding. This property makes it very difficult for banding to be captured by generic objective quality metrics such as PSNR or VMAF [10] which brings a need for a distortion specific detector targeted directly at banding artifacts.
Most of the previous attempts to solve this problem tried to tackle it as false segments or false edges detection. Both block-based [7, 14] and pixel-based [2, 3, 15] segmentation methods have been tried in the first category, while the edge-based methods exploited different local statistics such as gradients, contrast, or entropy [4, 6, 9, 13]. The main difficulty for all of these approaches is distinguishing between the real and false edges or segments. Recently, banding detection has also been addressed by deep neural networks [8].
The above mentioned approaches have been developed for 8-bit content and mostly tuned towards the banding artifacts occurring in user-generated images and videos. Moreover, they do not address the potential presence of dithering - an intentionally inserted noise used to randomize the error caused by quantization. Dithering is commonly used during bit-depth conversion and is often enabled by default in popular image and video processing tools, such as ffmpeg [12]. Despite being highly effective in reducing the perceived banding, dithering does not suppress the false contours completely, and thus needs to be factored in a reliable banding detector.
Our goal was, therefore, to develop an algorithm capable of evaluating perceived banding in professionally generated videos processed in ways relevant to the adaptive streaming scenario (i.e. video compression and scaling). The requirements also included ability to capture the effect of dithering and to work on both 8-bit and 10-bit content.
We hereby present CAMBI, a Contrast Aware Multiscale Banding Index. CAMBI is a white-box solution to the above described problem derived from basic principles of human vision with just a few, perceptually-motivated, parameters. The first version was introduced at PCS'2021 [11]. Here, we also present several improvements made since then.
There are three main steps in CAMBI - input preprocessing, multiscale banding confidence calculation, and spatio-temporal pooling. Although it has been shown that chromatic banding exists [5], like most past works, we assume that most of the banding can be captured in the luma channel. The preprocessing step, therefore, consists of luma channel extraction followed by filtering to account for dithering and a spatial mask computation to exclude regions with textures. Banding confidence is calculated for 4 brightness level differences on 5 scales, taking into account contrast perception of human visual system. This creates 20 banding confidence maps per frame that are pooled spatially considering only a certain percentage of highest banding confidence. Such mechanism ensures that even the banding appearing in relatively small area of the frame is captured proportionally to its perceptual importance. Finally, the scores from different video frames are pooled into a single banding index.
To test the accuracy of CAMBI, we conducted a subjective test on 86 video clips created from 9 different sources from Netflix catalog using different levels of compression and scaling with and without dithering. The ground-truth mean opinion scores (MOS) were obtained from 26 observers who were asked to rate the annoyance of the banding in the scene on the continuous impairment scale annotated with 5 equidistant labels (imperceptible, perceptible but not annoying, slightly annoying, annoying, very annoying) [1]. CAMBI achieved a correlation exceeding 0.94 in terms of both Pearson Linear Correlation Coefficient (PLCC) and Spearman Rank Order Correlation Coefficient (SROCC), significantly outperforming state-of-the-art banding detectors for our use-case.
CAMBI is currently used alongside VMAF in our production to improve the quality of encodes prone to banding. In the future, we are planning to integrate it into VMAF as one of the features to make it capable of accurately evaluating video quality in the presence of banding as well as other artifacts.