- Research
- Open access
- Published:
A two-stage algorithm for the early detection of zero-quantized discrete cosine transform coefficients in High Efficiency Video Coding
EURASIP Journal on Image and Video Processing volume 2017, Article number: 56 (2017)
Abstract
For High Efficiency Video Coding (HEVC) using block-based prediction, the discrete cosine transform (DCT), and quantization, a large number of DCT coefficients in a transform block (TB) are commonly found to be quantized to zero. A two-dimensional transform in HEVC is usually implemented by first applying a butterfly-based one-dimensional (1D) DCT to each row of the residual block, followed by a 1D DCT to each column. Accordingly, we propose a two-stage method for the early detection of zero-quantized DCT coefficients in this paper. In the first stage, a distribution-based method, which uses the sum of absolute differences as the threshold criterion and compares it to threshold values obtained for all columns, is employed to detect zero-quantized columns in the pixel domain prior to conducting the actual DCT. After conducting the 1D row DCT, the second detection stage, which uses the intermediate matrix resulting from the row transform as the input, is applied in the transform domain only to those columns predicted to contain non-zero coefficients. As an orthogonal transform, the 1D row DCT has a tendency to pack a large fraction of the signal energy into a relatively few coefficients. Therefore, the second stage of our algorithm is more effective than methods that conduct detection in the pixel domain, particularly for TBs greater than 8×8. Experimental results demonstrate that an HEVC encoder with our proposed two-stage algorithm can dramatically reduce unnecessary 1D row and column DCT operations compared with a standard encoder and therefore exhibits applicability to practical HEVC encoder implementation.
1 Introduction
High Efficiency Video Coding (HEVC) design follows the classic block-based hybrid video coding approach [1]. For HEVC using block-based intra- or inter-prediction, the discrete cosine transform (DCT), and quantization, a substantial number of DCT coefficients of the residual block are commonly found to be quantized to zero, particularly when the quantization parameter (QP) is large. Therefore, considerable computation can be avoided if the transform blocks (TBs) containing DCT coefficients that are uniformly zero (i.e., all-zero blocks (AZBs)) and TBs where all the DCT coefficients in a specified sub-block are zeros (i.e., partial-zero blocks (PZBs)) [2] can be detected prior to conducting the DCT operations and quantization.
The problems involved with the early detection of AZBs and zero sub-blocks for 4×4 DCT in H.264/AVC [3–6] and 8×8 DCT in MPEG-2/4 and H.263 [2, 7–10] have been extensively studied. However, the methods developed for these standards are not well suited for HEVC. The main reason is that HEVC supports larger TB sizes (i.e., 16×16 and 32×32) than those employed in prior standards, and the detection efficiency will be degraded when applying methods developed for 8×8 or 4×4 blocks to larger TBs. More recently, various schemes have been developed for detecting variable sized AZBs in HEVC [11–14]. These methods typically predict AZBs by setting the thresholds according to the sum of absolute differences (SAD) [11, 13, 14], sum of absolute transformed differences (SATD) [13], and sum of squared differences (SSD) [13]. However, we have observed in experiments that the number of AZBs relative to the total number of TBs decreases with increasing TB size, as would be expected. Therefore, even though AZB detection algorithms are efficient, their capacity to reduce the computational burden is limited.
In HEVC, the two-dimensional (2D) forward DCT of a TB is implemented by first applying a butterfly-based one-dimensional (1D) DCT [15] to each row of the residual block, followed by applying a 1D DCT to each column. As an orthogonal transform, 1D DCT has a tendency to pack a large fraction of the signal energy into a relatively few elements of the intermediate matrix. In this paper, we propose a two-stage method that capitalizes on this property for the early detection of zero-quantized columns in HEVC. In the first stage, a distribution-based method, which uses the SAD as the threshold criterion and compares it to threshold values obtained for all columns, is employed to detect zero-quantized columns in the pixel domain before conducting the actual DCT. After conducting the 1D row DCT, the second detection stage, which uses the intermediate matrix resulting from the row transform as the input, is applied in the transform domain only to those columns predicted to contain non-zero coefficients. Therefore, in contrast to most existing techniques, the proposed algorithm performs detection not only in the pixel domain but also in the transform domain after conducting 1D row DCT. Furthermore, the proposed method regards AZB detection as a special case of zero-quantized column detection, where an AZB is a TB whose columns are all zero-quantized columns. Experimental results demonstrate that the proposed two-stage algorithm can detect zero-quantized columns more effectively than existing methods, particularly for TBs greater than 8×8.
The remainder of this paper is organized as follows. Section 2 discusses previous studies related to the early detection of AZBs and PZBs for 4×4 and 8×8 DCT. Section 3 presents our proposed two-stage method for detecting zero-quantized columns in HEVC. Experimental results are given in Section 4. We conclude our work in Section 5.
2 Related work
Existing methods for the early detection of zero-quantized DCT coefficients typically aim at detecting the entire residual block as an AZB or as a PZB. In addition, most existing detection methods are threshold-based, and the SAD of the motion compensation block, which can be obtained during motion estimation, has been the most widely used threshold selection criterion. Some variants of the SAD have also been employed. Examples include row-based SAD [2] and the maximum value of the sum of positive residuals and the absolute value of negative residuals [10]. Methods for deriving the thresholds can be roughly classified into three categories: upper bound-based [7, 9, 10, 16], distribution-based [8, 17], and hybrid methods that employ multiple thresholds derived using methods in the first two categories [5, 18].
Upper bound-based methods first derive sufficient conditions for judging the existence of AZBs and PZBs. These methods estimate the upper bound of a DCT coefficient F(u,v) as a product of a scalar α and the SAD [7, 9, 10]:
where α is determined by a theoretical analysis of the properties of the DCT. Among these methods, Zhou et al. [7] provided a larger value α (i.e., \(\frac {1}{4}\) for an 8×8 DCT), which provided a more stringent condition for detecting AZBs. Sousa [9] refined the model proposed by Zhou et al. and derived a more precise α (i.e., \(\alpha = \frac {1}{4}\cos ^{2} \left ({\frac {\pi }{{16}}} \right)\)). Ji et al. [2] refined Sousa’s model using a combination of SAD and row-based SAD as the threshold criterion. Their method not only derived a more precise sufficient condition for detecting AZBs than Sousa’s but also provided conditions for the early detection of PZBs with 34 and 16 zero-quantized coefficients in an 8×8 block. After the H.264 standard was launched to address a new generation of multimedia applications, the implications of the employment by the standard of an integer 4×4 DCT and scaling multiplication to avoid division for quantization were studied [3, 4], and an upper bound-based method was extended to AZB detection in H.264/AVC. The advantage of upper bound-based methods is that they do not incur a false positive error. However, these methods determine the thresholds by approximating the individual elements of a transform matrix by a theoretical maximum value at different frequency positions. The deviation between the upper bound and the actual maximum value of the DCT coefficients increases as the TB size increases. Therefore, extending these methods to TBs greater than 8×8 reduces the detection ratio. Taking a 16×16 TB, for example, the SAD value is on average four times the SAD value of any of its four quadrants (i.e., 8×8 blocks). Using the method proposed in [9], the sufficient condition for AZB detection is about twice that for an 8×8 block.
Distribution-based methods employ Gaussian or Laplacian distributions to model the residual pixels and derive multiple thresholds based on a theoretical analysis of the relationship between the distribution of the residual pixels and the distribution of the DCT coefficients [8, 17]. These methods derive AZB and PZB thresholds under various assumptions, e.g., the residual pixels in a motion-compensated frame follow a Laplacian distribution [8] or a Gaussian distribution [17], and the distribution has a zero mean and a separable covariance. Furthermore, distribution-based methods discriminate DCT coefficients using a probability framework based on three-sigma limits, where sigma (σ) denotes the standard deviation of the distribution. Based on the assumptions of the method, the residual pixels follow a zero-mean Laplacian or Gaussian distribution and the distribution of the DCT coefficients at a given frequency position will fall within the range (−3σ,3σ) with a very high probability. Because these assumptions may fail, distribution-based methods may induce false positive detection and result in video quality degradation.
Let X denote an N×N residual block. In HEVC, the approximated N×N integer DCT is
where C N is an N×N core transform matrix. The elements of C N are derived by approximating the scaled DCT basis functions. Details regarding the core transform matrix for different transform sizes have been presented in a previous study [19]. The 2D forward DCT in (2) is implemented by first applying a butterfly-based 1D DCT [15] to each row of the residual block, that is, Y=C N X T, followed by the 1D column DCT F=C N Y T. In such an implementation, even if only a few coefficients are predicted as non-zero, the 1D row and column transforms cannot be skipped. As a result, the capacity of these methods to reduce the computational burden are limited (a detailed illustration of this phenomenon and its cause can be found in [17]). Moreover, HEVC supports larger TB sizes (up to 32×32) than those supported in previous standards.
The above analysis of past work indicates that it is necessary to develop a method of AZB and PZB detection that conforms well to the separated 1D row and column transform structure and larger TB sizes employed in HEVC.
3 The proposed method
In the first stage, the distribution-based methods presented in [8, 20], which were developed for an 8×8 DCT, are extended to zero-quantized column detection for variable TB sizes ranging from 4×4 to 32×32. After conducting the first-stage detection process, a butterfly-based 1D DCT is applied to each row of the residual block, and an intermediate matrix is obtained, which serves as the input of the second stage of the detection process. The second stage seeks to detect zero-quantized columns in the transform domain and applies 1D DCT only to those columns predicted to contain non-zero coefficients.
3.1 First-stage detection
In block-based video coding, the pixels of an N×N residual block are often statistically modeled by a Laplacian distribution with a zero mean, and as having a separable covariance [20]. Because the pixels within a residual block are often assumed to be identically distributed, the weighted summation in (2) for a given F(u,v) can be approximated as having a Gaussian distribution according to the central limit theorem [21].
For F(u,v), the variance \( \sigma _{F}^{2} \left ({u,v} \right)\) can be statistically calculated as
where \(\sigma _{f}^{2}\) is the variance of the residual signals; [·] u,u and [·] v,v denote the (u,u)th and (v,v)th components of the matrix, respectively; and R is the correlation matrix defined as below [22].
Here, 0<|ρ|<1 is the one-step correlation coefficient of the residual pixels and is typically assumed as an a priori value [8, 20].
For a given F(u,v), different values of ρ yield different \(\sigma _{f}^{2}\). However, some common conclusions can be obtained from (3) and (4). First, the variance of the DC coefficient (i.e., \(\sigma _{F}^{2} \left ({0,0} \right)\)) is greater than the variances of the AC coefficients [8]. Second, for a given column, say the vth column, \(\sigma _{F}^{2} \left ({0,v} \right)\) is always greater than the variances of the other coefficients in the column, i.e.,
Third, for any two columns v 1 and v 2, if v 1<v 2, then
Because the distribution of DCT coefficients in each frequency can be modeled well by a Gaussian distribution with zero mean and variance \(\sigma _{f}^{2}\), the coefficients will, with a very high probability, fall within the range [−β σ F (u,v), β σ F (u,v)] for β≥3.0. In [8, 20], σ f was approximated by
Thus, the distribution of the DCT coefficients can be directly estimated without actually applying the DCT.
Using β σ F (0,v) as a maximum bound for the DCT coefficients in the vth column, we can derive thresholds for detecting zero-quantized columns in the first stage. Accordingly, N−i columns, running from the ith to the (N−1)th column, where i=0,…,N−1, can be predicted to be zero-quantized columns if the following condition is met:
where TH i is the threshold corresponding to the ith column and qStep is the quantization step corresponding to the value of the QP. For an N×N block, we can obtain N thresholds, each of which corresponds to a column. However, the comparison in (8) also imposes some computational burden on the encoder. Therefore, rather than employing all N thresholds, we employ only about N/2 thresholds in our experiments. Table 1 presents examples of the thresholds obtained for a 16×16 block.
3.2 Second-stage detection
As discussed, second-stage detection is conducted using the intermediate coefficient matrix Y obtained in the first stage. Let y v denote the vth column of Y T. Therefore, the vth column of the coefficient matrix F is
The element f 0,v can be calculated as the following column sum:
As an orthogonal transform, Eq. (9) preserves the signal energy or, equivalently, preserves the length of the vector y v [22], i.e.,
From Eqs. (10) and (11), we can derive the following:
Furthermore, we rewrite Eq. (9) as a weighted sum:
where \(k_{u} = \sqrt {{1}/{N}}\) for u=0 and \( k_{u} = \sqrt {{2}/{N}}\) otherwise. An upper bound for the coefficients given in f v , except f 0,v , is
From Eqs. (10), (12), and (14), we obtain the following criterion for determining whether f v is a zero-quantized column:
where
3.3 Summary of the algorithm
We summarize the entire two-stage scheme as follows, where steps 1 and 2 represent the first stage and steps 3 and 4 represent the second stage.
4 Experimental results and discussion
To evaluate the performance of the proposed two-stage algorithm, the algorithm was implemented on the HEVC Test Model (HM) 12.0 reference software [23]. However, we must first provide some commentary regarding the implementation of the algorithm on this platform. First, because only four 1D column transforms are employed for a 4×4 TB and the second-stage detection requires additional computations for S v and f 0,v , the performance gain of the second stage is limited for 4×4 TBs. In practice, the second stage is bypassed under this condition, and experimental results for 4×4 TBs are therefore not presented. Second, we employ a FLAG vector, each element of which indicates whether or not a column in the coefficient matrix is a zero-quantized column. The 1D DCT operation in the HM 12.0 software (i.e., partialButterflyXX()) was slightly modified to skip the 1D DCT of a column if it is labeled by the FLAG vector as a zero-quantized column. Third, HEVC makes use of an integer DCT. The core transform matrices are finite approximations of the original real-valued DCT matrices, and all the matrix elements are scaled by a factor of 2(6+M/2), where M= log2N. After application of the 1D row DCT, elements in Y are right shifted by M−9+B, where B is the bit depth of the video [19]. Therefore, f 0,v and S v are scaled by a factor of 2−(15−B−M/2).
The detailed test conditions employed are as follows:
-
Maximum and minimum coding unit (CU) sizes were respectively 64×64 and 8×8.
-
Maximum and minimum transform unit (TU) sizes were respectively 32×32 and 4×4, and maximum TU depth was 3. That is, for 64×64 and 32×32 luma CUs, the supported TU sizes were 32×32, 16×16, 8×8, and 4×4. For 16×16 and 8×8 CUs, the TU tree structure had its root at the CU level, and the minimum TU size was 4×4.
-
Fast motion estimation was enabled, and the search range was 64 in the horizontal and vertical directions.
-
The QP value was set to 22, 27, 32, or 37.
Three Class B (BasketballDrive, BQTerrace, Parkscene), four Class C (BasketballDrill, BQMall, Keiba, PartyScene), three Class D (BasketballPass, BQSquare, BlowingBubbles), and three Class E (Fourpeople, KristenAndSara, Stockholm) image sequences were tested. For each sequence, 200 frames were coded. Simulations were run on a personal computer with an Intel Core i5-4430 CPU and 8-GB RAM. The operating system was Microsoft Windows 7 64-bit Enterprise edition.
Let Z o denote the total number of 1D row and column DCT operations that were required by the original HM 12.0 encoder and Z s denote the number of 1D DCT operations skipped by the proposed algorithm. For example, if an N×N TB was detected as an AZB in the first stage, then Z s =2N; otherwise, if, for example, n columns were detected as zero-quantized columns in the first and second stages, then Z s =n. The computation reduction was therefore calculated as
The results of the two stages are listed in Table 2, both separately and in total. The parameters in the first stage were β=3.0 and ρ=0.6. Taking QP=32 as an example, for 8×8, 16×16, and 32×32 TBs, the average performance gains of the second stage over the distribution-based method employed in the first stage were about 23.5, 19.8, and 17.1%, respectively, for the four image sequence classes. For detecting zero-quantized columns in the second stage, the condition in (15) requires additional computation. Therefore, the complexity reduction was also evaluated in terms of the reduced time consumption for computing the overall 2D DCT. We recorded the time consumption required for conducting the partial butterfly-based DCT operations and accumulated the totals for both the original HM 12.0 encoder (T ref) and the proposed two-stage method (T test). The time consumption reduction was therefore evaluated as
The average values of time consumption reduction for test image sequences are also listed in Table 2. Again, taking QP=32 as an example, the average time consumption reduction for all image sequences was about 17.0%. In addition, we evaluated the detection ratio η for our method based on the number of detected zero-quantized columns Z detected and the actual number of zero-quantized columns Z actual:
Table 3 listed the η values obtained using the proposed method for the tested image sequences. It can be observed that the detection ratio is high even for 16×16 and 32×32 blocks. And the higher the ratios of correct judgment the higher the detection efficiency.
It is noted that the proposed algorithm may induce false detection because a distribution-based method was employed in the first stage. Larger thresholds in (8) produce more false positive results, while smaller thresholds increase the miss rate. We performed simulations for two different threshold sets. One was calculated by setting β=3.0 and the other by setting β=3.5. The coding performance of our algorithm was compared with that of the original HM 12.0 encoder in terms of the Bjøntegaard delta rate (BD-Rate) [24]. The experimental results are summarized in Tables 4 (β=3.0) and 5 (β=3.5). The results show that when β=3.0, the proposed algorithm resulted in an acceptable BD-Rate loss compared with the original HM encoder. When β=3.5, the BD-Rate loss was negligible. The Δ T values with QP =22 and QP =37 (denoted as Δ T QP=22 and Δ T QP=37, respectively) for each test sequence were also depicted. The results suggest that smaller value thresholds slightly decrease the computational efficiency (about 10%).
Because most existing methods for 8×8 AZB detection in the pixel domain can be extended to AZB detection for variable TB sizes in the first stage in our algorithm, we do not establish a comparison between the results obtained by these algorithms and our algorithm. Instead, we regard the zero-quantized columns detected in the second stage as the performance gain over existing methods [9, 16] and separately list the results of the two stages in Tables 6 and 7. From the results, it can be observed that applying these methods to TBs larger than 8×8 reduces the detection ratio. The results also demonstrate the efficiency of our two-stage strategy. Taking QP=32 for an example, the average performance gains of the second stage over the methods in [9, 16] employed in the first stage was at least 26.1% for all image sequences.
5 Conclusions
A two-stage method for the early detection of zero-quantized DCT coefficients in HEVC was proposed in this paper. In the first stage, a distribution-based method is employed in the pixel domain to detect zero-quantized columns prior to conducting 1D row DCT operations. Using the intermediate matrix resulting from the row transform as the input, the second stage performs zero-quantized column detection in the transform domain only for the non-zero columns determined in the first stage. As an orthogonal transform, the 1D DCT has a tendency to distribute a large fraction of the signal energy into a relatively few coefficients. Therefore, the second stage of our algorithm is more effective than existing methods that conduct PZB detection in the pixel domain, particularly for TBs greater than 8×8.
References
GJ Sullivan, J-R Ohm, W-J Han, T Wiegand, Overview of the High Efficiency Video Coding (HEVC) standard. IEEE Trans. Circ. Syst. Video Technol. 22(12), 1649–1668 (2012).
X Ji, S Kwong, D Zhao, H Wang, C-CJ Kuo, Q Dai, Early determination of zero-quantized 8×8 DCT coefficients. IEEE Trans. Circ. Syst. Video Technol. 19(12), 1755–1765 (2009).
YH Moon, GY Kim, JH Kim, An improved early detection algorithm for all-zero blocks in H.264 video encoding. IEEE Trans. Circ. Syst. Video Technol.15(8), 1053–1057 (2005).
H Wang, S Kwong, C-W Kok, Efficient prediction algorithm of integer DCT coefficients for H.264/AVC optimization. IEEE Trans. Circ. Syst. Video Technol.16(4), 547–552 (2006).
H Wang, S Kwong, Hybrid model to detect zero quantized DCT coefficients in H.264. IEEE Trans. Multimed. 9(4), 728–735 (2007).
M Zhang, T Zhou, W Wang, Adaptive method for early detecting zero quantized DCT coefficients in H.264/AVC video encoding. IEEE Trans. Circ. Syst. Video Technol. 19(1), 103–107 (2009).
X Zhou, Z Yu, S Yu, Method for detecting all-zero DCT coefficients ahead of discrete cosine transformation and quantisation. Electron. Lett. 34(19), 1839–1840 (1998).
I-M Pao, M-T Sun, Modeling DCT coefficients for fast video encoding. IEEE Trans. Circ. Syst. Video Technol. 9(4), 608–616 (1999).
LA Sousa, General method for eliminating redundant computations in video coding. Electron. Lett. 36(4), 306–307 (2000).
S Jun, S Yu, Efficient method for early detection of all-zero DCT coefficients. Electron. Lett. 37(3), 160–161 (2001).
H Wang, H Du, W Lin, S Kwong, OC Au, J Wu, Z Wei, Early detection of all-zero 4×4 blocks in High Efficiency Video Coding. J. Vis. Commun. Image R. 25(7), 1784–1790 (2014).
K Lee, H-J Lee, J Kim, Y Choi, A novel algorithm for zero block detection in High Efficiency Video Coding. IEEE J. Sel. Topics Signal Process. 7(6), 1124–1134 (2013).
J Su, Y Huang, L Sun, S Sakaiday, T Ikenaga, in Asia Pacific Signal and Information Processing Association—Annual Summit and Conference (APSIPA ASC 2011), Online Proceeding. Low complexity quadtree based all zero block detection algorithm for HEVC (APSIPAXi’an, 2011).
H Wang, H Du, J Wu, in Proc. 2014 IEEE Conf. Multimedia and Expo. Predicting zero coefficients for high efficiency video coding (IEEE, Chengdu, 2014).
WH Chen, CH Smith, SC Fralick, A fast computational algorithm for the discrete cosine transform. IEEE Trans. Commun. 25(9), 1004–1009 (1977).
Z Xie, Y Liu, J Liu, T Yang, A general method for detecting all-zero blocks prior to DCT and quantization. IEEE Trans. Circ. Syst. Video Technol. 17(2), 237–241 (2007).
J Li, M Gabbouj, J Takala, Zero-quantized inter DCT coefficient prediction for real-time video coding. IEEE Trans. Circ. Syst. Video Technol. 22(2), 249–259 (2012).
H Wang, S Kwong, C-W Kok, Efficient predictive model of zero quantized DCT coefficients for fast video encoding. Image Vis. Comput. 25(6), 922–933 (2007).
M Budagavi, A Fuldseth, G Bjøntegaard, V Sze, M Sadafale, Core transform design in the High Efficiency Video Coding (HEVC) standard. IEEE J. Sel. Topics Signal Process. 7(6), 1029–1041 (2013).
M-T Sun, I-M Pao, Statistical computation of discrete cosine transform in video encoders. J. Vis. Commun. Image R.9(2), 163–170 (1998).
EY Lam, JW Goodman, A mathematical analysis of the DCT coefficient distributions for images. IEEE Trans. Image Process.9(10), 1661–1666 (2000).
AK Jain, Fundamentals of digital image processing (Prentice Hall, New Jersey, 1988).
HM 12 Software. https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware. Accessed 5 June 2017.
G Bjø ntegaard, in Technical report VCEG-M33, ITU-T SG16/Q6. Calculation of average PSNR differences between RD-Curves, (2001).
Acknowledgements
The authors would like to thank the anonymous reviewers for their helpful comments, and we thank LetPub (www.letpub.com) for the linguistic assistance during the preparation of this manuscript.
Funding
This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61672460, 61602407, and U1609215) and the Natural Science Foundation of Zhejiang Province (Grant No. LY14F020001).
Author information
Authors and Affiliations
Contributions
WC carried out the main part of this manuscript. XW is a supervisor of this research. YT has assisted in the experimental part of the work. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Chen, WG., Wang, X. & Tian, Y. A two-stage algorithm for the early detection of zero-quantized discrete cosine transform coefficients in High Efficiency Video Coding. J Image Video Proc. 2017, 56 (2017). https://doi.org/10.1186/s13640-017-0205-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13640-017-0205-2