Stochastic analysis of frequency-domain adaptive filters

Yang, Feiran

doi:10.1186/s13634-024-01193-5

Research
Open access
Published: 18 December 2024

Stochastic analysis of frequency-domain adaptive filters

Feiran Yang¹

EURASIP Journal on Advances in Signal Processing volume 2024, Article number: 100 (2024) Cite this article

248 Accesses
Metrics details

Abstract

This study investigates the convergence behaviors of a family of frequency-domain adaptive filters (FDAFs) under both exact- and under-modeling situations. The stochastic analysis is conducted by transforming the frequency-domain equations into their time-domain counterparts. We discuss the transient and steady-state convergence behaviors of four FDAF versions, i.e., the constrained FDAFs with and without step-normalization, the unconstrained FDAFs with and without step-normalization, and we also present the upper bounds of step size for mean stability and mean-square stability. Starting from the expression for the steady-state mean weight vector, this study investigates whether the FDAFs can converge to unknown system impulse responses and optimum Wiener solutions. Moreover, we provide the closed-form minimum mean-square error (MMSE) that each FDAF can achieve. The difference between the current work and our previous one is threefold. First, the presented time-domain analysis is much easier to handle and has a more explicit physical meaning than that in the frequency domain. Second, we here consider an arbitrary overlap factor between consecutive blocks, while our previous analysis only focuses on 50% overlap. Third, the presented MMSE expressions and excess mean-square error (EMSE) approximations have not been given before. Simulations reveal high consistency between the experimental and theoretical results.

1 Introduction

Frequency-domain adaptive filter (FDAFs) is originally proposed as a fast but exact realization of the block least mean-square (BLMS) algorithm [1, 2], and thus, they exhibit the same convergence characteristics. Subsequently, two strategies are presented to improve the convergence or gain computational efficiency. The first method is known as the step-normalization or self-orthogonalization procedure, in which the frequency-wise step size related to the corresponding signal power is adopted [3]. The second strategy is to remove the constraint on the weight vector in the time domain, and the corresponding algorithm is called the unconstrained FDAF [4, 5]. Thus, four variants of the FDAF algorithm are obtained, i.e., the constrained FDAFs with and without step-normalization, the unconstrained FDAFs with and without step-normalization, which have found applications in a wide range of areas [6,7,8,9,10,11,12,13]. Recently, deep neural networks have been incorporated into the estimation of the gradient or the key parameters of the FDAFs [14, 15].

Besides the algorithm design, it is desirable to characterize the convergence of the adaptive filtering algorithms [16,17,18,19,20,21,22, 22, 24,25,26,27]. In [18], the second-order statistics for the BLMS are analyzed for Gaussian inputs. In [19], the influence of the windowing function on the FDAF convergence is analyzed. In [20], the mean convergence performance of several FDAF versions is extensively analyzed via inverse transformation of frequency-domain formulas into time-domain counterparts, but the steady-state mean-square error (MSE) expressions are provided without detailed derivations. In [21], the optimum Wiener solution of FDAFs is given in the frequency domain, and it is shown that the unconstrained FDAFs achieve a reduced steady-state MSE compared to constrained algorithms under the under-modeling scenario. In [22], the analysis of FDAFs is conducted in the frequency domain, and the eigenvalue spread of the matrix that controls the mean convergence is well studied. The steady-state performance of constrained FDAFs is analyzed under the under-modeling scenario in [23], but the input is assumed to be Gaussian. In [24], a full second-order statistical framework for unconstrained FDAFs is presented for noncircular Gaussian signals. Recently, we have conducted an extensive statistical analysis of the FDAFs [25,26,27]. We derive a unifying update equation for four versions of FDAF, which enables us to conduct the performance analysis in a unifying framework. In [28] and [29], we provide a detailed performance assessment of the partitioned-block FDAFs in the time domain. However, the analysis results cannot be applied to the FDAFs straightforwardly.

We comprehensively study the convergence behavior of FDAF algorithms in this paper. The frequency-domain formulas are transformed into time-domain ones, which are utilized to deduce the evolution of the weight-error vector. We then analyze the mean convergence and mean-square convergence of the FDAFs in detail. This study goes further than our previous work [25,26,27] in several aspects. First, the presented analysis is conducted completely in the time domain, while our previous work is in the frequency domain. We have found that compared to the frequency-domain analysis, the time-domain one is easier to follow, and the time-domain variables have an explicit physical meaning. Second, our previous work only focuses on 50% overlap, but we deal with an arbitrary overlap here. Third, we provide the analytical expressions for the attainable minimum MSE (MMSE) of four variants of the FDAF. Also, we derive the excess MSE (EMSE) approximations in the time domain, and we point out that the EMSE of the constrained FDAF algorithms with step-normalization given in [20] and [22] is biased even for small step sizes. The presented stochastic model is built in the under-modeling scenario, but the exact modeling scenario can be treated as a special case. Simulations are presented to validate the theoretical results.

The contributions of this paper are as follows:

We describe the transient and steady-state convergence behavior of a family of the FDAFs with an arbitrary overlap in both exact- and under-modeling scenarios.
We present a comprehensive performance comparison of the constrained and unconstrained FDAFs in terms of the convergence rate, the modeling ability, the steady-state MSE, and the attainable minimum MSE.
We derive the approximated expressions of steady-state EMSE of four types of the FDAFs in exact modeling scenario, which are easier to follow than before.

2 FDAF

We begin our treatment by introducing the FDAF algorithms. Consider the linear time-invariant model in the framework of system identification

$$\begin{aligned} d(n) = {{{\textbf {x}}}^T}(n){{{\textbf {w}}}_{\mathrm{{opt}}}} + v(n), \end{aligned}$$

(1)

where d(n) represents the desired response, n is the discrete-time index, T represents a transposition, ${{\textbf {x}}}(n) = {[x(n), \cdots ,x(n - M + 1)]^T}$ denotes the input vector, ${{{\textbf {w}}}_{\mathrm{{opt}}}} = {[{w_0}, \cdots ,{w_{M - 1}}]^T}$ signifies the impulse response of an unknown system with a length of M, and v(n) accounts for the measurement noise signal that is independent with x(n). The input and noise signals are the zero-mean and stationary random processes, respectively, with a variance of $E[{x^2}(n)] = \sigma _x^2$ and $E[{v^2}(n)] = \sigma _v^2$ where $E[ \cdot ]$ is the expectation. The linear model in (1) has been applied to various problems despite its simplicity.

The FDAF algorithms typically rely on a block-processing approach, i.e., the filtering out and the adaptive weights are calculated block by block. The weight vector remains the same in one block in FDAF algorithms, which is different from the sample-by-sample-based least mean-square (LMS) algorithm. In this paper, R and k denote the block length and the block index, respectively. We consider an adaptive transversal filter $\hat{{\textbf {w}}}(k) = {[{\hat{w}_0}(k), \cdots ,{\hat{w}_{L - 1}}(k)]^T}$ that tries to imitate the unknown system response ${{{\textbf {w}}}_{\mathrm{{opt}}}}$, where L is the adaptive filter’s length. The relations $L = M$ and $L<M$ correspond to the exact and under-modeling situations, respectively. In this paper, we consider the under-modeling situation with $Q = M - L$ denoting the length of the under-modeling filter. However, we treat the exact modeling situation as a special case of the under-modeling situation, i.e., $Q = 0$, and hence, the presented analysis can cover both the exact and under-modeling situations.

We pad the estimated weight vector $\hat{{\textbf {W}}}(k)$ with R zeros and obtain its frequency-domain representation

$$\begin{aligned} \hat{{\textbf {W}}}(k) = {{\textbf {F}}}\left[ {\begin{array}{*{20}{c}} {\hat{{\textbf {w}}}(k)}\\ {{{{\textbf {0}}}_{R \times 1}}} \end{array}} \right] = {{\textbf {F}}}\Upsilon _{10}^T\hat{{\textbf {w}}}(k), \end{aligned}$$

(2)

where $N = L + R$ represents the DFT length, ${{\textbf {F}}}$ denotes the $N \times N$ DFT matrix, ${\Upsilon _{10}} = [{{{\textbf {I}}}_{L}}\ {{{\textbf {0}}}_{L \times R}}]$, ${{{\textbf {I}}}_L}$ represents an identity matrix with a size of $L \times L$, ${{{\textbf {0}}}_{L \times R}}$ denotes a zero matrix with a size of $L \times R$, and ${{{\textbf {0}}}_{R \times 1}}$ stands for the all-zero vector of length R. In our previous work [23,24,25], we focused on the special case with 50% overlap, i.e., $L=R$, while we now consider a general case for an arbitrary overlap.

The frequency-domain diagonal matrix ${\varvec{{\mathcal {X}}}}(k)$ can be attained by converting an input signal block into the frequency domain

$$\begin{aligned} {\varvec{{\mathcal {X}}}}(k) = \mathrm{{diag}}\{ {{\textbf {Fx}}}(k)\}, \end{aligned}$$

(3)

where ${{\textbf {x}}}(k) = {[x(kR - L), \cdots ,x(kR + R - 1)]^T}$ is the input vector with a length of N, and $\mathrm{{diag}}\{ \cdot \}$ forms a diagonal matrix from inputs. This input vector contains R elements in the current block and L elements in the previous one.

The time-domain block error signal vector is expressed by [5]

$$\begin{aligned} {{\textbf {e}}}(k) = {{\textbf {d}}}(k) - \hat{{\textbf {y}}}(k) = {{\textbf {d}}}(k) - {\Upsilon _{01}}{{{\textbf {F}}}^{ - 1}}{\varvec{{\mathcal {X}}}}(k)\hat{{\textbf {W}}}(k), \end{aligned}$$

(4)

where ${{\textbf {d}}}(k) = {[d(kR), \cdots ,d(kR + R - 1)]^T}$ is the block desired signal vector with a length of R, the block filtering out vector $\hat{{\textbf {y}}}(k)$ and the block error vector ${{\textbf {e}}}(k)$ are defined similarly, and the $R \times N$ matrix ${\Upsilon _{01}} = [{{{\textbf {0}}}_{R \times L}}\ {{{\textbf {I}}}_{R \times R}}]$ is utilized to retain the last R terms of the inverse DFT of ${\varvec{{\mathcal {X}}}}(k)\hat{{\textbf {W}}}(k)$ that corresponds exactly to the linear convolution. The time-domain error vector ${{\textbf {e}}}(k)$ is initially padded with L zeros and then turned into the frequency-domain expression

$$\begin{aligned} {{\textbf {E}}}(k) = {{\textbf {F}}}\left[ {\begin{array}{*{20}{c}} {{{{\textbf {0}}}_{L \times 1}}}\\ {{{\textbf {e}}}(k)} \end{array}} \right] = {{\textbf {F}}}\Upsilon _{01}^T{{\textbf {e}}}(k). \end{aligned}$$

(5)

The constrained FDAF is characterized by [22]

$$\begin{aligned} \hat{{\textbf {W}}}(k + 1) = {{{\textbf {G}}}_{10}}[\hat{{\textbf {W}}}(k) + \mu {{\varvec{\Lambda }}^{ - 1}}{{\varvec{{\mathcal {X}}}}^H}(k){{\textbf {E}}}(k)], \end{aligned}$$

(6)

where superscript H means a complex-conjugate transpose, the constraining matrix ${{{\textbf {G}}}_{10}} = {{\textbf {F}}}\Upsilon _{10}^T{\Upsilon _{10}}{{{\textbf {F}}}^{ - 1}}$ forces the last R components of the time-domain gradient ${{{\textbf {F}}}^{ - 1}}{{\varvec{\Lambda }}^{ - 1}}{{\varvec{{\mathcal {X}}}^H}}(k){{\textbf {E}}}(k)$ to zero, and the $N \times N$ diagonal matrix ${\varvec{\Lambda }}$ determines how the step size is chosen. If we use ${\varvec{\Lambda }} = {{{\textbf {I}}}_N}$, a common step size parameter $\mu$ is then employed for all the frequencies, namely, the FDAF algorithm without step-normalization. This algorithm is nothing but a faithful frequency-domain representation of the BLMS algorithm. If we choose the diagonal matrix ${\varvec{\Lambda }} = E[{\varvec{\mathcal X}^H}(k){\varvec{{\mathcal {X}}}}(k)]$, the step size in each frequency bin is scaled by the power spectral density (PSD). The algorithm is referred to as the FDAF with step-normalization, which can greatly speed up the convergence [3,4,5].

The update equation of the unconstrained FDAF is [3]

$$\begin{aligned} \hat{{\textbf {W}}}(k + 1) = \hat{{\textbf {W}}}(k) + \mu {{\varvec{\Lambda }}^{ - 1}}{{\varvec{{\mathcal {X}}}^H}}(k){{\textbf {E}}}(k). \end{aligned}$$

(7)

Similarly, we obtain the unconstrained FDAF algorithm with or without step-normalization by using ${\varvec{\Lambda }} = E[{\varvec{{\mathcal {X}}}^H}(k){\varvec{{\mathcal {X}}}}(k)]$ or ${\varvec{\Lambda }} = {{{\textbf {I}}}_N}$. Unlike constrained FDAF algorithms, the gradient constraint matrix ${{{\textbf {G}}}_{10}}$ is dropped out from (7), and hence (2) does not hold, which makes (4) implement a circular convolution for the unconstrained FDAF [30]. However, dropping the gradient constraint may have advantages for the under-modeling case, as the discussion will reveal.

3 Analysis of unconstrained FDAFs

This section analyzes the time-domain convergence behavior of unconstrained FDAF algorithms. The frequency-domain Eqs. (4) and (7) are changed to the time-domain representations, which are in turn used to evaluate the mean and mean-square performance and establish the stability bound.

3.1 Signal model

To pave the way for the performance analysis, we present an important result of the circulant matrix [4]

$$\begin{aligned} {{\varvec{{\mathcal {X}}}}^H}(k) = {{\textbf {FX}}}_{\mathrm{{c}}}^T(k){{{\textbf {F}}}^{ - 1}},{\varvec{{\mathcal {X}}}}(k) = {{\textbf {F}}}{{{\textbf {X}}}_{\mathrm{{c}}}}(k){{{\textbf {F}}}^{ - 1}}, \end{aligned}$$

(8)

where ${{{\textbf {X}}}_{\mathrm{{c}}}}(k)$ represents a circulant matrix with a size of $N \times N$ and the first column of ${{\textbf {x}}}(k)$. We can decompose ${{{\textbf {X}}}_{\mathrm{{c}}}}(k)$ as

$$\begin{aligned} {{{\textbf {X}}}_{\mathrm{{c}}}}(k) = \left[ {\begin{array}{*{20}{c}} {{{\hat{{\textbf {X}}}}_1}(k)}& {{{\hat{{\textbf {X}}}}_2}(k)}\\ {{{{\textbf {X}}}_1}(k)}& {{{{\textbf {X}}}_2}(k)} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {\hat{{\textbf {X}}}(k)}\\ {\bar{{\textbf {X}}}(k)} \end{array}} \right] , \end{aligned}$$

(9)

where the matrices ${{{\textbf {X}}}_1}(k)$, ${{{\textbf {X}}}_2}(k)$, ${\hat{{\textbf {X}}}_1}(k)$ , and ${\hat{{\textbf {X}}}_2}(k)$ have a size of $R \times L$, $R \times R$, $L \times L$ , and $R \times R$, respectively, and the matrices $\bar{{\textbf {X}}}(k) = [{{{\textbf {X}}}_1}(k)\ {{{\textbf {X}}}_2}(k)]$ and ${\hat{\textbf{X}}}(k) = [{\hat{\textbf{X}}}_{1}(k){\hat{\textbf{X}}}_{2}(k)]$ have a dimension of $R \times N$ and $L \times N$. For convenience, we define the matrix

$$\begin{aligned} {{\textbf {P}}} = {({{{\textbf {F}}}^{ - 1}}{\varvec{\Lambda F}})^{ - 1}} = {{\textbf {R}}}_{\mathrm{{c}}}^{ - 1}, \end{aligned}$$

(10)

where ${{{\textbf {R}}}_{\mathrm{{c}}}} = E[{{\textbf {X}}}_{\mathrm{{c}}}^T(k){{{\textbf {X}}}_{\mathrm{{c}}}}(k)]$. Because ${{{\textbf {X}}}_{\mathrm{{c}}}}(k)$ is circulant, the matrices ${{{\textbf {R}}}_{\mathrm{{c}}}}$ and ${{\textbf {P}}}$ are both circulant and symmetric. We then represent ${{\textbf {P}}}$ in the block matrix form

$$\begin{aligned} {{\textbf {P}}} = \left[ {\begin{array}{*{20}{c}} {{{{\textbf {P}}}_1}}& {{{{\textbf {P}}}_2}}\\ {{{\textbf {P}}}_2^T}& {{{{\textbf {P}}}_3}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {\bar{{\textbf {P}}}}\\ {\hat{{\textbf {P}}}} \end{array}} \right] , \end{aligned}$$

(11)

where ${{{\textbf {P}}}_1}$, ${{{\textbf {P}}}_2}$ and ${{{\textbf {P}}}_3}$ are $L \times L$, $L \times R$ , and $R \times R$ submatrices, respectively, and $\bar{{\textbf {P}}} = [{{{\textbf {P}}}_1}\ {{{\textbf {P}}}_2}]$ and ${\hat{\textbf{P}}} = [\begin{array}{*{20}{c}}{{\textbf{P}}}_{2}^{T} &{{{\textbf {P}}}_3}\end{array}]$ have a dimension of $L \times N$ and $R \times N$.

For the unconstrained algorithm, we define the time-domain weight vector ${\hat{{\textbf {w}}}_{\mathrm{{un}}}}(k) = {{{\textbf {F}}}^{ - 1}}\hat{{\textbf {W}}}(k) \buildrel \Delta \over = {[{\hat{{\textbf {w}}}}_{\mathrm{{un,}}L}^T(k),\hat{{\textbf {w}}}_{\mathrm{{un,}}R}^T(k)]^T}$, where the subvectors ${\hat{{\textbf {w}}}_{\mathrm{{un}},L}}(k)$ and ${\hat{{\textbf {w}}}_{\mathrm{{un}},R}}(k)$ are of length L and R, respectively. Note that ${\hat{{\textbf {w}}}_{\mathrm{{un}}}}(k)$ has a length of N and the last R elements of ${\hat{{\textbf {w}}}_{\mathrm{{un}}}}(k)$ may be nonzero since the gradient constraint is removed. Using (8), we represent the block error vector as

$$\begin{aligned} {{\textbf {e}}}(k) = {{\textbf {d}}}(k) - \bar{{\textbf {X}}}(k){\hat{{\textbf {w}}}_{\mathrm{{un}}}}(k) \end{aligned}$$

(12)

Calculating the filtering out signal may require future inputs due to the existence of ${\hat{{\textbf {w}}}_{\mathrm{{un}},R}}(k)$ [20], and we hence call ${\hat{{\textbf {w}}}_{\mathrm{{un}},L}}(k)$ and ${{\hat{{\textbf {w}}}}_{\mathrm{{un}},R}}(k)$ the non-causal and causal parts, respectively.

Pre-multiplying both sides of (7) by ${{{\textbf {F}}}^{ - 1}}$ and considering (8) and (10), we obtain the recursion of ${{\hat{{\textbf {w}}}}_{\mathrm{{un}}}}(k)$

$$\begin{aligned} {\hat{{\textbf {w}}}_{\mathrm{{un}}}}(k + 1) = {\hat{{\textbf {w}}}_{\mathrm{{un}}}}(k) + \mu {{\textbf {P}}}{\bar{{\textbf {X}}}^T}(k){{\textbf {e}}}(k). \end{aligned}$$

(13)

The true weight vector ${{{\textbf {w}}}_{\mathrm{{opt}}}}$ that we wish to predict can be split into two parts

$$\begin{aligned} {{{\textbf {w}}}_{\mathrm{{opt}}}} = \left[ {\begin{array}{*{20}{c}} {{{{\textbf {w}}}_\dag }}\\ {{{{\textbf {w}}}_ * }} \end{array}} \right] , \end{aligned}$$

(14)

where the column vectors ${{{\textbf {w}}}_\dag }$ and ${{{\textbf {w}}}_ * }$ have a length of L and Q. The desired response vector ${{\textbf {d}}}(k)$ can be expressed by

$$\begin{aligned} {{\textbf {d}}}(k) = \bar{{\textbf {X}}}(k)\bar{{\textbf {w}}} + {{{\textbf {X}}}_3}(k){{{\textbf {w}}}_ * } + {{\textbf {v}}}(k), \end{aligned}$$

(15)

where ${{\textbf {v}}}(k) = {[v(kR), \cdots ,v(kR + R - 1)]^T}$ denotes the noise vector of length R, ${{{\textbf {X}}}_3}(k) = {[{{{\textbf {x}}}_3}(kR - L) \cdots {{{\textbf {x}}}_3}(kR - L + R - 1)]^T}$ represents an $R \times Q$ input matrix with ${{{\textbf {x}}}_3}(n) = {[x(n), \cdots ,x(n - Q + 1)]^T}$, and $\bar{{\textbf {w}}} = {[{{\textbf {w}}}_\dag ^T,{{{\textbf {0}}}_{1 \times R}}]^T}$ has a length of N.

3.2 Mean convergence

This section investigates initially the mean behavior of the unconstrained FDAF algorithms. For that, we introduce the time-domain weight-error vector ${{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(k) = \bar{{\textbf {w}}} - {\hat{{\textbf {w}}}_{\mathrm{{un}}}}(k) \buildrel \Delta \over = {[{\tilde{{\textbf {w}}}}_{\mathrm{{un,}}L}^T(k),{\tilde{{\textbf {w}}}}_{\mathrm{{un,}}R}^T(k)]^T}$, where ${{\tilde{{\textbf {w}}}}_{\mathrm{{un,}}L}}(k) = {{{\textbf {w}}}_\dag } - {{\hat{{\textbf {w}}}}_{\mathrm{{un}},L}}(k)$ and ${{\tilde{{\textbf {w}}}}_{\mathrm{{un,}}R}}(k) = - {\hat{{\textbf {w}}}_{\mathrm{{un}},R}}(k)$. We subtract $\bar{{\textbf {w}}}$ from both sides of (13) and derive the update equation of ${{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(k)$

$$\begin{aligned} {{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(k + 1) = {{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(k) - \mu {{\textbf {P}}}{\bar{{\textbf {X}}}^T}(k){{\textbf {e}}}(k). \end{aligned}$$

(16)

We incorporate (15) with (12) and have

$$\begin{aligned} {{\textbf {e}}}(k) = \bar{{\textbf {X}}}(k){{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(k) + {{{\textbf {X}}}_3}(k){{{\textbf {w}}}_ * } + {{\textbf {v}}}(k). \end{aligned}$$

(17)

We then substitute (17) into (16) and obtain

$$\begin{aligned} {{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(k + 1) = \big ( {{{{\textbf {I}}}_N} - \mu {{{\textbf {A}}}_{\mathrm{{un}}}}(k)} \big ){{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(k) - \mu {{{\textbf {B}}}_{\mathrm{{un}}}}(k){{{\textbf {w}}}_ * } - \mu {{\textbf {P}}}{{\bar{{\textbf {X}}}}^T}(k){{\textbf {v}}}(k). \end{aligned}$$

(18)

where ${{{\textbf {A}}}_{\mathrm{{un}}}}(k) = {{\textbf {P}}}{\bar{{\textbf {X}}}^T}(k)\bar{{\textbf {X}}}(k)$ and ${{{\textbf {B}}}_{\mathrm{{un}}}}(k) = {{\textbf {P}}}{\bar{{\textbf {X}}}^T}(k){{{\textbf {X}}}_3}(k)$.

Taking the expectation of each side of (18) and invoking the independence assumption [4] yield

$$\begin{aligned} E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(k + 1)] = \big ( {{{{\textbf {I}}}_N} - \mu E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)]} \big )E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(k)] - \mu E[{{{\textbf {B}}}_{\mathrm{{un}}}}(k)]{{{\textbf {w}}}_ * }, \end{aligned}$$

(19)

where $E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)] = {{\textbf {P}}\bar{{\textbf {R}}}}$ and $\bar{{\textbf {R}}} = E[{\bar{{\textbf {X}}}^T}(k)\bar{{\textbf {X}}}(k)]$. The condition number of $E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)]$ has a significant influence on the convergence speed of the unconstrained FDAFs. It is shown in [20] that the approximation ${{{\textbf {R}}}_{\mathrm{{c}}}} \approx \frac{N}{R}\bar{{\textbf {R}}}$ holds for a large N. For unconstrained FDAFs with step-normalization, the matrix controlling the mean convergence is thus proportional to the identity matrix, $E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)] \approx \frac{R}{N}{{{\textbf {I}}}_N}$. It turns out that the unconstrained FDAF with step-normalization only has one mode of convergence thanks to the self-orthogonalizing method, while that without step-normalization may have a rather slow convergence speed, particularly for a highly correlated input.

At steady state, (19) leads to

$$\begin{aligned} E[{{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(\infty )] = - {\bar{{\textbf {R}}}^{ - 1}}{\bar{{\textbf {R}}}_3}{{{\textbf {w}}}_ * } \end{aligned}$$

(20)

where ${\bar{{\textbf {R}}}_3} = E[{\bar{{\textbf {X}}}^T}(k){{{\textbf {X}}}_3}(k)]$. We now investigate whether the unconstrained FDAFs converge to the Wiener solution using (20). Considering (17) and replacing ${{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(k)$ with a time-invariant vector, the Wiener solution is given by minimizing the MSE $E[{\left\| {{{\textbf {e}}}(k)} \right\| ^2}]$

$$\begin{aligned} {{\tilde{{\textbf {w}}}}_{\mathrm{{un,opt}}}} = - {\bar{{\textbf {R}}}^{ - 1}}{\bar{{\textbf {R}}}_3}{{{\textbf {w}}}_ * }. \end{aligned}$$

(21)

In the exact modeling situation, it has $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(\infty )] = {{\tilde{{\textbf {w}}}}_{\mathrm{{un,opt}}}} = {{{\textbf {0}}}_{N \times 1}}$, and hence, the unconstrained FDAF algorithm converges to the optimum Wiener solution that is also the true system impulse response. The non-causal part can converge to zero in mean, i.e., $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{un,}}R}}(\infty )] = {{{\textbf {0}}}_{R \times 1}}$, and hence, (4) implements an (approximate) linear convolution for sufficiently small step sizes.

In the under-modeling situation, it is observed from (20) and (21) that the unconstrained FDAF converges to the optimum Wiener solution in the mean sense regardless of input signal characteristics. In this case, it has ${\bar{{\textbf {R}}}_3} \ne {{{\textbf {0}}}_{N \times Q}}$ for any inputs, and hence $E[{\hat{{\textbf {w}}}_{\mathrm{{un}}}}(k)] = \bar{{\textbf {w}}} - E[{{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(\infty )] = \bar{{\textbf {w}}} + {\bar{{\textbf {R}}}^{ - 1}}{\bar{{\textbf {R}}}_3}{{{\textbf {w}}}_ * } \ne \bar{{\textbf {w}}}$, which indicates that the unconstrained FDAF algorithm approaches a biased estimate of the first L elements of the unknown impulse response. However, we will immediately show that we can indeed obtain more information about the coefficients of the unknown system from unconstrained FDAFs, which is not revealed in previous publications.

To proceed, we define ${\bar{{\textbf {R}}}^{ - 1}}{\bar{{\textbf {R}}}_3} = {[ {{\varvec{\beta }}_1^T}\ {{\varvec{\beta }}_2^T} ]^T}$, where the matrices ${{\varvec{\beta }}_1}$ and ${{\varvec{\beta }}_2}$ are of size $L \times Q$ and $R \times Q$, respectively. We then rewrite (20) as

$$\begin{aligned} E[{{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(\infty )] = - \left[ {\begin{array}{*{20}{c}} {{{\varvec{\beta }}_1}{{{\textbf {w}}}_ * }}\\ {{{\varvec{\beta }}_2}{{{\textbf {w}}}_ * }} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {{{{\textbf {w}}}_\dag } - E[{{\hat{{\textbf {w}}}}_{\mathrm{{un}},L}}(\infty )]}\\ { - E[{{\hat{{\textbf {w}}}}_{\mathrm{{un}},R}}(\infty )]} \end{array}} \right] . \end{aligned}$$

(22)

We define ${{\varvec{\beta }}_2} = [{{\varvec{\beta }}_3} \ {{\varvec{\beta }}_4}]$, where the matrices ${{\varvec{\beta }}_3}$ and ${{\varvec{\beta }}_4}$ are of size $R \times R$ and $R \times (Q - R)$. We then split ${{{\textbf {w}}}_ * }$ into two parts ${{{\textbf {w}}}_ * } = {[{{{\textbf {w}}}_{ * 1}^T}\ {{{\textbf {w}}}_{ * 2}^T}]^T}$, where lengths of the column vectors ${{{\textbf {w}}}_{ * 1}}$ and ${{{\textbf {w}}}_{ * 2}}$ are R and $Q-R$, respectively.

For white noise as input, we have ${{\varvec{\beta }}_1} = {{{\textbf {0}}}_{L \times Q}}$, ${{\varvec{\beta }}_4} = {{{\textbf {0}}}_{R \times (Q - R)}}$ , and ${{\varvec{\beta }}_3} = \frac{1}{R}\mathrm{{diag}}\{ {[R,R - 1, \cdots ,1]^T}\}$. Using (22), it has

$$\begin{aligned} & {{{\textbf {w}}}_\dag } = E[{\hat{{\textbf {w}}}_{\mathrm{{un}},L}}(\infty )], \end{aligned}$$

(23)

$$\begin{aligned} & {{{\textbf {w}}}_{ * 1}} = {\varvec{\beta }}_3^{ - 1}E[{\hat{{\textbf {w}}}_{\mathrm{{un}},R}}(\infty )]. \end{aligned}$$

(24)

From (23) and (24), we observe that ${{\hat{{\textbf {w}}}}_{\mathrm{{un}},L}}(k)$ can converge to the first L elements of the unknown plant in mean, and the first element of ${{\hat{{\textbf {w}}}}_{\mathrm{{un}},R}}(k)$ is equal to ${w_L}$. The other $R-1$ components cannot approach to any part of the unknown plant. That is, the first $L+1$ coefficients of $E[{{\hat{{\textbf {w}}}}_{\mathrm{{un}}}}(k)]$ converge to the first $L+1$ coefficients of the unknown plant. As a consequence, the unconstrained FDAF algorithm cannot directly model an unknown system having more than $L+1$ coefficients.^{Footnote 1} However, we can calculate ${{{\textbf {w}}}_{ * 1}}$ using (24), and hence, a total of $L+R$ coefficients of the unknown system can be restored from the steady-state vector $E[{\hat{{\textbf {w}}}_{\mathrm{{un}}}}(\infty )]$ of the unconstrained FDAF. This result holds without any restriction on the length of the system impulse response.

For correlated inputs, we have ${{\varvec{\beta }}_1} \ne {{{\textbf {0}}}_{L \times Q}}$, and hence $E[{\hat{{\textbf {w}}}_{\mathrm{{un}},L}}(\infty )] \ne {{{\textbf {w}}}_\dag }$, which indicates that the causal part of ${\hat{{\textbf {w}}}_{\mathrm{{un}}}}(k)$ cannot converge to the first L coefficients of the unknown system in mean. When $Q>R$, we should solve ${{\varvec{\beta }}_2}{{{\textbf {w}}}_ * } = E[{{\hat{{\textbf {w}}}}_{\mathrm{{un}},R}}(\infty )]$ to obtain ${{{\textbf {w}}}_ * }$, which is however an underdetermined linear system of equations and a precise solution is not available. For $Q \le R$, we can obtain ${{{\textbf {w}}}_ * }$ by solving the (over)-determined equation ${{\varvec{\beta }}_2}{{{\textbf {w}}}_ * } = E[{\hat{{\textbf {w}}}_{\mathrm{{un}},R}}(\infty )]$ and then, we estimate the modeling part ${{{\textbf {w}}}_\dag }$ as

$$\begin{aligned} {{{\textbf {w}}}_\dag } = E[{\hat{{\textbf {w}}}_{\mathrm{{un}},L}}(\infty )] - {{\varvec{\beta }}_1}{{{\textbf {w}}}_ * }. \end{aligned}$$

(25)

Hence, all the coefficients of unknown plant can be recovered in this case.

Also, notice that the Wiener solution in the under-modeling case is not the first L coefficients of the system impulse response. In echo cancelation, we aim at canceling the echo signal and hence, we expect that the filter converges to the Wiener solution and an accurate estimation of the echo path is not the main concern. In applications of room impulse response identification, on the contrary, a precise estimation of the room impulse response is the objective. We can thus choose the proper variant of FDAFs according to the task at hand.

3.3 Mean-square convergence

This section studies the second-order convergence of the unconstrained FDAF. Using (17), we formulate the instantaneous MSE as

$$\begin{aligned} \begin{aligned} {J_{\mathrm{{un}}}}(k)&= \frac{1}{R}E[{{{\textbf {e}}}^T}(k){{\textbf {e}}}(k)]\\&= \frac{1}{R}\big \{ \mathrm{{tr}}(E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(k){\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(k)]\bar{{\textbf {R}}}) + 2E[{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(k)]{{\bar{{\textbf {R}}}}_3}{{{\textbf {w}}}_ * } + {{\textbf {w}}}_ * ^T{{{\textbf {R}}}_3}{{{\textbf {w}}}_ * }\big \} + \sigma _v^2, \end{aligned} \end{aligned}$$

(26)

where ${{{\textbf {R}}}_3} = E[{{\textbf {X}}}_3^T(k){{{\textbf {X}}}_3}(k)]$, ${J_{\mathrm{{un, min}}}} = \sigma _v^2$ is the MMSE which can be achieved only in the exact modeling case, i.e., $M=L$, and ${J_{\mathrm{{un, ex}}}}(k) = \frac{1}{R}\left\{ {\mathrm{{tr}}(E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(k){\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(k)]\bar{{\textbf {R}}}) + 2E[{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(k)]{{\bar{{\textbf {R}}}}_3}{{{\textbf {w}}}_ * } + {{\textbf {w}}}_ * ^T{{{\textbf {R}}}_3}{{{\textbf {w}}}_ * }} \right\}$ is the EMSE. Note that in the under-modeling case, the MMSE is not ${J_{\mathrm{{un, min}}}} = \sigma _v^2$ given a fixed L as will be shown later. However, we still use the provided EMSE expression for convenience, which is reasonable if we treat the adaptive filter length as an independent variable.

Note that the calculation of the MSE ${J_{\mathrm{{un}}}}(k)$ requires evaluating $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(k)]$ and $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(k){\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(k)]$. Since the former has been presented in (19), we now examine the evolution of $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(k){\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(k)]$. By post-multiplying (18) by its transpose and calculating its expectation, it holds that

$$\begin{aligned} &E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(k + 1){\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(k + 1)]\\ & \quad = E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(k){\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(k)]\\ & \quad\quad - \mu E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(k){\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(k){{\textbf {A}}}_{\mathrm{{un}}}^T(k)]\\ & \quad\quad - \mu E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k){{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(k){\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(k)]\\ & \quad\quad + {\mu ^2}E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k){{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(k){\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T{{\textbf {A}}}_{\mathrm{{un}}}^T(k)]\\ & \quad\quad - \mu E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(k){{\textbf {w}}}_ * ^T{{\textbf {B}}}_{\mathrm{{un}}}^T(k)]\\ & \quad\quad - \mu E[{{{\textbf {B}}}_{\mathrm{{un}}}}(k){{{\textbf {w}}}_ * }{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(k)]\\ & \quad\quad + {\mu ^2}E[{{{\textbf {B}}}_{\mathrm{{un}}}}(k){{{\textbf {w}}}_ * }{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(k){{\textbf {A}}}_{\mathrm{{un}}}^T(k)]\\ & \quad\quad + {\mu ^2}E[{{{\textbf {B}}}_{\mathrm{{un}}}}(k){{{\textbf {w}}}_ * }{{\textbf {w}}}_ * ^T{{\textbf {B}}}_{\mathrm{{un}}}^T(k)]\\ & \quad\quad + {\mu ^2}E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k){{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(k){{\textbf {w}}}_ * ^T{{\textbf {B}}}_{\mathrm{{un}}}^T(k)]\\ & \quad\quad + {\mu ^2}\delta _v^2{{\textbf {P}}\bar{{\textbf {RP}}}}. \end{aligned}$$

(27)

We introduce the vector ${{{\textbf {z}}}_{\mathrm{{un}}}}(k) = \mathrm{{vec}}\left\{ {E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(k){\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(k)]} \right\}$ of length ${N^2}$, where $\mathrm{{vec}}\left\{ \cdot \right\}$ obtains column vectors by stacking all columns of the matrix argument [31]. Applying vectorization to (27), we set up a difference equation for ${{{\textbf {z}}}_{\mathrm{{un}}}}(k)$

$$\begin{aligned} {{{\textbf {z}}}_{\mathrm{{un}}}}(k + 1) = {{{\textbf {H}}}_{\mathrm{{un}}}}{{{\textbf {z}}}_{\mathrm{{un}}}}(k) + {{\varvec{\Theta }}_{\mathrm{{un}}}}(k), \end{aligned}$$

(28)

where

$$\begin{aligned} & \begin{array}{l} {{{\textbf {H}}}_{\mathrm{{un}}}} = {{{\textbf {I}}}_{{N^2}}} - \mu {{{\textbf {C}}}_{\mathrm{{un}}}} + {\mu ^2}{{{\textbf {J}}}_{\mathrm{{un}}}},\\ {{{\textbf {C}}}_{\mathrm{{un}}}} = E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)] \otimes {{{\textbf {I}}}_N} + {{{\textbf {I}}}_N} \otimes E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)],\\ {{{\textbf {J}}}_{\mathrm{{un}}}} = E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k) \otimes {{{\textbf {A}}}_{\mathrm{{un}}}}(k)], \end{array} \end{aligned}$$

(29)

$$\begin{aligned} & \begin{aligned} {{\varvec{\Theta }}_{\mathrm{{un}}}}(k)&= {\mu ^2}\delta _v^2\mathrm{{vec}}\left( {{{\textbf {P}}\bar{{\textbf {RP}}}}} \right) - \mu \left( {E[{{{\textbf {B}}}_{\mathrm{{un}}}}(k)] \otimes {{{\textbf {I}}}_N}} \right) \mathrm{{vec}}(E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(k)]{{\textbf {w}}}_ * ^T)\\&\ \ \ + {\mu ^2}\left( {E[{{{\textbf {B}}}_{\mathrm{{un}}}}(k)] \otimes {{{\textbf {A}}}_{\mathrm{{un}}}}(k)} \right) \mathrm{{vec}}(E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(k)]{{\textbf {w}}}_ * ^T)\\&\ \ \ - \mu \left( {{{{\textbf {I}}}_N} \otimes E[{{{\textbf {B}}}_{\mathrm{{un}}}}(k)]} \right) \mathrm{{vec}}({{{\textbf {w}}}_ * }E[{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(k)])\\&\ \ \ + {\mu ^2}\left( {E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)] \otimes {{{\textbf {B}}}_{\mathrm{{un}}}}(k)} \right) \mathrm{{vec}}({{{\textbf {w}}}_ * }E[{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(k)])\\&\ \ \ + {\mu ^2}\left( {E[{{{\textbf {B}}}_{\mathrm{{un}}}}(k)] \otimes {{{\textbf {B}}}_{\mathrm{{un}}}}(k)} \right) \mathrm{{vec}}({{{\textbf {w}}}_ * }{{\textbf {w}}}_ * ^T). \end{aligned} \end{aligned}$$

(30)

The mean-square convergence behavior of the unconstrained FDAF is governed by the eigenvalues of ${{{\textbf {H}}}_{\mathrm{{un}}}}$. It should be mentioned that (28) does not resort to the Gaussian input assumption, and hence, the analysis in this paper is valid for arbitrary data distributions. We will use the same procedure to analyze the constrained FDAFs in the next section. In [32], the energy-conservation approach is used to derive the time evolution of the MSD and EMSE. Indeed, we can use the energy-conservation approach for analysis of the FDAFs and can obtain the same results. However, the method used here is easier to follow and understand than the energy-conservation approach.

The state-space model (30) depicts the mean-square behavior of the unconstrained FDAF. The learning curve can be evaluated by iterating recursion (30). The mean-square deviation (MSD), i.e., the system distance, is commonly used to measure the Euclidean distance between the estimated and true values. The instantaneous MSD of the unconstrained FDAFs is calculated by ${\delta _{\mathrm{{un}}}}(k) = E({\left\| {{{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(k)} \right\| ^2}) = \mathrm{{tr}}(\mathrm{{ve}}{\mathrm{{c}}^{ - 1}}({{{\textbf {z}}}_{\mathrm{{un}}}}(k))).$ Notice that only the first L weights are involved in the MSD evaluation of the unknown system. At steady state, (28) leads to ${{{\textbf {z}}}_{\mathrm{{un}}}}(\infty ) = {\left( {{{{\textbf {I}}}_{{N^2}}} - {{{\textbf {H}}}_{\mathrm{{un}}}}} \right) ^{ - 1}}{{\varvec{\Theta }}_{\mathrm{{un}}}}(\infty )$, and hence, the steady-state MSD can be expressed by ${\delta _{\mathrm{{un}}}}(k) = E({\left\| {{{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(\infty )} \right\| ^2}) = \mathrm{{tr}}(\mathrm{{ve}}{\mathrm{{c}}^{ - 1}}({{{\textbf {z}}}_{\mathrm{{un}}}}(\infty )))$. The steady-state EMSE then follows

$$\begin{aligned} {J_{\mathrm{{un, ex}}}}(\infty ) = \frac{1}{R}\big \{ \mathrm{{tr}}(E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(\infty ){\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(\infty )]\bar{{\textbf {R}}}) + 2E[{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(\infty )]{{\bar{{\textbf {R}}}}_3}{{{\textbf {w}}}_ * } + {{\textbf {w}}}_ * ^T{{{\textbf {R}}}_3}{{{\textbf {w}}}_ * }\big \}. \end{aligned}$$

(31)

At this point, we would like to identify the factors that influence the steady-state MSE of the deficient-length unconstrained FDAF, where we have assumed that M and L are fixed. For that, we define the difference between $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(\infty )]$ and the optimum Wiener solution as ${\varvec{\Delta }} = {{\tilde{{\textbf {w}}}}_{\mathrm{{un,opt}}}} - E[{{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(\infty )]$. We then write the difference between ${\hat{{\textbf {w}}}_{\mathrm{{un}}}}(k)$ and its mean steady-state value $E[{\hat{{\textbf {w}}}_{\mathrm{{un}}}}(\infty )]$ as

$$\begin{aligned} {{{\textbf {q}}}_{\mathrm{{un}}}}(k) = {\hat{{\textbf {w}}}_{\mathrm{{un}}}}(k) - E[{\hat{{\textbf {w}}}_{\mathrm{{un}}}}(\infty )] = E[{{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(\infty )] - {{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(k) = {{\tilde{{\textbf {w}}}}_{\mathrm{{un,opt}}}} - {\varvec{\Delta }} - {{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(k). \end{aligned}$$

(32)

Substituting (32) into (26), we may represent the steady-state MSE in terms of ${{{\textbf {q}}}_{\mathrm{{un}}}}(k)$

$$\begin{aligned} \begin{aligned} {J_{\mathrm{{un}}}}(\infty )&= \frac{1}{R}\mathrm{{tr}}(E[{{{\textbf {q}}}_{\mathrm{{un}}}}(\infty ){{\textbf {q}}}_{\mathrm{{un}}}^T(\infty )]\bar{{\textbf {R}}})\\&\ \ \ + \frac{1}{R}\big \{ {\mathrm{{tr}}(E{\varvec{\Delta }}{{\varvec{\Delta }}^T}\bar{{\textbf {R}}}) - 2\mathrm{{tr}}({\varvec{\Delta \tilde{w}}}_{\mathrm{{un,opt}}}^T\bar{{\textbf {R}}}) + 2{{\varvec{\Delta }}^T}{{\bar{{\textbf {R}}}}_3}{{{\textbf {w}}}_ * }} \big \}\\&\ \ \ + \frac{1}{R}\big \{ {\mathrm{{tr}}({{{\tilde{{\textbf {w}}}}}_{\mathrm{{un,opt}}}}{\tilde{{\textbf {w}}}}_{\mathrm{{un,opt}}}^T\bar{{\textbf {R}}}) + 2{\tilde{{\textbf {w}}}}_{\mathrm{{un,opt}}}^T{{\bar{{\textbf {R}}}}_3}{{{\textbf {w}}}_ * } + {{\textbf {w}}}_ * ^T{{{\textbf {R}}}_3}{{{\textbf {w}}}_ * }} \big \}+ \sigma _v^2. \end{aligned} \end{aligned}$$

(33)

The first term on the right-hand side of (33) is directly related to ${{{\textbf {q}}}_{\mathrm{{un}}}}(\infty )$, i.e., the fluctuations of ${{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(\infty )$ around its mean $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(\infty )]$, and the second term is introduced by the bias ${\varvec{\Delta }}$ between $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(\infty )]$ and its optimal solution ${{\tilde{{\textbf {w}}}}_{\mathrm{{un,opt}}}}$. Because the two kinds of unconstrained FDAFs can converge to the optimal Wiener solution, i.e., ${\varvec{\Delta }} = {{{\textbf {0}}}_{N \times 1}}$, the MMSE of the unconstrained FDAFs with and without step-normalization can be expressed by

$$\begin{aligned} {J_{\min ,1}} = {J_{\min ,2}} = \frac{1}{R}\big \{ \mathrm{{tr}}({{{\tilde{{\textbf {w}}}}}_{\mathrm{{un,opt}}}}{\tilde{{\textbf {w}}}}_{\mathrm{{un,opt}}}^T\bar{{\textbf {R}}}) + 2{\tilde{{\textbf {w}}}}_{\mathrm{{un,opt}}}^T{{\bar{{\textbf {R}}}}_3}{{{\textbf {w}}}_ * } + {{\textbf {w}}}_ * ^T{{{\textbf {R}}}_3}{{{\textbf {w}}}_ * }\big \} + \delta _v^2. \end{aligned}$$

(34)

3.4 Stability bound

We investigate the stability condition for the unconstrained FDAFs. Note that (19) is convergent if all eigenvalues of ${{{\textbf {I}}}_N} - \mu E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)]$ are within the unit circle, i.e., $\rho \left( {{{{\textbf {I}}}_N} - \mu E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)]} \right) < 1$ with $\rho ( \cdot )$ denoting the spectral radius. The matrices ${{\textbf {P}}}$ and $\bar{{\textbf {R}}}$ are positive definite, so the eigenvalues of $E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)] = {{\textbf {P}}\bar{{\textbf {R}}}}$ are positive and real [31]. The condition $\rho \left( {{{{\textbf {I}}}_N} - \mu E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)]} \right) < 1$ that guarantees the unconstrained FDAF’s mean convergence is then equivalent to

$$\begin{aligned} \mu < \frac{2}{{{\lambda _{\max }}\left( {E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)]} \right) }}, \end{aligned}$$

(35)

where ${\lambda _{\max }}( \cdot )$ represents the maximum eigenvalue of the input matrix.

In light of (28), one observes that for the unconstrained FDAF, the mean-square convergence holds if and only if $\rho ({{{\textbf {H}}}_{\mathrm{{un}}}}) < 1$ holds. Because ${{{\textbf {C}}}_{\mathrm{{un}}}}$ may not be positive definite particularly in the step-normalization version, the method in [32] cannot be directly applicable to solve this inequality. We resort to the approach in [33] to deal with this problem. We decompose ${{\textbf {P}}}$ as ${{\textbf {P}}} = {{\textbf {QS}}}{{{\textbf {Q}}}^T}$, where the diagonal matrix ${{\textbf {S}}}$ consists of the eigenvalues of ${{\textbf {P}}}$ and the column vectors of the orthogonal matrix ${{\textbf {Q}}}$ are the corresponding eigenvectors. We introduce a matrix ${\varvec{\alpha }} = {{\textbf {Q}}}{{{\textbf {S}}}^{\frac{1}{2}}}{{{\textbf {Q}}}^T}$. Pre- and post-multiplying (29) by ${({\varvec{\alpha }} \otimes {\varvec{\alpha }})^{ - 1}}$ and ${\varvec{\alpha }} \otimes {\varvec{\alpha }}$, we arrive at

$$\begin{aligned} {\bar{{\textbf {H}}}_{\mathrm{{un}}}} = {{{\textbf {I}}}_{{N^2}}} - \mu {\bar{{\textbf {C}}}_{\mathrm{{un}}}} + {\mu ^2}{\bar{{\textbf {J}}}_{\mathrm{{un}}}}, \end{aligned}$$

(36)

where ${\bar{{\textbf {H}}}_{\mathrm{{un}}}} = {({\varvec{\alpha }} \otimes {\varvec{\alpha }})^{ - 1}}{{{\textbf {H}}}_{\mathrm{{un}}}}({\varvec{\alpha }} \otimes {\varvec{\alpha }})$, ${\bar{{\textbf {C}}}_{\mathrm{{un}}}} = ({\varvec{\alpha } \bar{{\textbf {R}}} \varvec{\alpha }}) \otimes {{{\textbf {I}}}_N} + {{{\textbf {I}}}_N} \otimes ({\varvec{\alpha } \bar{{\textbf {R}}} \varvec{\alpha }})$, and ${\bar{{\textbf {J}}}_{\mathrm{{un}}}} = E[({\varvec{\alpha }}{\bar{{\textbf {X}}}^T}(k)\bar{{\textbf {X}}}(k){\varvec{\alpha }}) \otimes ({\varvec{\alpha }}{\bar{{\textbf {X}}}^T}(k)\bar{{\textbf {X}}}(k){\varvec{\alpha }})]$. The matrices ${{{\textbf {H}}}_{\mathrm{{un}}}}$ and ${\bar{{\textbf {H}}}_{\mathrm{{un}}}}$ have the same eigenvalues, which means that we can solve $\rho ({\bar{{\textbf {H}}}_{\mathrm{{un}}}}) < 1$ instead. We can easily infer that ${\bar{{\textbf {C}}}_{\mathrm{{un}}}}$ is positive definite and ${\bar{{\textbf {J}}}_{\mathrm{{un}}}}$ is non-negative definite. As a consequence, the condition on $\mu$ to guarantee $\rho ({\bar{{\textbf {H}}}_{\mathrm{{un}}}}) < 1$ can be given by [32]

$$\begin{aligned} 0< \mu < \min \left\{ {\frac{1}{{{\lambda _{\max }}\left( {{{\bar{{\textbf {C}}}}_{\mathrm{{un}}}}^{ - 1}{{\bar{{\textbf {J}}}}_{\mathrm{{un}}}}} \right) }},\frac{1}{{\max \left( {\lambda ({{{\varvec{{\bar{\eta }} }}}_{\mathrm{{cn}}}}) \in {R^ + }} \right) }}} \right\} , \end{aligned}$$

(37)

where ${{\varvec{{\bar{\eta }} }}_{\mathrm{{un}}}} = \left( {\begin{array}{*{20}{c}} {\frac{1}{2}{{\bar{{\textbf {C}}}}_{\mathrm{{un}}}}}& { - \frac{1}{2}{{\bar{{\textbf {J}}}}_{\mathrm{{un}}}}}\\ {{{{\textbf {I}}}_{{N^2}}}}& {{{{\textbf {0}}}_{{N^2}}}} \end{array}} \right)$.

4 Analysis of constrained FDAFs

We are now ready to study the statistical behavior of constrained FDAFs. The time-domain approach used above is adopted for the performance evaluation. However, as will be shown, the constrained algorithm exhibits some different convergence properties.

4.1 Mean convergence

We start by examining the mean convergence in the constrained case. For that, we define the time-domain weight vector as ${{\hat{{\textbf {w}}}}_{\mathrm{{cn}}}}(k) = {\Upsilon _{10}}{{{\textbf {F}}}^{ - 1}}{\hat{{\textbf {W}}}}(k)$ of length L, where ${\Upsilon _{10}}$ extracts the first L components of the inverse DFT of $\hat{{\textbf {W}}}(k)$. Using (4), we obtain the error vector

$$\begin{aligned} {{\textbf {e}}}(k) = {{\textbf {d}}}(k) - {{{\textbf {X}}}_1}(k){{\hat{{\textbf {w}}}}_{\mathrm{{cn}}}}(k). \end{aligned}$$

(38)

Pre-multiply (6) by ${\Upsilon _{10}}{{{\textbf {F}}}^{ - 1}}$, we obtain the time-domain update equation

$$\begin{aligned} {\hat{{\textbf {w}}}_{\mathrm{{cn}}}}(k + 1) = {\hat{{\textbf {w}}}_{\mathrm{{cn}}}}(k) + \mu \bar{{\textbf {P}}}{\bar{{\textbf {X}}}^T}(k){{\textbf {e}}}(k). \end{aligned}$$

(39)

Subtracting ${{{\textbf {w}}}_\dag }$ from (39) yields

$$\begin{aligned} {{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(k + 1) = {{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(k) - \mu \bar{{\textbf {P}}}{\bar{{\textbf {X}}}^T}(k){{\textbf {e}}}(k), \end{aligned}$$

(40)

where ${{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(k) = {{{\textbf {w}}}_\dag } - {\hat{{\textbf {w}}}_{\mathrm{{cn}}}}(k)$ denotes the weight-error vector of the constrained FDAF. We rewrite ${{\textbf {d}}}(k)$ in terms of ${{{\textbf {w}}}_\dag }$ as

$$\begin{aligned} {{\textbf {d}}}(k) = {{{\textbf {X}}}_1}(k){{{\textbf {w}}}_\dag } + {{{\textbf {X}}}_3}(k){{{\textbf {w}}}_ * } + {{\textbf {v}}}(k). \end{aligned}$$

(41)

Substituting (41) into (38) leads to

$$\begin{aligned} {{\textbf {e}}}(k) = {{{\textbf {X}}}_1}(k){{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(k) + {{{\textbf {X}}}_3}(k){{{\textbf {w}}}_ * } + {{\textbf {v}}}(k). \end{aligned}$$

(42)

Inserting (42) into (40), we obtain the recursion

$$\begin{aligned} {{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(k + 1) = {{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(k) - \mu {{{\textbf {A}}}_{\mathrm{{cn}}}}(k){{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(k)- \mu {{{\textbf {B}}}_{\mathrm{{cn}}}}(k){{{\textbf {w}}}_ * } - \mu \bar{{\textbf {P}}}{{\bar{{\textbf {X}}}}^T}(k){{\textbf {v}}}(k), \end{aligned}$$

(43)

where ${{{\textbf {A}}}_{\mathrm{{cn}}}}(k) = \bar{{\textbf {P}}}{\bar{{\textbf {X}}}^T}(k){{{\textbf {X}}}_1}(k)$ and ${{{\textbf {B}}}_{\mathrm{{cn}}}}(k) = \bar{{\textbf {P}}}{\bar{{\textbf {X}}}^T}(k){{{\textbf {X}}}_3}(k)$.

Taking the mathematical expectation of (43) and using the independence assumption

$$\begin{aligned} E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(k + 1)] = \big ( {{{{\textbf {I}}}_L} - \mu E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)]} \big )E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(k)] -\mu E[{{{\textbf {B}}}_{\mathrm{{cn}}}}(k)]{{{\textbf {w}}}_ * }. \end{aligned}$$

(44)

This recursion describes the time evolution of $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(k)]$. The last term in (44) disappears in the exact modeling case. It is shown in [20] that for the constrained FDAF with step-normalization, we have the approximation $E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)] \approx \frac{R}{N}{{{\textbf {I}}}_L}$ when N is large enough, and hence the algorithm only has a single convergence mode. However, the convergence speed of the constrained FDAF without step-normalization is linked to the eigenvalue spread of $E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)] = E[{{\textbf {X}}}_1^T(k){{{\textbf {X}}}_1}(k)] = {{{\textbf {R}}}_{11}}$, which may be significantly higher than that of the step-normalization version.

Using (9) and (11), the matrix governing the mean convergence of unconstrained FDAFs is represented as

$$\begin{aligned} E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)] = \left[ {\begin{array}{*{20}{c}} {{{{\textbf {P}}}_1}{{{\textbf {R}}}_{11}} + {{{\textbf {P}}}_2}{{{\textbf {R}}}_{21}}}& {{{{\textbf {P}}}_1}{{\textbf {R}}}_{21}^T + {{{\textbf {P}}}_2}{{{\textbf {R}}}_{22}}}\\ {{{\textbf {P}}}_2^T{{{\textbf {R}}}_{11}} + {{{\textbf {P}}}_3}{{{\textbf {R}}}_{21}}}& {{{\textbf {P}}}_2^T{{\textbf {R}}}_{21}^T + {{{\textbf {P}}}_3}{{{\textbf {R}}}_{22}}} \end{array}} \right] . \end{aligned}$$

(45)

where ${{{\textbf {R}}}_{21}} = E[{{\textbf {X}}}_2^T(k){{{\textbf {X}}}_1}(k)]$. It is then observed that the matrix that controls the mean convergence of the constrained version $E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)] = \bar{{\textbf {P}}}E[{\bar{{\textbf {X}}}^T}(k){{{\textbf {X}}}_1}(k)] = {{{\textbf {P}}}_1}{{{\textbf {R}}}_{11}} + {{{\textbf {P}}}_2}{{{\textbf {R}}}_{21}}$ is a submatrix of $E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)]$. For the FDAF without step-normalization, it has been shown in [22] that

$$\begin{aligned} \frac{{{\lambda _{\max }}(E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)])}}{{{\lambda _{\min }}(E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)])}} \ge \frac{{{\lambda _{\max }}(E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)])}}{{{\lambda _{\min }}(E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)])}}. \end{aligned}$$

(46)

Eq. (46) states that eigenvalue spread of $E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)]$ is always larger or equal to that of $E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)]$, and hence, the constrained FDAF may converge faster than the unconstrained version. However, it is very challenging to verify if the relation (46) holds for the FDAF algorithms with step-normalization, which remains an open problem.

At steady state, the constrained FDAF converges to

$$\begin{aligned} \begin{aligned} E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(\infty )]&= - {\left( {E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)]} \right) ^{ - 1}}E[{{{\textbf {B}}}_{\mathrm{{cn}}}}(k)]{{{\textbf {w}}}_ * }\\&= - {({{{\textbf {P}}}_1}{{{\textbf {R}}}_{11}} + {{{\textbf {P}}}_2}{{{\textbf {R}}}_{21}})^{ - 1}}({{{\textbf {P}}}_1}{{{\textbf {R}}}_{13}} + {{{\textbf {P}}}_2}{{{\textbf {R}}}_{23}}){{{\textbf {w}}}_ * }, \end{aligned} \end{aligned}$$

(47)

where ${{{\textbf {R}}}_{13}} = E[{{\textbf {X}}}_1^T(k){{{\textbf {X}}}_3}(k)]$, ${{{\textbf {R}}}_{21}} = E[{{\textbf {X}}}_2^T(k){{{\textbf {X}}}_1}(k)]$ and ${{{\textbf {R}}}_{23}} = E[{{\textbf {X}}}_2^T(k){{{\textbf {X}}}_3}(k)]$.

Let us investigate whether the constrained FDAF converges to the optimum Wiener solution. We present the optimum Wiener solution by minimizing the quadratic cost function $E[{\left\| {{{\textbf {e}}}(k)} \right\| ^2}]$ in (38)

$$\begin{aligned} {{\tilde{{\textbf {w}}}}_{\mathrm{{cn,opt}}}} = - {{\textbf {R}}}_{11}^{ - 1}{{{\textbf {R}}}_{13}}{{{\textbf {w}}}_ * }. \end{aligned}$$

(48)

The optimum Wiener solution of the unconstrained FDAF in (21) utilizes the future information, and hence, it is non-causal. However, the optimum Wiener solution of the constrained FDAF in (48) only requires the past inputs, which is causal. Also, Eq. (21) is an unconstrained solution since we do not impose any constraint on ${{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(k)$. However, Eq. (48) is a constrained solution since the last R components of the inverse DFT of $\hat{{\textbf {W}}}(k)$ are always forced to be zero. Due to the additional constraint, the MMSE from (48) is larger than that from (21), which is already pointed out in [21] but the explicit MMSE expression is not given in [21].

In the exact modeling scenario, the optimum Wiener solution (48) is the true system impulse response, and hence, the constrained FDAF converges to the optimum Wiener solution, i.e., the true plant.

We now pay our attention to the under-modeling scenario.

For constrained FDAFs without step-normalization, we have ${{{\textbf {P}}}_1} = {{{\textbf {I}}}_L},{{{\textbf {P}}}_2} = {{{\textbf {0}}}_{L \times R}}$ and obtain the relation $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(\infty )] = {{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(k) = - {{\textbf {R}}}_{11}^{ - 1}{{{\textbf {R}}}_{13}}{{{\textbf {w}}}_ * }$. The constrained FDAF without step-normalization converges to the causal Wiener solution. For the white noise input, i.e., ${{{\textbf {R}}}_{13}} = {{{\textbf {0}}}_{L \times Q}}$, we have $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(\infty )] = {{{\textbf {0}}}_{L \times 1}}$, which means that the constrained FDAF without step-normalization converges to the first L coefficients of the system impulse response. When correlated signals are used as input, however, it has ${{{\textbf {R}}}_{13}} \ne {{{\textbf {0}}}_{L \times Q}}$, and then $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(\infty )] \ne {{{\textbf {0}}}_{L \times 1}}$, which means that the constrained FDAF algorithm without step-normalization cannot converge to the first L coefficients of the system impulse response.
For constrained FDAFs with step-normalization, we have ${{{\textbf {P}}}_1} = N\sigma _x^2{{{\textbf {I}}}_L},{{{\textbf {P}}}_2} = {{{\textbf {0}}}_{L \times R}},{{{\textbf {R}}}_{13}} = {{{\textbf {0}}}_{L \times Q}}$ for white noise input, and then $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(\infty )] = {{\tilde{{\textbf {w}}}}_{\mathrm{{cn,opt}}}} = {{{\textbf {0}}}_{L \times 1}}$, which means that it can converge to the causal Wiener solution and the first L coefficients of the system impulse response. When correlated signals are used as input, we can verify that $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(\infty )] \ne {{\tilde{{\textbf {w}}}}_{\mathrm{{cn,opt}}}}$ and $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(\infty )] \ne {{{\textbf {0}}}_{L \times 1}}$, which means that it cannot converge to the causal Wiener solution or the first L coefficients of the system impulse response.

4.2 Mean-square convergence

Next, we perform a detailed second-order analysis of the constrained FDAFs. Using (42), we obtain the instantaneous MSE of the constrained FDAFs

$$\begin{aligned} \begin{aligned} {J_{\mathrm{{cn}}}}(k)&= \frac{1}{R}E[{{{\textbf {e}}}^T}(k){{\textbf {e}}}(k)]\\&= \frac{1}{R}\{ \mathrm{{tr}}(E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(k){\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(k)]{{{\textbf {R}}}_{11}}) + 2{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(k){{{\textbf {R}}}_{13}}{{{\textbf {w}}}_ * } + {{{\textbf {w}}}_ * }{{{\textbf {R}}}_3}{{\textbf {w}}}_ * ^T\} + \sigma _v^2. \end{aligned} \end{aligned}$$

(49)

where ${J_{\mathrm{{cn, min}}}} = \sigma _v^2$ is the MMSE that can be realized only when the adaptive coefficients converge to the system impulse response, and ${J_{\mathrm{{cn, ex}}}}(k) = \frac{1}{R}\left\{ {\mathrm{{tr}}(E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(k){\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(k)]{{{\textbf {R}}}_{11}}) + 2{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(k){{{\textbf {R}}}_{13}}{{{\textbf {w}}}_ * } + {{{\textbf {w}}}_ * }{{{\textbf {R}}}_3}{{\textbf {w}}}_ * ^T} \right\}$ is the EMSE.

We study the evaluation of the covariance $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(k){\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(k)]$ to obtain the MSE. We post-multiply (43) by its transpose and take the expectation

$$\begin{aligned} &E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(k + 1){\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(k + 1)]\\&= E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(k){\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(k)]\\&\ \ - \mu E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(k){\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(k){{\textbf {A}}}_{\mathrm{{cn}}}^T(k)]\\&\ \ - \mu E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k){{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(k){\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(k)]\\&\ \ + {\mu ^2}E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k){{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(k){\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(k){{\textbf {A}}}_{\mathrm{{cn}}}^T(k)]\\&\ \ - \mu E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(k){{\textbf {w}}}_ * ^T{{\textbf {B}}}_{\mathrm{{cn}}}^T(k)]\\&\ \ - \mu E[{{{\textbf {B}}}_{\mathrm{{cn}}}}(k){{{\textbf {w}}}_ * }{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(k)]\\&\ \ + {\mu ^2}E[{{{\textbf {B}}}_{\mathrm{{cn}}}}(k){{{\textbf {w}}}_ * }{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T{{\textbf {A}}}_{\mathrm{{cn}}}^T(k)]\\&\ \ + {\mu ^2}E[{{{\textbf {B}}}_{\mathrm{{cn}}}}(k){{{\textbf {w}}}_ * }{{\textbf {w}}}_ * ^T{{\textbf {B}}}_{\mathrm{{cn}}}^T(k)]\\&\ \ + {\mu ^2}E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k){{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(k){{\textbf {w}}}_ * ^T{{\textbf {B}}}_{\mathrm{{cn}}}^T(k)]\\&\ \ + {\mu ^2}\delta _v^2\bar{{\textbf {P}}\bar{{\textbf {R}}}}{{\bar{{\textbf {P}}}}^T}. \end{aligned}$$

(50)

Following the same argument used in the unconstrained FDAFs, we define the ${L^2} \times 1$ vector ${{{\textbf {z}}}_{\mathrm{{cn}}}}(k) = \mathrm{{vec}}\left\{ {E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(k){\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(k)]} \right\}$. Applying the operation of vectorization to (50), we obtain the following difference equation

$$\begin{aligned} {{{\textbf {z}}}_{\mathrm{{cn}}}}(k + 1) = {{{\textbf {H}}}_{\mathrm{{cn}}}}{{{\textbf {z}}}_{\mathrm{{cn}}}}(k) + {{\varvec{\Theta }}_{\mathrm{{cn}}}}(k), \end{aligned}$$

(51)

where

$$\begin{aligned} & \begin{array}{l} {{{\textbf {H}}}_{\mathrm{{cn}}}} = {{{\textbf {I}}}_{{L^2}}} - \mu {{{\textbf {C}}}_{\mathrm{{cn}}}} + {\mu ^2}{{{\textbf {J}}}_{\mathrm{{cn}}}}\\ {{{\textbf {C}}}_{\mathrm{{cn}}}} = E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)] \otimes {{{\textbf {I}}}_L} + {{{\textbf {I}}}_L} \otimes E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)]\\ {{{\textbf {J}}}_{\mathrm{{cn}}}} = E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k) \otimes {{{\textbf {A}}}_{\mathrm{{cn}}}}(k)] \end{array} \end{aligned}$$

(52)

$$\begin{aligned} & \begin{aligned} {{\varvec{\Theta }}_{\mathrm{{cn}}}}(k)&= {\mu ^2}\delta _v^2\mathrm{{vec}}(\bar{{\textbf {P}}\bar{{\textbf {R}}}}{{\bar{{\textbf {P}}}}^T})\\&- \mu \left( {E[{{{\textbf {B}}}_{\mathrm{{cn}}}}(k)] \otimes {{{\textbf {I}}}_L}} \right) \mathrm{{vec}}(E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(k)]{{\textbf {w}}}_ * ^T)\\&+ {\mu ^2}\left( {E[{{{\textbf {B}}}_{\mathrm{{cn}}}}(k)] \otimes {{{\textbf {A}}}_{\mathrm{{cn}}}}(k)} \right) \mathrm{{vec}}(E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(k)]{{\textbf {w}}}_ * ^T)\\&- \mu \left( {{{{\textbf {I}}}_L} \otimes E[{{{\textbf {B}}}_{\mathrm{{cn}}}}(k)]} \right) \mathrm{{vec}}({{{\textbf {w}}}_ * }E[{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(k)])\\&+ {\mu ^2}\left( {E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)] \otimes {{{\textbf {B}}}_{\mathrm{{cn}}}}(k)} \right) \mathrm{{vec}}({{{\textbf {w}}}_ * }E[{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(k)])\\&+ {\mu ^2}\left( {E[{{{\textbf {B}}}_{\mathrm{{cn}}}}(k)] \otimes {{{\textbf {B}}}_{\mathrm{{cn}}}}(k)} \right) \mathrm{{vec}}({{{\textbf {w}}}_ * }{{\textbf {w}}}_ * ^T). \end{aligned} \end{aligned}$$

(53)

Note that the eigenvalue spread of ${{{\textbf {H}}}_{\mathrm{{cn}}}}$ determines the mean-square convergence behavior of constrained FDAFs. Once ${{{\textbf {z}}}_{\mathrm{{cn}}}}(k)$ is recursively generated using (51), the instantaneous MSD of the constrained FDAFs is then given by ${\delta _{\mathrm{{cn}}}}(k) = E({\left\| {{{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(k)} \right\| ^2}) = \mathrm{{tr}}(\mathrm{{ve}}{\mathrm{{c}}^{ - 1}}({{{\textbf {z}}}_{\mathrm{{cn}}}}(k))).$

In the steady state, we have ${{{\textbf {z}}}_{\mathrm{{cn}}}}(\infty ) = {({{{\textbf {I}}}_{{L^2}}} - {{{\textbf {H}}}_{\mathrm{{cn}}}})^{ - 1}}{{\varvec{\Theta }}_{\mathrm{{cn}}}}(\infty )$ according to (51), and we immediately obtain the steady-state MSD ${\delta _{\mathrm{{cn}}}}(\infty ) = E\big ( {{{\left\| {{{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(\infty )} \right\| }^2}} \big ) = \mathrm{{tr}}\left( {\mathrm{{ve}}{\mathrm{{c}}^{ - 1}}({{{\textbf {z}}}_{\mathrm{{cn}}}}(\infty ))} \right)$. Also, the steady-state EMSE follows:

$$\begin{aligned} {J_{\mathrm{{cn, ex}}}}(\infty ) = \frac{1}{R}\{ \mathrm{{tr}}(E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(\infty ){\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(\infty )]{{{\textbf {R}}}_{11}}) + 2E[{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(\infty )]{{{\textbf {R}}}_{13}}{{{\textbf {w}}}_ * } + {{{\textbf {w}}}_ * }{{{\textbf {R}}}_3}{{\textbf {w}}}_ * ^T\}. \end{aligned}$$

(54)

We have derived the steady-state EMSE expressions of four variants of the FDAF algorithm, which are accurate but somewhat complex. Some simplifications for the EMSE expressions in the exact modeling case can be found in the literature. Closed-form EMSE expressions are given in [20] without detailed derivation, and the same results are derived in the frequency domain in [4]. In Appendix A, we adopt a different approach to derive the approximate EMSE expressions in the time domain. As shown, our derivation is easy to understand and also we provide a more accurate solution for the constrained FDAF algorithm..

Next, we investigate how the MSE of the constrained FDAFs is generated and derive the attainable MMSE for the two versions. To this end, we define ${\bar{{\varvec{\Delta }} }} = {{\tilde{{\textbf {w}}}}_{\mathrm{{cn,opt}}}} - E[{{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(\infty )]$. The difference between ${\hat{{\textbf {w}}}_{\mathrm{{cn}}}}(k)$ and its steady-state mean value is

$${\mathbf{q}}_{{{\text{cn}}}} (k) = {\mathbf{\hat{w}}}_{{{\text{cn}}}} (k) - E[{\mathbf{\hat{w}}}_{{{\text{cn}}}} (\infty )] = E[{\mathbf{\tilde{w}}}_{{{\text{cn}}}} (\infty )] - {\mathbf{\tilde{w}}}_{{{\text{cn}}}} (k) = {\mathbf{\tilde{w}}}_{{{\text{cn}},{\text{opt}}}} - \bar{\Delta } - {\mathbf{\tilde{w}}}_{{{\text{cn}}}} (k).$$

(55)

We rewrite the steady-state MSE terms of ${{\textbf {q}}_{\mathrm{{cn}}}}(k)$

$$\begin{aligned} {J_{\mathrm{{cn}}}}(\infty )&= \frac{1}{R}\mathrm{{tr}}(E[{{{\textbf {q}}}_{\mathrm{{cn}}}}(\infty ){{\textbf {q}}}_{\mathrm{{cn}}}^T(\infty )]{{{\textbf {R}}}_{11}})\\&\ \ \ + \frac{1}{R}\left\{ {\mathrm{{tr}}(E{\bar{{\varvec{\Delta }} }}{{{\bar{{\varvec{\Delta }} }}}^T}{{{\textbf {R}}}_{11}}) - 2\mathrm{{tr}}(\bar{\varvec{\Delta}}{ {\tilde{w}}}_{\mathrm{{cn,opt}}}^T{{{\textbf {R}}}_{11}}) + 2{{\bar{\varvec{\Delta }}}^T}{{{\textbf {R}}}_{13}}{{{\textbf {w}}}_ * }} \right\} \\&\ \ \ + \frac{1}{R}\left \{ {\mathrm{{tr}}({{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn,opt}}}}{\tilde{{\textbf {w}}}}_{\mathrm{{cn,opt}}}^T{{{\textbf {R}}}_{11}}) + 2{\tilde{{\textbf {w}}}}_{\mathrm{{cn,opt}}}^T{{{\textbf {R}}}_{13}}{{{\textbf {w}}}_ * } + {{{\textbf {w}}}_ * }{{{\textbf {R}}}_3}{{\textbf {w}}}_ * ^T} \right \}+ \sigma _v^2 \end{aligned}$$

(56)

The first term in the right-hand side of (56) is related to the fluctuations of ${{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(\infty )$ around its mean $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(\infty )]$, and the second term is attributed to the bias between $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(\infty )]$ and its optimum solution ${{\tilde{{\textbf {w}}}}_{\mathrm{{cn,opt}}}}$.

Because the constrained FDAF algorithm without step-normalization for any inputs and the constrained FDAF algorithm with step-normalization for white noise input can converge to the optimum Wiener solution, i.e., ${\bar{{\varvec{\Delta }} }} = {{{\textbf {0}}}_{L \times 1}}$, the second term in (56) then becomes zero and the attainable MMSE is

$$\begin{aligned} J_{\min ,3 } = \frac{1}{R}\{ \mathrm{{tr}}({{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn,opt}}}}{\tilde{{\textbf {w}}}}_{\mathrm{{cn,opt}}}^T{{{\textbf {R}}}_{11}}) + 2{\tilde{{\textbf {w}}}}_{\mathrm{{cn,opt}}}^T{{{\textbf {R}}}_{13}}{{{\textbf {w}}}_ * } + {{{\textbf {w}}}_ * }{{{\textbf {R}}}_3}{{\textbf {w}}}_ * ^T\} + \sigma _v^2. \end{aligned}$$

(57)

which is the same as the LMS [34]. When the input is correlated, the constrained FDAF algorithm with step-normalization cannot converge to the Wiener solution, i.e., ${\bar{{\varvec{\Delta }} }} \ne {{{\textbf {0}}}_{L \times 1}}$. The attainable MMSE in this case is given by

$$\begin{aligned} J_{\min ,4 } = \frac{1}{R}\{ \mathrm{{tr}}(E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(\infty )]E[{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(\infty )]{{{\textbf {R}}}_{11}}) + 2E[{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(\infty )]{{{\textbf {R}}}_{13}}{{{\textbf {w}}}_ * } + {{{\textbf {w}}}_ * }{{{\textbf {R}}}_3}{{\textbf {w}}}_ * ^T\} + \sigma _v^2, \end{aligned}$$

(58)

where $E[{{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(\infty )] = - {({{{\textbf {P}}}_1}{{{\textbf {R}}}_{11}} + {{{\textbf {P}}}_2}{{{\textbf {R}}}_{21}})^{ - 1}}({{{\textbf {P}}}_1}{{{\textbf {R}}}_{13}} + {{{\textbf {P}}}_2}{{{\textbf {R}}}_{23}}){{{\textbf {w}}}_ * }$.

We now compare the MMSE performance of four variants of FDAF algorithms. In the exact modeling scenario, all the variants can achieve the same MMSE, i.e., ${J_{\min }}(\infty ) = \sigma _v^2$. The case is different for the under-modeling scenario. For white noise input, it has

$$\begin{aligned} {J_{\min ,1}} = {J_{\min ,2}} \le {J_{\min ,3}} = {J_{\min ,4}}. \end{aligned}$$

(59)

For correlated input, we have

$$\begin{aligned} {J_{\min ,1}} = {J_{\min ,2}} \le {J_{\min ,3}} < {J_{\min ,4}}. \end{aligned}$$

(60)

4.3 Stability bound

We now present the condition in which the constrained FDAF algorithm can be stable. For the constrained FDAF algorithm without step-normalization, we have ${{{\textbf {P}}}_1} = {{{\textbf {I}}}_L},{{{\textbf {P}}}_2} = {{{\textbf {0}}}_{L \times R}}$, and hence, $E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)] = {{{\textbf {R}}}_{11}}$ is positive definite with real-valued eigenvalues. To guarantee the mean stability, we should have $\rho \left( {{{{\textbf {I}}}_L} - \mu E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)]} \right) < 1$ such that all eigenvalues of ${{{\textbf {I}}}_L} - \mu E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)]$ must be less than one in magnitude, and then, the step size satisfies

$$\begin{aligned} \mu < \frac{2}{{{\lambda _{\max }}\left( {E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)]} \right) }}. \end{aligned}$$

(61)

According to (51), we should have $\rho ({{{\textbf {H}}}_{\mathrm{{cn}}}}) < 1$ such that the algorithm can be mean-square stable. We can infer that ${{{\textbf {C}}}_{\mathrm{{cn}}}}$ is positive definite and ${{{\textbf {J}}}_{\mathrm{{cn}}}}$ is non-negative definite, and then, the range of step size is given by [32]

$$\begin{aligned} 0< \mu < \min \left\{ {\frac{1}{{{\lambda _{\max }}\left( {{{\textbf {C}}}_{\mathrm{{cn}}}^{^{ - 1}}{{{\textbf {J}}}_{\mathrm{{cn}}}}} \right) }},\frac{1}{{\max \left( {\lambda ({{\varvec{\eta }}_{\mathrm{{cn}}}}) \in {R^ + }} \right) }}} \right\} , \end{aligned}$$

(62)

where ${{\varvec{\eta }}_{\mathrm{{cn}}}} = \left( {\begin{array}{*{20}{c}} {\frac{1}{2}{{{\textbf {C}}}_{\mathrm{{cn}}}}}& { - \frac{1}{2}{{{\textbf {J}}}_{\mathrm{{cn}}}}}\\ {{{{\textbf {I}}}_{{L^2}}}}& {{{{\textbf {0}}}_{{L^2}}}} \end{array}} \right)$.

For the constrained FDAF algorithm with step-normalization, $E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)]$ is not a Hermitian matrix, and hence, the eigenvalues of $E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)]$ may not be real. The condition that guarantees mean convergence is expressed as

$$\begin{aligned} 0< \mu < \frac{{2{\Re e}({\lambda _i})}}{{{{\left| {{\lambda _i}} \right| }^2}}}, i = 0,1, \cdots ,L - 1, \end{aligned}$$

(63)

where ${\lambda _i}$ is the i-th eigenvalue of $E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)]$. Similarly, the constrained FDAF is mean-square stable, if and only if

$$\begin{aligned} \rho \left( {{{{\textbf {I}}}_{{L^2}}} - \mu {{{\textbf {C}}}_{\mathrm{{cn}}}} + {\mu ^2}{{{\textbf {J}}}_{\mathrm{{cn}}}}} \right) < 1. \end{aligned}$$

(64)

However, the matrices ${{{\textbf {C}}}_{\mathrm{{cn}}}}$ and ${{{\textbf {J}}}_{\mathrm{{cn}}}}$ are not Hermitian in this case, and hence, we cannot use (62) to solve the inequality (64). In practice, a numerical solution may be used to handle this problem, see [25] for more details.

Given the complexity of the mathematical developments, some variables and matrices used in the formulations are summarized in Appendix B to enhance the readability, although they are explicitly defined.

5 Results and discussion

We conduct computer simulations to demonstrate and support our analysis. The unknown system impulse response has a length of $M=16$ and is with the coefficients [0.01 0.02 $-$0.04 $-$0.08 0.15 $-$0.3 0.45 0.6 0.6 0.45 $-$0.3 0.15 $-$0.08 $-$0.04 0.02 0.01]. The adaptive filter has a length of $L = 16$ or $L=14$, corresponding to the exact- and under-modeling situations, respectively. The input x(n) is generated by filtering the white Gaussian noise with $H(z) = \sqrt{1 - {a^2}} /(1 - a{z^{ - 1}})$, which is white noise for $a=0$ and an AR(1) process for $0< \left| a \right| < 1$. The signal-to-noise ratio (SNR) is defined as $10{\log _{10}}\left( {E[{{({{{\textbf {x}}}^T}(n){{{\textbf {w}}}_{\mathrm{{opt}}}})}^2}]/E[{v^2}(n)]} \right)$. We estimate the required statistical moments by ensemble averaging.

Figure 1 presents the MSD and EMSE learning curves of the step-normalization FDAF algorithm. SNR = 10 dB and SNR = $\infty$ are used in this example. We here consider an exact modeling case, i.e., $L=16$ and $R=16$. The step size is $\mu =0.1$. The theoretical learning curves of the unconstrained FDAF algorithm are generated recursively using (28) and (26), while that of the constrained version are obtained from (51) and (49). High consistency between theory and experimental results is witnessed in Fig. 1. Also, we find that the constrained and unconstrained FDAFs exhibit a similar initial convergence rate in this case. This is because the eigenvalue spread of ${{{\textbf {A}}}_{\mathrm{{un}}}}(k)$ is only slightly larger than that of ${{{\textbf {A}}}_{\mathrm{{cn}}}}(k)$, i.e., $\frac{{{\lambda _{\max }}(E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)])}}{{{\lambda _{\min }}(E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)])}} = 1.63$ and $\frac{{{\lambda _{\max }}(E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)])}}{{{\lambda _{\min }}(E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)])}} = 1.52$. The step-normalization procedure greatly reduces the eigenvalue spread of ${{{\textbf {A}}}_{\mathrm{{un}}}}(k)$ and ${{{\textbf {A}}}_{\mathrm{{cn}}}}(k)$, and the constraining matrix ${{{\textbf {G}}}_{10}}$ further diagonalizes ${{{\textbf {A}}}_{\mathrm{{cn}}}}(k)$. The experimental MSD and EMSE for SNR = $\infty$ both achieve a floor due to the roundoff error. We conduct simulations in the under-modeling situation in Fig. 2, where we set $L=14$ and $R=28$ and other conditions are the same as that in Fig. 1, and we observe that the presented model is capable of predicting the EMSE and MSD learning curves well.

Figure 3 presents the steady-state mean-square performance of the step-normalization FDAF algorithms for SNR = 30 dB. We here consider an exact modeling case with $L=16$ and $R=16$. An AR(1) process with $a=0.9$ is used as input, and $\mu$ varies from 0.01 to 1.9. The stability bounds are also illustrated in Fig. 3. Note that the simulations are consistent with the theoretical predictions, i.e., Eqs. (31) and (54), even at large step sizes. We then investigate the accuracy of several approximations shown in Appendix A. To give a more intuitive observation, we present some theoretical and measured results in Table 1. Our approximated EMSE solutions of the unconstrained and constrained FDAFs, i.e., Eqs. (A5) and (A11), match the experimental values well for smaller step sizes, but the previous theory in [4] and [20] (i.e., Eq. (A12)) present the largest prediction errors. It is apparent that all the approximated results deviate from the simulations when the step size is large. Given the same step size, the constrained FDAF algorithm has lower steady-state MSD and EMSE values than the unconstrained algorithm.

Table 1 Measured and theoretical steady-state EMSEs of constrained FDAF with step-normalization in exact modeling case

Full size table

We investigate the steady-state MSD and EMSE performance of the FDAFs in under-modeling in Fig. 4 for SNR = 30 dB. The experimental conditions are the same as Fig. 3 except that $L=14$ and $R=28$ are used. It is observed in this example that the theoretical predictions are similar to the simulations. The constrained FDAF achieves smaller steady-state MSD values than the unconstrained one in under-modeling case. However, the steady-state MSE in under-modeling case is quite different from that in the exact modeling case. When the step size is small enough, the EMSE related to the step size can be ignored. Hence, the unconstrained FDAF has a lower steady-state EMSE when $\mu <0.5$ since it converges to the non-causal Wiener solution. As the step size increases, the first term in (33) and (56) may dominate. Consequently, we found that for $\mu >0.5$ the unconstrained FDAF algorithm has a larger EMSE compared to its constrained counterpart.

Figure 5 displays the MMSE of four FDAF algorithms with different a. The filter has a length of $L=14$, and the block has a length of $R=14$. Notice that an under-modeling situation has been considered here, whereas four variants of the FDAF have the same MMSE for a sufficient long filter length and the results are not shown. Because the unconstrained FDAF algorithm can always converge to the non-causal Wiener solution, it has a smaller MMSE than the constrained version. An intuitive explanation is that the unconstrained FDAF yields superior MSE performance by utilizing more data than the constrained counterpart. This finding has already been investigated in [20] and [21], but they did not provide the analytical MMSE expressions. Also note that with or without step-normalization, the constrained FDAF attains similar MMSE values for a close to zero, i.e., the input is close to white noise. However, as the parameter a increases, the MMSE of the step-normalization FDAF turns out to be worse than that without step-normalization. The observation is consistent with our analysis, i.e., the constrained FDAF algorithm without step-normalization converges to the causal Wiener solution for an arbitrary input, while for the constrained FDAF algorithm with step-normalization, this can only happen for a white noise input.

6 Conclusion

We carried out the performance evaluation of a family of FDAF algorithms in the time domain. In particular, we have derived the recursive models for their mean and mean-square properties. We extensively discussed whether each variant of the FDAF converges to the system impulse response or the optimum Wiener solution. Closed-form expressions that characterize the MSD and MSE performance of the FDAFs were derived, and accurate and approximate steady-state EMSEs were given. The MMSEs of the four versions of the FDAF were compared. This work is more general and easier to follow than our previous study [25,26,27], which offers a thorough insight into the underlying learning theory of FDAFs. Computer simulations supported the theoretical model.

Data availability

No datasets were generated or analyzed during the current study.

Notes

Only several papers have discussed the modeling capability of the unconstrained FDAF algorithm with a white noise input. In [20], it is observed from experiment that the first L weights of the unconstrained and constrained FDAFs are about the same in under-modeling case. In [30], it is pointed out that the unconstrained FDAF algorithm cannot accurately model an unknown system with greater than L coefficients. In our previous work [26], it is shown that the first L elements of $E[{{\hat{{\textbf {w}}}}_{\mathrm{{un}}}}(\infty )]$ equal those of the true system response, but we did not point out that the $L+1$-th element of $E[{\hat{{\textbf {w}}}_{\mathrm{{un}}}}(\infty )]$ is equal to that of true weight vector. The aforementioned discussion is limited to the case $N=L+R$. It is easy to verify that the unconstrained FDAF algorithm can directly model an unknown plant with $N-R+1$ coefficients for any $N \ge L + R - 1$ for white noise input and more coefficients can be recovered using (22).

References

E.R. Ferrara, Fast implementation of LMS adaptive filters. IEEE Trans. Acoust. Speech Signal Process. 28(4), 474–475 (1980)
Article Google Scholar
G.A. Clark, S.K. Mitra, S.R. Parker, Block implementation of adaptive digital filters. IEEE Trans. Circuits Syst. 28(3), 584–592 (1981)
Article Google Scholar
D. Mansour Jr., Unconstrained frequency-domain adaptive filter. IEEE Trans. Acoust. Speech Signal Process. 30(5), 726–734 (1982)
Article Google Scholar
B. Farhang-Boroujeny, Adaptive Filters: Theory and Applications (John Wiley & Sons, Hoboken, NJ, USA, 2013)
Book Google Scholar
J.J. Shynk, Frequency-domain and multirate adaptive filtering. IEEE Signal Process. Mag. 9(1), 14–37 (1992)
Article Google Scholar
D. Comminiello, A. Nezamdoust, S. Scardapane, M. Scarpiniti, A. Hussain, A. Uncini, A new class of efficient adaptive filters for online nonlinear modeling. IEEE Trans. Syst. Man Cybern. Syst. 52, 1384–1396 (2022)
Google Scholar
J. Franzen, T. Fingscheidt, Improved measurement noise covariance estimation for N-channel feedback cancellation based on the frequency domain adaptive kalman filter. In: Proc. IEEE ICASSP, pp. 965–969 (2019)
J. Lorente, M. Ferrer, M.D. Diego, A. González, GPU implementation of multichannel adaptive algorithms for local active noise control. IEEE/ACM Trans. Audio Speech Lang. Process. 22(11), 1624–1635 (2014)
Article Google Scholar
M. Schneider, W. Kellermann, The generalized frequency-domain adaptive filtering algorithm as an approximation of the block recursive least-squares algorithm. EURASIP J. Adv. Signal Process. 2016, 1–15 (2016)
Article Google Scholar
J. Lu, K. Chen, X. Qiu, Convergence analysis of the modified frequency-domain block LMS algorithm with guaranteed optimal steady state performance. Signal Process. 132, 165–169 (2017)
Article Google Scholar
F. Yang, J. Yang, Mean-square performance of the modified frequency-domain block LMS algorithm. Signal Process. 163(18–25), 596–604 (2019)
Google Scholar
K. Mayyas, Performance analysis of the deficient length LMS adaptive algorithm. IEEE Trans. Signal Process. 53(8), 2727–2734 (2005)
Article MathSciNet Google Scholar
G. Enzner, S. Voit, Hybrid-frequency-resolution adaptive Kalman filter for online identification of long acoustic responses with low input-output latency. IEEE/ACM Trans. Audio Speech Lang. Process. 31, 3550–3563 (2023)
Article Google Scholar
T. Haubner, A. Brendel, W. Kellermann, End-to-end deep learning based adaptation control for frequency-domain adaptive system identification. In: Proc. IEEE ICASSP, pp. 766–770 (2022)
J. Casebeer, N. Bryan, P. Smaragdis, Meta-AF: meta-learning for adaptive filters. IEEE/ACM Trans. Audio Speech Lang. Process. 31, 355–370 (2023)
Article Google Scholar
A. Carini, G.L. Sicuranza, Transient and steady-state analysis of filtered-x affine projection algorithms. IEEE Trans. Signal Process. 54(2), 665–678 (2006)
Article Google Scholar
E. Eweda, N.J. Bershad, J.C.M. Bermudez, Stochastic analysis of the LMS and NLMS algorithms for cyclostationary white Gaussian and non-Gaussian inputs. IEEE Trans. Signal Process. 66(18), 4753–4765 (2018)
Article MathSciNet Google Scholar
A. Feuer, Performance analysis of the block least mean square algorithm. IEEE Trans. Circuits Syst. 32(9), 960–963 (1985)
Article MathSciNet Google Scholar
P.C.W. Sommen, P.J.V. Gerwen, H.J. Kotmans, A.J.E.M. Janssen, Convergence analysis of a frequency domain adaptive filter with exponential power averaging and generalized window function. IEEE Trans. Circuits Syst. 34(7), 788–798 (1987)
Article Google Scholar
J.C. Lee, C.K. Un, Performance analysis of frequency domain block LMS adaptive digital filters. IEEE Trans. Circuits Syst. 36(2), 173–189 (1989)
Article MathSciNet Google Scholar
A. Feuer, R. Cristi, On the steady state performance of frequency domain LMS algorithms. IEEE Trans. Signal Process. 41(1), 419–423 (1993)
Article Google Scholar
B. Farhang-Boroujeny, K.S. Chan, Analysis of the frequency-domain block LMS algorithm. IEEE Trans. Signal Process. 48(8), 2332–2342 (2000)
Article Google Scholar
M. Wu, J. Yang, Y. Xu, X. Qiu, Steady-state solution of the deficient length constrained FBLMS algorithm. IEEE Trans. Signal Process. 60(12), 6681–6687 (2012)
Article MathSciNet Google Scholar
X. Zhang, Y. Xia, C. Li, L. Yang, D.P. Mandic, Analysis of the unconstrained frequency-domain block LMS for second-order noncircular inputs. IEEE Trans. Signal Process. 67(15), 3970–3984 (2019)
Article MathSciNet Google Scholar
F. Yang, G. Enzner, J. Yang, A unified approach to the statistical convergence analysis of frequency-domain adaptive filters. IEEE Trans. Signal Process. 67(7), 1785–1796 (2019)
Article MathSciNet Google Scholar
F. Yang, J. Yang, Convergence analysis of deficient-length frequency-domain adaptive filters. IEEE Trans. Circuits Syst. I 66(11), 4242–4255 (2019)
Article MathSciNet Google Scholar
F. Yang, G. Enzner, J. Yang, New insights into convergence theory of constrained frequency-domain adaptive filters. Circuits Syst. Signal Process. 40, 2076–2090 (2021)
Article Google Scholar
F. Yang, G. Enzner, J. Yang, On the convergence behavior of partitioned-block frequency-domain adaptive filters. IEEE Trans. Signal Process. 69, 4906–4920 (2021)
Article MathSciNet Google Scholar
F. Yang, Analysis of deficient-length partitioned-block frequency-domain adaptive filters. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 456–467 (2022)
Article Google Scholar
X. Li, W.K. Jenkins, The comparison of the constrained and unconstrained frequency-domain block-LMS adaptive algorithms. IEEE Signal Process. Lett. 44(7), 1813–1816 (1996)
Article Google Scholar
D.S. Bernstein, Matrix Mathematics: Theory, Facts, and Formulas (Princeton University Press, Princeton, NJ, 2009)
Book Google Scholar
Y. Al-Naffouri, A.H. Sayed, Transient analysis of data-normalized adaptive filters. IEEE Trans. Signal Process. 51(3), 639–652 (2003)
Article Google Scholar
F. Yang, J. Yang, Mean-square performance of the modified frequency-domain block LMS algorithm. Signal Process. 163(18–25), 596–604 (2019)
Google Scholar
K. Mayyas, Performance analysis of the deficient length LMS adaptive algorithm. IEEE Trans. Signal Process. 53(8), 2727–2734 (2005)
Article MathSciNet Google Scholar

Download references

Funding

This work was supported by the National Natural Science Foundation of China under Grant 62171438, and by the Beijing Natural Science Foundation under Grant 4242013.

Author information

Authors and Affiliations

State Key Laboratory of Acoustics, Institute of Acoustics, Chinese Academy of Sciences, Beijing, 100190, China
Feiran Yang

Authors

Feiran Yang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Feiran Yang was involved in conceptualization, methodology, software, and writing.

Corresponding author

Correspondence to Feiran Yang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix: A

This appendix examines the steady-state mean-square behavior of FDAFs that is easy to follow. Particularly, the derivations will rely on some simplifications.

For the unconstrained FDAF algorithm without step-normalization, it has $E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)] = \bar{{\textbf {R}}}$. Using (27) and ignoring the terms related to ${\mu ^2}$, we obtain

$$\begin{aligned} E[{{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(\infty ){\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(\infty )]\bar{{\textbf {R}}} + \bar{{\textbf {R}}}E[{{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(\infty ){\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(\infty )] = \mu \delta _v^2\bar{{\textbf {R}}}. \end{aligned}$$

(A1)

Taking the trace of both sides of (A1), it has

$${\mathop{\rm tr}\nolimits} \left( {E[{{{\bf{\tilde w}}}_{{\rm{un}}}}(\infty ){\bf{\tilde w}}_{{\rm{un}}}^T(\infty )]{\bf{\bar R}}} \right) = \frac{1}{2}\mu \delta _v^2{\mathop{\rm tr}\nolimits} ({\bf{\bar R}}) = \frac{1}{2}\mu NR\delta _v^2\delta _x^2.$$

(A2)

Considering (31) and (A2), we then obtain the steady-state EMSE of the unconstrained FDAF algorithm without step-normalization

$$\begin{aligned} {J_{\mathrm{{un, ex}}}}(\infty ) = \frac{1}{R}\mathrm{{tr}}(E[{{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(\infty ){\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(\infty )]\bar{{\textbf {R}}}) = \frac{1}{2}\mu N\delta _v^2\delta _x^2. \end{aligned}$$

(A3)

For the step-normalization version, using the relation $E[{{{\textbf {A}}}_{\mathrm{{un}}}}(k)] = \frac{R}{N}{{{\textbf {I}}}_N}$ and (27), we have

$$\begin{aligned} E[{{\tilde{{\textbf {w}}}}_{\mathrm{{un}}}}(\infty ){\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(\infty )] = \frac{1}{2}\mu \delta _v^2{{\textbf {P}}}. \end{aligned}$$

(A4)

Using (31) and (A.4), the steady-state EMSE of the unconstrained FDAF algorithm with step-normalization then follows

$$\begin{aligned} {J_{\mathrm{{un, ex}}}}(\infty ) = \frac{1}{R}\mathrm{{tr}}(E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{un}}}}(\infty ){\tilde{{\textbf {w}}}}_{\mathrm{{un}}}^T(\infty )]\bar{{\textbf {R}}})= \frac{1}{{2R}}\mu \delta _v^2\mathrm{{tr}}({{\textbf {P}}\bar{{\textbf {R}}}})= \frac{1}{2}\mu \delta _v^2. \end{aligned}$$

(A5)

For the constrained FDAF without step-normalization, it has $E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)] = {{{\textbf {R}}}_{11}}$. Using (50) and ignoring the term related to ${\mu ^2}$, we get

$$\begin{aligned} E[{{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(\infty ){\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(\infty )]{{{\textbf {R}}}_{11}} + {{{\textbf {R}}}_{11}}E[{{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(\infty ){\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(\infty )] = \mu \delta _v^2{\bar{\textbf {P}}\bar{{\textbf {R}}}}{\bar{{\textbf {P}}}^T}. \end{aligned}$$

(A6)

Taking the trace of both sides of (A6) and considering the relation ${\bar{\textbf {P}}\bar{{\textbf {R}}}}{\bar{{\textbf {P}}}^T} = {{{\textbf {R}}}_{11}}$, one immediately obtains

$$\begin{aligned} \mathrm{{tr}}\left( {E[{{{\tilde{{\textbf {w}}}}}_{\mathrm{{cn}}}}(\infty ){\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(\infty )]{{{\textbf {R}}}_{11}}} \right) = \frac{1}{2}\mu \delta _v^2\mathrm{{tr}}({{{\textbf {R}}}_{11}}). \end{aligned}$$

(A7)

Considering $\mathrm{{tr}}({{{\textbf {R}}}_{11}}) = LR\delta _x^2$, we then obtain the EMSE of the constrained FDAF algorithm without step-normalization from (54)

$$\begin{aligned} {J_{\mathrm{{cn, ex}}}}(\infty ) = \frac{1}{R}\mathrm{{tr}}(E[{{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(\infty ){\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(\infty )]{{{\textbf {R}}}_{11}}) = \frac{1}{2}\mu L\delta _v^2\delta _x^2. \end{aligned}$$

(A8)

Note that Eqs. (A3), (A5), and (A8) agree with the results in in [4], [20], but our derivation is easier to understand.

For the constrained FDAF algorithm with step-normalization, it has an approximation $E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)] = \frac{R}{N}{{{\textbf {I}}}_L}$. Using (50) and ignoring the ${\mu ^2}$ terms, we get

$$\begin{aligned} E[{{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(\infty ){\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(\infty )] = \frac{1}{2}\frac{N}{R}\mu \delta _v^2{\bar{\textbf {P}}\bar{{\textbf {R}}}}{\bar{{\textbf {P}}}^T}. \end{aligned}$$

(A9)

Using the approximation ${{\textbf {P}}\bar{{\textbf {R}}}} = \frac{R}{N}{{{\textbf {I}}}_L}$, we get ${\bar{\textbf {P}}\bar{{\textbf {R}}}} = [{{{\textbf {I}}}_L}\ {{{\textbf {0}}}_{L \times R}}]$, and hence ${\bar{\textbf {P}}\bar{{\textbf {R}}}}{\bar{{\textbf {P}}}^T} \approx \frac{R}{N}{{{\textbf {P}}}_1}$. Equation (A9) then becomes

$$\begin{aligned} E[{{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(\infty ){\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(\infty )] = \frac{1}{2}\mu \delta _v^2{{{\textbf {P}}}_1}. \end{aligned}$$

(A10)

The steady-state EMSE of the constrained FDAF algorithm with step-normalization can be expressed by

$$\begin{aligned} {J_{\mathrm{{cn, ex}}}}(\infty ) = \frac{1}{R}\mathrm{{tr}}(E[{{\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}}(\infty ){\tilde{{\textbf {w}}}}_{\mathrm{{cn}}}^T(\infty )]{{{\textbf {R}}}_{11}}) = \frac{1}{{2R}}\mu \delta _v^2\mathrm{{tr}}({{{\textbf {P}}}_1}{{{\textbf {R}}}_{11}}). \end{aligned}$$

(A11)

For white noise input, we have ${{{\textbf {P}}}_2}{{{\textbf {R}}}_{21}} = {{{\textbf {0}}}_L}$, and hence ${{{\textbf {P}}}_1}{{{\textbf {R}}}_{11}} = E[{{{\textbf {A}}}_{\mathrm{{cn}}}}(k)] - {{{\textbf {P}}}_2}{{{\textbf {R}}}_{21}} \approx \frac{R}{N}{{{\textbf {I}}}_L}$. Equation (A.11) then becomes

$$\begin{aligned} {J_{\mathrm{{cn, ex}}}}(\infty ) = \frac{1}{2}\frac{L}{N}\mu \delta _v^2. \end{aligned}$$

(A12)

The approximation (A12) is the same as that in [4] and [20]. Note that (A12) has a good approximation when the input is white noise or close to white noise, but (A12) deviates from the simulated result for highly correlated inputs. However, the presented theoretical EMSE (A11) holds well for both white noise and correlated input.

It is observed that the sufficient-length FDAF algorithms with and without step-normalization can achieve the same steady-state EMSE result by step size adjustments, while their difference lies in the convergence speed. The EMSE gap between the two FDAF variants is determined by a factor of L/R, which becomes significant as the block length R increases.

Appendix: B

In this section, we show some variables and matrices used in the formulation for the convenience of the reader (Table 2).

Table 2 The symbol matrices and variables

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, F. Stochastic analysis of frequency-domain adaptive filters. EURASIP J. Adv. Signal Process. 2024, 100 (2024). https://doi.org/10.1186/s13634-024-01193-5

Download citation

Received: 09 September 2024
Accepted: 19 November 2024
Published: 18 December 2024
DOI: https://doi.org/10.1186/s13634-024-01193-5

Stochastic analysis of frequency-domain adaptive filters

Abstract

1 Introduction

2 FDAF

3 Analysis of unconstrained FDAFs

3.1 Signal model

3.2 Mean convergence

3.3 Mean-square convergence

3.4 Stability bound

4 Analysis of constrained FDAFs

4.1 Mean convergence

4.2 Mean-square convergence

4.3 Stability bound

5 Results and discussion

6 Conclusion

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Appendices

Appendix: A

Appendix: B

Rights and permissions

About this article

Cite this article

Share this article

Keywords