Topical Review The following article is Open access

Randomized algorithms for fast computation of low rank tensor ring model

, , , , , and

Published 1 December 2020 © 2020 The Author(s). Published by IOP Publishing Ltd
, , Citation Salman Ahmadi-Asl et al 2021 Mach. Learn.: Sci. Technol. 2 011001 DOI 10.1088/2632-2153/abad87

2632-2153/2/1/011001

Abstract

Randomized algorithms are efficient techniques for big data tensor analysis. In this tutorial paper, we review and extend a variety of randomized algorithms for decomposing large-scale data tensors in Tensor Ring (TR) format. We discuss both adaptive and nonadaptive randomized algorithms for this task. Our main focus is on the random projection technique as an efficient randomized framework and how it can be used to decompose large-scale data tensors in the TR format. Simulations are provided to support the presentation and efficiency, and performance of the presented algorithms are compared.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Tensor decompositions have found numerous applications such as in signal processing, machine learning and scientific computing [19]. Handling very large-scale data tensors is a challenging task due to high computational complexity and memory requirements. Tucker decomposition [1012] can resolve this problem by compressing a given data tensor by a smaller core tensor and factor matrices. However, the core tensor suffers from the phenomenon known as curse of dimensionality which means that the number of parameters for its representation is exponentially increased as the order of the core tensor is increased [13, 14]. To tackle this difficulty, alternative tensor representations have been introduced. Tensor networks are effective tools to cope with this difficulty by approximating a given data tensor by a series of inter-connected smaller low order core tensors [2, 3, 15]. For example, Tensor Train/Tensor Ring (TT/TR) decompositions [1620] are special cases of Hierarchical Tucker decomposition [21, 22] which are two simple but powerful tensor networks representing the original tensor as a train and a ring (chain) of inter-connected third order tensors, respectively. The TT decomposition is known as Matrix Product States (MPS) in quantum physics [2326]. The TT and the TR decompositions have found many applications in scientific computing and machine learning communities such as computing extreme singular values and computing pseudoinverse of very large matrices [27, 28], reducing number of parameters in deep neural networks (DNNs) [2933], tensor completion [3439], machine learning [4044], quantum chemistry [45], solving high-dimensional PDEs [46], low rank approximation of large sparse matrices [47]. Other related tensor networks are Projected Entangled Pair States (PEPS) [48], the Multi-scale Entanglement Renormalization Ansatz (MERA) [49] and the Tree Tensor Network (TTN) [50]. While the Tucker decomposition suffers from the curse of dimensionality, recently an efficient algorithm has been proposed in [51] to compute a Tucker decomposition whose core tensor stored in the TT format to avoid this difficulty. The idea is based on decomposing a tensor into the TT format followed by a conversion into the Tucker format whose core tensor is stored in the TT format. Deterministic algorithms for computing the TR decomposition involves computation of a series of the SVD of unfolding matrices. Clearly this is computationally prohibitive for large-scale data tensors and requires large memory and computational complexity. Randomized algorithms are effective techniques to cope with this problem by reducing the computational complexity of the deterministic algorithms and also reducing the communication among different levels of memory hierarchy. In this paper, we review a variety of randomized algorithms for computing the TR decomposition of large-scale data tensors.

Our main contributions in this paper are as follows:

  • Extending random projection technique for fast TR decomposition (Algorithms 7 and 10),
  • Extending randomized algorithms for fast TR decomposition with permutation of core tensors (Algorithms 13 and 14),
  • Extending randomized rank-revealing algorithm for fast TR decomposition (Algorithm 7),
  • Applying randomized algorithms for fast TR completion task (Example 3 in section 5).

Before starting the next section, we introduce some concepts and notations used throughout the paper. Tensors, matrices and vectors are denoted by underlined bold upper case, bold upper case and bold lower case letters as $\underline{\mathbf{X}}$, ${\mathbf{X}}$, ${\mathbf{x}}$, respectively. The notations 'T' and 'Tr' denote the transpose and the trace of a matrix. The Frobenius norm of a tensor is denoted by ${\left\| . \right\|_F}$. Slices are matrices taken from tensors produced by fixing all but two indices. Slices ${\mathbf{X}}(:,i_2,:)$ of a 3rd-order tensor $\underline{\mathbf{X}}\in\mathbb{R}^{I_1\times I_2\times I_3}$, are called lateral slices. We can multiply tensors and matrices. For example, the tensor-matrix multiplication of a tensor $\underline{\mathbf{X}}\in\mathbb{R}^{I_1\times I_2 \times\cdots\times I_N}$ and a matrix ${\mathbf{Q}}\in\mathbb{R}^{K\times I_n}$ along mode n is denoted by $\underline{\mathbf{X}}\times_{n}{\mathbf{Q}}\in \mathbb{R}^{I_1\times \cdots \times I_{n-1}\times K\times I_{n+1}\times \cdots \times I_N}$ and defined as follows

Equation (1)

The same definition can be presented for tensor-vector multiplication. Based on the definition of the tensor-matrix multiplication, when a tensor is multiplied by a vector, the resulting tensor has a mode of size 1. In order to remove the mentioned mode and reduce the order of the resulting tensor, we use a special notation $\bar\times$. For example, for $\underline{\mathbf{X}}\in\mathbb{R}^{I_1\times I_2\times \cdots\times I_N}$ and ${\mathbf{y}}\in\mathbb{R}^{I_n}$, we have $\underline{\mathbf{X}}{\bar\times}_n{\mathbf{y}}\in\mathbb{R}^{I_1\times\cdots\times I_{n-1}\times I_{n+1}\cdots\times I_N}$. A tensor can be reshaped to a matrix and vice versa. These procedures are called matricization (unfolding or flattening) and tensorization, respectively. For a tensor $\underline{\mathbf{X}}\in\mathbb{R}^{{I_1}\times{I_2}\times\cdots\times {I_N}}$, the n-unfolding of the tensor $\underline{\mathbf{X}}$, is denoted by ${\mathbf{X}}_{\left\langle n \right\rangle}\in\mathbb{R}^{{I_1\cdots I_{n}\times I_{n+1}\cdots I_N}}$, and its components are defined as follows

where $\overline{{i_1}{i_2} \ldots {i_N}} = \sum\limits_{k = 1}^N {\left( {{i_k} - 1} \right){J_k}} ,\,\,{J_k} = \prod\limits_{m = 1}^{k - 1} {{I_m}}.$ A special case of the n-unfolding with only one index for the first coordinate is called mode-n unfolding and is denoted by ${{\mathbf{X}}_{\left( n \right)}} \in {\mathbb{R}^{{I_n} \times \prod\limits_{i \ne n} {{I_i}} }}$.

If a tensor is multiplied by a matrix along a specific mode, then its mode-n unfolding can be computed as follows

For a given data matrix ${\mathbf{X}}$, operator $\textrm{SVD}_{\delta}({\mathbf{X}})$ denotes the truncated SVD of ${\mathbf{X}}$, i.e.

and the corresponding minimal rank is denoted by $\textrm{rank}{_\delta }(\textbf{X})$.

The Tucker decomposition of a tensor $\underline{\mathbf X}\in{\mathbb{R}^{{I_1} \times I_2\times \cdots \times {I_N}}}$ admits the following model [1012]

Equation (2)

where $\underline{\mathbf S} \in {\mathbb{R}^{{K_1} \times K_2\times \cdots \times {K_N}}}$ is the core tensor and the matrices ${\mathbf Q}^{(n)} \in {\mathbb{R}^{{I_n} \times {K_n}}},$ $\,{K_n} \le {I_n},\,\,n = 1,2,\ldots,N$ are factor matrices. A shorthand notation for the Tucker decomposition is $\underline{\mathbf X} \cong \left[\kern-0.15em\left[\,{\underline{\mathbf S};\,{{\mathbf Q}^{(1)}},\,{{\mathbf Q}^{(2)}},\,\ldots,\,{{\mathbf Q}^{(N)}}} \right]\kern-0.15em\right].$

The N-tuple $(K_1,K_2,\ldots,K_N)$ is called multilinear or Tucker rank. Higher order SVD (HOSVD) [52] is a special Tucker decomposition in which the factor matrices are orthogonal.

The organization of this paper is structured as follows. In section 2, basic randomized algorithms for low rank matrix approximation are introduced. The TR model, its properties and basic algorithms are described in section 3. Adaptive and non-adaptive randomized variants of the algorithms presented in section 3 are elaborated in section 4. Simulations are provided in section 5 to support the presentation and the conclusion is give in section 6.

2. Basic randomized algorithms for large-scale matrices

Randomized algorithms are efficient techniques for computing low rank matrix approximation. The principle idea behind this framework is capturing the range (column space) of a matrix and making reduction in the data matrix in this manner. Note that if the number of rows of a matrix is more than its columns then we should capture its row space. The randomized approaches can be categorized into next three groups:

  • Random projection. In the random projection approach, a matrix is multiplied by a random matrix such as Gaussian, uniform or Bernoulli matrices 3 on the right-hand side [53]. This procedure may be expensive. In order to accelerate the computation procedure, we can use structured random matrices such sparse random matrices [54, 55], subsampled random Fourier transform [56, 57], subsampled Hadamard transforms, sequence of Givens rotations [57, 58] which are established techniques and have been used extensively in the literature [59].
  • Sampling. In the sampling techniques [60], a part of columns of the original matrix is selected and reduction is made in the data matrix in this manner.
  • Count-Sketch. The count-sketch approach [6163] first labels the columns of the data matrix uniformly. This is also called Hashing. Then, the columns with the same labels are grouped together. Afterwards, the columns of each group are multiplied with ±1 uniformly 4 and they are summed as representative columns.

In this paper, we only consider the random projection technique although the sampling and the count-sketch techniques are also applicable. We mainly focus on standard and distributed data tensors which means that they can be stored in Random Access Memory (RAM) on a single workstation or distributed among several disks, respectively. We only make a few comments on streaming data sets.

2.1. Fast SVD

After making enough reduction in the data matrix via one of the above-mentioned randomized dimensionality reduction techniques, the original data matrix is projected onto the low rank approximation obtained in the first step. To be more precise, let ${\mathbf{X}}\in\mathbb{R}^{I\times J}$ be a given data matrix with $\textrm{rank}({\mathbf{X}}) = R\ll{\mathrm{min}}(I,J)$. In the first step, the matrix ${\mathbf{X}}$ is multiplied by a random matrix ${\boldsymbol{\Omega}} \in {\mathbb{R}^{J \times R}}$ as

In the next step, an orthogonal projector onto the column space of the matrix ${\mathbf{X}}$, i.e. ${\mathbf{Q}}{\mathbf{Q}}^T$, is computed where ${\mathbf{Q}}$ is an orthonormal basis for the range of ${\mathbf{Y}}$. The orthonormal basis ${\mathbf{Q}}$ can be computed through the QR decomposition ${\mathbf{Y}} = {\mathbf{Q}}{\mathbf{R}},$ where ${\mathbf{Q}}\in\mathbb{R}^{I\times R},$ and ${\mathbf{R}}\in\mathbb{R}^{R\times R}$. Here, we have

and the compressed matrix ${\mathbf{B}} = {{\mathbf{Q}}^T}\,{\mathbf{X}}\in\mathbb{R}^{R\times J}$ is of smaller size than ${\mathbf{X}}$. The SVD of the original matrix ${\mathbf{X}}$ is recovered from the SVD of ${\mathbf{B}} = {\mathbf{U}}{\mathbf{S}}{\mathbf{V}}^T$ as follows

This procedure is summarized as follows:

Remark 1.  (Oversampling technique) The oversampling technique can be used to improve the solution accuracy of the randomized algorithms. In the oversampling technique, we use additional random vectors (for example R + P random vectors instead of R random vectors) in the first step, i.e. the dimensionality reduction step. In practice, typically P = 5 or P = 10 is enough to achieve reasonable solutions [53].

Remark 2.  (Power iteration technique) The power iteration technique, is used when the singular values of a data matrix do not decay very fast. Here, we exploit matrix ${\mathbf{Z}} = {\left({\mathbf{X}}\,{{\mathbf{X}}^T}\right)^q}\,{\mathbf{X}}$ (q is a non-negative integer number) instead of the matrix ${\mathbf{X}}$ and the randomized algorithms are applied to this new matrix. Considering the SVD, ${\mathbf{X}} = {\mathbf{U}}{\mathbf{S}}{\mathbf{V}}^T$, we have ${\left({\mathbf{X}}\,{{\mathbf{X}}^T}\right)^q}\,{\mathbf{X}} = {\mathbf{U}}\,{{\mathbf{S}}^{2q + 1}}\,{{\mathbf{V}}^T}$, and it is seen that the left and right singular vectors of the new matrix ${\mathbf{Z}}$ are the same as those of the matrix ${\mathbf{X}}$ but the singular values of ${\mathbf{Z}}$ have faster decay rate. This can improve the solution accuracy obtained by the randomized algorithms. One should avoid constructing the matrix ${\mathbf{Z}}$ explicitly because of instability issues and it should be computed iteratively using QR decomposition [53]. It was experimentally confirmed that the power iteration q = 1 or q = 2 is often enough in practice for achieving accurate solutions [53].

  • Reduction. Replacing an extremely large-scale data matrix with a new one of smaller size compared with the original one capturing either the column or the row space of the original data matrix as much as possible.
  • SVD computation. Applying deterministic algorithms, e.g. truncated SVD to the reduced data matrix and finding its low rank approximation 5 .
  • Recovery. Recovering the SVD of the original data matrix from the SVD of the compressed one.

The basic form of the randomized SVD algorithm equipped with the oversampling and the power iteration strategies is outlined in Algorithm 1.

Algorithm 1: Randomized SVD algorithm with oversampling and power iteration [53]
Input: A data matrix ${\mathbf{X}}\in \mathbb{R}^{I\times J}$, target rank R, oversampling P and power iteration q
Output: SVD factor matrices ${\mathbf{U}}\in\mathbb{R}^{I\times R},{\mathbf{S}}\in\mathbb{R}^{R\times R}$ and ${\mathbf{V}}\in\mathbb{R}^{R\times J}$
1 Generate a random matrix ${\boldsymbol{\Omega}}\in\mathbb{R}^{J\times {(P+R)}}$ with a prescribed probability distribution
2 Form ${\mathbf{Y}} = {{\left( {{\mathbf{X}}\,{{\mathbf{X}}^T}} \right)^q}\,{\mathbf{X}}}\,{\boldsymbol{\Omega}}\in\mathbb{R}^{I\times R}$
3 Compute QR decomposition: ${\mathbf{Y}} = \mathbf{QR}$
4 Compute: ${\mathbf{B}} = {\mathbf{Q}}^T{\mathbf{X}}\in\mathbb{R}^{R\times J}$
5 Compute an SVD, ${\mathbf{B}} = \overline{\mathbf{U}}\,\overline{{\mathbf{S}}}\,\overline{\mathbf{V}}^T$
6 $\widetilde{\mathbf{U}} = {\mathbf{Q}}\,\overline{\mathbf{U}}$
7 ${\mathbf{U}} = \widetilde{\mathbf{U}}(:,1:R),\,{\mathbf{S}} = \overline{{\mathbf{S}}}(1:R,1:R),\,\,{\mathbf{V}} = \overline{\mathbf{V}}(:,1:R)$

2.2. Two-sided randomized algorithm

It is also possible to make reduction on both dimensions of a data matrix when both of them are large. This can be done by simultaneous multiplying a given data matrix with two random matrices from the left and right hand sides. Algorithm 2 outlines the structure of such a randomized algorithm. Both Algorithms 1 and 2 are randomized multi-pass SVD algorithms because both of them need to access (pass) the original data matrix ${\mathbf{X}}$ two times in Lines 2 and 4. These algorithms can be modified to become single-pass algorithms. To this end, Line 4 in both algorithms can be replaced by alternative representations. For Algorithm 1, we can consider [66]

Equation (3)

and for Algorithm 2, we can consider

Equation (4)

The benefit of these approaches is that they avoid computation of the terms ${\mathbf{Q}}^T\,{\mathbf{X}}$ and ${\mathbf{Q}}^{(1)\,T}\,{\mathbf{X}}\,{\mathbf{Q}}^{(2)}$ which may be computationally expensive, especially when the data matrix is stored out-of-core and the cost of communication may exceed our main computations. Instead, in formulations (3) and (4), the original data matrix ${\mathbf{X}}$ is sketched using the random projection technique and the corresponding matrix ${\mathbf{B}}$ is obtained by solving some well-conditioned overdetermined linear least-squares problems [66]. The matrix multiplication by a random matrix can be performed relatively fast by employing structured random matrices. We should note that this strategy passes the original data matrix only once because all sketching procedures can be done in the first pass over the raw data 6 . Other types of single-pass techniques can be found in [53, 6770].

Algorithm 2: Two-Sided Matrix Randomized SVD [53, 65]
Input: A data matrix ${\mathbf{X}}\in \mathbb{R}^{I\times J}$, and target rank R
Output: SVD factor matrices ${\mathbf{U}}\in\mathbb{R}^{I\times R},{\mathbf{S}}\in\mathbb{R}^{R\times R}$ and ${\mathbf{V}}\in\mathbb{R}^{R\times J}$
1 Draw random matrices of prescribed sizes ${{\boldsymbol{\Omega}}_1} \in {\mathbb{R}^{J \times R}},\,{{\boldsymbol{\Omega}}_2} \in {\mathbb{R}^{I \times R}}$
2 Compute ${{\mathbf{Y}}_1} = {\mathbf{X}}\,{{\boldsymbol{\Omega}}_1}\in\mathbb{R}^{I\times R}$ and ${{\mathbf{Y}}_2} = {\mathbf{X}}^T\,{{\boldsymbol{\Omega}}_2}\in\mathbb{R}^{J\times R}$
3 Compute QR decompositions ${{\mathbf{Y}}_1} = {{\mathbf{Q}}^{(1)}}\,{{\mathbf{R}}_1},\,{{\mathbf{Y}}_2} = {{\mathbf{Q}}^{(2)}}\,{{\mathbf{R}}_2}$
4 Compute ${\mathbf{B}} = {{\mathbf{Q}}^{(1)}}^T\,{\mathbf{X}}\,{{\mathbf{Q}}^{(2)}}\in\mathbb{R}^{R\times R}$
5 Compute an SVD, ${\mathbf{B}} = {\overline{\mathbf{U}}}\,{\overline{\mathbf{S}}}\,{\overline{\mathbf{V}}^T}$
6 Set ${\mathbf{U}} = {\mathbf{Q}}^{(1)}\,{\overline{\mathbf{U}}},\,{\mathbf{S}} = {\overline{\mathbf{S}}}$ and ${\mathbf{V}} = {{\mathbf{Q}}^{(2)}\,\overline{\mathbf{V}}}$

2.3. Randomized matrix rank-revealing (RR) algorithms

Algorithm 3: Randomized Matrix Rank-Revealing Algorithm [71, 72]
Input: A data matrix ${\mathbf{X}}\in\mathbb{R}^{I\times J}$, approximation error bound epsilon, block size b and power iteration q
Output: ${\mathbf{Q}}\in\mathbb{R}^{I\times R},\,{\mathbf{B}}\in\mathbb{R}^{R\times J}$ such that ${\left\| {{\mathbf{X}} - \textbf{QB}} \right\|_F} \lt \varepsilon$
1 ${\mathbf{Q}} = [],\,\,{\mathbf{B}} = []$
2 $E = \left\| {\mathbf{X}} \right\|_F^2$
3 i = 0
4 whileE > ε2 do
5 ${{\boldsymbol{\Omega}}_i} = \textrm{randn}\left( {J,b} \right)$
6 ${{\mathbf{Q}}_i} = \textrm{orth}\left( {{\mathbf{X}}\,{{\boldsymbol{\Omega}} _i} - {\mathbf{Q}}\,\left( {{\mathbf{B}}\,{{\boldsymbol{\Omega}}_i}} \right)} \right)$
7 forl = 1, 2, ..., q do
8 ${{\mathbf{Q}}_i} = \textrm{orth}\left( {{{\mathbf{X}}^T}\,{{\mathbf{Q}}_i}} \right)$
9 ${{\mathbf{Q}}_i} = \textrm{orth}\left( {{\mathbf{X}}\,{{\mathbf{Q}}_i}} \right)$
10 end
11 ${{\mathbf{Q}}_i} = \textrm{orth}\left( {{{\mathbf{Q}}_i} - {\mathbf{Q}}\,\left( {{{\mathbf{Q}}^T}\,{{\mathbf{Q}}_i}} \right)} \right)$
12 ${{\mathbf{B}}_i} = {\mathbf{Q}}_i^T\,{\mathbf{X}}$
13 ${\mathbf{Q}} = \left[ {{\mathbf{Q}},{{\mathbf{Q}}_i}} \right]$
14 ${\mathbf{B}} = \left[ \begin{array}{l} {\mathbf{B}}\\ {{\mathbf{B}}_i} \end{array} \right]$
15 $E = E - \left\| {{{\mathbf{B}}_i}} \right\|_F^2$
16 i = i + 1
17 end

In randomized Algorithms 1 and 2, we need an estimation of the matrix rank in advance which may be a difficult task. Randomized rank-revealing (RR) or equivalently randomized fixed-precision algorithms are able to retrieve the rank of a given data matrix and also the corresponding low-rank matrix approximation automatically. In practice, we use randomized RR Algorithm 3 proposed in [71] which is a modification of the RR algorithm developed in [72]. The operator 'orth' in Lines 6, 8, 9, 11 computes an orthonormal basis for the range of a given data matrix. Also, in the first step, the matrices ${\mathbf{Q}}$ and ${\mathbf{B}}$ are empty and they are updated sequentially. The algorithm requires an approximation error bound epsilon, block size b and the power iteration q. For more details on theoretical results of this algorithm 7 , we refer to [71, 72].

3. Basic tensor ring (TR) decomposition

Tensor Chain (TC) or Ring (TR) decomposition [1720] is a tensor network representing a tensor as a ring (chain) of 3rd-order tensors (see figure 1). A special case of the TR decomposition with condition $R_0 = R_N = 1$ is called the Tensor Train (TT) decomposition [16] because it represents a tensor as a train of 3rd-order tensors. For simplicity of presentation, throughout the paper, we only focus on the TR decomposition and all materials naturally hold true for the TT decomposition.

Figure 1.

Figure 1. Graphical illustrations of the TR decomposition, Tensor network representation (top), slice-wise representation (bottom) For $R_0 = R_N = 1$, the TR decomposition is simplified into the TT decomposition.

Standard image High-resolution image

Let $\underline{\mathbf{X}}\in\mathbb{R}^{I_1\times I_2 \times \cdots \times I_N}$, then the TR decomposition of the tensor $\underline{\mathbf{X}}$ admits the following model

Equation (5)

where $\widehat{\underline{\mathbf{X}}}^{(n)}\in\mathbb{R}^{R_{n-1}\times I_n\times R_n},\,\,n = 1,2,\ldots,N$ are called core tensors and the (N − 1)-tuple $(R_0,R_1,\ldots,R_{N-1})$ is called TR-ranks. Note that in the TR decomposition, we have $R_0 = R_N$ and it is also shown in [20] that the TR-ranks satisfy ${R_0}{R_n} \le {\mathrm{rank}}\left( {{{\mathbf{X}}_{\left\langle n \right\rangle }}} \right)$ for n = 1, 2, ..., N. Equation (5), is called component-wise TR representation and an equivalent slice-wise representation is

Equation (6)

Here, ${{\widehat{\mathbf{X}}}^{(n)}\left( {{i_n}} \right)}$ are $R_{n-1}\times R_n$ lateral slices of the core tensors ${\widehat{\mathbf{X}}}^{(n)}$ for n = 1, 2, ..., N. In view of (5), introducing an auxiliary index r0 makes it possible to consider the TR decomposition as a linear combination of R0 terms of the TT decomposition with partially shared cores. Generally the topological structure of tensor networks can be changed. For example, the TT and TR decompositions can be converted to each other 8 [73, 74].

There are several efficient algorithms for computation of the TR decomposition such as Sequential SVDs algorithm, Alternating Least-Squares (ALS) algorithm, ALS with adaptive ranks 9 and Block-wise ALS algorithm [20, 73]. Note that in the ALS-type algorithms, at each iteration, all core tensors are kept fixed except one and the corresponding core tensor is updated. Moreover, the fixed core tensor is alternatively changed and this justifies the name ALS. A closely related algorithm is modified ALS (MALS) or Density Matrix Renormalization Group (DMRG) algorithm [18, 73, 7577], where consecutive core tensors are updated simultaneously.

Here, we introduce the TR-SVD algorithm [20] for computing the TR decomposition. It is summarized in Algorithm 4. This algorithm is robust because it relies on the SVD where at each iteration of the algorithm the truncated SVD of the unfolding matrices are computed. It is worth mentioning that the TR-SVD algorithm with initial rank R0 = 1 is equivalent to TT-SVD algorithm [16]. The idea of cross/skeleton or equivalently CUR decomposition [7881] has been used for the TT decomposition [8284] which can be naturally used for the TR decomposition [19]. For ALS and MALS-types algorithms see [18, 20, 85, 86]. The TT decomposition can also be computed by Tucker-2 model which is also applicable for the TR decomposition, (see [2, 85]).

Remark 3.  The TR-ranks obtained by Algorithm 4 may not be optimal and often rounding algorithms [16, 87] are used to find a decomposition with lower TR-ranks. Unlike the TT format, the rounding algorithms for mathematical operations in the TR format is more complicated [88]. The TR decomposition as a tensor network contains loop. Hence, it is in general not closed in the Zariski topology [73], section 3,[14, 89, 90]. This means that a sequence of tensors in the TR format may not necessarily converge to a tensor in the TR format, for a detailed theoretical justification see [14, 89, 90]. This may cause instability issues when one wants to find the best TR approximation [18, 73, 91, 92]. That is, the best TR approximation of a given tensor with predefined TR-ranks may not exist and it can be arbitrarily approximated well by the TR decomposition with lower TR-ranks. In contrast, the best low rank TT decomposition is a well-posed problem [16].

Algorithm 4: TR-SVD algorithm [20]
Input: A data tensor $\underline{\mathbf{X}} \in {\mathbb{R}^{{I_1} \times {I_2} \times \cdots \times {I_N}}},$ a prescribed approximation error bound epsilon, and initial rank R0 as a divisor of $\textrm{rank}{_\delta }\left( {{{\mathbf{X}}_{\left\langle 1 \right\rangle }}} \right)$
Output: Approximative representation of the tensor $\underline{\mathbf{X}}$ in the TR format ${\widehat{\underline{\mathbf{X}}} = \ll \widehat{\underline{\mathbf{X}}}^{(1)},\widehat{\underline{\mathbf{X}}}^{(2)},\ldots,\widehat{\underline{\mathbf{X}}}^{(N)}\gg,}$ such that ${\left\| {\underline{\mathbf{X}} - \widehat{\underline{\mathbf{X}}}} \right\|_F} \le \varepsilon {\left\| \underline{\mathbf{X}} \right\|_F}$ and the TR-ranks $(R_0,R_1,\ldots,R_{N-1})$
1 Compute $\delta = \frac{{\varepsilon {{\left\| \underline{\mathbf{X}} \right\|}_F}}}{{\sqrt N }}$
2 ${\mathbf{C}} = {\mathrm{reshape}}\left( {\underline{\mathbf{X}},\left[ {{I_1},I_2I_3\ldots I_N} \right]} \right)$
3 $\left[ {\mathbf{U}},{\mathbf{S}} ,{\mathbf{V}} \right] = \textrm{SVD}_{\delta }\left( {\mathbf{C}} \right)$
4 Set ${R_0 R_1} = \textrm{rank}\left( {\mathbf{S}} \right)$
5 ${\widehat{\underline{\mathbf{X}}}^{(1)}} = \textrm{permute}\left({\textrm{reshape}\left( {{\mathbf{U}},\left[ {{I_1},{R_0},{R_1}} \right]} \right)},[2,1,3]\right)$
6 $\underline{\mathbf{C}} = \textrm{permute}\left({\textrm{reshape}\left( {{\mathbf{S}} {{\mathbf{V}}^T},\left[ {{R_0},{R_1},\prod\nolimits_{j = 2}^N {{I_j}} } \right]} \right)},[2,3,1]\right)$
7 forn = 2, ..., N − 1 do
8 ${\mathbf{C}} = \textrm{reshape}\left( {\underline{\mathbf{C}},\left[ {{R_{n - 1}}{I_n},\frac{{\textrm{numel}\left( \underline{\mathbf{C}} \right)}}{{{{R_{n - 1}}{I_n}} }}} \right]} \right)$
9 $\left[ {\mathbf{U}},{\mathbf{S}},{\mathbf{V}} \right] = \textrm{SVD}_\delta \left( \textbf{C} \right)$
10 ${R_n} = \textrm{rank}\left( {\mathbf{S}} \right)$
11 ${\widehat{\underline{\mathbf{X}}}^{(n)}} = \textrm{reshape}\left( {{\mathbf{U}},\left[ {{R_{n - 1}},{I_n},{R_n}} \right]} \right)$
12 $\underline{\mathbf{C}} = {\mathrm{reshape}}\left( {{\mathbf{S}} {{\mathbf{V}}^T},\left[ {{R_{n}},\prod\nolimits_{j = {n+1} }^N {{I_j}} ,{R_0}} \right]} \right)$
13 end
14 ${\widehat{\underline{\mathbf{X}}}^{(N)}} = \textrm{reshape}\left( {\underline{\mathbf{C}},\left[ {{R_{N - 1}},{I_N},{R_0}} \right]} \right)$

Let τ be a cyclic permutation of the dimensions of a data tensor $\underline{\mathbf{X}}\in\mathbb{R}^{I_1\times I_2\times \cdots\times I_N}$, and produce a new reshaped data tensor $\underline{\mathbf{X}}^{\tau}\in\mathbb{R}^{I_{\tau(1)}\times I_{\tau(2)}\times\cdots\times I_{\tau(N)}}$ which is equivalent to

Assume that the TR representation of the tensor $\underline{\mathbf{X}}^{\tau}$ is as follows

Equation (7)

where τ−1 is the inverse of the cyclic permutation τ. Since there are no boundary on the corner core tensors, the decomposition is invariant to cyclic permutation. However, in practice, the TR-ranks of the permuted tensor ${\underline{\mathbf{X}}^\tau }$ may be different from the TR-ranks of the tensor $\underline{\mathbf{X}}$ and each cyclic permutation of indices may provide different TR-ranks.

It turns out that two main issues underlying the compression performance of the TR decomposition are

  • Cyclic shifts which leads to a suboptimal model
  • Initial rank R0 of the first core tensor

More precisely, choosing different cyclic shifts and an initial rank R0 may lead to different TR decompositions with different number of parameters. In particular, for different initial ranks, it is shown that it is impossible to find a common minimal rank, see ([87], Proposition 2.2). These facts imply that to find a TR decomposition with suboptimal TR-ranks, it is necessary to check all cyclic shifts and possible initial ranks. This procedure is called Reduced storage TR-SVD [87] and is summarized in Algorithm 5. Clearly this algorithm is computationally expensive for high order tensors and because of this issue, a heuristic algorithm called Heuristic TR-SVD is developed [87] in which the procedure of initial rank and cyclic permutation selections are performed heuristically. This procedure is summarized in Algorithm 6 and it essentially consists of two parts as follows:

  • Preprocessing part to find a sub-optimal cyclic shift and an initial rank R0 (in heuristic manner),
  • Applying the TR-SVD with those parameters obtained from the first step.

Assume $\tau = \left( {1,N,N - 1, \ldots ,2} \right)$ and consider all cyclic permutations produced by the generator τ as ${\gamma _n} = {\tau ^{n-1}}$ for n = 1, 2, ..., N.

In the heuristic algorithm, firstly a cyclic permutation $\gamma_{n_*}$ is chosen by solving the following minimization problem

Equation (8)

Afterwards, the initial rank R0 corresponding to the cyclic permutation $\gamma_{n^*}$ selected in the first step, is found by solving the following minimization problem [87]

Equation (9)

where

Equation (10)

Algorithm 5: TR-SVD with all possible cyclic permutations (see also [87])
Input: A data tensor $\underline{\mathbf{X}} \in {\mathbb{R}^{{I_1} \times I_2\times \cdots \times {I_N}}}$ and a prescribed approximation error bound epsilon
Output: Approximative representation of the tensor $\underline{\mathbf{X}}$ in the TR format $\widehat{\underline{\mathbf{X}}} = \ll \widehat{\underline{\mathbf{X}}}^{(1)},\widehat{\underline{\mathbf{X}}}^{(2)},\ldots,\widehat{\underline{\mathbf{X}}}^{(N)}\gg,$ such that $\left\| {\underline{\mathbf{X}}-\widehat{\underline{\mathbf{X}}}} \right\| \le \varepsilon {\left\| \underline{\mathbf{X}} \right\|_F}$ and the TR-ranks $(R_0,R_1,\ldots,R_{N-1})$
1 Set $\tau = \left( {1,N,N - 1, \ldots ,2} \right)$
2 for n = 1, 2, ..., N do
3 Set ${\gamma _n} = {\tau ^{n-1}}$
4 En  = Compute all divisors of $\textrm{rank}_\delta\left( {{\mathbf{X}}_{\left\langle 1 \right\rangle }^{{\gamma _n}}} \right)$
5 for All r ∈ En do
6 Set R0 = r
7 Apply Algorithm 4 to the permuted tensor ${\underline{\mathbf{X}}}^{{\gamma _n}}$ with initial rank R0 and obtain core tensors $\widetilde{{\underline{\mathbf{X}}}}^{(n)}\left( {{i_{{{\gamma^{-1}_n} }\left( k \right)}}} \right),$ k = 1,2,...,N
8 end
9 end
10 Find the optimal TR decomposition with the least number of parameters corresponding to the cyclic permutation $\gamma_{n^*}$ and the initial rank R0, i.e. core tensors $\widetilde{\underline{\mathbf{X}}}^{(k)}\left( {{i_{{\gamma^{-1}_{n^*}}\left( k \right)}}} \right)$ for k = 1, 2, ..., N
11 Set core tensors ${{\widehat {\underline{\mathbf{X}}}}^{(k)}}\left( {{i_{k}}} \right)\equiv \widetilde{{\underline{\mathbf{X}}}}^{({\gamma_{n^*}}(k))}\left( {{i_k}} \right),\,k = 1,2,\ldots,N$
Algorithm 6: Simplified TR-SVD (see also [87])
Input: A tensor $\underline{\mathbf{X}} \in {\mathbb{R}^{{I_1} \times I_2 \times \cdots \times {I_N}}}$ and a prescribed approximation error bound epsilon
Output: Approximative representation of the tensor $\underline{\mathbf{X}}$ in the TR format $\widehat{\underline{\mathbf{X}}} = \ll \widehat{\underline{\mathbf{X}}}^{(1)},\widehat{\underline{\mathbf{X}}}^{(2)},\ldots,\widehat{\underline{\mathbf{X}}}^{(N)}\gg,$ such that $\left\| {\underline{\mathbf{X}}-\widehat{\underline{\mathbf{X}}}} \right\|_F \le \varepsilon \left\|\underline{\mathbf{X}} \right\|_F$ and the TR-ranks $\{R_0,R_1,\ldots,R_{N-1}\}$
1 Set $\tau = \left( {1,N,N - 1, \ldots ,2} \right)$
2 for n = 1, 2, ..., N do
3 Set ${\gamma _n} = {\tau ^{n-1}}$
4 $\widehat{R}_{\gamma_n} = \textrm{rank}\left( {\mathbf{X}}_{\left\langle 2 \right\rangle }^{{\gamma_n}} \right)$
5 end
6 $\widehat{R}_{\gamma_{n^*}} = \mathop {\text{arg}\,\text{min}\,}\limits_{n = 1,2, \ldots ,N} \,\,\,\widehat{R}_{\gamma_n}$
7 Compute $R_0^*$ by solving optimization problem (9)
8 Apply Algorithm 4 to the permuted tensor ${{\underline{\mathbf{X}}}^{{\gamma_{{n^*}}}}}$ with the initial rank $R_0^*$ and obtain core tensors ${{\widetilde{\underline{\mathbf{X}}}}}^{(n)}\left( {{i_{\gamma^{-1}_{n^*}\left( n \right)}}} \right),$ n = 1, 2, ..., N
9 Set core tensors ${{\widehat {\underline{\mathbf{X}}}}^{(n)}}\left( {{i_{n}}} \right)\equiv \widetilde{{\underline{\mathbf{X}}}}^{({\gamma_{n^*}}(n))}\left( {{i_n}} \right),\,n = 1,2,\ldots,N$

The main algorithms proposed so far are deterministic using truncated or economic SVD which are quite expensive for big data matrices. Next we present the randomized variants of the mentioned algorithms.

4. Randomized tensor ring (TR) decomposition

The main computationally expensive part of Algorithms 4-6 is computation of the truncated SVD of the unfolding matrices. Exploiting the randomized algorithms can speed-up these algorithms for the TR decomposition. Following this strategy, in this section, randomized algorithms for the TR decomposition are introduced.

As mentioned, we only focus on the random projection technique. The sampling and the count-sketch strategies can be applied straightforwardly. For example, in [19], the cross decomposition was used instead of the SVD. Here, columns and rows are sampled in heuristic ways. Also in [93], the sampling technique is used within the ALS algorithm which scales linearly in the tensor order. The problem is treated as a tensor with missing components (only $\mathcal{O}(N)$ known components) and the ALS-TR algorithm applied to this uncompleted data tensor to simultaneously recover the data tensor and also decompose it into the TR format.

Recently, in order to reduce the high computational cost of the standard TT-SVD algorithm, two types of randomized algorithms have been proposed in [9496] as follows

  • Random projection TT algorithm,
  • Adaptive randomized TT algorithm.

The random projection TT algorithm is a variant of the TT-SVD algorithm where at each iteration of the algorithm, random projection technique is used to find low rank approximations of unfolding matrices. This procedure is outlined in Algorithm 7.

We discuss this idea for the general setting of the TR decomposition. To be more clear, we explain one iteration of Algorithm 7. In the first iteration of Algorithm 7, ${\mathbf{C}}_{(1) }\in{\mathbb{R}^{{I_1} \times {I_2}{I_3} \cdots {I_N}}}$ is the mode-1 unfolding matrix of the Nth-order tensor $\underline{\mathbf{C}}\in{\mathbb{R}^{{I_1} \times {I_2}\times \cdots \times {I_N}}}$. Considering random Gaussian matrix ${\boldsymbol{\Omega}}$ of conforming dimension and taking into account the oversampling P, we have

In order to find an orthonormal basis for the range of matrix ${\mathbf{C}}_{(1) }{\boldsymbol{\Omega}}$, the QR decomposition of mentioned matrix is computed as follows

The first $R_0R_1$ columns of matrix $\textbf{Q}$ are taken as orthonormal basis for range of ${\mathbf{C}}_{(1)}$ or

Since ${\mathbf{Q}}^{(1)}\,{\mathbf{Q}}^{(1)\,T}$ is an orthogonal projection onto the range of ${\mathbf{C}}_{\left\langle 1 \right\rangle}$, we have

Two terms ${\mathbf{Q}}^{(1)}$ and ${{\mathbf{Q}}^{(1)\,T}}\,{\mathbf{C}}_{(1)}$ are reshaped into tensors of conforming dimensions in the following manners (see figure 2 for graphical illustration)

In the next step, the tensor $\underline{\mathbf{C}}{\times _1}{{\mathbf{Q}}^{(1)\,T}}$ is reshaped to a 3rd order tensor as

and the procedure is continued with the tensor ${\underline{\mathbf{C}}}$. In general, in the nth iteration of Algorithm 7, the following reshaped 3rd order tensor is considered

Similar to the procedure discussed above, first its mode-1 unfolding, i.e. ${\mathbf{C}}_{(1)}\in\mathbb{R}^{R_{n-1}I_n\times I_{n+1}\ldots I_N R_0}$ is computed. Then from the following randomized low rank matrix approximation

Equation (11)

the nth core tensor $\widehat{\underline{\mathbf{X}}}^{(n)}$ can be computed as

Remark 4.  Algorithm 7 can be equipped with the power iteration technique when the data tensor is corrupted by a level of noise or equivalently the singular values of the unfolding matrices do not decay very fast and oversampling technique may not provide satisfactory approximations. To that end, in Algorithm 7 we replace ${\mathbf{Z}} = {\mathbf{C}}\,{\boldsymbol{\Omega}}$ in Lines 3 and 11, by ${\mathbf{Z}} = {\left( {{{\mathbf{C}}}\,{\mathbf{C}}^T} \right)^q}\,{\mathbf{C}}\,{\boldsymbol{\Omega}}$.

Algorithm 7: Random projection TR-SVD algorithm (see also [94])
Input: A data tensor $\underline{\mathbf{X}} \in {\mathbb{R}^{{I_1} \times {I_2} \times \cdots \times {I_N}}},$ a prescribed approximation error bound epsilon, and TR-ranks $(R_0,R_1,\ldots,R_{N-1})$, oversampling P
Output: Approximative representation of the tensor $\underline{\mathbf{X}}$ in the TR format ${\widehat{\underline{\mathbf{X}}} = \ll \widehat{\underline{\mathbf{X}}}^{(1)},\widehat{\underline{\mathbf{X}}}^{(2)},\ldots,\widehat{\underline{\mathbf{X}}}^{(N)}\gg,}$ such that ${\left\| {\underline{\mathbf{X}} - \widehat{\underline{\mathbf{X}}}} \right\|_F} \le \varepsilon {\left\| \underline{\mathbf{X}} \right\|_F}$
1 ${\mathbf{C}} = {\mathrm{reshape}}\left( {\underline{\mathbf{X}},\left[ {{I_1},I_2I_3\ldots I_N} \right]} \right)$
2 Compute ${\mathbf{Z}} = {\mathbf{C}}\,{\boldsymbol{\Omega}}$ where ${\boldsymbol{\Omega}} \in {\mathbb{R}^{{I_{2}I_{3}} \cdots {I_N} \times \left( {{R_0R_1} + P} \right)}}$
3 Compute ${\mathbf{Q}}$ as a columnwise orthogonal basis of $\textbf{Z}$ by using the QR decomposition
4 Let ${\mathbf{Q}}^{(1)} = {\mathbf{Q}}\left( {:,1:{R_0R_1}} \right)$
5 ${\widehat{\underline{\mathbf{X}}}^{(1)}} = \textrm{permute}\left({\textrm{reshape}\left( {{\mathbf{Q}}^{(1)},\left[ {{I_1},{R_0},{R_1}} \right]} \right)},[2,1,3]\right)$
6 Compute ${\underline{\mathbf{C}}} = {\underline{\mathbf{X}}}\,{\times _1}{{\mathbf{Q}}^{(1)T}}$
7 $\underline{\mathbf{C}} = \textrm{permute}\left({\mathrm{reshape}}\left( {\underline{\mathbf{C}},\left[ {{R_0},{R_1},{I_2} \ldots {I_N}} \right]} \right),[2,3,1]\right)$
8 $\underline{\mathbf{C}} = \textrm{reshape}\left(\underline{\mathbf{C}},[R_1I_2,I_3\ldots I_N,R_0]\right)$
9 for n = 2, ..., N − 1 do
10 ${\mathbf{C}} = {\mathrm{reshape}}\left( {\underline{\mathbf{C}},\left[ {{R_{n - 1}}{I_n},{I_{n + 1}} \ldots {I_N}{R_0}} \right]} \right)$
11 Compute ${\mathbf{Z}} = {\mathbf{C}}\,{\boldsymbol{\Omega}}$ where ${\boldsymbol{\Omega}} \in {\mathbb{R}^{{I_{n+1}I_{n+2}} \cdots {I_N}R_0 \times \left( {{R_{n}} + P} \right)}}$
12 Compute ${\mathbf{Q}}$ as a columnwise orthogonal basis of $\textbf{Z}$ by using the QR decomposition;
13 Let ${\mathbf{Q}}^{(n)} = {\mathbf{Q}}\left( {:,1:{R_{n}}} \right)$
14 ${\widehat{\underline{\mathbf{X}}}^{(n)}} = \textrm{reshape}\left( {{\mathbf{Q}}^{(n)},\left[ {{R_{n - 1}},{I_n},{R_n}} \right]} \right)$
15 Compute ${\underline{\mathbf{C}}} = {\underline{\mathbf{C}}}\,{\times _1}{{\mathbf{Q}}^{(n)T}}$
16 $\underline{\mathbf{C}} = {\mathrm{reshape}}\left( {\underline{\mathbf{C}},\left[ {{R_{n-1}}{I_n},\,\,{I_{n+1}} \ldots {I_N},\,\,{R_0}} \right]} \right)$
17 end
18 ${\widehat{\underline{\mathbf{X}}}^{(N)}} = \textrm{reshape}\left( {{\mathbf{C}},\left[ {{R_{N - 1}},{I_N},{R_0}} \right]} \right)$
Figure 2.

Figure 2. The first step of randomized projection TR-SVD algorithm.

Standard image High-resolution image

Algorithm 7 needs an estimation of the TR-ranks in advance because it is necessary to have estimation of the unfolding matrices ranks for the projection step in Lines 2 and 11. This imposes a restriction on it because we may not have any information about the TR-ranks. It is of interest to choose the TR-ranks of tensors adaptively during running the algorithm.

Estimating the TR-rank Rn in (11) is equivalent to finding an orthogonal matrix ${\mathbf{Q}}^{(n)} \in {\mathbb{R}^{{R_{n-1}{I}_{n}} \times {R_n}}}$ satisfying

This can be equivalently reformulated as the following problem:

Problem 1. Suppose that $\underline{\mathbf{C}} \in {\mathbb{R}^{{I_n R_{n-1}} \times {I_{n+1}} \times \cdots \times {I_N}}}$ and epsilon is a prescribed approximation error bound. The objective is to find a columnwise orthogonal matrix ${{\mathbf{Q}}^{(n)}} \in {\mathbb{R}^{{I_n R_{n-1}} \times {R_n}}}$ with ${R _n} \le I_n R_{n-1}$, such that

Equation (12)

where ${\mathbf{I}}$ is the identity matrix of size $R_{n-1}I_n\times R_{n-1}I_n$.

Problem 1, can be solved by Algorithm 8 or 9, for a detailed study on these algorithms see [53, 94]. Note that in Algorithm 8 a stopping criterion can be either a predefined maximum number of iterations or a predefined approximation error bound. An adaptive randomized algorithm based on this idea is summarized in Algorithm 10. At each iteration of Algorithm 10, Problem 1 is numerically solved and both an estimation rank Rn and also corresponding columnwise orthogonal matrix ${\mathbf{Q}}^{(n)}$ are computed.

It is also possible to use randomized RR Algorithm 3 (also see Algorithm 5 in [97]). within the TR-SVD. This procedure is summarized in Algorithm 7 and we refereed to it as Randomized RRTR-SVD algorithm. Please note that in Lines 3 and 9 of Algorithm 7, by Rank-Revealing Algorithm, we mean Algorithm 3. Recently, this idea has been used for the Tucker decomposition [98] and here we utilize it for the TR decomposition.

Algorithm 8: Pseudocode for solving Problem 1 [53, 94]
Input: A data tensor $\underline{\mathbf{C}} \in {\mathbb{R}^{{I_nR_{n-1}} \times I_{n+1}\times \cdots \times {I_N}}}$
Output: A columnwise orthogonal matrix ${\mathbf{Q}}^{(n)} \in {\mathbb{R}^{{I_nR_{n-1}} \times {R _n}}}$ satisfying (12)
1 $k = 1,\,\,{\mathbf{Q}}^{(0)} = 0$
2 while a stopping criterion is not satisfied do
3 Draw N − n standard Gaussian vectors ${{\boldsymbol{\omega}}_m} \in {\mathbb{R}^{{I_m}}}$ with $m = n+1,n+2,\ldots ,N$
4 Compute ${\mathbf{y}}_k = \underline{\mathbf{C}}\,\,{{\bar\times }_{n+1}}\,\,{{\boldsymbol{\omega}}_{n+1}}\,{\bar\times}_{n+2} \cdots {{\bar\times }_{N}}\,\,{{\boldsymbol{\omega}}_{N}}$
5 if k > 1 then
6 Compute ${{{\mathbf{y}}}_k} = \left( {{{\mathbf{I}}} - {{\mathbf{Q}}^{(k-1)}}\,{\mathbf{Q}}^{(k - 1)\,T}} \right){\mathbf{y}}_k$
7 end
8 Normalize ${{\mathbf{q}}_k} = \frac{{{{{\mathbf{y}}}_k}}}{{{{\left\| {{{{\mathbf{y}}}_k}} \right\|}_2}}}$ and form ${{\mathbf{Q}}^{(k)}} = \left[ {{\mathbf{Q}}^{(k-1)},{{\mathbf{q}}_k}} \right]$
9 k = k + 1
10 end

Following the idea of computation of CANDECOMP/PARAFAC decomposition (CPD) [99101] with a prior fast randomized HOSVD compression [102], in [103], a randomized algorithm was proposed for computation of the TR decomposition based on a prior Tucker compression. The idea is utilizing a randomized Tucker decomposition in the first step as a preprocessing step after which the deterministic algorithms such as Algorithm 4, Algorithm 5, Algorithm 6 or TR-ALS Algorithm [20] can be applied to the smaller Tucker core tensor.

The randomized Tucker compression is summarized in Algorithm 12 and the TR decomposition with a prior Tucker compression is outlined in Algorithm 13 (see figure 3 for graphical illustration). Please note that Algorithms 12 is also called randomized Sequentially Truncated HOSVD (r-STHOSVD) [98, 104]. A main drawback with Algorithm 13, is that it needs an estimation of the multilinear rank of the original data tensor which may be difficult. However, an adaptive algorithm, e.g. Algorithm 3, can be used in Algorithm 12 to find the factor matrices and their corresponding multilinear rank automatically[98]. This procedure is summarized in Algorithm 14.

Figure 3.

Figure 3. Randomized TR decomposition with a prior Tucker compression.

Standard image High-resolution image

It has been shown in [105], that the computational complexity of the TT-SVD algorithm for decomposing an Nth-order tensor of size ${I \times I \times \cdots \times I}$ and the TT-ranks $\left( {R,R, \ldots ,R} \right)$ is $\mathcal{O}\left( {{I^N}{R^2}} \right),$ due to the computation of N SVD of the unfolding matrices, (Theorem 2.1 in Page 2136). Since the computational complexity of the TR-SVD algorithm is the same as the TT-SVD algorithm, we have the same complexity for the TR-SVD algorithm. The idea of decomposing tensors in the TT format with a prior Tucker compression was first proposed in [16]. The computational complexity of TR-SVD (with a prior Tucker decomposition) of an Nth-order tensor of size ${I \times I \times \cdots \times I}$, with the Tucker rank $\left({\tilde{R},\tilde{R}, \ldots ,\tilde{R}} \right)$ and the TR-ranks $\left( {R,R, \ldots ,R} \right)$ is $\mathcal{O}\left( {{I^N}\tilde{R}+{\tilde{R}^N}{R^2}} \right).$

The first term is the cost for multiplying the unfolding matrices with random matrices which is the most expensive operation. The second term is for the TR decomposition of the core Tucker tensor. So, if $\tilde{R}\lt R^2$, then the approach of the TR decomposition with a prior Tucker compression is cheaper.

We should emphasize that the TR decomposition with a prior Tucker compression is applicable when

  • The underlying data tensor is of small order (up to 5) otherwise the curse of dimensionality occurs.
  • The Tucker core tensor admits a low multilinear rank.

Concerning Algorithms 5 and 6, the complexity is more involved because they need several permutations of modes. They are not applicable directly to very large-scale tensors and a prior Tucker compression can somewhat reduce the computational complexity. Please note that all algorithms discussed in this paper achieve a suboptimal compression ratio and developing algorithms for finding the optimal model is a challenging topic that needs to be investigated. For example, in [85], novel algorithms are developed for the TT decomposition.

Algorithm 9: Pseudocode for solving Problem 1 (See also [53, 94])
Input: A data tensor $\underline{\mathbf{C}} \in {\mathbb{R}^{I_nR_{n-1}\times I_{n+1}\times \cdots \times {I_N}}}$, an integer P, an approximation error bound epsilon, a Boolean flag "take max" and maximum number of iterations $Iter_{\textrm{max}}$
Output: A columnwise orthogonal matrix ${\mathbf{Q}}^{(n)} \in {\mathbb{R}^{{I_nR_{n-1}} \times {R _n}}}$ satisfying (12)
1 Draw N − n independent families $\left\{{{\boldsymbol{\omega}}_m^{(p)} \in {\mathbb{R}^{{I_m}}}:\,p = 1,2, \ldots ,P} \right\}$ of standard Gaussian vectors where $m = n+1,n+2,\ldots ,N$
2 Compute ${{\mathbf{y}}_p} = \underline{\mathbf{C}}\,{\bar\times _{n+1}}\,\,{\boldsymbol{\omega}}_{n+1}^{(p)}{\bar\times _{n+2}} \cdots {\bar\times _{N}}\,\,{\boldsymbol{\omega}}_{N}^{(p)}$ with p = 1, 2, ..., P
3 Start with an empty basis matrix ${\mathbf{Q}}^{(0)}$ and set k = 0
4 while $\text{max}\, \left\{{{{\left\| {{{\mathbf{y}}_{k + 1}}} \right\|}_2},{{\left\| {{{\mathbf{y}}_{k + 1}}} \right\|}_2}, \ldots ,{{\left\| {{{\mathbf{y}}_{k + P}}} \right\|}_2}} \right\}\gt \varepsilon \,\,or\,\,k \lt {Iter_{\text{max}\, }}$ do
5 Set k = k + 1
6 if the value of 'take max' is 'True' then
7 Choose ${k_0} \in \left\{{k + 1, \ldots ,k + P} \right\}$ such that ${\left\| {{{\mathbf{y}}_{k_0}}} \right\|_2} = \text{max}\, \left\{{\,{{\left\| {{{\mathbf{y}}_{k + 1}}} \right\|}_2},\,{{\left\| {{{\mathbf{y}}_{k + 2}}} \right\|}_2}, \ldots ,\,{{\left\| {{{\mathbf{y}}_{k + P}}} \right\|}_2}} \right\}$ ${{\mathbf{y}}_k} = \left( {{{\mathbf{I}}} - {{\mathbf{Q}}^{(k-1)}}{\mathbf{Q}}^{(k-1)\,T}} \right){{\mathbf{y}}_{k_0}}\,\,if\,k\gt 1$
8 else
9 ${{\mathbf{y}}_k} = \left( {{{\mathbf{I}}} - {{\mathbf{Q}}^{(k-1)}}{\mathbf{Q}}^{(k-1)\,T}} \right){{\mathbf{y}}_k}\,\,if\,k\gt 1$
10 end
11 Compute ${{\mathbf{q}}_k} = \frac{{{{\mathbf{y}}_k}}}{{{{\left\| {{{\mathbf{y}}_k}} \right\|}_2}}}$ and form ${{\mathbf{Q}}^{(k)}} = \left[ {{{\mathbf{Q}}^{(k-1)}},{{\mathbf{q}}_k}} \right]$
12 Draw N − n standard Gaussian vectors ${{\boldsymbol{\omega}}_m} \in {\mathbb{R}^{{I_m}}}$ where $m = n+1,n+2,\ldots ,N$
13 Compute $\begin{array}{l} {{\mathbf{y}}_{k + P}} = \left({{\mathbf{I}} - {{\mathbf{Q}}^{(k)}}{\mathbf{Q}}^{(k)\,T}} \right) \left( \underline{\mathbf{C}}\,{\bar\times _{n+1}}\,\,{{\boldsymbol{\omega}}_{n+1}} {\bar\times}_{n+2} \cdots {\bar\times _{N}}\,\,{{\boldsymbol{\omega}}_{N}} \right) \end{array}$
14 for $i = k + 1,k + 2, \ldots ,k + P - 1$ do
15 ${{\mathbf{y}}_i} = {{\mathbf{y}}_i} - \left( {{\mathbf{q}}_k^T{{\mathbf{y}}_i}} \right){{\mathbf{q}}_k}$
16 end
17 end
18 Set ${\mathbf{Q}} = {\mathbf{Q}}^{(n)}$ and Rn as the number of all columns of ${\mathbf{Q}}^{(n)}$.
Algorithm 10: Adaptive Random projection TR-SVD algorithm (see also [94])
Input: A data tensor $\underline{\mathbf{X}} \in {\mathbb{R}^{{I_1} \times {I_2} \times \cdots \times {I_N}}},$ a prescribed approximation error bound epsilon, a positive number P and an initial rank R0 as a divisor of $\textrm{rank}{_\delta }\left( {{{\mathbf{X}}_{\left\langle 1 \right\rangle }}} \right)$
Output: Approximative representation of the tensor $\underline{\mathbf{X}}$ in the TR format ${\widehat{\underline{\mathbf{X}}} = \ll \widehat{\underline{\mathbf{X}}}^{(1)},\widehat{\underline{\mathbf{X}}}^{(2)},\ldots,\widehat{\underline{\mathbf{X}}}^{(N)}\gg,}$ such that ${\left\| {\underline{\mathbf{X}} - \widehat{\underline{\mathbf{X}}}} \right\|_F} \le \varepsilon {\left\| \underline{\mathbf{X}} \right\|_F}$
1 Apply Algorithm 3 to the ${{\mathbf{C}}}_{\left\langle 1 \right\rangle }$ and generate the columnwise orthogonal matrix ${{\mathbf{Q}}^{(1)}} \in {\mathbb{R}^{{I_1} \times {R_{0}}R_1}}$
2 ${\widehat{\underline{\mathbf{X}}}^{(1)}} = \textrm{permute}\left({\textrm{reshape}\left( {{\mathbf{Q}}^{(1)},\left[ {{I_1},{R_0},{R_1}} \right]} \right)},[2,1,3]\right)$
3 Compute ${\underline{\mathbf{C}}} = {\underline{\mathbf{X}}}\,{\times _1}{{\mathbf{Q}}^{(1)\,T}}$
4 $\underline{\mathbf{C}} = \textrm{permute}\left({\mathrm{reshape}}\left( {\underline{\mathbf{C}},\left[ {{R_0},{R_1},{I_2} \ldots {I_N}} \right]} \right),[2,3,1]\right)$
5 $\underline{\mathbf{C}} = \textrm{reshape}\left(\underline{\mathbf{C}},[R_1I_2,I_3\ldots I_N,R_0]\right)$
6 for n = 2, ..., N − 1 do
7 $\underline{\mathbf{C}} = {\mathrm{reshape}}\left( {\underline{\mathbf{C}},\left[ {{R_{n-1}}{I_n},\,\,{I_{n+1}} \ldots {I_N}{R_0}} \right]} \right)$
8 Apply Algorithm 8 or 9 to the tensor ${\underline{\mathbf{C}}}$ for solving Problem 1 and generate the columnwise orthogonal matrix ${{\mathbf{Q}}^{(n)}} \in {\mathbb{R}^{{I_n}{R_{n-1}} \times R_n}}$
9 ${\widehat{\underline{\mathbf{X}}}^{(n)}} = \textrm{reshape}\left( {{\mathbf{Q}}^{(n)},\left[ {{R_{n - 1}},{I_n},{R_n}} \right]} \right)$
10 Compute ${\underline{\mathbf{C}}} = {\underline{\mathbf{C}}}\,{\times _1}\,{{\mathbf{Q}}^{(n)\,T}}$
11 $\underline{\mathbf{C}} = {\mathrm{reshape}}\left( {\underline{\mathbf{C}},\left[ {{R_{n-1}}{I_n},\,\,{I_{n+1}} \ldots {I_N},\,\,{R_0}} \right]} \right)$
12 end
13 ${\widehat{\underline{\mathbf{X}}}^{(N)}} = \textrm{reshape}\left( {{\mathbf{C}},\left[ {{R_{N - 1}},{I_N},{R_0}} \right]} \right)$
Algorithm 11: Randomized RRTR-SVD Algorithm
Input: A data tensor $\underline{\mathbf{X}} \in {\mathbb{R}^{{I_1} \times I_2\times \cdots \times {I_N}}},$ a prescribed approximation error bound epsilon, a power iteration q and an initial rank R0 as a divisor of $\textrm{rank}{_\delta }\left( {{{\mathbf{X}}_{\left\langle 1 \right\rangle }}} \right)$
Output: Approximative representation of the tensor $\underline{\mathbf{X}}$ in the TR format $\widehat{\underline{\mathbf{X}}} = \ll \widehat{\underline{\mathbf{X}}}^{(1)},\widehat{\underline{\mathbf{X}}}^{(2)},\ldots,\widehat{\underline{\mathbf{X}}}^{(N)}\gg,$ such that ${\left\| {\underline{\mathbf{X}} - \widehat{\underline{\mathbf{X}}}} \right\|_F} \le \varepsilon {\left\| \underline{\mathbf{X}} \right\|_F}$ and the TR-ranks are $\{R_0,R_1,\ldots,R_{N-1}\}$
1 Compute $\delta = \frac{{\varepsilon {{\left\| \underline{\mathbf{X}} \right\|}_F}}}{{\sqrt N }}$
2 ${\mathbf{C}} = {\mathrm{reshape}}\left( {\underline{\mathbf{X}},\left[ {{I_1},I_2I_3\ldots I_N} \right]} \right)$
3 $[{\mathbf{Q}}, {\mathbf{B}}, \widetilde{R}_1]$ = Rank-Revealing Algorithm(${\mathbf{C}},\,\epsilon,b,\,q$)
4 $R_1 = {\widetilde{R}_1}/{R_0}$
5 ${\widehat{\underline{\mathbf{X}}}^{(1)}} = \begin{array}{l} \textrm{permute}\left( {\textrm{reshape}\left( {{\mathbf{Q}}\left( {:,1:{R_0}} \right),\left[ {{I_1},{R_0},{R_1}} \right]} \right)} \left[ {2,1,3} \right] \right) \end{array}$
6 $\underline{\mathbf{C}} = \textrm{permute}\left( \textrm{reshape}\left( {\mathbf{B}},\left[ {{R_0},{R_1},\prod\limits_{j = 2}^N {{I_j}} } \right] \right),\left[ {2,3,1} \right] \right) $
7 for n = 2, ..., N − 1do
8 ${\mathbf{C}} = \textrm{reshape}\left( {\underline{\mathbf{C}},\left[ {{R_{n - 1}}{I_n},\frac{{\textrm{numel}\left( \underline{\mathbf{C}} \right)}}{{{{R_{n - 1}}{I_n}} }}} \right]} \right)$
9 $[{\mathbf{Q}}, {\mathbf{B}}, R_n]$ = Rank-Revealing Algorithm(${\mathbf{C}},\,\epsilon,b,\,q$)
10 ${\widehat{\underline{\mathbf{X}}}^{(n)}} = \textrm{reshape}\left( {{{\mathbf{Q}}(:,1:R_n)},\left[ {{R_{n - 1}},{I_n},{R_n}} \right]} \right)$
11 $\underline{\mathbf{C}} = {\mathrm{reshape}}\left( {{{\mathbf{B}}(1:R_n,:)},\left[ {{R_{n}},\prod\nolimits_{j = {n+1} }^N {{I_j}} ,{R_0}} \right]} \right)$
12 end
13 ${\widehat{\underline{\mathbf{X}}}^{(N)}} = \textrm{reshape}\left( {\underline{\mathbf{C}},\left[ {{R_{N - 1}},{I_N},{R_0}} \right]} \right)$
Algorithm 12: Pseudocode for the Tucker compression
Input: A data tensor $\underline{\mathbf{X}}\in \mathbb{R}^{I_1\times I_2\times\cdots\times I_N}$ and a Tucker rank $(K_1,K_2,\ldots,K_N)$
Output: Approximate Tucker decomposition $\underline{\mathbf{X}}\cong\left[\kern-0.15em\left[\underline{\mathbf{S}};{\mathbf{Q}}^{(1)},{\mathbf{Q}}^{(2)},\ldots,{\mathbf{Q}}^{(N)}\right]\kern-0.15em\right]$
1 $\underline{\mathbf{S}} = \underline{\mathbf{X}}$
2 for n = 1, 2, ..., N do
3 $\left[H_n,\mathtt{\sim}\right] = \textrm{Size}({{\mathbf{S}}_{\left( n \right)}})$
4 Draw a random matrix ${{\boldsymbol{\Omega}}_n}\in\mathbb{R}^{K_n\times H_n}$
5 ${\mathbf{Y}} = {{\mathbf{S}}_{\left( n \right)}}{{\boldsymbol{\Omega}}_n}$
6 $\left[ {{\mathbf{Q}}^{(n)},\mathtt{\sim}} \right] = \textrm{qr}\left( \textbf{Y} \right)$
7 $\underline{\mathbf{S}} = \underline{\mathbf{S}}{\times _n}{{\mathbf{Q}}^{\left( n \right)\,\,T}}$
8 end
Algorithm 13: Randomized TR decomposition with a prior Tucker compression (see also [103])
Input: A data tensor $\underline{\mathbf{X}} \in {\mathbb{R}^{{I_1} \times I_2\times \cdots \times {I_N}}}$, an approximate multilinear rank $(K_1,K_2,\ldots,K_{N})$ and approximate TR-ranks $\{R_0,R_1,\ldots,R_{N-1}\}$
Output: Approximative representation of the tensor $\underline{\mathbf{X}}$ in the TR format $\widehat{\underline{\mathbf{X}}} = \ll \widehat{\underline{\mathbf{X}}}^{(1)},\widehat{\underline{\mathbf{X}}}^{(2)},\ldots,\widehat{\underline{\mathbf{X}}}^{(N)}\gg,$ such that $\left\| {\underline{\mathbf{X}}-\widehat{\underline{\mathbf{X}}}} \right\| \le \varepsilon \left\| \underline{\mathbf{X}}\right\|$
1 Apply Algorithm 12 to the data tensor $\underline{\mathbf{X}}$ to compress it in the Tucker model with Tucker rank multilinear rank $(K_1,K_2,\ldots,K_{N})$ and obtain $\left[\kern-0.15em\left[\underline{\mathbf{S}},{\mathbf{Q}}^{(1)},{\mathbf{Q}}^{(2)},\ldots,{\mathbf{Q}}^{(N)}\right]\kern-0.15em\right]$
2 Apply Algorithm 4, Algorithm 6, Algorithm 5 or TR-ALS Algorithm [20] to the compressed data tensor $\underline{\mathbf{S}}$ and obtain $\ll \widehat{\underline{\mathbf{S}}}^{(1)},\widehat{\underline{\mathbf{S}}}^{(2)},\ldots,\widehat{\underline{\mathbf{S}}}^{(N)}\gg$
3 Recover the TR cores of the original data tensor from the TR cores of the compressed data tensor, $\widehat{\underline{\mathbf{X}}}^{(n)} = \widehat{\underline{\mathbf{S}}}^{(n)}\times_{2}{\mathbf{Q}}^{(n)},\,n = 1,2,\ldots,N$
4 Return $\ll \widehat{\underline{\mathbf{X}}}^{(1)},\widehat{\underline{\mathbf{X}}}^{(2)},\ldots,\widehat{\underline{\mathbf{X}}}^{(N)}\gg$
Algorithm 14: Adaptive randomized TR decomposition with a prior Tucker compression (see also [103])
Input: A data tensor $\underline{\mathbf{X}} \in {\mathbb{R}^{{I_1} \times I_2\times \cdots \times {I_N}}}$ and a prescribed approximation error bound epsilon
Output: Approximative representation of the tensor $\underline{\mathbf{X}}$ in the TR format $\widehat{\underline{\mathbf{X}}} = \ll \widehat{\underline{\mathbf{X}}}^{(1)},\widehat{\underline{\mathbf{X}}}^{(2)},\ldots,\widehat{\underline{\mathbf{X}}}^{(N)}\gg,$ such that $\left\| {\underline{\mathbf{X}}-\widehat{\underline{\mathbf{X}}}} \right\| \le \varepsilon \left\| \underline{\mathbf{X}}\right\|$ and the TR- ranks $\{R_0,R_1,\ldots,R_{N-1}\}$
1 Apply the r-SHOSVD algorithm [98] (also see Algorithm 5 in [97]) to compress the data tensor in the Tucker format and obtain $\left[\kern-0.15em\left[\underline{\mathbf{S}};{\mathbf{Q}}^{(1)},{\mathbf{Q}}^{(2)},\ldots,{\mathbf{Q}}^{(N)}\right]\kern-0.15em\right]$
2 Apply Algorithm 4, Algorithm 6, Algorithm 5 or TR-ALS Algorithm [20] to the compressed data tensor and obtain $\underline{\mathbf{S}}$ represented as $\ll \widehat{\underline{\mathbf{S}}}^{(1)},\widehat{\underline{\mathbf{S}}}^{(2)},\ldots,\widehat{\underline{\mathbf{S}}}^{(N)}\gg$
3 Recover the TR cores of the original data tensor from the TR cores of the compressed data tensor, $\widehat{\underline{\mathbf{X}}}^{(n)} = \widehat{\underline{\mathbf{S}}}^{(n)} \times_{2} {\mathbf{Q}}^{(n)},\,n = 1,2,\ldots,N$
4 Return $\ll \widehat{\underline{\mathbf{X}}}^{(1)},\widehat{\underline{\mathbf{X}}}^{(2)},\ldots,\widehat{\underline{\mathbf{X}}}^{(N)}\gg$

Remark 5.  For decomposing a streaming data tensor in the TT format, specialized algorithms should be used. Recently a streaming TT decomposition is developed in [106] with applications in cyber-physical big data. Further applications of the streaming TT decomposition in DNNs can be found in [107].

Remark 6.  Recently an efficient technique called TT-HSVD was proposed in [108, 109] for decomposing tensors in the TT format in which the core tensors can be computed in parallel. This is in contrast to the TT-SVD in which at each step just one core tensor is computed. This technique can be generalized for the TR decomposition employing randomization for further acceleration in computations.

Remark 7.  Consider Algorithm 1 which is the basic form of randomized algorithms for low rank approximation of matrices [53]. It has been recently shown in that for large sparse matrices, Algorithm 1 or its variants can be totally improved by prior transformation of a given data matrix into the TT matrix format (equivalently Matrix Product Operator (MPO))[105], and performing all computations in the TT matrix format. This algorithm is called tensor train randomized SVD (TTrSVD) algorithm and simulations have shown that for some experiments, it can achieve more than 10 times speed up for the TT decomposition compared with tensor-based alternating least squares SVD (ALS-SVD) [27] and modified alternating least squares SVD (MALS-SVD) matrix approximation methods [27] even with better accuracy.

5. Simulations

In this section, we evaluate the presented randomized algorithms for computation of the TR decomposition of synthetic and real data tensors. All numerical simulations were conducted on a laptop computer with 2.60 GHz Intel(R) Core(TM) i7-5600U processor and 8GB memory. The evaluation measures of the algorithms are compression ratio and relative error. The compression ratio is defined as

where $C_1,\,C_2$ are the number of components of the original data tensor and its approximation in the TR format, respectively. Also, the relative error is defined as

where $\widetilde{\underline{\mathbf{X}}}$ and $\underline{\mathbf{X}}$ are approximate and exact data tensors, respectively.

Example 1. In the first experiment, the performance and accuracy of the randomized TR algorithms are compared for synthetic data. We set the power iteration as q = 1 and the oversampling as P = 5, within the randomized algorithms. Consider a 4th-order random tensor $\underline{\mathbf{X}}\in\mathbb{R}^{70 \times 70 \times 70 \times 70}$ with exact TR-ranks (5, 3, 5, 7). We applied the deterministic and randomized TR algorithms (Algorithms 5–7, 7, 10, 13, 14) to the tensor $\underline{\mathbf{X}}$. The possible initial rank, i.e. R0, for the TR decomposition were R0 = 1, 3, 5, 15 and the best compression was achieved for R0 = 15. The compression ratio of the TR-SVD achieved using different initial ranks R0 are reported in table 1. The same compression was achieved using the randomized TR-SVD but with better running time reported in table 2. From table 1, it is seen that the best compression ratio was achieved by initial rank R0 = 15.

Table 1. Example 1. Compression ratio of the TR-SVD algorithm for different possible initial ranks R0 for a random tensor of size 70 × 70 × 70 × 70 and TR-rank {5, 3, 5, 7} with optimal compression 2.9155e-04.

Initial rank R0 TR-ranksCompression ratio
1{1, 15, 25, 35}0.0038
3{3, 5, 75, 105}0.0250
5{5, 3, 45, 63}0.0096
15{15, 1, 15, 21}0.0019

Table 2. Example 1. Comparison results for the data tensor of size 70 × 70 × 70 × 70 and TR-ranks (5, 3, 5, 7) with optimal compression 2.9155e-04.

AlgorithmRelative ErrorCPU TimeCompression ratio
Algorithm 4 (with initial R0 = 15)7.3475e-064.8874790.0019
Algorithm 7 (with initial R0 = 15)5.0265e-06 1.443426 0.0019
Algorithm 7 with a prior Tucker compression (Algorithm 14)1.2578e-06 2.4633 0.0019
Algorithm 54.2254e-06265.45230.0019
Algorithm 5 with a prior Tucker compression (Algorithm14)5.4346e-06 2.6749 0.0019
Algorithm 66.6723e-0672.34770.0019
Algorithm 6 with a prior Tucker compression (Algorithm 14)1.2578e-06 2.4633 0.0019
Algorithm 102.1165e-06 3.3244 0.0019
Algorithm 72.0875e-06 3.2679 2.9155e-04
Algorithm 132.5686e-06 2.7645 2.9155e-04

Algorithms 5–6 and their variants with a prior Tucker compression (Algorithm 13 or 14) were able to find the TR-ranks {15, 1, 15, 21} but with higher computational costs. The running time and relative error of algorithms are reported in table 2. For a prior reduction in the Tucker format, the tensor was compressed to a tensor of size (15, 15, 35, 35) using the r-STHOSVD algorithm. From table 2, it is seen that the randomized algorithms have better running time compared to the deterministic counterparts. Note that the last two algorithms in table 2 are non-adaptive and we gave the true TR-ranks to the algorithms. This is why they achieved a better compression ratio.

In a second simulation, we generated a 5th tensor of size 30 × 30 × 30 × 30 × 30 with TR-ranks (5, 3, 5, 3, 5). Here, we wanted to highlight the performance of TR decomposition with a prior Tucker compression when it is combined with Algorithms 5 and 6. Note that for the Tucker compression step, we used multilinear rank (15, 15, 15, 15, 25). The results of this experiment are reported in table 3. The results show significant speed-up of the deterministic algorithms using the randomization technique.

Table 3. Example 1. Comparison results for the data tensor of size 30 × 30 × 30 × 30 × 30 and TR-ranks (5, 3, 5, 3, 5) with optimal compression 1.0494e-04.

AlgorithmRelative ErrorCPU TimeCompression ratio
Algorithm 57.6743e-04139.67356.5598e-04
Algorithm 5 with a prior Tucker compression (Algorithm 14)4.3423e-04 11.7935 6.5598e-04
Algorithm 62.2911e-0419.34576.5598e-04
Algorithm 6 with a prior Tucker compression (Algorithm 14)5.5467e-04 5.3598 6.5598e-04

Example 2. In this example, we consider the highly oscillating function $f\left( x \right) = \left( {x + 1} \right)\text{sin}\,\left( {100{{\left( {x + 1} \right)}^2}} \right)$ depicted in figure 4. It was shown in [110] and [111] that the discretized form of such functions after a tensorization can be compressed in the TR format. Discretization of this function over the range [−1, 1] with step-size $\frac{1}{2^{24}}$ leads to a long vector of size 33554433. We remove the first component of this long vector so that the new vector can be reshaped to a 4th-order tensor of size 128 × 256 × 32 × 32. This tensor has true multilinear rank (3, 7, 21, 21) which can be computed via the Matlab function 'mlrank.m' included in the tensorlab toolbox [112]. The Matlab function 'mlrankest.m' gave the numerical multilinear rank (2, 4, 16, 16) but we used the former in our compression step. We again used Algorithm 12 for the Tucker compression step with the multilinear rank (3, 7, 21, 21). In this experiment, we highlighted the importance of prior Tucker compression in the data tensor in reducing the running time of TR algorithms with cyclic permutations of the cores. The results of this simulation are reported in table 4. From the results, the superiority of the randomized algorithms compared to deterministic counterparts is visible. Also, all algorithms achieved the same compression ratio.

Figure 4.

Figure 4. The highly oscillating function $f\left( x \right) = \left( {x + 1} \right)\text{sin}\,\left( {100{{\left( {x + 1} \right)}^2}} \right)$ over [−1, 1]. Discretization of this function with step-size $\frac{1}{2^{24}}$ followed by a tensorization has low TR-ranks [110, 111]

Standard image High-resolution image

Table 4. Example 2. Comparison results for the highly oscillating function $f\left( x \right) = \left( {x + 1} \right)\text{sin}\,\left( {100{{\left( {x + 1} \right)}^2}} \right)$.

AlgorithmRelative ErrorCPU TimeCompression ratio
Algorithm 55.3849e-06803.90731.4782e-04
Algorithm 5 with a prior Tucker compression (Algorithm 14)5.8412e-06 10.1231 1.4782e-04
Algorithm 65.3859e-06360.25031.4782e-04
Algorithm 5 with a prior Tucker compression (Algorithm 14)7.1662e-06 8.4494 1.4782e-04

Example 3. In this simulation, we are dealing with the tensor completion problem. The problem of recovering a given data tensor from its partially observed components is known as tensor completion [113, 114]. The TR model has recently been used for recovering the missing elements of incomplete higher-order data tensors in [34, 115117]. Here, we used the TR low rank factors (TRLRF) algorithm proposed in [116, 117] because as reported in [116, 117], it was the most stable and efficient algorithm in terms of sensitivity to TR-ranks and reconstruction error compared to others. The algorithm is based on nuclear norm minimization formulation and it uses the ADMM algorithm [118] to solve the underlying optimization problem 10 . It requires to compute the SVD of some unfolding matrices in Lines 86–88 of the Matlab code 'TRLRF.m'. We modified it to a randomized algorithm by replacing the SVD with Algorithm 3 where the block size was b = 100 and the power iteration was q = 1. We call it randomized TRLRF (R-TRLRF) algorithm.

Consider three benchmark images ('Giant' , 'Kate' , 'Peppers') depicted in figure 7, respectively (images in the first column). The "Giant' was of size 256 × 256 × 3, while the 'Kate' and 'Peppers' were of size 512 × 512 × 3. We reshaped 'Giant' , 'Kate' , 'Peppers' to 7th-order tensors of sizes 8 × 8 × 4 × 8 × 8 × 4 × 3,  16 × 8 × 4 × 16 × 8 × 4 × 3 and 16 × 8 × 4 × 16 × 8 × 4 × 3, respectively. We removed randomly 80% of the pixels of the mentioned images and the TRLRF and the R-TRLRF algorithms were applied to the images with missing pixels with varying TR-ranks ${R_1} = {R_2} = \cdots = {R_7} = R,$ for R = 12, 13, ..., 20. The running times of the TRLRF and the R-TRLRF algorithms for different TR-ranks and all images were almost the same and we only report it for the 'Giant' image and TR-ranks 20 in figure 5. From figure 5, the scalability of the R-TRLRF algorithm compared to the TRLRF algorithm is visible. Our simulations confirmed the R-TRLRF algorithm achieves roughly the same accuracy as the TRLRF algorithm while it is computationally much cheaper. The PSNR and the RSE of recovered images obtained by the TRLRF algorithm and the R-TRLRF algorithm are reported in figures 6, where $\textrm{RSE} = \frac{{{{\left\| {\underline{\mathbf{X}} - \widehat{\underline{\mathbf{X}}}} \right\|}_F}}}{{{{\left\| \underline{\mathbf{X}} \right\|}_F}}},\,{\mathrm{PSNR = 10lo}}{{\mathrm{g}}_{{\mathrm{10}}}}\left( {{{255}^2}/{\mathrm{MSE}}} \right),$ and ${\mathrm{MSE}} = \left\| {\underline{\mathbf{X}} - \widehat{\underline{\mathbf{X}}}} \right\|_F^2/{\mathrm{num}}\left( \underline{\mathbf{X}} \right).$ Note 'num' denotes the number of parameters of a given data tensor. Here again, it is seen that the R-TRLRF algorithm outperforms the TRLRF algorithm. We visualize the results of recovered images by the TRLRF algorithm and the R-TRLRF algorithm in figure 7 for TR-rank 20.

Figure 5.

Figure 5. The running time of the TRLRF and the R-TRLRF algorithms for 'Giant' image and TR-ranks 20.

Standard image High-resolution image
Figure 6.

Figure 6. PSNR comparison results of the TRLRF and the R-TRLRF algorithms.

Standard image High-resolution image
Figure 7.

Figure 7. The reconstructed images using the TRLRF algorithm and its randomized variant for TR-rank 20.

Standard image High-resolution image

6. Conclusion

In this paper, we reviewed and extended a variety of randomized algorithms for computation of the Tensor Ring (TR) decomposition which to some extent can be applied to the TT decomposition with R0 = 1. We discussed both adaptive and non-adaptive randomized algorithms for this task. Our main focus was on the random projection technique and used it to speed-up the decomposition of a tensor in the TT/TR format. Simulations were performed on synthetic and real data-sets to support the presentation.

Acknowledgment

The authors would like to thank two anonymous reviewers for their constructive comments which have greatly improved the quality of the paper. This research was also partially supported by the Ministry of Education and Science of the Russian Federation (grant 14.756.31.0001).

Data Availability

Data sharing is not applicable to this article as no new data were created or analysed in this study. The analyzed data are available online.

The data of images are available at https://github.com/wangwenqi1990/TensorRingCompletion and https://drive.google.com/file/d/15xk67wYZ9GI2Kn93g_aaA4CsmgnqGlHE/view.

The synthetic data were generated using functions of the TR toolbox https://github.com/oscarmickelin/tensor-ring-decomposition.

Footnotes

Please wait… references are loading.
10.1088/2632-2153/abad87