Fast Monte Carlo algorithms for matrices III: Computing a compressed approximate matrix decomposition

P Drineas, R Kannan, MW Mahoney - SIAM Journal on Computing, 2006 - SIAM
SIAM Journal on Computing, 2006SIAM
In many applications, the data consist of (or may be naturally formulated as) an m*n matrix A
which may be stored on disk but which is too large to be read into random access memory
(RAM) or to practically perform superlinear polynomial time computations on it. Two
algorithms are presented which, when given an m*n matrix A, compute approximations to A
which are the product of three smaller matrices, C, U, and R, each of which may be
computed rapidly. Let A'=CUR be the computed approximate decomposition; both …
In many applications, the data consist of (or may be naturally formulated as) an matrix A which may be stored on disk but which is too large to be read into random access memory (RAM) or to practically perform superlinear polynomial time computations on it. Two algorithms are presented which, when given an matrix A, compute approximations to A which are the product of three smaller matrices, C, U, and R, each of which may be computed rapidly. Let be the computed approximate decomposition; both algorithms have provable bounds for the error matrix . In the first algorithm, c columns of A and r rows of A are randomly chosen. If the matrix C consists of those c columns of A (after appropriate rescaling) and the matrix R consists of those r rows of A (also after appropriate rescaling), then the matrix U may be calculated from C and R. For any matrix X, let and denote its Frobenius norm and its spectral norm, respectively. It is proven that $$ \left\|A-A'\right\|_\xi \le \min_{D:\mathrm{rank}(D)\le k} \left\|A-D\right\|_\xi + poly(k,1/c) \left\|A\right\|_F $$ holds in expectation and with high probability for both and for all ; thus by appropriate choice of k
also holds in expectation and with high probability. This algorithm may be implemented without storing the matrix A in RAM, provided it can make two passes over the matrix stored in external memory and use additional RAM (assuming that c and r are constants, independent of the size of the input). The second algorithm is similar except that it approximates the matrix C by randomly sampling a constant number of rows of C. Thus, it has additional error but it can be implemented in three passes over the matrix using only constant additional RAM. To achieve an additional error (beyond the best rank-k approximation) that is at most , both algorithms take time which is a low-degree polynomial in k, , and , where is a failure probability; the first takes time linear in and the second takes time independent of m and n. The proofs for the error bounds make important use of matrix perturbation theory and previous work on approximating matrix multiplication and computing low-rank approximations to a matrix. The probability distribution over columns and rows and the rescaling are crucial features of the algorithms and must be chosen judiciously.
Society for Industrial and Applied Mathematics