Search | arXiv e-print repository

VQ-Flow: Taming Normalizing Flows for Multi-Class Anomaly Detection via Hierarchical Vector Quantization

Authors: Yixuan Zhou, Xing Xu, Zhe Sun, Jingkuan Song, Andrzej Cichocki, Heng Tao Shen

Abstract: Normalizing flows, a category of probabilistic models famed for their capabilities in modeling complex data distributions, have exhibited remarkable efficacy in unsupervised anomaly detection. This paper explores the potential of normalizing flows in multi-class anomaly detection, wherein the normal data is compounded with multiple classes without providing class labels. Through the integration of… ▽ More Normalizing flows, a category of probabilistic models famed for their capabilities in modeling complex data distributions, have exhibited remarkable efficacy in unsupervised anomaly detection. This paper explores the potential of normalizing flows in multi-class anomaly detection, wherein the normal data is compounded with multiple classes without providing class labels. Through the integration of vector quantization (VQ), we empower the flow models to distinguish different concepts of multi-class normal data in an unsupervised manner, resulting in a novel flow-based unified method, named VQ-Flow. Specifically, our VQ-Flow leverages hierarchical vector quantization to estimate two relative codebooks: a Conceptual Prototype Codebook (CPC) for concept distinction and its concomitant Concept-Specific Pattern Codebook (CSPC) to capture concept-specific normal patterns. The flow models in VQ-Flow are conditioned on the concept-specific patterns captured in CSPC, capable of modeling specific normal patterns associated with different concepts. Moreover, CPC further enables our VQ-Flow for concept-aware distribution modeling, faithfully mimicking the intricate multi-class normal distribution through a mixed Gaussian distribution reparametrized on the conceptual prototypes. Through the introduction of vector quantization, the proposed VQ-Flow advances the state-of-the-art in multi-class anomaly detection within a unified training scheme, yielding the Det./Loc. AUROC of 99.5%/98.3% on MVTec AD. The codebase is publicly available at https://github.com/cool-xuan/vqflow. △ Less

Submitted 2 September, 2024; originally announced September 2024.

arXiv:2406.00655 [pdf, other]

Generalized Exponentiated Gradient Algorithms and Their Application to On-Line Portfolio Selection

Authors: Andrzej Cichocki, Sergio Cruces, Auxiliadora Sarmiento, Toshihisa Tanaka

Abstract: This paper introduces a novel family of generalized exponentiated gradient (EG) updates derived from an Alpha-Beta divergence regularization function. Collectively referred to as EGAB, the proposed updates belong to the category of multiplicative gradient algorithms for positive data and demonstrate considerable flexibility by controlling iteration behavior and performance through three hyperparam… ▽ More This paper introduces a novel family of generalized exponentiated gradient (EG) updates derived from an Alpha-Beta divergence regularization function. Collectively referred to as EGAB, the proposed updates belong to the category of multiplicative gradient algorithms for positive data and demonstrate considerable flexibility by controlling iteration behavior and performance through three hyperparameters: $α$, $β$, and the learning rate $η$. To enforce a unit $l_1$ norm constraint for nonnegative weight vectors within generalized EGAB algorithms, we develop two slightly distinct approaches. One method exploits scale-invariant loss functions, while the other relies on gradient projections onto the feasible domain. As an illustration of their applicability, we evaluate the proposed updates in addressing the online portfolio selection problem (OLPS) using gradient-based methods. Here, they not only offer a unified perspective on the search directions of various OLPS algorithms (including the standard exponentiated gradient and diverse mean-reversion strategies), but also facilitate smooth interpolation and extension of these updates due to the flexibility in hyperparameter selection. Simulation results confirm that the adaptability of these generalized gradient updates can effectively enhance the performance for some portfolios, particularly in scenarios involving transaction costs. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2308.04595 [pdf, other]

Quantization Aware Factorization for Deep Neural Network Compression

Authors: Daria Cherniuk, Stanislav Abukhovich, Anh-Huy Phan, Ivan Oseledets, Andrzej Cichocki, Julia Gusak

Abstract: Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a d… ▽ More Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy. This motivated us to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model. Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with factors whose elements lie on a specified quantization grid. We compress neural network weights with a devised algorithm and evaluate it's prediction quality and performance. We compare our approach to state-of-the-art post-training quantization methods and demonstrate competitive results and high flexibility in achiving a desirable quality-performance tradeoff. △ Less

Submitted 8 August, 2023; originally announced August 2023.

arXiv:2306.11750 [pdf, other]

Optical Coherence Tomography Image Enhancement via Block Hankelization and Low Rank Tensor Network Approximation

Authors: Farnaz Sedighin, Andrzej Cichocki, Hossein Rabbani

Abstract: In this paper, the problem of image super-resolution for Optical Coherence Tomography (OCT) has been addressed. Due to the motion artifacts, OCT imaging is usually done with a low sampling rate and the resulting images are often noisy and have low resolution. Therefore, reconstruction of high resolution OCT images from the low resolution versions is an essential step for better OCT based diagnosis… ▽ More In this paper, the problem of image super-resolution for Optical Coherence Tomography (OCT) has been addressed. Due to the motion artifacts, OCT imaging is usually done with a low sampling rate and the resulting images are often noisy and have low resolution. Therefore, reconstruction of high resolution OCT images from the low resolution versions is an essential step for better OCT based diagnosis. In this paper, we propose a novel OCT super-resolution technique using Tensor Ring decomposition in the embedded space. A new tensorization method based on a block Hankelization approach with overlapped patches, called overlapped patch Hankelization, has been proposed which allows us to employ Tensor Ring decomposition. The Hankelization method enables us to better exploit the inter connection of pixels and consequently achieve better super-resolution of images. The low resolution image was first patch Hankelized and then its Tensor Ring decomposition with rank incremental has been computed. Simulation results confirm that the proposed approach is effective in OCT super-resolution. △ Less

Submitted 19 June, 2023; originally announced June 2023.

arXiv:2306.09822 [pdf, other]

Lightweight Attribute Localizing Models for Pedestrian Attribute Recognition

Authors: Ashish Jha, Dimitrii Ermilov, Konstantin Sobolev, Anh Huy Phan, Salman Ahmadi-Asl, Naveed Ahmed, Imran Junejo, Zaher AL Aghbari, Thar Baker, Ahmed Mohamed Khedr, Andrzej Cichocki

Abstract: Pedestrian Attribute Recognition (PAR) deals with the problem of identifying features in a pedestrian image. It has found interesting applications in person retrieval, suspect re-identification and soft biometrics. In the past few years, several Deep Neural Networks (DNNs) have been designed to solve the task; however, the developed DNNs predominantly suffer from over-parameterization and high com… ▽ More Pedestrian Attribute Recognition (PAR) deals with the problem of identifying features in a pedestrian image. It has found interesting applications in person retrieval, suspect re-identification and soft biometrics. In the past few years, several Deep Neural Networks (DNNs) have been designed to solve the task; however, the developed DNNs predominantly suffer from over-parameterization and high computational complexity. These problems hinder them from being exploited in resource-constrained embedded devices with limited memory and computational capacity. By reducing a network's layers using effective compression techniques, such as tensor decomposition, neural network compression is an effective method to tackle these problems. We propose novel Lightweight Attribute Localizing Models (LWALM) for Pedestrian Attribute Recognition (PAR). LWALM is a compressed neural network obtained after effective layer-wise compression of the Attribute Localization Model (ALM) using the Canonical Polyadic Decomposition with Error Preserving Correction (CPD-EPC) algorithm. △ Less

Submitted 16 June, 2023; originally announced June 2023.

arXiv:2305.18798 [pdf, other]

AnoOnly: Semi-Supervised Anomaly Detection with the Only Loss on Anomalies

Authors: Yixuan Zhou, Peiyu Yang, Yi Qu, Xing Xu, Zhe Sun, Andrzej Cichocki

Abstract: Semi-supervised anomaly detection (SSAD) methods have demonstrated their effectiveness in enhancing unsupervised anomaly detection (UAD) by leveraging few-shot but instructive abnormal instances. However, the dominance of homogeneous normal data over anomalies biases the SSAD models against effectively perceiving anomalies. To address this issue and achieve balanced supervision between heavily imb… ▽ More Semi-supervised anomaly detection (SSAD) methods have demonstrated their effectiveness in enhancing unsupervised anomaly detection (UAD) by leveraging few-shot but instructive abnormal instances. However, the dominance of homogeneous normal data over anomalies biases the SSAD models against effectively perceiving anomalies. To address this issue and achieve balanced supervision between heavily imbalanced normal and abnormal data, we develop a novel framework called AnoOnly (Anomaly Only). Unlike existing SSAD methods that resort to strict loss supervision, AnoOnly suspends it and introduces a form of weak supervision for normal data. This weak supervision is instantiated through the utilization of batch normalization, which implicitly performs cluster learning on normal data. When integrated into existing SSAD methods, the proposed AnoOnly demonstrates remarkable performance enhancements across various models and datasets, achieving new state-of-the-art performance. Additionally, our AnoOnly is natively robust to label noise when suffering from data contamination. Our code is publicly available at https://github.com/cool-xuan/AnoOnly. △ Less

Submitted 6 September, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.09564 [pdf, other]

Image Reconstruction using Superpixel Clustering and Tensor Completion

Authors: Maame G. Asante-Mensah, Anh Huy Phan, Salman Ahmadi-Asl, Zaher Al Aghbari, Andrzej Cichocki

Abstract: This paper presents a pixel selection method for compact image representation based on superpixel segmentation and tensor completion. Our method divides the image into several regions that capture important textures or semantics and selects a representative pixel from each region to store. We experiment with different criteria for choosing the representative pixel and find that the centroid pixel… ▽ More This paper presents a pixel selection method for compact image representation based on superpixel segmentation and tensor completion. Our method divides the image into several regions that capture important textures or semantics and selects a representative pixel from each region to store. We experiment with different criteria for choosing the representative pixel and find that the centroid pixel performs the best. We also propose two smooth tensor completion algorithms that can effectively reconstruct different types of images from the selected pixels. Our experiments show that our superpixel-based method achieves better results than uniform sampling for various missing ratios. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2302.09019 [pdf, other]

Tensor Networks Meet Neural Networks: A Survey and Future Perspectives

Authors: Maolin Wang, Yu Pan, Zenglin Xu, Xiangli Yang, Guangxi Li, Andrzej Cichocki

Abstract: Tensor networks (TNs) and neural networks (NNs) are two fundamental data modeling approaches. TNs were introduced to solve the curse of dimensionality in large-scale tensors by converting an exponential number of dimensions to polynomial complexity. As a result, they have attracted significant attention in the fields of quantum physics and machine learning. Meanwhile, NNs have displayed exceptiona… ▽ More Tensor networks (TNs) and neural networks (NNs) are two fundamental data modeling approaches. TNs were introduced to solve the curse of dimensionality in large-scale tensors by converting an exponential number of dimensions to polynomial complexity. As a result, they have attracted significant attention in the fields of quantum physics and machine learning. Meanwhile, NNs have displayed exceptional performance in various applications, e.g., computer vision, natural language processing, and robotics research. Interestingly, although these two types of networks originate from different observations, they are inherently linked through the common multilinearity structure underlying both TNs and NNs, thereby motivating a significant number of intellectual developments regarding combinations of TNs and NNs. In this paper, we refer to these combinations as tensorial neural networks (TNNs), and present an introduction to TNNs in three primary aspects: network compression, information fusion, and quantum circuit simulation. Furthermore, this survey also explores methods for improving TNNs, examines flexible toolboxes for implementing TNNs, and documents TNN development while highlighting potential future directions. To the best of our knowledge, this is the first comprehensive survey that bridges the connections among NNs, TNs, and quantum circuits. We provide a curated list of TNNs at \url{https://github.com/tnbar/awesome-tensorial-neural-networks}. △ Less

Submitted 8 May, 2023; v1 submitted 22 January, 2023; originally announced February 2023.

arXiv:2205.00293 [pdf, other]

TTOpt: A Maximum Volume Quantized Tensor Train-based Optimization and its Application to Reinforcement Learning

Authors: Konstantin Sozykin, Andrei Chertkov, Roman Schutski, Anh-Huy Phan, Andrzej Cichocki, Ivan Oseledets

Abstract: We present a novel procedure for optimization based on the combination of efficient quantized tensor train representation and a generalized maximum matrix volume principle. We demonstrate the applicability of the new Tensor Train Optimizer (TTOpt) method for various tasks, ranging from minimization of multidimensional functions to reinforcement learning. Our algorithm compares favorably to popular… ▽ More We present a novel procedure for optimization based on the combination of efficient quantized tensor train representation and a generalized maximum matrix volume principle. We demonstrate the applicability of the new Tensor Train Optimizer (TTOpt) method for various tasks, ranging from minimization of multidimensional functions to reinforcement learning. Our algorithm compares favorably to popular evolutionary-based methods and outperforms them by the number of function evaluations or execution time, often by a significant margin. △ Less

Submitted 28 September, 2022; v1 submitted 30 April, 2022; originally announced May 2022.

Comments: 26 pages, 8 figures, accepted to Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022). Pre camera-ready version

arXiv:2203.02617 [pdf, other]

How to Train Unstable Looped Tensor Network

Authors: Anh-Huy Phan, Konstantin Sobolev, Dmitry Ermilov, Igor Vorona, Nikolay Kozyrskiy, Petr Tichavsky, Andrzej Cichocki

Abstract: A rising problem in the compression of Deep Neural Networks is how to reduce the number of parameters in convolutional kernels and the complexity of these layers by low-rank tensor approximation. Canonical polyadic tensor decomposition (CPD) and Tucker tensor decomposition (TKD) are two solutions to this problem and provide promising results. However, CPD often fails due to degeneracy, making the… ▽ More A rising problem in the compression of Deep Neural Networks is how to reduce the number of parameters in convolutional kernels and the complexity of these layers by low-rank tensor approximation. Canonical polyadic tensor decomposition (CPD) and Tucker tensor decomposition (TKD) are two solutions to this problem and provide promising results. However, CPD often fails due to degeneracy, making the networks unstable and hard to fine-tune. TKD does not provide much compression if the core tensor is big. This motivates using a hybrid model of CPD and TKD, a decomposition with multiple Tucker models with small core tensor, known as block term decomposition (BTD). This paper proposes a more compact model that further compresses the BTD by enforcing core tensors in BTD identical. We establish a link between the BTD with shared parameters and a looped chain tensor network (TC). Unfortunately, such strongly constrained tensor networks (with loop) encounter severe numerical instability, as proved by y (Landsberg, 2012) and (Handschuh, 2015a). We study perturbation of chain tensor networks, provide interpretation of instability in TC, demonstrate the problem. We propose novel methods to gain the stability of the decomposition results, keep the network robust and attain better approximation. Experimental results will confirm the superiority of the proposed methods in compression of well-known CNNs, and TC decomposition under challenging scenarios △ Less

Submitted 4 March, 2022; originally announced March 2022.

MSC Class: 65K05; 49M27

arXiv:2108.09422 [pdf, other]

L3C-Stereo: Lossless Compression for Stereo Images

Authors: Zihao Huang, Zhe Sun, Feng Duan, Andrzej Cichocki, Peiying Ruan, Chao Li

Abstract: A large number of autonomous driving tasks need high-definition stereo images, which requires a large amount of storage space. Efficiently executing lossless compression has become a practical problem. Commonly, it is hard to make accurate probability estimates for each pixel. To tackle this, we propose L3C-Stereo, a multi-scale lossless compression model consisting of two main modules: the warpin… ▽ More A large number of autonomous driving tasks need high-definition stereo images, which requires a large amount of storage space. Efficiently executing lossless compression has become a practical problem. Commonly, it is hard to make accurate probability estimates for each pixel. To tackle this, we propose L3C-Stereo, a multi-scale lossless compression model consisting of two main modules: the warping module and the probability estimation module. The warping module takes advantage of two view feature maps from the same domain to generate a disparity map, which is used to reconstruct the right view so as to improve the confidence of the probability estimate of the right view. The probability estimation module provides pixel-wise logistic mixture distributions for adaptive arithmetic coding. In the experiments, our method outperforms the hand-crafted compression methods and the learning-based method on all three datasets used. Then, we show that a better maximum disparity can lead to a better compression effect. Furthermore, thanks to a compression property of our model, it naturally generates a disparity map of an acceptable quality for the subsequent stereo tasks. △ Less

Submitted 20 August, 2021; originally announced August 2021.

arXiv:2103.08561 [pdf, other]

Meta-Solver for Neural Ordinary Differential Equations

Authors: Julia Gusak, Alexandr Katrutsa, Talgat Daulbaev, Andrzej Cichocki, Ivan Oseledets

Abstract: A conventional approach to train neural ordinary differential equations (ODEs) is to fix an ODE solver and then learn the neural network's weights to optimize a target loss function. However, such an approach is tailored for a specific discretization method and its properties, which may not be optimal for the selected application and yield the overfitting to the given solver. In our paper, we inve… ▽ More A conventional approach to train neural ordinary differential equations (ODEs) is to fix an ODE solver and then learn the neural network's weights to optimize a target loss function. However, such an approach is tailored for a specific discretization method and its properties, which may not be optimal for the selected application and yield the overfitting to the given solver. In our paper, we investigate how the variability in solvers' space can improve neural ODEs performance. We consider a family of Runge-Kutta methods that are parameterized by no more than two scalar variables. Based on the solvers' properties, we propose an approach to decrease neural ODEs overfitting to the pre-defined solver, along with a criterion to evaluate such behaviour. Moreover, we show that the right choice of solver parameterization can significantly affect neural ODEs models in terms of robustness to adversarial attacks. Recently it was shown that neural ODEs demonstrate superiority over conventional CNNs in terms of robustness. Our work demonstrates that the model robustness can be further improved by optimizing solver choice for a given task. The source code to reproduce our experiments is available at https://github.com/juliagusak/neural-ode-metasolver. △ Less

Submitted 15 March, 2021; originally announced March 2021.

arXiv:2101.09184 [pdf, other]

Tensor-Train Networks for Learning Predictive Modeling of Multidimensional Data

Authors: M. Nazareth da Costa, R. Attux, A. Cichocki, J. M. T. Romano

Abstract: In this work, we firstly apply the Train-Tensor (TT) networks to construct a compact representation of the classical Multilayer Perceptron, representing a reduction of up to 95% of the coefficients. A comparative analysis between tensor model and standard multilayer neural networks is also carried out in the context of prediction of the Mackey-Glass noisy chaotic time series and NASDAQ index. We s… ▽ More In this work, we firstly apply the Train-Tensor (TT) networks to construct a compact representation of the classical Multilayer Perceptron, representing a reduction of up to 95% of the coefficients. A comparative analysis between tensor model and standard multilayer neural networks is also carried out in the context of prediction of the Mackey-Glass noisy chaotic time series and NASDAQ index. We show that the weights of a multidimensional regression model can be learned by means of TT network and the optimization of TT weights is a more robust to the impact of coefficient initialization and hyper-parameter setting. Furthermore, an efficient algorithm based on alternating least squares has been proposed for approximating the weights in TT-format with a reduction of computational calculus, providing a much faster convergence than the well-known adaptive learning-method algorithms, widely applied for optimizing neural networks. △ Less

Submitted 30 March, 2021; v1 submitted 22 January, 2021; originally announced January 2021.

Comments: 34 pages, 16 figures

arXiv:2012.06813 [pdf, other]

Improving EEG Decoding via Clustering-based Multi-task Feature Learning

Authors: Yu Zhang, Tao Zhou, Wei Wu, Hua Xie, Hongru Zhu, Guoxu Zhou, Andrzej Cichocki

Abstract: Accurate electroencephalogram (EEG) pattern decoding for specific mental tasks is one of the key steps for the development of brain-computer interface (BCI), which is quite challenging due to the considerably low signal-to-noise ratio of EEG collected at the brain scalp. Machine learning provides a promising technique to optimize EEG patterns toward better decoding accuracy. However, existing algo… ▽ More Accurate electroencephalogram (EEG) pattern decoding for specific mental tasks is one of the key steps for the development of brain-computer interface (BCI), which is quite challenging due to the considerably low signal-to-noise ratio of EEG collected at the brain scalp. Machine learning provides a promising technique to optimize EEG patterns toward better decoding accuracy. However, existing algorithms do not effectively explore the underlying data structure capturing the true EEG sample distribution, and hence can only yield a suboptimal decoding accuracy. To uncover the intrinsic distribution structure of EEG data, we propose a clustering-based multi-task feature learning algorithm for improved EEG pattern decoding. Specifically, we perform affinity propagation-based clustering to explore the subclasses (i.e., clusters) in each of the original classes, and then assign each subclass a unique label based on a one-versus-all encoding strategy. With the encoded label matrix, we devise a novel multi-task learning algorithm by exploiting the subclass relationship to jointly optimize the EEG pattern features from the uncovered subclasses. We then train a linear support vector machine with the optimized features for EEG pattern decoding. Extensive experimental studies are conducted on three EEG datasets to validate the effectiveness of our algorithm in comparison with other state-of-the-art approaches. The improved experimental results demonstrate the outstanding superiority of our algorithm, suggesting its prominent performance for EEG pattern decoding in BCI applications. △ Less

Submitted 12 December, 2020; originally announced December 2020.

arXiv:2011.11128 [pdf, other]

doi 10.1109/TCDS.2021.3079712

Deep Learning in EEG: Advance of the Last Ten-Year Critical Period

Authors: Shu Gong, Kaibo Xing, Andrzej Cichocki, Junhua Li

Abstract: Deep learning has achieved excellent performance in a wide range of domains, especially in speech recognition and computer vision. Relatively less work has been done for EEG, but there is still significant progress attained in the last decade. Due to the lack of a comprehensive and topic widely covered survey for deep learning in EEG, we attempt to summarize recent progress to provide an overview,… ▽ More Deep learning has achieved excellent performance in a wide range of domains, especially in speech recognition and computer vision. Relatively less work has been done for EEG, but there is still significant progress attained in the last decade. Due to the lack of a comprehensive and topic widely covered survey for deep learning in EEG, we attempt to summarize recent progress to provide an overview, as well as perspectives for future developments. We first briefly mention the artifacts removal for EEG signal and then introduce deep learning models that have been utilized in EEG processing and classification. Subsequently, the applications of deep learning in EEG are reviewed by categorizing them into groups such as brain-computer interface, disease detection, and emotion recognition. They are followed by the discussion, in which the pros and cons of deep learning are presented and future directions and challenges for deep learning in EEG are proposed. We hope that this paper could serve as a summary of past work for deep learning in EEG and the beginning of further developments and achievements of EEG studies based on deep learning. △ Less

Submitted 20 May, 2021; v1 submitted 22 November, 2020; originally announced November 2020.

Comments: Accepted for publication in the IEEE Transactions on Cognitive and Developmental Systems

Journal ref: IEEE Transactions on Cognitive and Developmental Systems, 2021

arXiv:2008.05441 [pdf, other]

Stable Low-rank Tensor Decomposition for Compression of Convolutional Neural Network

Authors: Anh-Huy Phan, Konstantin Sobolev, Konstantin Sozykin, Dmitry Ermilov, Julia Gusak, Petr Tichavsky, Valeriy Glukhov, Ivan Oseledets, Andrzej Cichocki

Abstract: Most state of the art deep neural networks are overparameterized and exhibit a high computational cost. A straightforward approach to this problem is to replace convolutional kernels with its low-rank tensor approximations, whereas the Canonical Polyadic tensor Decomposition is one of the most suited models. However, fitting the convolutional tensors by numerical optimization algorithms often enco… ▽ More Most state of the art deep neural networks are overparameterized and exhibit a high computational cost. A straightforward approach to this problem is to replace convolutional kernels with its low-rank tensor approximations, whereas the Canonical Polyadic tensor Decomposition is one of the most suited models. However, fitting the convolutional tensors by numerical optimization algorithms often encounters diverging components, i.e., extremely large rank-one tensors but canceling each other. Such degeneracy often causes the non-interpretable result and numerical instability for the neural network fine-tuning. This paper is the first study on degeneracy in the tensor decomposition of convolutional kernels. We present a novel method, which can stabilize the low-rank approximation of convolutional kernels and ensure efficient compression while preserving the high-quality performance of the neural networks. We evaluate our approach on popular CNN architectures for image classification and show that our method results in much lower accuracy degradation and provides consistent performance. △ Less

Submitted 12 August, 2020; originally announced August 2020.

Comments: This paper is accepted to ECCV2020

arXiv:2008.04793 [pdf]

Future Trends for Human-AI Collaboration: A Comprehensive Taxonomy of AI/AGI Using Multiple Intelligences and Learning Styles

Authors: Andrzej Cichocki, Alexander P. Kuleshov

Abstract: This article discusses some trends and concepts in developing new generation of future Artificial General Intelligence (AGI) systems which relate to complex facets and different types of human intelligence, especially social, emotional, attentional and ethical intelligence. We describe various aspects of multiple human intelligences and learning styles, which may impact on a variety of AI problem… ▽ More This article discusses some trends and concepts in developing new generation of future Artificial General Intelligence (AGI) systems which relate to complex facets and different types of human intelligence, especially social, emotional, attentional and ethical intelligence. We describe various aspects of multiple human intelligences and learning styles, which may impact on a variety of AI problem domains. Using the concept of 'multiple intelligences' rather than a single type of intelligence, we categorize and provide working definitions of various AGI depending on their cognitive skills or capacities. Future AI systems will be able not only to communicate with human users and each other, but also to efficiently exchange knowledge and wisdom with abilities of cooperation, collaboration and even co-creating something new and valuable and have meta-learning capacities. Multi-agent systems such as these can be used to solve problems that would be difficult to solve by any individual intelligent agent. Key words: Artificial General Intelligence (AGI), multiple intelligences, learning styles, physical intelligence, emotional intelligence, social intelligence, attentional intelligence, moral-ethical intelligence, responsible decision making, creative-innovative intelligence, cognitive functions, meta-learning of AI systems. △ Less

Submitted 11 December, 2020; v1 submitted 7 August, 2020; originally announced August 2020.

Comments: 19 Figures, 27 pages

Journal ref: Computational Intelligence and Neuroscience (2020)

arXiv:2004.09222 [pdf, other]

Towards Understanding Normalization in Neural ODEs

Authors: Julia Gusak, Larisa Markeeva, Talgat Daulbaev, Alexandr Katrutsa, Andrzej Cichocki, Ivan Oseledets

Abstract: Normalization is an important and vastly investigated technique in deep learning. However, its role for Ordinary Differential Equation based networks (neural ODEs) is still poorly understood. This paper investigates how different normalization techniques affect the performance of neural ODEs. Particularly, we show that it is possible to achieve 93% accuracy in the CIFAR-10 classification task, and… ▽ More Normalization is an important and vastly investigated technique in deep learning. However, its role for Ordinary Differential Equation based networks (neural ODEs) is still poorly understood. This paper investigates how different normalization techniques affect the performance of neural ODEs. Particularly, we show that it is possible to achieve 93% accuracy in the CIFAR-10 classification task, and to the best of our knowledge, this is the highest reported accuracy among neural ODEs tested on this problem. △ Less

Submitted 27 April, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

arXiv:2003.05271 [pdf, other]

Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs

Authors: Talgat Daulbaev, Alexandr Katrutsa, Larisa Markeeva, Julia Gusak, Andrzej Cichocki, Ivan Oseledets

Abstract: We propose a simple interpolation-based method for the efficient approximation of gradients in neural ODE models. We compare it with the reverse dynamic method (known in the literature as "adjoint method") to train neural ODEs on classification, density estimation, and inference approximation tasks. We also propose a theoretical justification of our approach using logarithmic norm formalism. As a… ▽ More We propose a simple interpolation-based method for the efficient approximation of gradients in neural ODE models. We compare it with the reverse dynamic method (known in the literature as "adjoint method") to train neural ODEs on classification, density estimation, and inference approximation tasks. We also propose a theoretical justification of our approach using logarithmic norm formalism. As a result, our method allows faster model training than the reverse dynamic method that was confirmed and validated by extensive numerical experiments for several standard benchmarks. △ Less

Submitted 30 October, 2020; v1 submitted 11 March, 2020; originally announced March 2020.

arXiv:2002.12135 [pdf, other]

Block Hankel Tensor ARIMA for Multiple Short Time Series Forecasting

Authors: Qiquan Shi, Jiaming Yin, Jiajun Cai, Andrzej Cichocki, Tatsuya Yokota, Lei Chen, Mingxuan Yuan, Jia Zeng

Abstract: This work proposes a novel approach for multiple time series forecasting. At first, multi-way delay embedding transform (MDT) is employed to represent time series as low-rank block Hankel tensors (BHT). Then, the higher-order tensors are projected to compressed core tensors by applying Tucker decomposition. At the same time, the generalized tensor Autoregressive Integrated Moving Average (ARIMA) i… ▽ More This work proposes a novel approach for multiple time series forecasting. At first, multi-way delay embedding transform (MDT) is employed to represent time series as low-rank block Hankel tensors (BHT). Then, the higher-order tensors are projected to compressed core tensors by applying Tucker decomposition. At the same time, the generalized tensor Autoregressive Integrated Moving Average (ARIMA) is explicitly used on consecutive core tensors to predict future samples. In this manner, the proposed approach tactically incorporates the unique advantages of MDT tensorization (to exploit mutual correlations) and tensor ARIMA coupled with low-rank Tucker decomposition into a unified framework. This framework exploits the low-rank structure of block Hankel tensors in the embedded space and captures the intrinsic correlations among multiple TS, which thus can improve the forecasting results, especially for multiple short time series. Experiments conducted on three public datasets and two industrial datasets verify that the proposed BHT-ARIMA effectively improves forecasting accuracy and reduces computational cost compared with the state-of-the-art methods. △ Less

Submitted 25 February, 2020; originally announced February 2020.

Comments: Accepted by AAAI 2020

arXiv:2002.11523 [pdf]

doi 10.1134/S1064226919120131

Using Reinforcement Learning in the Algorithmic Trading Problem

Authors: Evgeny Ponomarev, Ivan Oseledets, Andrzej Cichocki

Abstract: The development of reinforced learning methods has extended application to many areas including algorithmic trading. In this paper trading on the stock exchange is interpreted into a game with a Markov property consisting of states, actions, and rewards. A system for trading the fixed volume of a financial instrument is proposed and experimentally tested; this is based on the asynchronous advantag… ▽ More The development of reinforced learning methods has extended application to many areas including algorithmic trading. In this paper trading on the stock exchange is interpreted into a game with a Markov property consisting of states, actions, and rewards. A system for trading the fixed volume of a financial instrument is proposed and experimentally tested; this is based on the asynchronous advantage actor-critic method with the use of several neural network architectures. The application of recurrent layers in this approach is investigated. The experiments were performed on real anonymized data. The best architecture demonstrated a trading strategy for the RTS Index futures (MOEX:RTSI) with a profitability of 66% per annum accounting for commission. The project source code is available via the following link: http://github.com/evgps/a3c_trading. △ Less

Submitted 26 February, 2020; originally announced February 2020.

Journal ref: ISSN 1064-2269, Journal of Communications Technology and Electronics, 2019, Vol. 64, No. 12, pp. 1450-1457

arXiv:1910.06995 [pdf, other]

Reduced-Order Modeling of Deep Neural Networks

Authors: Julia Gusak, Talgat Daulbaev, Evgeny Ponomarev, Andrzej Cichocki, Ivan Oseledets

Abstract: We introduce a new method for speeding up the inference of deep neural networks. It is somewhat inspired by the reduced-order modeling techniques for dynamical systems.The cornerstone of the proposed method is the maximum volume algorithm. We demonstrate efficiency on neural networks pre-trained on different datasets. We show that in many practical cases it is possible to replace convolutional lay… ▽ More We introduce a new method for speeding up the inference of deep neural networks. It is somewhat inspired by the reduced-order modeling techniques for dynamical systems.The cornerstone of the proposed method is the maximum volume algorithm. We demonstrate efficiency on neural networks pre-trained on different datasets. We show that in many practical cases it is possible to replace convolutional layers with much smaller fully-connected layers with a relatively small drop in accuracy. △ Less

Submitted 25 November, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

arXiv:1908.02995 [pdf, other]

Manifold Modeling in Embedded Space: A Perspective for Interpreting Deep Image Prior

Authors: Tatsuya Yokota, Hidekata Hontani, Qibin Zhao, Andrzej Cichocki

Abstract: Deep image prior (DIP), which utilizes a deep convolutional network (ConvNet) structure itself as an image prior, has attracted attentions in computer vision and machine learning communities. It empirically shows the effectiveness of ConvNet structure for various image restoration applications. However, why the DIP works so well is still unknown, and why convolution operation is useful for image r… ▽ More Deep image prior (DIP), which utilizes a deep convolutional network (ConvNet) structure itself as an image prior, has attracted attentions in computer vision and machine learning communities. It empirically shows the effectiveness of ConvNet structure for various image restoration applications. However, why the DIP works so well is still unknown, and why convolution operation is useful for image reconstruction or enhancement is not very clear. In this study, we tackle these questions. The proposed approach is dividing the convolution into ``delay-embedding'' and ``transformation (\ie encoder-decoder)'', and proposing a simple, but essential, image/tensor modeling method which is closely related to dynamical systems and self-similarity. The proposed method named as manifold modeling in embedded space (MMES) is implemented by using a novel denoising-auto-encoder in combination with multi-way delay-embedding transform. In spite of its simplicity, the image/tensor completion, super-resolution, deconvolution, and denoising results of MMES are quite similar even competitive to DIP in our extensive experiments, and these results would help us for reinterpreting/characterizing the DIP from a perspective of ``low-dimensional patch-manifold prior''. △ Less

Submitted 21 January, 2020; v1 submitted 8 August, 2019; originally announced August 2019.

arXiv:1907.12827 [pdf]

doi 10.1109/TCYB.2020.3035282

Multi-Kernel Capsule Network for Schizophrenia Identification

Authors: Tian Wang, Anastasios Bezerianos, Andrzej Cichocki, Junhua Li

Abstract: Objective: Schizophrenia seriously affects the quality of life. To date, both simple (linear discriminant analysis) and complex (deep neural network) machine learning methods have been utilized to identify schizophrenia based on functional connectivity features. The existing simple methods need two separate steps (i.e., feature extraction and classification) to achieve the identification, which di… ▽ More Objective: Schizophrenia seriously affects the quality of life. To date, both simple (linear discriminant analysis) and complex (deep neural network) machine learning methods have been utilized to identify schizophrenia based on functional connectivity features. The existing simple methods need two separate steps (i.e., feature extraction and classification) to achieve the identification, which disables simultaneous tuning for the best feature extraction and classifier training. The complex methods integrate two steps and can be simultaneously tuned to achieve optimal performance, but these methods require a much larger amount of data for model training. Methods: To overcome the aforementioned drawbacks, we proposed a multi-kernel capsule network (MKCapsnet), which was developed by considering the brain anatomical structure. Kernels were set to match with partition sizes of brain anatomical structure in order to capture interregional connectivities at the varying scales. With the inspiration of widely-used dropout strategy in deep learning, we developed vector dropout in the capsule layer to prevent overfitting of the model. Results: The comparison results showed that the proposed method outperformed the state-of-the-art methods. Besides, we compared performances using different parameters and illustrated the routing process to reveal characteristics of the proposed method. Conclusion: MKCapsnet is promising for schizophrenia identification. Significance: Our study not only proposed a multi-kernel capsule network but also provided useful information in the parameter setting, which is informative for further studies using a capsule network for neurophysiological signal classification. △ Less

Submitted 30 July, 2019; originally announced July 2019.

Comments: IEEE Transactions on Cybernetics (2020)

arXiv:1903.09973 [pdf, other]

MUSCO: Multi-Stage Compression of neural networks

Authors: Julia Gusak, Maksym Kholiavchenko, Evgeny Ponomarev, Larisa Markeeva, Ivan Oseledets, Andrzej Cichocki

Abstract: The low-rank tensor approximation is very promising for the compression of deep neural networks. We propose a new simple and efficient iterative approach, which alternates low-rank factorization with a smart rank selection and fine-tuning. We demonstrate the efficiency of our method comparing to non-iterative ones. Our approach improves the compression rate while maintaining the accuracy for a var… ▽ More The low-rank tensor approximation is very promising for the compression of deep neural networks. We propose a new simple and efficient iterative approach, which alternates low-rank factorization with a smart rank selection and fine-tuning. We demonstrate the efficiency of our method comparing to non-iterative ones. Our approach improves the compression rate while maintaining the accuracy for a variety of tasks. △ Less

Submitted 15 November, 2019; v1 submitted 24 March, 2019; originally announced March 2019.

arXiv:1803.07226 [pdf, other]

Learning the Hierarchical Parts of Objects by Deep Non-Smooth Nonnegative Matrix Factorization

Authors: Jinshi Yu, Guoxu Zhou, Andrzej Cichocki, Shengli Xie

Abstract: Nonsmooth Nonnegative Matrix Factorization (nsNMF) is capable of producing more localized, less overlapped feature representations than other variants of NMF while keeping satisfactory fit to data. However, nsNMF as well as other existing NMF methods is incompetent to learn hierarchical features of complex data due to its shallow structure. To fill this gap, we propose a deep nsNMF method coined b… ▽ More Nonsmooth Nonnegative Matrix Factorization (nsNMF) is capable of producing more localized, less overlapped feature representations than other variants of NMF while keeping satisfactory fit to data. However, nsNMF as well as other existing NMF methods is incompetent to learn hierarchical features of complex data due to its shallow structure. To fill this gap, we propose a deep nsNMF method coined by the fact that it possesses a deeper architecture compared with standard nsNMF. The deep nsNMF not only gives parts-based features due to the nonnegativity constraints, but also creates higher-level, more abstract features by combing lower-level ones. The in-depth description of how deep architecture can help to efficiently discover abstract features in dnsNMF is presented. And we also show that the deep nsNMF has close relationship with the deep autoencoder, suggesting that the proposed model inherits the major advantages from both deep learning and NMF. Extensive experiments demonstrate the standout performance of the proposed method in clustering analysis. △ Less

Submitted 19 March, 2018; originally announced March 2018.

arXiv:1708.09165 [pdf, other]

doi 10.1561/2200000067

Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

Authors: A. Cichocki, A-H. Phan, Q. Zhao, N. Lee, I. V. Oseledets, M. Sugiyama, D. Mandic

Abstract: Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical… ▽ More Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions. △ Less

Submitted 30 August, 2017; originally announced August 2017.

Comments: 232 pages

Journal ref: Foundations and Trends in Machine Learning: Vol. 9: No. 6, pp 431-673, 2017

arXiv:1609.09578 [pdf]

Optimized motor imagery paradigm based on imagining Chinese characters writing movement

Authors: Zhaoyang Qiu, Brendan Z. Allison, Jing Jin, Yu Zhang, Xingyu Wang, Wei Li, Andrzej Cichocki

Abstract: Motor imagery (MI) is a mental representation of motor behavior that has been widely used as a control method for a brain-computer interface (BCI), allowing communication for the physically impaired. The performance of MI based BCI mainly depends on the subject's ability to self-modulate EEG signals. Proper training can help naive subjects learn to modulate brain activity proficiently. However, tr… ▽ More Motor imagery (MI) is a mental representation of motor behavior that has been widely used as a control method for a brain-computer interface (BCI), allowing communication for the physically impaired. The performance of MI based BCI mainly depends on the subject's ability to self-modulate EEG signals. Proper training can help naive subjects learn to modulate brain activity proficiently. However, training subjects typically involves abstract motor tasks and is time-consuming. To improve the performance of naive subjects during motor imagery, a novel paradigm was presented that would guide naive subjects to modulate brain activity effectively. In this new paradigm, pictures of the left or right hand were used as cues for subjects to finish the motor imagery task. Fourteen healthy subjects (11 male, aged 22-25 years, mean 23.6+/-1.16) participated in this study. The task was to imagine writing a Chinese character. Specifically, subjects could imagine hand movements following the sequence of writing strokes in the Chinese character. This paradigm was meant to find an effective and familiar action for most Chinese people, to provide them with a specific, extensively practiced task and help them modulate brain activity. Results showed that the writing task paradigm yielded significantly better performance than the traditional arrow paradigm (p<0.001). Questionnaire replies indicated that most subjects thought the new paradigm was easier and more comfortable. The proposed new motor imagery paradigm could guide subjects to help them modulate brain activity effectively. Results showed that there were significant improvements using new paradigm, both in classification accuracy and usability. △ Less

Submitted 29 September, 2016; originally announced September 2016.

arXiv:1606.05535 [pdf, other]

Tensor Ring Decomposition

Authors: Qibin Zhao, Guoxu Zhou, Shengli Xie, Liqing Zhang, Andrzej Cichocki

Abstract: Tensor networks have in recent years emerged as the powerful tools for solving the large-scale optimization problems. One of the most popular tensor network is tensor train (TT) decomposition that acts as the building blocks for the complicated tensor networks. However, the TT decomposition highly depends on permutations of tensor dimensions, due to its strictly sequential multilinear products ove… ▽ More Tensor networks have in recent years emerged as the powerful tools for solving the large-scale optimization problems. One of the most popular tensor network is tensor train (TT) decomposition that acts as the building blocks for the complicated tensor networks. However, the TT decomposition highly depends on permutations of tensor dimensions, due to its strictly sequential multilinear products over latent cores, which leads to difficulties in finding the optimal TT representation. In this paper, we introduce a fundamental tensor decomposition model to represent a large dimensional tensor by a circular multilinear products over a sequence of low dimensional cores, which can be graphically interpreted as a cyclic interconnection of 3rd-order tensors, and thus termed as tensor ring (TR) decomposition. The key advantage of TR model is the circular dimensional permutation invariance which is gained by employing the trace operation and treating the latent cores equivalently. TR model can be viewed as a linear combination of TT decompositions, thus obtaining the powerful and generalized representation abilities. For optimization of latent cores, we present four different algorithms based on the sequential SVDs, ALS scheme, and block-wise ALS techniques. Furthermore, the mathematical properties of TR model are investigated, which shows that the basic multilinear algebra can be performed efficiently by using TR representaions and the classical tensor decompositions can be conveniently transformed into the TR representation. Finally, the experiments on both synthetic signals and real-world datasets were conducted to evaluate the performance of different algorithms. △ Less

Submitted 17 June, 2016; originally announced June 2016.

arXiv:1508.07416 [pdf, other]

doi 10.1109/JPROC.2015.2474704

Linked Component Analysis from Matrices to High Order Tensors: Applications to Biomedical Data

Authors: Guoxu Zhou, Qibin Zhao, Yu Zhang, Tülay Adalı, Shengli Xie, Andrzej Cichocki

Abstract: With the increasing availability of various sensor technologies, we now have access to large amounts of multi-block (also called multi-set, multi-relational, or multi-view) data that need to be jointly analyzed to explore their latent connections. Various component analysis methods have played an increasingly important role for the analysis of such coupled data. In this paper, we first provide a b… ▽ More With the increasing availability of various sensor technologies, we now have access to large amounts of multi-block (also called multi-set, multi-relational, or multi-view) data that need to be jointly analyzed to explore their latent connections. Various component analysis methods have played an increasingly important role for the analysis of such coupled data. In this paper, we first provide a brief review of existing matrix-based (two-way) component analysis methods for the joint analysis of such data with a focus on biomedical applications. Then, we discuss their important extensions and generalization to multi-block multiway (tensor) data. We show how constrained multi-block tensor decomposition methods are able to extract similar or statistically dependent common features that are shared by all blocks, by incorporating the multiway nature of data. Special emphasis is given to the flexible common and individual feature analysis of multi-block data with the aim to simultaneously extract common and individual latent components with desired properties and types of diversity. Illustrative examples are given to demonstrate their effectiveness for biomedical data analysis. △ Less

Submitted 29 August, 2015; originally announced August 2015.

Comments: 20 pages, 11 figures, Proceedings of the IEEE, 2015

arXiv:1505.06611 [pdf, other]

doi 10.1109/TSP.2016.2586759

Smooth PARAFAC Decomposition for Tensor Completion

Authors: Tatsuya Yokota, Qibin Zhao, Andrzej Cichocki

Abstract: In recent years, low-rank based tensor completion, which is a higher-order extension of matrix completion, has received considerable attention. However, the low-rank assumption is not sufficient for the recovery of visual data, such as color and 3D images, where the ratio of missing data is extremely high. In this paper, we consider "smoothness" constraints as well as low-rank approximations, and… ▽ More In recent years, low-rank based tensor completion, which is a higher-order extension of matrix completion, has received considerable attention. However, the low-rank assumption is not sufficient for the recovery of visual data, such as color and 3D images, where the ratio of missing data is extremely high. In this paper, we consider "smoothness" constraints as well as low-rank approximations, and propose an efficient algorithm for performing tensor completion that is particularly powerful regarding visual data. The proposed method admits significant advantages, owing to the integration of smooth PARAFAC decomposition for incomplete tensors and the efficient selection of models in order to minimize the tensor rank. Thus, our proposed method is termed as "smooth PARAFAC tensor completion (SPC)." In order to impose the smoothness constraints, we employ two strategies, total variation (SPC-TV) and quadratic variation (SPC-QV), and invoke the corresponding algorithms for model learning. Extensive experimental evaluations on both synthetic and real-world visual data illustrate the significant improvements of our method, in terms of both prediction performance and efficiency, compared with many state-of-the-art tensor completion methods. △ Less

Submitted 25 January, 2016; v1 submitted 25 May, 2015; originally announced May 2015.

Comments: 13 pages, 9 figures

arXiv:1505.02343 [pdf, other]

Bayesian Sparse Tucker Models for Dimension Reduction and Tensor Completion

Authors: Qibin Zhao, Liqing Zhang, Andrzej Cichocki

Abstract: Tucker decomposition is the cornerstone of modern machine learning on tensorial data analysis, which have attracted considerable attention for multiway feature extraction, compressive sensing, and tensor completion. The most challenging problem is related to determination of model complexity (i.e., multilinear rank), especially when noise and missing data are present. In addition, existing methods… ▽ More Tucker decomposition is the cornerstone of modern machine learning on tensorial data analysis, which have attracted considerable attention for multiway feature extraction, compressive sensing, and tensor completion. The most challenging problem is related to determination of model complexity (i.e., multilinear rank), especially when noise and missing data are present. In addition, existing methods cannot take into account uncertainty information of latent factors, resulting in low generalization performance. To address these issues, we present a class of probabilistic generative Tucker models for tensor decomposition and completion with structural sparsity over multilinear latent space. To exploit structural sparse modeling, we introduce two group sparsity inducing priors by hierarchial representation of Laplace and Student-t distributions, which facilitates fully posterior inference. For model learning, we derived variational Bayesian inferences over all model (hyper)parameters, and developed efficient and scalable algorithms based on multilinear operations. Our methods can automatically adapt model complexity and infer an optimal multilinear rank by the principle of maximum lower bound of model evidence. Experimental results and comparisons on synthetic, chemometrics and neuroimaging data demonstrate remarkable performance of our models for recovering ground-truth of multilinear rank and missing entries. △ Less

Submitted 10 May, 2015; originally announced May 2015.

arXiv:1503.01868 [pdf, other]

doi 10.1109/TIP.2016.2579262

Total Variation Regularized Tensor RPCA for Background Subtraction from Compressive Measurements

Authors: Wenfei Cao, Yao Wang, Jian Sun, Deyu Meng, Can Yang, Andrzej Cichocki, Zongben Xu

Abstract: Background subtraction has been a fundamental and widely studied task in video analysis, with a wide range of applications in video surveillance, teleconferencing and 3D modeling. Recently, motivated by compressive imaging, background subtraction from compressive measurements (BSCM) is becoming an active research task in video surveillance. In this paper, we propose a novel tensor-based robust PCA… ▽ More Background subtraction has been a fundamental and widely studied task in video analysis, with a wide range of applications in video surveillance, teleconferencing and 3D modeling. Recently, motivated by compressive imaging, background subtraction from compressive measurements (BSCM) is becoming an active research task in video surveillance. In this paper, we propose a novel tensor-based robust PCA (TenRPCA) approach for BSCM by decomposing video frames into backgrounds with spatial-temporal correlations and foregrounds with spatio-temporal continuity in a tensor framework. In this approach, we use 3D total variation (TV) to enhance the spatio-temporal continuity of foregrounds, and Tucker decomposition to model the spatio-temporal correlations of video background. Based on this idea, we design a basic tensor RPCA model over the video frames, dubbed as the holistic TenRPCA model (H-TenRPCA). To characterize the correlations among the groups of similar 3D patches of video background, we further design a patch-group-based tensor RPCA model (PG-TenRPCA) by joint tensor Tucker decompositions of 3D patch groups for modeling the video background. Efficient algorithms using alternating direction method of multipliers (ADMM) are developed to solve the proposed models. Extensive experiments on simulated and real-world videos demonstrate the superiority of the proposed approaches over the existing state-of-the-art approaches. △ Less

Submitted 5 June, 2016; v1 submitted 6 March, 2015; originally announced March 2015.

Comments: To appear in IEEE TIP

arXiv:1412.7146 [pdf, other]

doi 10.3390/e17052988

Log-Determinant Divergences Revisited: Alpha--Beta and Gamma Log-Det Divergences

Authors: Andrzej Cichocki, Sergio Cruces, Shun-Ichi Amari

Abstract: In this paper, we review and extend a family of log-det divergences for symmetric positive definite (SPD) matrices and discuss their fundamental properties. We show how to generate from parameterized Alpha-Beta (AB) and Gamma Log-det divergences many well known divergences, for example, the Stein's loss, S-divergence, called also Jensen-Bregman LogDet (JBLD) divergence, the Logdet Zero (Bhattachar… ▽ More In this paper, we review and extend a family of log-det divergences for symmetric positive definite (SPD) matrices and discuss their fundamental properties. We show how to generate from parameterized Alpha-Beta (AB) and Gamma Log-det divergences many well known divergences, for example, the Stein's loss, S-divergence, called also Jensen-Bregman LogDet (JBLD) divergence, the Logdet Zero (Bhattacharryya) divergence, Affine Invariant Riemannian Metric (AIRM) as well as some new divergences. Moreover, we establish links and correspondences among many log-det divergences and display them on alpha-beta plain for various set of parameters. Furthermore, this paper bridges these divergences and shows also their links to divergences of multivariate and multiway Gaussian distributions. Closed form formulas are derived for gamma divergences of two multivariate Gaussian densities including as special cases the Kullback-Leibler, Bhattacharryya, Rényi and Cauchy-Schwartz divergences. Symmetrized versions of the log-det divergences are also discussed and reviewed. A class of divergences is extended to multiway divergences for separable covariance (precision) matrices. △ Less

Submitted 23 December, 2014; v1 submitted 18 December, 2014; originally announced December 2014.

Comments: 35 pages, 4 figures

arXiv:1412.1885 [pdf, other]

Decomposition of Big Tensors With Low Multilinear Rank

Authors: Guoxu Zhou, Andrzej Cichocki, Shengli Xie

Abstract: Tensor decompositions are promising tools for big data analytics as they bring multiple modes and aspects of data to a unified framework, which allows us to discover complex internal structures and correlations of data. Unfortunately most existing approaches are not designed to meet the major challenges posed by big data analytics. This paper attempts to improve the scalability of tensor decomposi… ▽ More Tensor decompositions are promising tools for big data analytics as they bring multiple modes and aspects of data to a unified framework, which allows us to discover complex internal structures and correlations of data. Unfortunately most existing approaches are not designed to meet the major challenges posed by big data analytics. This paper attempts to improve the scalability of tensor decompositions and provides two contributions: A flexible and fast algorithm for the CP decomposition (FFCP) of tensors based on their Tucker compression; A distributed randomized Tucker decomposition approach for arbitrarily big tensors but with relatively low multilinear rank. These two algorithms can deal with huge tensors, even if they are dense. Extensive simulations provide empirical evidence of the validity and efficiency of the proposed algorithms. △ Less

Submitted 28 December, 2014; v1 submitted 4 December, 2014; originally announced December 2014.

arXiv:1410.6313 [pdf, ps, other]

doi 10.1109/JBHI.2015.2491645

Canonical Polyadic Decomposition with Auxiliary Information for Brain Computer Interface

Authors: Junhua Li, Chao Li, Andrzej Cichocki

Abstract: Physiological signals are often organized in the form of multiple dimensions (e.g., channel, time, task, and 3D voxel), so it is better to preserve original organization structure when processing. Unlike vector-based methods that destroy data structure, Canonical Polyadic Decomposition (CPD) aims to process physiological signals in the form of multi-way array, which considers relationships between… ▽ More Physiological signals are often organized in the form of multiple dimensions (e.g., channel, time, task, and 3D voxel), so it is better to preserve original organization structure when processing. Unlike vector-based methods that destroy data structure, Canonical Polyadic Decomposition (CPD) aims to process physiological signals in the form of multi-way array, which considers relationships between dimensions and preserves structure information contained by the physiological signal. Nowadays, CPD is utilized as an unsupervised method for feature extraction in a classification problem. After that, a classifier, such as support vector machine, is required to classify those features. In this manner, classification task is achieved in two isolated steps. We proposed supervised Canonical Polyadic Decomposition by directly incorporating auxiliary label information during decomposition, by which a classification task can be achieved without an extra step of classifier training. The proposed method merges the decomposition and classifier learning together, so it reduces procedure of classification task compared with that of respective decomposition and classification. In order to evaluate the performance of the proposed method, three different kinds of signals, synthetic signal, EEG signal, and MEG signal, were used. The results based on evaluations of synthetic and real signals demonstrated that the proposed method is effective and efficient. △ Less

Submitted 5 November, 2015; v1 submitted 23 October, 2014; originally announced October 2014.

Journal ref: IEEE journal of biomedical and health informatics, 2015, 21(1): 263-271

arXiv:1410.2386 [pdf, other]

doi 10.1109/TNNLS.2015.2423694

Bayesian Robust Tensor Factorization for Incomplete Multiway Data

Authors: Qibin Zhao, Guoxu Zhou, Liqing Zhang, Andrzej Cichocki, Shun-ichi Amari

Abstract: We propose a generative model for robust tensor factorization in the presence of both missing data and outliers. The objective is to explicitly infer the underlying low-CP-rank tensor capturing the global information and a sparse tensor capturing the local information (also considered as outliers), thus providing the robust predictive distribution over missing entries. The low-CP-rank tensor is mo… ▽ More We propose a generative model for robust tensor factorization in the presence of both missing data and outliers. The objective is to explicitly infer the underlying low-CP-rank tensor capturing the global information and a sparse tensor capturing the local information (also considered as outliers), thus providing the robust predictive distribution over missing entries. The low-CP-rank tensor is modeled by multilinear interactions between multiple latent factors on which the column sparsity is enforced by a hierarchical prior, while the sparse tensor is modeled by a hierarchical view of Student-$t$ distribution that associates an individual hyperparameter with each element independently. For model learning, we develop an efficient closed-form variational inference under a fully Bayesian treatment, which can effectively prevent the overfitting problem and scales linearly with data size. In contrast to existing related works, our method can perform model selection automatically and implicitly without need of tuning parameters. More specifically, it can discover the groundtruth of CP rank and automatically adapt the sparsity inducing priors to various types of outliers. In addition, the tradeoff between the low-rank approximation and the sparse representation can be optimized in the sense of maximum model evidence. The extensive experiments and comparisons with many state-of-the-art algorithms on both synthetic and real-world datasets demonstrate the superiorities of our method from several perspectives. △ Less

Submitted 16 April, 2015; v1 submitted 9 October, 2014; originally announced October 2014.

Comments: in IEEE Transactions on Neural Networks and Learning Systems, 2015

arXiv:1410.0818 [pdf, other]

doi 10.1016/j.neucom.2014.08.092

Feature Learning from Incomplete EEG with Denoising Autoencoder

Authors: Junhua Li, Zbigniew Struzik, Liqing Zhang, Andrzej Cichocki

Abstract: An alternative pathway for the human brain to communicate with the outside world is by means of a brain computer interface (BCI). A BCI can decode electroencephalogram (EEG) signals of brain activities, and then send a command or an intent to an external interactive device, such as a wheelchair. The effectiveness of the BCI depends on the performance in decoding the EEG. Usually, the EEG is contam… ▽ More An alternative pathway for the human brain to communicate with the outside world is by means of a brain computer interface (BCI). A BCI can decode electroencephalogram (EEG) signals of brain activities, and then send a command or an intent to an external interactive device, such as a wheelchair. The effectiveness of the BCI depends on the performance in decoding the EEG. Usually, the EEG is contaminated by different kinds of artefacts (e.g., electromyogram (EMG), background activity), which leads to a low decoding performance. A number of filtering methods can be utilized to remove or weaken the effects of artefacts, but they generally fail when the EEG contains extreme artefacts. In such cases, the most common approach is to discard the whole data segment containing extreme artefacts. This causes the fatal drawback that the BCI cannot output decoding results during that time. In order to solve this problem, we employ the Lomb-Scargle periodogram to estimate the spectral power from incomplete EEG (after removing only parts contaminated by artefacts), and Denoising Autoencoder (DAE) for learning. The proposed method is evaluated with motor imagery EEG data. The results show that our method can successfully decode incomplete EEG to good effect. △ Less

Submitted 3 October, 2014; originally announced October 2014.

Comments: The paper was accepted for publication by Neurocomputing

Journal ref: Neurocomputing, 2015, 165: 23-31

arXiv:1409.0347 [pdf, other]

Multi-tensor Completion for Estimating Missing Values in Video Data

Authors: Chao Li, Lili Guo, Andrzej Cichocki

Abstract: Many tensor-based data completion methods aim to solve image and video in-painting problems. But, all methods were only developed for a single dataset. In most of real applications, we can usually obtain more than one dataset to reflect one phenomenon, and all the datasets are mutually related in some sense. Thus one question raised whether such the relationship can improve the performance of data… ▽ More Many tensor-based data completion methods aim to solve image and video in-painting problems. But, all methods were only developed for a single dataset. In most of real applications, we can usually obtain more than one dataset to reflect one phenomenon, and all the datasets are mutually related in some sense. Thus one question raised whether such the relationship can improve the performance of data completion or not? In the paper, we proposed a novel and efficient method by exploiting the relationship among datasets for multi-video data completion. Numerical results show that the proposed method significantly improve the performance of video in-painting, particularly in the case of very high missing percentage. △ Less

Submitted 1 September, 2014; originally announced September 2014.

arXiv:1406.3295 [pdf, ps, other]

doi 10.1109/TSP.2014.2385040

Stable, Robust and Super Fast Reconstruction of Tensors Using Multi-Way Projections

Authors: Cesar F. Caiafa, Andrzej Cichocki

Abstract: In the framework of multidimensional Compressed Sensing (CS), we introduce an analytical reconstruction formula that allows one to recover an $N$th-order $(I_1\times I_2\times \cdots \times I_N)$ data tensor $\underline{\mathbf{X}}$ from a reduced set of multi-way compressive measurements by exploiting its low multilinear-rank structure. Moreover, we show that, an interesting property of multi-way… ▽ More In the framework of multidimensional Compressed Sensing (CS), we introduce an analytical reconstruction formula that allows one to recover an $N$th-order $(I_1\times I_2\times \cdots \times I_N)$ data tensor $\underline{\mathbf{X}}$ from a reduced set of multi-way compressive measurements by exploiting its low multilinear-rank structure. Moreover, we show that, an interesting property of multi-way measurements allows us to build the reconstruction based on compressive linear measurements taken only in two selected modes, independently of the tensor order $N$. In addition, it is proved that, in the matrix case and in a particular case with $3$rd-order tensors where the same 2D sensor operator is applied to all mode-3 slices, the proposed reconstruction $\underline{\mathbf{X}}_τ$ is stable in the sense that the approximation error is comparable to the one provided by the best low-multilinear-rank approximation, where $τ$ is a threshold parameter that controls the approximation error. Through the analysis of the upper bound of the approximation error we show that, in the 2D case, an optimal value for the threshold parameter $τ=τ_0 > 0$ exists, which is confirmed by our simulation results. On the other hand, our experiments on 3D datasets show that very good reconstructions are obtained using $τ=0$, which means that this parameter does not need to be tuned. Our extensive simulation results demonstrate the stability and robustness of the method when it is applied to real-world 2D and 3D signals. A comparison with state-of-the-arts sparsity based CS methods specialized for multidimensional signals is also included. A very attractive characteristic of the proposed method is that it provides a direct computation, i.e. it is non-iterative in contrast to all existing sparsity based CS algorithms, thus providing super fast computations, even for large datasets. △ Less

Submitted 30 June, 2014; v1 submitted 22 May, 2014; originally announced June 2014.

Comments: Submitted to IEEE Transactions on Signal Processing

arXiv:1405.7786 [pdf, ps, other]

Fundamental Tensor Operations for Large-Scale Data Analysis in Tensor Train Formats

Authors: Namgil Lee, Andrzej Cichocki

Abstract: We discuss extended definitions of linear and multilinear operations such as Kronecker, Hadamard, and contracted products, and establish links between them for tensor calculus. Then we introduce effective low-rank tensor approximation techniques including Candecomp/Parafac (CP), Tucker, and tensor train (TT) decompositions with a number of mathematical and graphical representations. We also provid… ▽ More We discuss extended definitions of linear and multilinear operations such as Kronecker, Hadamard, and contracted products, and establish links between them for tensor calculus. Then we introduce effective low-rank tensor approximation techniques including Candecomp/Parafac (CP), Tucker, and tensor train (TT) decompositions with a number of mathematical and graphical representations. We also provide a brief review of mathematical properties of the TT decomposition as a low-rank approximation technique. With the aim of breaking the curse-of-dimensionality in large-scale numerical analysis, we describe basic operations on large-scale vectors, matrices, and high-order tensors represented by TT decomposition. The proposed representations can be used for describing numerical methods based on TT decomposition for solving large-scale optimization problems such as systems of linear equations and symmetric eigenvalue problems. △ Less

Submitted 24 February, 2016; v1 submitted 30 May, 2014; originally announced May 2014.

Comments: 36 pages; Several improvements and corrected references

MSC Class: 15A63; 15A69; 65F25; 65F30

arXiv:1404.4412 [pdf, other]

doi 10.1109/TIP.2015.2478396

Efficient Nonnegative Tucker Decompositions: Algorithms and Uniqueness

Authors: Guoxu Zhou, Andrzej Cichocki, Qibin Zhao, Shengli Xie

Abstract: Nonnegative Tucker decomposition (NTD) is a powerful tool for the extraction of nonnegative parts-based and physically meaningful latent components from high-dimensional tensor data while preserving the natural multilinear structure of data. However, as the data tensor often has multiple modes and is large-scale, existing NTD algorithms suffer from a very high computational complexity in terms of… ▽ More Nonnegative Tucker decomposition (NTD) is a powerful tool for the extraction of nonnegative parts-based and physically meaningful latent components from high-dimensional tensor data while preserving the natural multilinear structure of data. However, as the data tensor often has multiple modes and is large-scale, existing NTD algorithms suffer from a very high computational complexity in terms of both storage and computation time, which has been one major obstacle for practical applications of NTD. To overcome these disadvantages, we show how low (multilinear) rank approximation (LRA) of tensors is able to significantly simplify the computation of the gradients of the cost function, upon which a family of efficient first-order NTD algorithms are developed. Besides dramatically reducing the storage complexity and running time, the new algorithms are quite flexible and robust to noise because any well-established LRA approaches can be applied. We also show how nonnegativity incorporating sparsity substantially improves the uniqueness property and partially alleviates the curse of dimensionality of the Tucker decompositions. Simulation results on synthetic and real-world data justify the validity and high efficiency of the proposed NTD algorithms. △ Less

Submitted 16 September, 2015; v1 submitted 16 April, 2014; originally announced April 2014.

Comments: appears in IEEE Transactions on Image Processing, 2015

arXiv:1403.2048 [pdf, other]

Era of Big Data Processing: A New Approach via Tensor Networks and Tensor Decompositions

Authors: Andrzej Cichocki

Abstract: Many problems in computational neuroscience, neuroinformatics, pattern/image recognition, signal processing and machine learning generate massive amounts of multidimensional data with multiple aspects and high dimensionality. Tensors (i.e., multi-way arrays) provide often a natural and compact representation for such massive multidimensional data via suitable low-rank approximations. Big data anal… ▽ More Many problems in computational neuroscience, neuroinformatics, pattern/image recognition, signal processing and machine learning generate massive amounts of multidimensional data with multiple aspects and high dimensionality. Tensors (i.e., multi-way arrays) provide often a natural and compact representation for such massive multidimensional data via suitable low-rank approximations. Big data analytics require novel technologies to efficiently process huge datasets within tolerable elapsed times. Such a new emerging technology for multidimensional big data is a multiway analysis via tensor networks (TNs) and tensor decompositions (TDs) which represent tensors by sets of factor (component) matrices and lower-order (core) tensors. Dynamic tensor analysis allows us to discover meaningful hidden structures of complex data and to perform generalizations by capturing multi-linear and multi-aspect relationships. We will discuss some fundamental TN models, their mathematical and graphical descriptions and associated learning algorithms for large-scale TDs and TNs, with many potential applications including: Anomaly detection, feature extraction, classification, cluster analysis, data fusion and integration, pattern recognition, predictive modeling, regression, time series analysis and multiway component analysis. Keywords: Large-scale HOSVD, Tensor decompositions, CPD, Tucker models, Hierarchical Tucker (HT) decomposition, low-rank tensor approximations (LRA), Tensorization/Quantization, tensor train (TT/QTT) - Matrix Product States (MPS), Matrix Product Operator (MPO), DMRG, Strong Kronecker Product (SKP). △ Less

Submitted 24 August, 2014; v1 submitted 9 March, 2014; originally announced March 2014.

Comments: Part of this work was presented on the International Workshop on Smart Info-Media Systems in Asia,(invited talk - SISA-2013) Sept.30--Oct.2, 2013, Nagoya, JAPAN

arXiv:1401.6497 [pdf, other]

doi 10.1109/TPAMI.2015.2392756

Bayesian CP Factorization of Incomplete Tensors with Automatic Rank Determination

Authors: Qibin Zhao, Liqing Zhang, Andrzej Cichocki

Abstract: CANDECOMP/PARAFAC (CP) tensor factorization of incomplete data is a powerful technique for tensor completion through explicitly capturing the multilinear latent factors. The existing CP algorithms require the tensor rank to be manually specified, however, the determination of tensor rank remains a challenging problem especially for CP rank. In addition, existing approaches do not take into account… ▽ More CANDECOMP/PARAFAC (CP) tensor factorization of incomplete data is a powerful technique for tensor completion through explicitly capturing the multilinear latent factors. The existing CP algorithms require the tensor rank to be manually specified, however, the determination of tensor rank remains a challenging problem especially for CP rank. In addition, existing approaches do not take into account uncertainty information of latent factors, as well as missing entries. To address these issues, we formulate CP factorization using a hierarchical probabilistic model and employ a fully Bayesian treatment by incorporating a sparsity-inducing prior over multiple latent factors and the appropriate hyperpriors over all hyperparameters, resulting in automatic rank determination. To learn the model, we develop an efficient deterministic Bayesian inference algorithm, which scales linearly with data size. Our method is characterized as a tuning parameter-free approach, which can effectively infer underlying multilinear factors with a low-rank constraint, while also providing predictive distributions over missing entries. Extensive simulations on synthetic data illustrate the intrinsic capability of our method to recover the ground-truth of CP rank and prevent the overfitting problem, even when a large amount of entries are missing. Moreover, the results from real-world applications, including image inpainting and facial image synthesis, demonstrate that our method outperforms state-of-the-art approaches for both tensor factorization and tensor completion in terms of predictive performance. △ Less

Submitted 9 October, 2014; v1 submitted 25 January, 2014; originally announced January 2014.

arXiv:1305.0395 [pdf, ps, other]

Tensor Decompositions: A New Concept in Brain Data Analysis?

Authors: Andrzej Cichocki

Abstract: Matrix factorizations and their extensions to tensor factorizations and decompositions have become prominent techniques for linear and multilinear blind source separation (BSS), especially multiway Independent Component Analysis (ICA), NonnegativeMatrix and Tensor Factorization (NMF/NTF), Smooth Component Analysis (SmoCA) and Sparse Component Analysis (SCA). Moreover, tensor decompositions have ma… ▽ More Matrix factorizations and their extensions to tensor factorizations and decompositions have become prominent techniques for linear and multilinear blind source separation (BSS), especially multiway Independent Component Analysis (ICA), NonnegativeMatrix and Tensor Factorization (NMF/NTF), Smooth Component Analysis (SmoCA) and Sparse Component Analysis (SCA). Moreover, tensor decompositions have many other potential applications beyond multilinear BSS, especially feature extraction, classification, dimensionality reduction and multiway clustering. In this paper, we briefly overview new and emerging models and approaches for tensor decompositions in applications to group and linked multiway BSS/ICA, feature extraction, classification andMultiway Partial Least Squares (MPLS) regression problems. Keywords: Multilinear BSS, linked multiway BSS/ICA, tensor factorizations and decompositions, constrained Tucker and CP models, Penalized Tensor Decompositions (PTD), feature extraction, classification, multiway PLS and CCA. △ Less

Submitted 2 May, 2013; originally announced May 2013.

Journal ref: Control Measurement, and System Integration (SICE), special issue; Measurement of Brain Functions and Bio-Signals, 7, 507-517, (2011)

arXiv:1212.3913 [pdf, other]

doi 10.1109/TNNLS.2015.2487364

Group Component Analysis for Multiblock Data: Common and Individual Feature Extraction

Authors: Guoxu Zhou, Andrzej Cichocki, Yu Zhang, Danilo Mandic

Abstract: Very often data we encounter in practice is a collection of matrices rather than a single matrix. These multi-block data are naturally linked and hence often share some common features and at the same time they have their own individual features, due to the background in which they are measured and collected. In this study we proposed a new scheme of common and individual feature analysis (CIFA) t… ▽ More Very often data we encounter in practice is a collection of matrices rather than a single matrix. These multi-block data are naturally linked and hence often share some common features and at the same time they have their own individual features, due to the background in which they are measured and collected. In this study we proposed a new scheme of common and individual feature analysis (CIFA) that processes multi-block data in a linked way aiming at discovering and separating their common and individual features. According to whether the number of common features is given or not, two efficient algorithms were proposed to extract the common basis which is shared by all data. Then feature extraction is performed on the common and the individual spaces separately by incorporating the techniques such as dimensionality reduction and blind source separation. We also discussed how the proposed CIFA can significantly improve the performance of classification and clustering tasks by exploiting common and individual features of samples respectively. Our experimental results show some encouraging features of the proposed methods in comparison to the state-of-the-art methods on synthetic and real data. △ Less

Submitted 12 March, 2017; v1 submitted 17 December, 2012; originally announced December 2012.

Comments: 13 pages,11 figures

Journal ref: IEEE Transactions on Neural Networks and Learning Systems, Volume: 27, Issue: 11, Nov. 2016

arXiv:1211.3500 [pdf, other]

doi 10.1109/TNNLS.2013.2271507

Accelerated Canonical Polyadic Decomposition by Using Mode Reduction

Authors: Guoxu Zhou, Andrzej Cichocki, Shengli Xie

Abstract: Canonical Polyadic (or CANDECOMP/PARAFAC, CP) decompositions (CPD) are widely applied to analyze high order tensors. Existing CPD methods use alternating least square (ALS) iterations and hence need to unfold tensors to each of the $N$ modes frequently, which is one major bottleneck of efficiency for large-scale data and especially when $N$ is large. To overcome this problem, in this paper we prop… ▽ More Canonical Polyadic (or CANDECOMP/PARAFAC, CP) decompositions (CPD) are widely applied to analyze high order tensors. Existing CPD methods use alternating least square (ALS) iterations and hence need to unfold tensors to each of the $N$ modes frequently, which is one major bottleneck of efficiency for large-scale data and especially when $N$ is large. To overcome this problem, in this paper we proposed a new CPD method which converts the original $N$th ($N>3$) order tensor to a 3rd-order tensor first. Then the full CPD is realized by decomposing this mode reduced tensor followed by a Khatri-Rao product projection procedure. This way is quite efficient as unfolding to each of the $N$ modes are avoided, and dimensionality reduction can also be easily incorporated to further improve the efficiency. We show that, under mild conditions, any $N$th-order CPD can be converted into a 3rd-order case but without destroying the essential uniqueness, and theoretically gives the same results as direct $N$-way CPD methods. Simulations show that, compared with state-of-the-art CPD methods, the proposed method is more efficient and escape from local solutions more easily. △ Less

Submitted 24 June, 2013; v1 submitted 15 November, 2012; originally announced November 2012.

Comments: 12 pages. Accepted by TNNLS

arXiv:1207.1230 [pdf, other]

doi 10.1109/TPAMI.2012.254.

Higher-Order Partial Least Squares (HOPLS): A Generalized Multi-Linear Regression Method

Authors: Qibin Zhao, Cesar F. Caiafa, Danilo P. Mandic, Zenas C. Chao, Yasuo Nagasaka, Naotaka Fujii, Liqing Zhang, Andrzej Cichocki

Abstract: A new generalized multilinear regression model, termed the Higher-Order Partial Least Squares (HOPLS), is introduced with the aim to predict a tensor (multiway array) $\tensor{Y}$ from a tensor $\tensor{X}$ through projecting the data onto the latent space and performing regression on the corresponding latent variables. HOPLS differs substantially from other regression models in that it explains t… ▽ More A new generalized multilinear regression model, termed the Higher-Order Partial Least Squares (HOPLS), is introduced with the aim to predict a tensor (multiway array) $\tensor{Y}$ from a tensor $\tensor{X}$ through projecting the data onto the latent space and performing regression on the corresponding latent variables. HOPLS differs substantially from other regression models in that it explains the data by a sum of orthogonal Tucker tensors, while the number of orthogonal loadings serves as a parameter to control model complexity and prevent overfitting. The low dimensional latent space is optimized sequentially via a deflation operation, yielding the best joint subspace approximation for both $\tensor{X}$ and $\tensor{Y}$. Instead of decomposing $\tensor{X}$ and $\tensor{Y}$ individually, higher order singular value decomposition on a newly defined generalized cross-covariance tensor is employed to optimize the orthogonal loadings. A systematic comparison on both synthetic data and real-world decoding of 3D movement trajectories from electrocorticogram (ECoG) signals demonstrate the advantages of HOPLS over the existing methods in terms of better predictive ability, suitability to handle small sample sizes, and robustness to noise. △ Less

Submitted 5 July, 2012; originally announced July 2012.

Journal ref: Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 35, no.7, July, 2013

arXiv:1205.2584 [pdf, ps, other]

Low Complexity Damped Gauss-Newton Algorithms for CANDECOMP/PARAFAC

Authors: Anh Huy Phan, Petr Tichavský, Andrzej Cichocki

Abstract: The damped Gauss-Newton (dGN) algorithm for CANDECOMP/PARAFAC (CP) decomposition can handle the challenges of collinearity of factors and different magnitudes of factors; nevertheless, for factorization of an $N$-D tensor of size $I_1\times I_N$ with rank $R$, the algorithm is computationally demanding due to construction of large approximate Hessian of size $(RT \times RT)$ and its inversion wher… ▽ More The damped Gauss-Newton (dGN) algorithm for CANDECOMP/PARAFAC (CP) decomposition can handle the challenges of collinearity of factors and different magnitudes of factors; nevertheless, for factorization of an $N$-D tensor of size $I_1\times I_N$ with rank $R$, the algorithm is computationally demanding due to construction of large approximate Hessian of size $(RT \times RT)$ and its inversion where $T = \sum_n I_n$. In this paper, we propose a fast implementation of the dGN algorithm which is based on novel expressions of the inverse approximate Hessian in block form. The new implementation has lower computational complexity, besides computation of the gradient (this part is common to both methods), requiring the inversion of a matrix of size $NR^2\times NR^2$, which is much smaller than the whole approximate Hessian, if $T \gg NR$. In addition, the implementation has lower memory requirements, because neither the Hessian nor its inverse never need to be stored in their entirety. A variant of the algorithm working with complex valued data is proposed as well. Complexity and performance of the proposed algorithm is compared with those of dGN and ALS with line search on examples of difficult benchmark tensors. △ Less

Submitted 12 September, 2012; v1 submitted 11 May, 2012; originally announced May 2012.

arXiv:1202.5470

Convergence analysis of the FOCUSS algorithm

Authors: Zhaoshui He, Shengli Xie, Andrzej Cichocki

Abstract: FOCal Underdetermined System Solver (FOCUSS) is a powerful tool for sparse representation and underdetermined inverse problems, which is extremely easy to implement. In this paper, we give a comprehensive convergence analysis on the FOCUSS algorithm towards establishing a systematic convergence theory by providing three primary contributions as follows. First, we give a rigorous derivation for thi… ▽ More FOCal Underdetermined System Solver (FOCUSS) is a powerful tool for sparse representation and underdetermined inverse problems, which is extremely easy to implement. In this paper, we give a comprehensive convergence analysis on the FOCUSS algorithm towards establishing a systematic convergence theory by providing three primary contributions as follows. First, we give a rigorous derivation for this algorithm exploiting the auxiliary function. Then, we prove its convergence. Third, we systematically study its convergence rate with respect to the sparsity parameter p and demonstrate its convergence rate by numerical experiments. △ Less

Submitted 17 April, 2013; v1 submitted 24 February, 2012; originally announced February 2012.

Comments: This paper has been withdrawn by the author due to a crucial error

Showing 1–50 of 50 results for author: Cichocki, A