-
Mobile Vision Transformer-based Visual Object Tracking
Authors:
Goutam Yelluru Gopal,
Maria A. Amer
Abstract:
The introduction of robust backbones, such as Vision Transformers, has improved the performance of object tracking algorithms in recent years. However, these state-of-the-art trackers are computationally expensive since they have a large number of model parameters and rely on specialized hardware (e.g., GPU) for faster inference. On the other hand, recent lightweight trackers are fast but are less…
▽ More
The introduction of robust backbones, such as Vision Transformers, has improved the performance of object tracking algorithms in recent years. However, these state-of-the-art trackers are computationally expensive since they have a large number of model parameters and rely on specialized hardware (e.g., GPU) for faster inference. On the other hand, recent lightweight trackers are fast but are less accurate, especially on large-scale datasets. We propose a lightweight, accurate, and fast tracking algorithm using Mobile Vision Transformers (MobileViT) as the backbone for the first time. We also present a novel approach of fusing the template and search region representations in the MobileViT backbone, thereby generating superior feature encoding for target localization. The experimental results show that our MobileViT-based Tracker, MVT, surpasses the performance of recent lightweight trackers on the large-scale datasets GOT10k and TrackingNet, and with a high inference speed. In addition, our method outperforms the popular DiMP-50 tracker despite having 4.7 times fewer model parameters and running at 2.8 times its speed on a GPU. The tracker code and models are available at https://github.com/goutamyg/MVT
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Separable Self and Mixed Attention Transformers for Efficient Object Tracking
Authors:
Goutam Yelluru Gopal,
Maria A. Amer
Abstract:
The deployment of transformers for visual object tracking has shown state-of-the-art results on several benchmarks. However, the transformer-based models are under-utilized for Siamese lightweight tracking due to the computational complexity of their attention blocks. This paper proposes an efficient self and mixed attention transformer-based architecture for lightweight tracking. The proposed bac…
▽ More
The deployment of transformers for visual object tracking has shown state-of-the-art results on several benchmarks. However, the transformer-based models are under-utilized for Siamese lightweight tracking due to the computational complexity of their attention blocks. This paper proposes an efficient self and mixed attention transformer-based architecture for lightweight tracking. The proposed backbone utilizes the separable mixed attention transformers to fuse the template and search regions during feature extraction to generate superior feature encoding. Our prediction head performs global contextual modeling of the encoded features by leveraging efficient self-attention blocks for robust target state estimation. With these contributions, the proposed lightweight tracker deploys a transformer-based backbone and head module concurrently for the first time. Our ablation study testifies to the effectiveness of the proposed combination of backbone and head modules. Simulations show that our Separable Self and Mixed Attention-based Tracker, SMAT, surpasses the performance of related lightweight trackers on GOT10k, TrackingNet, LaSOT, NfS30, UAV123, and AVisT datasets, while running at 37 fps on CPU, 158 fps on GPU, and having 3.8M parameters. For example, it significantly surpasses the closely related trackers E.T.Track and MixFormerV2-S on GOT10k-test by a margin of 7.9% and 5.8%, respectively, in the AO metric. The tracker code and model is available at https://github.com/goutamyg/SMAT
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
Balancing Accuracy and Latency in Multipath Neural Networks
Authors:
Mohammed Amer,
Tomás Maul,
Iman Yi Liao
Abstract:
The growing capacity of neural networks has strongly contributed to their success at complex machine learning tasks and the computational demand of such large models has, in turn, stimulated a significant improvement in the hardware necessary to accelerate their computations. However, models with high latency aren't suitable for limited-resource environments such as hand-held and IoT devices. Henc…
▽ More
The growing capacity of neural networks has strongly contributed to their success at complex machine learning tasks and the computational demand of such large models has, in turn, stimulated a significant improvement in the hardware necessary to accelerate their computations. However, models with high latency aren't suitable for limited-resource environments such as hand-held and IoT devices. Hence, many deep learning techniques aim to address this problem by developing models with reasonable accuracy without violating the limited-resource constraint. In this work, we use a one-shot neural architecture search model to implicitly evaluate the performance of an intractable number of multipath neural networks. Combining this architecture search with a pruning technique and architecture sample evaluation, we can model the relation between the accuracy and the latency of a spectrum of models with graded complexity. We show that our method can accurately model the relative performance between models with different latencies and predict the performance of unseen models with good precision across different datasets.
△ Less
Submitted 24 April, 2021;
originally announced April 2021.
-
Reducing Catastrophic Forgetting in Modular Neural Networks by Dynamic Information Balancing
Authors:
Mohammed Amer,
Tomás Maul
Abstract:
Lifelong learning is a very important step toward realizing robust autonomous artificial agents. Neural networks are the main engine of deep learning, which is the current state-of-the-art technique in formulating adaptive artificial intelligent systems. However, neural networks suffer from catastrophic forgetting when stressed with the challenge of continual learning. We investigate how to exploi…
▽ More
Lifelong learning is a very important step toward realizing robust autonomous artificial agents. Neural networks are the main engine of deep learning, which is the current state-of-the-art technique in formulating adaptive artificial intelligent systems. However, neural networks suffer from catastrophic forgetting when stressed with the challenge of continual learning. We investigate how to exploit modular topology in neural networks in order to dynamically balance the information load between different modules by routing inputs based on the information content in each module so that information interference is minimized. Our dynamic information balancing (DIB) technique adapts a reinforcement learning technique to guide the routing of different inputs based on a reward signal derived from a measure of the information load in each module. Our empirical results show that DIB combined with elastic weight consolidation (EWC) regularization outperforms models with similar capacity and EWC regularization across different task formulations and datasets.
△ Less
Submitted 10 December, 2019;
originally announced December 2019.
-
Performance of Two-Way Relaying over $α$-$μ$ Fading Channels in Hybrid RF/FSO Wireless Networks
Authors:
Mohammed A. Amer,
Suhail Al-Dharrab
Abstract:
In this paper, the performance of two-way relaying in mixed RF/FSO communication system with a backup RF link is investigated. Uplink RF channels are used to send data of $K$ users to a two-way relay, $R$, whereas FSO link is mainly used to exchange data between a base station $S$ and $R$. We propose to have a backup RF link between the relay $R$ and the node $S$ to improve reliability under certa…
▽ More
In this paper, the performance of two-way relaying in mixed RF/FSO communication system with a backup RF link is investigated. Uplink RF channels are used to send data of $K$ users to a two-way relay, $R$, whereas FSO link is mainly used to exchange data between a base station $S$ and $R$. We propose to have a backup RF link between the relay $R$ and the node $S$ to improve reliability under certain conditions. All uplink RF channels follow Rayleigh fading model while $α$-$μ$ is adopted to model both backup RF and FSO links. We approximate the widely used Gamma-Gamma fading model using the $α$-$μ$ distribution based on moments-based estimator technique assuming perfect alignment between transmitter and receiver antennas. This approximation shows good agreement under certain atmospheric turbulence conditions. Then, we derive exact closed-form expressions for the outage and average symbol error probabilities and derive approximations at high signal-to-noise ratio (SNR). We corroborate our analytical expressions with extensive Monte-Carlo simulations and demonstrate exact match. Furthermore, we analyze the effect of number of nodes, opportunistic scheduling among $K$ nodes, and $α$-$μ$ parameters on the overall performance of mixed RF/FSO and backup RF systems. Our numerical results illustrate an achievable coding gain when increasing $K$; however, performance degradation occurs as the relay applies selection that favors the domination of specific links in the system.
△ Less
Submitted 14 November, 2019;
originally announced November 2019.
-
Underwater Optical Communication System Relayed by $α-μ$ Fading Channel: Outage, Capacity and Asymptotic Analysis
Authors:
Mohammed Amer,
Yasser Al-Eryani
Abstract:
We investigate underwater optical communication system that is relayed by a single decode-and-forward (DF) relay through an exponential-generalized Gamma distribution (EGG) into a final destination. Specifically, a certain terminal device sends data through underwater wireless optical link (UWO) that utilizes the so-called blue laser technology into a nearby relay that in term sends a decoded (and…
▽ More
We investigate underwater optical communication system that is relayed by a single decode-and-forward (DF) relay through an exponential-generalized Gamma distribution (EGG) into a final destination. Specifically, a certain terminal device sends data through underwater wireless optical link (UWO) that utilizes the so-called blue laser technology into a nearby relay that in term sends a decoded (and modulated) version of the received signal into a remote destination. The RF link is assumed to follow the generalized $α-μ$ distribution; which include many distributions as a special cases, e.g., Rayleigh. In the other hand, the UWO link is presumed to follow the state-of-art Exponential-Generalized Gamma distribution (EGG) which was recently proposed to model the underwater optical turbulence. Closed-form expressions of outage probability, average error rate and ergodic capacity are derived assuming heterodyne detection technique (HD). Also, asymptotic outage expression is obtained for more performance insights. Results show that high achievable rate is obtained for high-speed underwater communication systems when turbulence conditions underwater are relatively weak. In addition, the RF link is dominating the outage performance in weak optical turbulence while UWO link is dominating the outage performance in severe optical turbulence.
△ Less
Submitted 11 November, 2019;
originally announced November 2019.
-
Deep Complex Networks for Protocol-Agnostic Radio Frequency Device Fingerprinting in the Wild
Authors:
Ioannis Agadakos,
Nikolaos Agadakos,
Jason Polakis,
Mohamed R. Amer
Abstract:
Researchers have demonstrated various techniques for fingerprinting and identifying devices. Previous approaches have identified devices from their network traffic or transmitted signals while relying on software or operating system specific artifacts (e.g., predictability of protocol header fields) or characteristics of the underlying protocol (e.g.,frequency offset). As these constraints can be…
▽ More
Researchers have demonstrated various techniques for fingerprinting and identifying devices. Previous approaches have identified devices from their network traffic or transmitted signals while relying on software or operating system specific artifacts (e.g., predictability of protocol header fields) or characteristics of the underlying protocol (e.g.,frequency offset). As these constraints can be a hindrance in real-world settings, we introduce a practical, generalizable approach that offers significant operational value for a variety of scenarios, including as an additional factor of authentication for preventing impersonation attacks. Our goal is to identify artifacts in transmitted signals that are caused by a device's unique hardware "imperfections" without any knowledge about the nature of the signal. We develop RF-DCN, a novel Deep Complex-valued Neural Network (DCN) that operates on raw RF signals and is completely agnostic of the underlying applications and protocols. We present two DCN variations: (i) Convolutional DCN (CDCN) for modeling full signals, and (ii) Recurrent DCN (RDCN) for modeling time series. Our system handles raw I/Q data from open air captures within a given spectrum window, without knowledge of the modulation scheme or even the carrier frequencies. While our experiments demonstrate the effectiveness of our system, especially under challenging conditions where other neural network architectures break down, we identify additional challenges in signal-based fingerprinting and provide guidelines for future explorations. Our work lays the foundation for more research within this vast and challenging space by establishing fundamental directions for using raw RF I/Q data in novel complex-valued networks.
△ Less
Submitted 18 September, 2019;
originally announced September 2019.
-
Image Classification with Hierarchical Multigraph Networks
Authors:
Boris Knyazev,
Xiao Lin,
Mohamed R. Amer,
Graham W. Taylor
Abstract:
Graph Convolutional Networks (GCNs) are a class of general models that can learn from graph structured data. Despite being general, GCNs are admittedly inferior to convolutional neural networks (CNNs) when applied to vision tasks, mainly due to the lack of domain knowledge that is hardcoded into CNNs, such as spatially oriented translation invariant filters. However, a great advantage of GCNs is t…
▽ More
Graph Convolutional Networks (GCNs) are a class of general models that can learn from graph structured data. Despite being general, GCNs are admittedly inferior to convolutional neural networks (CNNs) when applied to vision tasks, mainly due to the lack of domain knowledge that is hardcoded into CNNs, such as spatially oriented translation invariant filters. However, a great advantage of GCNs is the ability to work on irregular inputs, such as superpixels of images. This could significantly reduce the computational cost of image reasoning tasks. Another key advantage inherent to GCNs is the natural ability to model multirelational data. Building upon these two promising properties, in this work, we show best practices for designing GCNs for image classification; in some cases even outperforming CNNs on the MNIST, CIFAR-10 and PASCAL image datasets.
△ Less
Submitted 21 July, 2019;
originally announced July 2019.
-
Container Density Improvements with Dynamic Memory Extension using NAND Flash
Authors:
Jan S. Rellermeyer,
Maher Amer,
Richard Smutzer,
Karthick Rajamani
Abstract:
While containers efficiently implement the idea of operating-system-level application virtualization, they are often insufficient to increase the server utilization to a desirable level. The reason is that in practice many containerized applications experience a limited amount of load while there are few containers with a high load. In such a scenario, the virtual memory management system can beco…
▽ More
While containers efficiently implement the idea of operating-system-level application virtualization, they are often insufficient to increase the server utilization to a desirable level. The reason is that in practice many containerized applications experience a limited amount of load while there are few containers with a high load. In such a scenario, the virtual memory management system can become the limiting factor to container density even though the working set of active containers would fit into main memory. In this paper, we describe and evaluate a system for transparently moving memory pages in and out of DRAM and to a NAND Flash medium which is attached through the memory bus. This technique, called Diablo Memory Expansion (DMX), operates on a prediction model and is able to relieve the pressure on the memory system. We present a benchmark for container density and show that even under an overall constant workload, adding additional containers adversely affects performance-critical applications in Docker. When using the DMX technology of the Memory1 system, however, the performance of the critical workload remains stable.
△ Less
Submitted 24 June, 2019;
originally announced June 2019.
-
Data-Efficient Mutual Information Neural Estimator
Authors:
Xiao Lin,
Indranil Sur,
Samuel A. Nastase,
Ajay Divakaran,
Uri Hasson,
Mohamed R. Amer
Abstract:
Measuring Mutual Information (MI) between high-dimensional, continuous, random variables from observed samples has wide theoretical and practical applications. Recent work, MINE (Belghazi et al. 2018), focused on estimating tight variational lower bounds of MI using neural networks, but assumed unlimited supply of samples to prevent overfitting. In real world applications, data is not always avail…
▽ More
Measuring Mutual Information (MI) between high-dimensional, continuous, random variables from observed samples has wide theoretical and practical applications. Recent work, MINE (Belghazi et al. 2018), focused on estimating tight variational lower bounds of MI using neural networks, but assumed unlimited supply of samples to prevent overfitting. In real world applications, data is not always available at a surplus. In this work, we focus on improving data efficiency and propose a Data-Efficient MINE Estimator (DEMINE), by developing a relaxed predictive MI lower bound that can be estimated at higher data efficiency by orders of magnitudes. The predictive MI lower bound also enables us to develop a new meta-learning approach using task augmentation, Meta-DEMINE, to improve generalization of the network and further boost estimation accuracy empirically. With improved data-efficiency, our estimators enables statistical testing of dependency at practical dataset sizes. We demonstrate the effectiveness of our estimators on synthetic benchmarks and a real world fMRI data, with application of inter-subject correlation analysis.
△ Less
Submitted 24 May, 2019; v1 submitted 8 May, 2019;
originally announced May 2019.
-
Understanding Attention and Generalization in Graph Neural Networks
Authors:
Boris Knyazev,
Graham W. Taylor,
Mohamed R. Amer
Abstract:
We aim to better understand attention over nodes in graph neural networks (GNNs) and identify factors influencing its effectiveness. We particularly focus on the ability of attention GNNs to generalize to larger, more complex or noisy graphs. Motivated by insights from the work on Graph Isomorphism Networks, we design simple graph reasoning tasks that allow us to study attention in a controlled en…
▽ More
We aim to better understand attention over nodes in graph neural networks (GNNs) and identify factors influencing its effectiveness. We particularly focus on the ability of attention GNNs to generalize to larger, more complex or noisy graphs. Motivated by insights from the work on Graph Isomorphism Networks, we design simple graph reasoning tasks that allow us to study attention in a controlled environment. We find that under typical conditions the effect of attention is negligible or even harmful, but under certain conditions it provides an exceptional gain in performance of more than 60% in some of our classification tasks. Satisfying these conditions in practice is challenging and often requires optimal initialization or supervised training of attention. We propose an alternative recipe and train attention in a weakly-supervised fashion that approaches the performance of supervised models, and, compared to unsupervised models, improves results on several synthetic as well as real datasets. Source code and datasets are available at https://github.com/bknyaz/graph_attention_pool.
△ Less
Submitted 28 October, 2019; v1 submitted 7 May, 2019;
originally announced May 2019.
-
Weight Map Layer for Noise and Adversarial Attack Robustness
Authors:
Mohammed Amer,
Tomás Maul
Abstract:
Convolutional neural networks (CNNs) are known for their good performance and generalization in vision-related tasks and have become state-of-the-art in both application and research-based domains. However, just like other neural network models, they suffer from a susceptibility to noise and adversarial attacks. An adversarial defence aims at reducing a neural network's susceptibility to adversari…
▽ More
Convolutional neural networks (CNNs) are known for their good performance and generalization in vision-related tasks and have become state-of-the-art in both application and research-based domains. However, just like other neural network models, they suffer from a susceptibility to noise and adversarial attacks. An adversarial defence aims at reducing a neural network's susceptibility to adversarial attacks through learning or architectural modifications. We propose the weight map layer (WM) as a generic architectural addition to CNNs and show that it can increase their robustness to noise and adversarial attacks. We further explain that the enhanced robustness of the two WM variants results from the adaptive activation-variance amplification exhibited by the layer. We show that the WM layer can be integrated into scaled up models to increase their noise and adversarial attack robustness, while achieving comparable accuracy levels across different datasets.
△ Less
Submitted 2 December, 2020; v1 submitted 2 May, 2019;
originally announced May 2019.
-
A Review of Modularization Techniques in Artificial Neural Networks
Authors:
Mohammed Amer,
Tomás Maul
Abstract:
Artificial neural networks (ANNs) have achieved significant success in tackling classical and modern machine learning problems. As learning problems grow in scale and complexity, and expand into multi-disciplinary territory, a more modular approach for scaling ANNs will be needed. Modular neural networks (MNNs) are neural networks that embody the concepts and principles of modularity. MNNs adopt a…
▽ More
Artificial neural networks (ANNs) have achieved significant success in tackling classical and modern machine learning problems. As learning problems grow in scale and complexity, and expand into multi-disciplinary territory, a more modular approach for scaling ANNs will be needed. Modular neural networks (MNNs) are neural networks that embody the concepts and principles of modularity. MNNs adopt a large number of different techniques for achieving modularization. Previous surveys of modularization techniques are relatively scarce in their systematic analysis of MNNs, focusing mostly on empirical comparisons and lacking an extensive taxonomical framework. In this review, we aim to establish a solid taxonomy that captures the essential properties and relationships of the different variants of MNNs. Based on an investigation of the different levels at which modularization techniques act, we attempt to provide a universal and systematic framework for theorists studying MNNs, also trying along the way to emphasise the strengths and weaknesses of different modularization approaches in order to highlight good practices for neural network practitioners.
△ Less
Submitted 29 April, 2019;
originally announced April 2019.
-
Path Capsule Networks
Authors:
Mohammed Amer,
Tomás Maul
Abstract:
Capsule network (CapsNet) was introduced as an enhancement over convolutional neural networks, supplementing the latter's invariance properties with equivariance through pose estimation. CapsNet achieved a very decent performance with a shallow architecture and a significant reduction in parameters count. However, the width of the first layer in CapsNet is still contributing to a significant numbe…
▽ More
Capsule network (CapsNet) was introduced as an enhancement over convolutional neural networks, supplementing the latter's invariance properties with equivariance through pose estimation. CapsNet achieved a very decent performance with a shallow architecture and a significant reduction in parameters count. However, the width of the first layer in CapsNet is still contributing to a significant number of its parameters and the shallowness may be limiting the representational power of the capsules. To address these limitations, we introduce Path Capsule Network (PathCapsNet), a deep parallel multi-path version of CapsNet. We show that a judicious coordination of depth, max-pooling, regularization by DropCircuit and a new fan-in routing by agreement technique can achieve better or comparable results to CapsNet, while further reducing the parameter count significantly.
△ Less
Submitted 26 October, 2019; v1 submitted 11 February, 2019;
originally announced February 2019.
-
Spectral Multigraph Networks for Discovering and Fusing Relationships in Molecules
Authors:
Boris Knyazev,
Xiao Lin,
Mohamed R. Amer,
Graham W. Taylor
Abstract:
Spectral Graph Convolutional Networks (GCNs) are a generalization of convolutional networks to learning on graph-structured data. Applications of spectral GCNs have been successful, but limited to a few problems where the graph is fixed, such as shape correspondence and node classification. In this work, we address this limitation by revisiting a particular family of spectral graph networks, Cheby…
▽ More
Spectral Graph Convolutional Networks (GCNs) are a generalization of convolutional networks to learning on graph-structured data. Applications of spectral GCNs have been successful, but limited to a few problems where the graph is fixed, such as shape correspondence and node classification. In this work, we address this limitation by revisiting a particular family of spectral graph networks, Chebyshev GCNs, showing its efficacy in solving graph classification tasks with a variable graph structure and size. Chebyshev GCNs restrict graphs to have at most one edge between any pair of nodes. To this end, we propose a novel multigraph network that learns from multi-relational graphs. We model learned edges with abstract meaning and experiment with different ways to fuse the representations extracted from annotated and learned edges, achieving competitive results on a variety of chemical classification benchmarks.
△ Less
Submitted 23 November, 2018;
originally announced November 2018.
-
Human Motion Modeling using DVGANs
Authors:
Xiao Lin,
Mohamed R. Amer
Abstract:
We present a novel generative model for human motion modeling using Generative Adversarial Networks (GANs). We formulate the GAN discriminator using dense validation at each time-scale and perturb the discriminator input to make it translation invariant. Our model is capable of motion generation and completion. We show through our evaluations the resiliency to noise, generalization over actions, a…
▽ More
We present a novel generative model for human motion modeling using Generative Adversarial Networks (GANs). We formulate the GAN discriminator using dense validation at each time-scale and perturb the discriminator input to make it translation invariant. Our model is capable of motion generation and completion. We show through our evaluations the resiliency to noise, generalization over actions, and generation of long diverse sequences. We evaluate our approach on Human 3.6M and CMU motion capture datasets using inception scores.
△ Less
Submitted 18 May, 2018; v1 submitted 27 April, 2018;
originally announced April 2018.
-
BitNet: Bit-Regularized Deep Neural Networks
Authors:
Aswin Raghavan,
Mohamed Amer,
Sek Chai,
Graham Taylor
Abstract:
We present a novel optimization strategy for training neural networks which we call "BitNet". The parameters of neural networks are usually unconstrained and have a dynamic range dispersed over all real values. Our key idea is to limit the expressive power of the network by dynamically controlling the range and set of values that the parameters can take. We formulate this idea using a novel end-to…
▽ More
We present a novel optimization strategy for training neural networks which we call "BitNet". The parameters of neural networks are usually unconstrained and have a dynamic range dispersed over all real values. Our key idea is to limit the expressive power of the network by dynamically controlling the range and set of values that the parameters can take. We formulate this idea using a novel end-to-end approach that circumvents the discrete parameter space by optimizing a relaxed continuous and differentiable upper bound of the typical classification loss function. The approach can be interpreted as a regularization inspired by the Minimum Description Length (MDL) principle. For each layer of the network, our approach optimizes real-valued translation and scaling factors and arbitrary precision integer-valued parameters (weights). We empirically compare BitNet to an equivalent unregularized model on the MNIST and CIFAR-10 datasets. We show that BitNet converges faster to a superior quality solution. Additionally, the resulting model has significant savings in memory due to the use of integer-valued parameters.
△ Less
Submitted 16 November, 2018; v1 submitted 16 August, 2017;
originally announced August 2017.
-
Structure Optimization for Deep Multimodal Fusion Networks using Graph-Induced Kernels
Authors:
Dhanesh Ramachandram,
Michal Lisicki,
Timothy J. Shields,
Mohamed R. Amer,
Graham W. Taylor
Abstract:
A popular testbed for deep learning has been multimodal recognition of human activity or gesture involving diverse inputs such as video, audio, skeletal pose and depth images. Deep learning architectures have excelled on such problems due to their ability to combine modality representations at different levels of nonlinear feature extraction. However, designing an optimal architecture in which to…
▽ More
A popular testbed for deep learning has been multimodal recognition of human activity or gesture involving diverse inputs such as video, audio, skeletal pose and depth images. Deep learning architectures have excelled on such problems due to their ability to combine modality representations at different levels of nonlinear feature extraction. However, designing an optimal architecture in which to fuse such learned representations has largely been a non-trivial human engineering effort. We treat fusion structure optimization as a hyper-parameter search and cast it as a discrete optimization problem under the Bayesian optimization framework. We propose a novel graph-induced kernel to compute structural similarities in the search space of tree-structured multimodal architectures and demonstrate its effectiveness using two challenging multimodal human activity recognition datasets.
△ Less
Submitted 3 July, 2017;
originally announced July 2017.
-
GPU Activity Prediction using Representation Learning
Authors:
Aswin Raghavan,
Mohamed Amer,
Timothy Shields,
David Zhang,
Sek Chai
Abstract:
GPU activity prediction is an important and complex problem. This is due to the high level of contention among thousands of parallel threads. This problem was mostly addressed using heuristics. We propose a representation learning approach to address this problem. We model any performance metric as a temporal function of the executed instructions with the intuition that the flow of instructions ca…
▽ More
GPU activity prediction is an important and complex problem. This is due to the high level of contention among thousands of parallel threads. This problem was mostly addressed using heuristics. We propose a representation learning approach to address this problem. We model any performance metric as a temporal function of the executed instructions with the intuition that the flow of instructions can be identified as distinct activities of the code. Our experiments show high accuracy and non-trivial predictive power of representation learning on a benchmark.
△ Less
Submitted 27 March, 2017;
originally announced March 2017.
-
Low Precision Neural Networks using Subband Decomposition
Authors:
Sek Chai,
Aswin Raghavan,
David Zhang,
Mohamed Amer,
Tim Shields
Abstract:
Large-scale deep neural networks (DNN) have been successfully used in a number of tasks from image recognition to natural language processing. They are trained using large training sets on large models, making them computationally and memory intensive. As such, there is much interest in research development for faster training and test time. In this paper, we present a unique approach using lower…
▽ More
Large-scale deep neural networks (DNN) have been successfully used in a number of tasks from image recognition to natural language processing. They are trained using large training sets on large models, making them computationally and memory intensive. As such, there is much interest in research development for faster training and test time. In this paper, we present a unique approach using lower precision weights for more efficient and faster training phase. We separate imagery into different frequency bands (e.g. with different information content) such that the neural net can better learn using less bits. We present this approach as a complement existing methods such as pruning network connections and encoding learning weights. We show results where this approach supports more stable learning with 2-4X reduction in precision with 17X reduction in DNN parameters.
△ Less
Submitted 24 March, 2017;
originally announced March 2017.
-
Action-Affect Classification and Morphing using Multi-Task Representation Learning
Authors:
Timothy J. Shields,
Mohamed R. Amer,
Max Ehrlich,
Amir Tamrakar
Abstract:
Most recent work focused on affect from facial expressions, and not as much on body. This work focuses on body affect analysis. Affect does not occur in isolation. Humans usually couple affect with an action in natural interactions; for example, a person could be talking and smiling. Recognizing body affect in sequences requires efficient algorithms to capture both the micro movements that differe…
▽ More
Most recent work focused on affect from facial expressions, and not as much on body. This work focuses on body affect analysis. Affect does not occur in isolation. Humans usually couple affect with an action in natural interactions; for example, a person could be talking and smiling. Recognizing body affect in sequences requires efficient algorithms to capture both the micro movements that differentiate between happy and sad and the macro variations between different actions. We depart from traditional approaches for time-series data analytics by proposing a multi-task learning model that learns a shared representation that is well-suited for action-affect classification as well as generation. For this paper we choose Conditional Restricted Boltzmann Machines to be our building block. We propose a new model that enhances the CRBM model with a factored multi-task component to become Multi-Task Conditional Restricted Boltzmann Machines (MTCRBMs). We evaluate our approach on two publicly available datasets, the Body Affect dataset and the Tower Game dataset, and show superior classification performance improvement over the state-of-the-art, as well as the generative abilities of our model.
△ Less
Submitted 21 March, 2016;
originally announced March 2016.
-
Human Social Interaction Modeling Using Temporal Deep Networks
Authors:
Mohamed R. Amer,
Behjat Siddiquie,
Amir Tamrakar,
David A. Salter,
Brian Lande,
Darius Mehri,
Ajay Divakaran
Abstract:
We present a novel approach to computational modeling of social interactions based on modeling of essential social interaction predicates (ESIPs) such as joint attention and entrainment. Based on sound social psychological theory and methodology, we collect a new "Tower Game" dataset consisting of audio-visual capture of dyadic interactions labeled with the ESIPs. We expect this dataset to provide…
▽ More
We present a novel approach to computational modeling of social interactions based on modeling of essential social interaction predicates (ESIPs) such as joint attention and entrainment. Based on sound social psychological theory and methodology, we collect a new "Tower Game" dataset consisting of audio-visual capture of dyadic interactions labeled with the ESIPs. We expect this dataset to provide a new avenue for research in computational social interaction modeling. We propose a novel joint Discriminative Conditional Restricted Boltzmann Machine (DCRBM) model that combines a discriminative component with the generative power of CRBMs. Such a combination enables us to uncover actionable constituents of the ESIPs in two steps. First, we train the DCRBM model on the labeled data and get accurate (76\%-49\% across various ESIPs) detection of the predicates. Second, we exploit the generative capability of DCRBMs to activate the trained model so as to generate the lower-level data corresponding to the specific ESIP that closely matches the actual training data (with mean square error 0.01-0.1 for generating 100 frames). We are thus able to decompose the ESIPs into their constituent actionable behaviors. Such a purely computational determination of how to establish an ESIP such as engagement is unprecedented.
△ Less
Submitted 28 May, 2015; v1 submitted 6 May, 2015;
originally announced May 2015.