Search | arXiv e-print repository

Sensor-Aware Classifiers for Energy-Efficient Time Series Applications on IoT Devices

Authors: Dina Hussein, Lubah Nelson, Ganapati Bhat

Abstract: Time-series data processing is an important component of many real-world applications, such as health monitoring, environmental monitoring, and digital agriculture. These applications collect distinct windows of sensor data (e.g., few seconds) and process them to assess the environment. Machine learning (ML) models are being employed in time-series applications due to their generalization abilitie… ▽ More Time-series data processing is an important component of many real-world applications, such as health monitoring, environmental monitoring, and digital agriculture. These applications collect distinct windows of sensor data (e.g., few seconds) and process them to assess the environment. Machine learning (ML) models are being employed in time-series applications due to their generalization abilities for classification. State-of-the-art time-series applications wait for entire sensor data window to become available before processing the data using ML algorithms, resulting in high sensor energy consumption. However, not all situations require processing full sensor window to make accurate inference. For instance, in activity recognition, sitting and standing activities can be inferred with partial windows. Using this insight, we propose to employ early exit classifiers with partial sensor windows to minimize energy consumption while maintaining accuracy. Specifically, we first utilize multiple early exits with successively increasing amount of data as they become available in a window. If early exits provide inference with high confidence, we return the label and enter low power mode for sensors. The proposed approach has potential to enable significant energy savings in time series applications. We utilize neural networks and random forest classifiers to evaluate our approach. Our evaluations with six datasets show that the proposed approach enables up to 50-60% energy savings on average without any impact on accuracy. The energy savings can enable time-series applications in remote locations with limited energy availability. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2211.15175 [pdf, ps, other]

Automating and Mechanizing Cutoff-based Verification of Distributed Protocols

Authors: Shreesha G. Bhat, Kartik Nagar

Abstract: Distributed protocols are generally parametric and can be executed on a system with any number of nodes, and hence proving their correctness becomes an infinite state verification problem. The most popular approach for verifying distributed protocols is to find an inductive invariant which is strong enough to prove the required safety property. However, finding inductive invariants is known to be… ▽ More Distributed protocols are generally parametric and can be executed on a system with any number of nodes, and hence proving their correctness becomes an infinite state verification problem. The most popular approach for verifying distributed protocols is to find an inductive invariant which is strong enough to prove the required safety property. However, finding inductive invariants is known to be notoriously hard, and is especially harder in the context of distributed protocols which are quite complex due to their asynchronous nature. In this work, we investigate an orthogonal cut-off based approach to verifying distributed protocols which sidesteps the problem of finding an inductive invariant, and instead reduces checking correctness to a finite state verification problem. The main idea is to find a finite, fixed protocol instance called the cutoff instance, such that if the cutoff instance is safe, then any protocol instance would also be safe. Previous cutoff based approaches have only been applied to a restricted class of protocols and specifications. We formalize the cutoff approach in the context of a general protocol modeling language (RML), and identify sufficient conditions which can be efficiently encoded in SMT to check whether a given protocol instance is a cutoff instance. Further, we propose a simple static analysis-based algorithm to automatically synthesize a cut-off instance. We have applied our approach successfully on a number of complex distributed protocols, providing the first known cut-off results for many of them. △ Less

Submitted 28 November, 2022; originally announced November 2022.

Comments: 27 pages

arXiv:2210.05008 [pdf, other]

Fast Hierarchical Learning for Few-Shot Object Detection

Authors: Yihang She, Goutam Bhat, Martin Danelljan, Fisher Yu

Abstract: Transfer learning based approaches have recently achieved promising results on the few-shot detection task. These approaches however suffer from ``catastrophic forgetting'' issue due to finetuning of base detector, leading to sub-optimal performance on the base classes. Furthermore, the slow convergence rate of stochastic gradient descent (SGD) results in high latency and consequently restricts re… ▽ More Transfer learning based approaches have recently achieved promising results on the few-shot detection task. These approaches however suffer from ``catastrophic forgetting'' issue due to finetuning of base detector, leading to sub-optimal performance on the base classes. Furthermore, the slow convergence rate of stochastic gradient descent (SGD) results in high latency and consequently restricts real-time applications. We tackle the aforementioned issues in this work. We pose few-shot detection as a hierarchical learning problem, where the novel classes are treated as the child classes of existing base classes and the background class. The detection heads for the novel classes are then trained using a specialized optimization strategy, leading to significantly lower training times compared to SGD. Our approach obtains competitive novel class performance on few-shot MS-COCO benchmark, while completely retaining the performance of the initial model on the base classes. We further demonstrate the application of our approach to a new class-refined few-shot detection task. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: 8 pages, 5 figures, accepted by IROS2022

arXiv:2203.12692 [pdf, other]

Affective Feedback Synthesis Towards Multimodal Text and Image Data

Authors: Puneet Kumar, Gaurav Bhat, Omkar Ingle, Daksh Goyal, Balasubramanian Raman

Abstract: In this paper, we have defined a novel task of affective feedback synthesis that deals with generating feedback for input text & corresponding image in a similar way as humans respond towards the multimodal data. A feedback synthesis system has been proposed and trained using ground-truth human comments along with image-text input. We have also constructed a large-scale dataset consisting of image… ▽ More In this paper, we have defined a novel task of affective feedback synthesis that deals with generating feedback for input text & corresponding image in a similar way as humans respond towards the multimodal data. A feedback synthesis system has been proposed and trained using ground-truth human comments along with image-text input. We have also constructed a large-scale dataset consisting of image, text, Twitter user comments, and the number of likes for the comments by crawling the news articles through Twitter feeds. The proposed system extracts textual features using a transformer-based textual encoder while the visual features have been extracted using a Faster region-based convolutional neural networks model. The textual and visual features have been concatenated to construct the multimodal features using which the decoder synthesizes the feedback. We have compared the results of the proposed system with the baseline models using quantitative and qualitative measures. The generated feedbacks have been analyzed using automatic and human evaluation. They have been found to be semantically similar to the ground-truth comments and relevant to the given text-image input. △ Less

Submitted 31 March, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

Comments: Submitted to ACM Transactions on Multimedia Computing, Communications, and Applications

arXiv:2203.11192 [pdf, other]

Transforming Model Prediction for Tracking

Authors: Christoph Mayer, Martin Danelljan, Goutam Bhat, Matthieu Paul, Danda Pani Paudel, Fisher Yu, Luc Van Gool

Abstract: Optimization based tracking methods have been widely successful by integrating a target model prediction module, providing effective global reasoning by minimizing an objective function. While this inductive bias integrates valuable domain knowledge, it limits the expressivity of the tracking network. In this work, we therefore propose a tracker architecture employing a Transformer-based model pre… ▽ More Optimization based tracking methods have been widely successful by integrating a target model prediction module, providing effective global reasoning by minimizing an objective function. While this inductive bias integrates valuable domain knowledge, it limits the expressivity of the tracking network. In this work, we therefore propose a tracker architecture employing a Transformer-based model prediction module. Transformers capture global relations with little inductive bias, allowing it to learn the prediction of more powerful target models. We further extend the model predictor to estimate a second set of weights that are applied for accurate bounding box regression. The resulting tracker relies on training and on test frame information in order to predict all weights transductively. We train the proposed tracker end-to-end and validate its performance by conducting comprehensive experiments on multiple tracking datasets. Our tracker sets a new state of the art on three benchmarks, achieving an AUC of 68.5% on the challenging LaSOT dataset. △ Less

Submitted 21 March, 2022; originally announced March 2022.

Comments: Accepted at CVPR 2022. The code and trained models are available at https://github.com/visionml/pytracking

arXiv:2201.07888 [pdf, other]

Adaptive Energy Management for Self-Sustainable Wearables in Mobile Health

Authors: Dina Hussein, Ganapati Bhat, Janardhan Rao Doppa

Abstract: Wearable devices that integrate multiple sensors, processors, and communication technologies have the potential to transform mobile health for remote monitoring of health parameters. However, the small form factor of the wearable devices limits the battery size and operating lifetime. As a result, the devices require frequent recharging, which has limited their widespread adoption. Energy harvesti… ▽ More Wearable devices that integrate multiple sensors, processors, and communication technologies have the potential to transform mobile health for remote monitoring of health parameters. However, the small form factor of the wearable devices limits the battery size and operating lifetime. As a result, the devices require frequent recharging, which has limited their widespread adoption. Energy harvesting has emerged as an effective method towards sustainable operation of wearable devices. Unfortunately, energy harvesting alone is not sufficient to fulfill the energy requirements of wearable devices. This paper studies the novel problem of adaptive energy management towards the goal of self-sustainable wearables by using harvested energy to supplement the battery energy and to reduce manual recharging by users. To solve this problem, we propose a principled algorithm referred as AdaEM. There are two key ideas behind AdaEM. First, it uses machine learning (ML) methods to learn predictive models of user activity and energy usage patterns. These models allow us to estimate the potential of energy harvesting in a day as a function of the user activities. Second, it reasons about the uncertainty in predictions and estimations from the ML models to optimize the energy management decisions using a dynamic robust optimization (DyRO) formulation. We propose a light-weight solution for DyRO to meet the practical needs of deployment. We validate the AdaEM approach on a wearable device prototype consisting of solar and motion energy harvesting using real-world data of user activities. Experiments show that AdaEM achieves solutions that are within 5% of the optimal with less than 0.005% execution time and energy overhead. △ Less

Submitted 16 January, 2022; originally announced January 2022.

Comments: To be presented at AAAI 2022

arXiv:2108.08286 [pdf, other]

Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

Authors: Goutam Bhat, Martin Danelljan, Fisher Yu, Luc Van Gool, Radu Timofte

Abstract: We propose a deep reparametrization of the maximum a posteriori formulation commonly employed in multi-frame image restoration tasks. Our approach is derived by introducing a learned error metric and a latent representation of the target image, which transforms the MAP objective to a deep feature space. The deep reparametrization allows us to directly model the image formation process in the laten… ▽ More We propose a deep reparametrization of the maximum a posteriori formulation commonly employed in multi-frame image restoration tasks. Our approach is derived by introducing a learned error metric and a latent representation of the target image, which transforms the MAP objective to a deep feature space. The deep reparametrization allows us to directly model the image formation process in the latent space, and to integrate learned image priors into the prediction. Our approach thereby leverages the advantages of deep learning, while also benefiting from the principled multi-frame fusion provided by the classical MAP formulation. We validate our approach through comprehensive experiments on burst denoising and burst super-resolution datasets. Our approach sets a new state-of-the-art for both tasks, demonstrating the generality and effectiveness of the proposed formulation. △ Less

Submitted 18 August, 2021; originally announced August 2021.

Comments: ICCV 2021 Oral

arXiv:2106.03839 [pdf, other]

NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

Authors: Goutam Bhat, Martin Danelljan, Radu Timofte, Kazutoshi Akita, Wooyeong Cho, Haoqiang Fan, Lanpeng Jia, Daeshik Kim, Bruno Lecouat, Youwei Li, Shuaicheng Liu, Ziluan Liu, Ziwei Luo, Takahiro Maeda, Julien Mairal, Christian Micheloni, Xuan Mo, Takeru Oba, Pavel Ostyakov, Jean Ponce, Sanghyeok Son, Jian Sun, Norimichi Ukita, Rao Muhammad Umer, Youliang Yan , et al. (3 additional authors not shown)

Abstract: This paper reviews the NTIRE2021 challenge on burst super-resolution. Given a RAW noisy burst as input, the task in the challenge was to generate a clean RGB image with 4 times higher resolution. The challenge contained two tracks; Track 1 evaluating on synthetically generated data, and Track 2 using real-world bursts from mobile camera. In the final testing phase, 6 teams submitted results using… ▽ More This paper reviews the NTIRE2021 challenge on burst super-resolution. Given a RAW noisy burst as input, the task in the challenge was to generate a clean RGB image with 4 times higher resolution. The challenge contained two tracks; Track 1 evaluating on synthetically generated data, and Track 2 using real-world bursts from mobile camera. In the final testing phase, 6 teams submitted results using a diverse set of solutions. The top-performing methods set a new state-of-the-art for the burst super-resolution task. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: NTIRE 2021 Burst Super-Resolution challenge report

arXiv:2105.09282 [pdf, other]

Learning Pareto-Frontier Resource Management Policies for Heterogeneous SoCs: An Information-Theoretic Approach

Authors: Aryan Deshwal, Syrine Belakaria, Ganapati Bhat, Janardhan Rao Doppa, Partha Pratim Pande

Abstract: Mobile system-on-chips (SoCs) are growing in their complexity and heterogeneity (e.g., Arm's Big-Little architecture) to meet the needs of emerging applications, including games and artificial intelligence. This makes it very challenging to optimally manage the resources (e.g., controlling the number and frequency of different types of cores) at runtime to meet the desired trade-offs among multipl… ▽ More Mobile system-on-chips (SoCs) are growing in their complexity and heterogeneity (e.g., Arm's Big-Little architecture) to meet the needs of emerging applications, including games and artificial intelligence. This makes it very challenging to optimally manage the resources (e.g., controlling the number and frequency of different types of cores) at runtime to meet the desired trade-offs among multiple objectives such as performance and energy. This paper proposes a novel information-theoretic framework referred to as PaRMIS to create Pareto-optimal resource management policies for given target applications and design objectives. PaRMIS specifies parametric policies to manage resources and learns statistical models from candidate policy evaluation data in the form of target design objective values. The key idea is to select a candidate policy for evaluation in each iteration guided by statistical models that maximize the information gain about the true Pareto front. Experiments on a commercial heterogeneous SoC show that PaRMIS achieves better Pareto fronts and is easily usable to optimize complex objectives (e.g., performance per Watt) when compared to prior methods. △ Less

Submitted 14 April, 2021; originally announced May 2021.

Comments: To be published in proceedings DAC

arXiv:2102.13605 [pdf, other]

ECO: Enabling Energy-Neutral IoT Devices through Runtime Allocation of Harvested Energy

Authors: Yigit Tuncel, Ganapati Bhat, Jaehyun Park, Umit Ogras

Abstract: Energy harvesting offers an attractive and promising mechanism to power low-energy devices. However, it alone is insufficient to enable an energy-neutral operation, which can eliminate tedious battery charging and replacement requirements. Achieving an energy-neutral operation is challenging since the uncertainties in harvested energy undermine the quality of service requirements. To address this… ▽ More Energy harvesting offers an attractive and promising mechanism to power low-energy devices. However, it alone is insufficient to enable an energy-neutral operation, which can eliminate tedious battery charging and replacement requirements. Achieving an energy-neutral operation is challenging since the uncertainties in harvested energy undermine the quality of service requirements. To address this challenge, we present a runtime energy-allocation framework that optimizes the utility of the target device under energy constraints using a rollout algorithm, which is a sequential approach to solve dynamic optimization problems. The proposed framework uses an efficient iterative algorithm to compute initial energy allocations at the beginning of a day. The initial allocations are then corrected at every interval to compensate for the deviations from the expected energy harvesting pattern. We evaluate this framework using solar and motion energy harvesting modalities and American Time Use Survey data from 4772 different users. Compared to prior techniques, the proposed framework achieves up to 35% higher utility even under energy-limited scenarios. Moreover, measurements on a wearable device prototype show that the proposed framework has 1000x smaller energy overhead than iterative approaches with a negligible loss in utility. △ Less

Submitted 10 September, 2021; v1 submitted 26 February, 2021; originally announced February 2021.

arXiv:2101.10997 [pdf, other]

Deep Burst Super-Resolution

Authors: Goutam Bhat, Martin Danelljan, Luc Van Gool, Radu Timofte

Abstract: While single-image super-resolution (SISR) has attracted substantial interest in recent years, the proposed approaches are limited to learning image priors in order to add high frequency details. In contrast, multi-frame super-resolution (MFSR) offers the possibility of reconstructing rich details by combining signal information from multiple shifted images. This key advantage, along with the incr… ▽ More While single-image super-resolution (SISR) has attracted substantial interest in recent years, the proposed approaches are limited to learning image priors in order to add high frequency details. In contrast, multi-frame super-resolution (MFSR) offers the possibility of reconstructing rich details by combining signal information from multiple shifted images. This key advantage, along with the increasing popularity of burst photography, have made MFSR an important problem for real-world applications. We propose a novel architecture for the burst super-resolution task. Our network takes multiple noisy RAW images as input, and generates a denoised, super-resolved RGB image as output. This is achieved by explicitly aligning deep embeddings of the input frames using pixel-wise optical flow. The information from all frames are then adaptively merged using an attention-based fusion module. In order to enable training and evaluation on real-world data, we additionally introduce the BurstSR dataset, consisting of smartphone bursts and high-resolution DSLR ground-truth. We perform comprehensive experimental analysis, demonstrating the effectiveness of the proposed architecture. △ Less

Submitted 6 April, 2021; v1 submitted 26 January, 2021; originally announced January 2021.

arXiv:2101.02196 [pdf, other]

Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in Videos

Authors: Bin Zhao, Goutam Bhat, Martin Danelljan, Luc Van Gool, Radu Timofte

Abstract: Segmenting objects in videos is a fundamental computer vision task. The current deep learning based paradigm offers a powerful, but data-hungry solution. However, current datasets are limited by the cost and human effort of annotating object masks in videos. This effectively limits the performance and generalization capabilities of existing video segmentation methods. To address this issue, we exp… ▽ More Segmenting objects in videos is a fundamental computer vision task. The current deep learning based paradigm offers a powerful, but data-hungry solution. However, current datasets are limited by the cost and human effort of annotating object masks in videos. This effectively limits the performance and generalization capabilities of existing video segmentation methods. To address this issue, we explore weaker form of bounding box annotations. We introduce a method for generating segmentation masks from per-frame bounding box annotations in videos. To this end, we propose a spatio-temporal aggregation module that effectively mines consistencies in the object and background appearance across multiple frames. We use our resulting accurate masks for weakly supervised training of video object segmentation (VOS) networks. We generate segmentation masks for large scale tracking datasets, using only their bounding box annotations. The additional data provides substantially better generalization performance leading to state-of-the-art results in both the VOS and more challenging tracking domain. △ Less

Submitted 6 January, 2021; originally announced January 2021.

arXiv:2012.06310 [pdf]

Artificial Intelligence for COVID-19 Detection -- A state-of-the-art review

Authors: Parsa Sarosh, Shabir A. Parah, Romany F Mansur, G. M. Bhat

Abstract: The emergence of COVID-19 has necessitated many efforts by the scientific community for its proper management. An urgent clinical reaction is required in the face of the unending devastation being caused by the pandemic. These efforts include technological innovations for improvement in screening, treatment, vaccine development, contact tracing and, survival prediction. The use of Deep Learning (D… ▽ More The emergence of COVID-19 has necessitated many efforts by the scientific community for its proper management. An urgent clinical reaction is required in the face of the unending devastation being caused by the pandemic. These efforts include technological innovations for improvement in screening, treatment, vaccine development, contact tracing and, survival prediction. The use of Deep Learning (DL) and Artificial Intelligence (AI) can be sought in all of the above-mentioned spheres. This paper aims to review the role of Deep Learning and Artificial intelligence in various aspects of the overall COVID-19 management and particularly for COVID-19 detection and classification. The DL models are developed to analyze clinical modalities like CT scans and X-Ray images of patients and predict their pathological condition. A DL model aims to detect the COVID-19 pneumonia, classify and distinguish between COVID-19, Community-Acquired Pneumonia (CAP), Viral and Bacterial pneumonia, and normal conditions. Furthermore, sophisticated models can be built to segment the affected area in the lungs and quantify the infection volume for a better understanding of the extent of damage. Many models have been developed either independently or with the help of pre-trained models like VGG19, ResNet50, and AlexNet leveraging the concept of transfer learning. Apart from model development, data preprocessing and augmentation are also performed to cope with the challenge of insufficient data samples often encountered in medical applications. It can be evaluated that DL and AI can be effectively implemented to withstand the challenges posed by the global emergency △ Less

Submitted 25 November, 2020; originally announced December 2020.

arXiv:2012.04479 [pdf, other]

Transfer Learning for Human Activity Recognition using Representational Analysis of Neural Networks

Authors: Sizhe An, Ganapati Bhat, Suat Gumussoy, Umit Ogras

Abstract: Human activity recognition (HAR) research has increased in recent years due to its applications in mobile health monitoring, activity recognition, and patient rehabilitation. The typical approach is training a HAR classifier offline with known users and then using the same classifier for new users. However, the accuracy for new users can be low with this approach if their activity patterns are dif… ▽ More Human activity recognition (HAR) research has increased in recent years due to its applications in mobile health monitoring, activity recognition, and patient rehabilitation. The typical approach is training a HAR classifier offline with known users and then using the same classifier for new users. However, the accuracy for new users can be low with this approach if their activity patterns are different than those in the training data. At the same time, training from scratch for new users is not feasible for mobile applications due to the high computational cost and training time. To address this issue, we propose a HAR transfer learning framework with two components. First, a representational analysis reveals common features that can transfer across users and user-specific features that need to be customized. Using this insight, we transfer the reusable portion of the offline classifier to new users and fine-tune only the rest. Our experiments with five datasets show up to 43% accuracy improvement and 66% training time reduction when compared to the baseline without using transfer learning. Furthermore, measurements on the Nvidia Jetson Xavier-NX hardware platform reveal that the power and energy consumption decrease by 43% and 68%, respectively, while achieving the same or higher accuracy as training from scratch. △ Less

Submitted 23 February, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

arXiv:2008.04109 [pdf]

Deep Q-Network Based Multi-agent Reinforcement Learning with Binary Action Agents

Authors: Abdul Mueed Hafiz, Ghulam Mohiuddin Bhat

Abstract: Deep Q-Network (DQN) based multi-agent systems (MAS) for reinforcement learning (RL) use various schemes where in the agents have to learn and communicate. The learning is however specific to each agent and communication may be satisfactorily designed for the agents. As more complex Deep QNetworks come to the fore, the overall complexity of the multi-agent system increases leading to issues like d… ▽ More Deep Q-Network (DQN) based multi-agent systems (MAS) for reinforcement learning (RL) use various schemes where in the agents have to learn and communicate. The learning is however specific to each agent and communication may be satisfactorily designed for the agents. As more complex Deep QNetworks come to the fore, the overall complexity of the multi-agent system increases leading to issues like difficulty in training, need for higher resources and more training time, difficulty in fine-tuning, etc. To address these issues we propose a simple but efficient DQN based MAS for RL which uses shared state and rewards, but agent-specific actions, for updation of the experience replay pool of the DQNs, where each agent is a DQN. The benefits of the approach are overall simplicity, faster convergence and better performance as compared to conventional DQN based approaches. It should be noted that the method can be extended to any DQN. As such we use simple DQN and DDQN (Double Q-learning) respectively on three separate tasks i.e. Cartpole-v1 (OpenAI Gym environment) , LunarLander-v2 (OpenAI Gym environment) and Maze Traversal (customized environment). The proposed approach outperforms the baseline on these tasks by decent margins respectively. △ Less

Submitted 6 August, 2020; originally announced August 2020.

arXiv:2008.00829 [pdf]

Deep Network Ensemble Learning applied to Image Classification using CNN Trees

Authors: Abdul Mueed Hafiz, Ghulam Mohiuddin Bhat

Abstract: Traditional machine learning approaches may fail to perform satisfactorily when dealing with complex data. In this context, the importance of data mining evolves w.r.t. building an efficient knowledge discovery and mining framework. Ensemble learning is aimed at integration of fusion, modeling and mining of data into a unified model. However, traditional ensemble learning methods are complex and h… ▽ More Traditional machine learning approaches may fail to perform satisfactorily when dealing with complex data. In this context, the importance of data mining evolves w.r.t. building an efficient knowledge discovery and mining framework. Ensemble learning is aimed at integration of fusion, modeling and mining of data into a unified model. However, traditional ensemble learning methods are complex and have optimization or tuning problems. In this paper, we propose a simple, sequential, efficient, ensemble learning approach using multiple deep networks. The deep network used in the ensembles is ResNet50. The model draws inspiration from binary decision/classification trees. The proposed approach is compared against the baseline viz. the single classifier approach i.e. using a single multiclass ResNet50 on the ImageNet and Natural Images datasets. Our approach outperforms the baseline on all experiments on the ImageNet dataset. Code is available in https://github.com/mueedhafiz1982/CNNTreeEnsemble.git △ Less

Submitted 23 July, 2020; originally announced August 2020.

arXiv:2007.01193 [pdf]

Reinforcement Learning Based Handwritten Digit Recognition with Two-State Q-Learning

Authors: Abdul Mueed Hafiz, Ghulam Mohiuddin Bhat

Abstract: We present a simple yet efficient Hybrid Classifier based on Deep Learning and Reinforcement Learning. Q-Learning is used with two Q-states and four actions. Conventional techniques use feature maps extracted from Convolutional Neural Networks (CNNs) and include them in the Qstates along with past history. This leads to difficulties with these approaches as the number of states is very large numbe… ▽ More We present a simple yet efficient Hybrid Classifier based on Deep Learning and Reinforcement Learning. Q-Learning is used with two Q-states and four actions. Conventional techniques use feature maps extracted from Convolutional Neural Networks (CNNs) and include them in the Qstates along with past history. This leads to difficulties with these approaches as the number of states is very large number due to high dimensions of the feature maps. Since our method uses only two Q-states it is simple and has much lesser number of parameters to optimize and also thus has a straightforward reward function. Also, the approach uses unexplored actions for image processing vis-a-vis other contemporary techniques. Three datasets have been used for benchmarking of the approach. These are the MNIST Digit Image Dataset, the USPS Digit Image Dataset and the MATLAB Digit Image Dataset. The performance of the proposed hybrid classifier has been compared with other contemporary techniques like a well-established Reinforcement Learning Technique, AlexNet, CNN-Nearest Neighbor Classifier and CNNSupport Vector Machine Classifier. Our approach outperforms these contemporary hybrid classifiers on all the three datasets used. △ Less

Submitted 10 August, 2020; v1 submitted 28 June, 2020; originally announced July 2020.

arXiv:2007.00047 [pdf]

doi 10.1007/s13735-020-00195-x

A Survey on Instance Segmentation: State of the art

Authors: Abdul Mueed Hafiz, Ghulam Mohiuddin Bhat

Abstract: Object detection or localization is an incremental step in progression from coarse to fine digital image inference. It not only provides the classes of the image objects, but also provides the location of the image objects which have been classified. The location is given in the form of bounding boxes or centroids. Semantic segmentation gives fine inference by predicting labels for every pixel in… ▽ More Object detection or localization is an incremental step in progression from coarse to fine digital image inference. It not only provides the classes of the image objects, but also provides the location of the image objects which have been classified. The location is given in the form of bounding boxes or centroids. Semantic segmentation gives fine inference by predicting labels for every pixel in the input image. Each pixel is labelled according to the object class within which it is enclosed. Furthering this evolution, instance segmentation gives different labels for separate instances of objects belonging to the same class. Hence, instance segmentation may be defined as the technique of simultaneously solving the problem of object detection as well as that of semantic segmentation. In this survey paper on instance segmentation -- its background, issues, techniques, evolution, popular datasets, related work up to the state of the art and future scope have been discussed. The paper provides valuable information for those who want to do research in the field of instance segmentation. △ Less

Submitted 28 June, 2020; originally announced July 2020.

Comments: Int J Multimed Info Retr (2020)

arXiv:2007.00046 [pdf]

Fast Training of Deep Networks with One-Class CNNs

Authors: Abdul Mueed Hafiz, Ghulam Mohiuddin Bhat

Abstract: One-class CNNs have shown promise in novelty detection. However, very less work has been done on extending them to multiclass classification. The proposed approach is a viable effort in this direction. It uses one-class CNNs i.e., it trains one CNN per class, for multiclass classification. An ensemble of such one-class CNNs is used for multiclass classification. The benefits of the approach are ge… ▽ More One-class CNNs have shown promise in novelty detection. However, very less work has been done on extending them to multiclass classification. The proposed approach is a viable effort in this direction. It uses one-class CNNs i.e., it trains one CNN per class, for multiclass classification. An ensemble of such one-class CNNs is used for multiclass classification. The benefits of the approach are generally better recognition accuracy while taking almost even half or two-thirds of the training time of a conventional multi-class deep network. The proposed approach has been applied successfully to face recognition and object recognition tasks. For face recognition, a 1000 frame RGB video, featuring many faces together, has been used for benchmarking of the proposed approach. Its database is available on request via e-mail. For object recognition, the Caltech-101 Image Database and 17Flowers Dataset have also been used. The experimental results support the claims made. △ Less

Submitted 22 July, 2020; v1 submitted 28 June, 2020; originally announced July 2020.

Comments: Camera Ready: 2nd International Conference on Cybernetics, Cognition and Machine Learning Applications(ICCCMLA), 2020, India

arXiv:2004.14532 [pdf, other]

Hierarchical Encoders for Modeling and Interpreting Screenplays

Authors: Gayatri Bhat, Avneesh Saluja, Melody Dye, Jan Florjanczyk

Abstract: While natural language understanding of long-form documents is still an open challenge, such documents often contain structural information that can inform the design of models for encoding them. Movie scripts are an example of such richly structured text - scripts are segmented into scenes, which are further decomposed into dialogue and descriptive components. In this work, we propose a neural ar… ▽ More While natural language understanding of long-form documents is still an open challenge, such documents often contain structural information that can inform the design of models for encoding them. Movie scripts are an example of such richly structured text - scripts are segmented into scenes, which are further decomposed into dialogue and descriptive components. In this work, we propose a neural architecture for encoding this structure, which performs robustly on a pair of multi-label tag classification datasets, without the need for handcrafted features. We add a layer of insight by augmenting an unsupervised "interpretability" module to the encoder, allowing for the extraction and visualization of narrative trajectories. Though this work specifically tackles screenplays, we discuss how the underlying approach can be generalized to a range of structured documents. △ Less

Submitted 29 April, 2020; originally announced April 2020.

Comments: 12 pages, including references and appendix

arXiv:2003.11540 [pdf, other]

Learning What to Learn for Video Object Segmentation

Authors: Goutam Bhat, Felix Järemo Lawin, Martin Danelljan, Andreas Robinson, Michael Felsberg, Luc Van Gool, Radu Timofte

Abstract: Video object segmentation (VOS) is a highly challenging problem, since the target object is only defined during inference with a given first-frame reference mask. The problem of how to capture and utilize this limited target information remains a fundamental research question. We address this by introducing an end-to-end trainable VOS architecture that integrates a differentiable few-shot learning… ▽ More Video object segmentation (VOS) is a highly challenging problem, since the target object is only defined during inference with a given first-frame reference mask. The problem of how to capture and utilize this limited target information remains a fundamental research question. We address this by introducing an end-to-end trainable VOS architecture that integrates a differentiable few-shot learning module. This internal learner is designed to predict a powerful parametric model of the target by minimizing a segmentation error in the first frame. We further go beyond standard few-shot learning techniques by learning what the few-shot learner should learn. This allows us to achieve a rich internal representation of the target in the current frame, significantly increasing the segmentation accuracy of our approach. We perform extensive experiments on multiple benchmarks. Our approach sets a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5, corresponding to a 2.6% relative improvement over the previous best result. △ Less

Submitted 1 May, 2020; v1 submitted 25 March, 2020; originally announced March 2020.

Comments: First two authors contributed equally

arXiv:2003.11014 [pdf, other]

Know Your Surroundings: Exploiting Scene Information for Object Tracking

Authors: Goutam Bhat, Martin Danelljan, Luc Van Gool, Radu Timofte

Abstract: Current state-of-the-art trackers only rely on a target appearance model in order to localize the object in each frame. Such approaches are however prone to fail in case of e.g. fast appearance changes or presence of distractor objects, where a target appearance model alone is insufficient for robust tracking. Having the knowledge about the presence and locations of other objects in the surroundin… ▽ More Current state-of-the-art trackers only rely on a target appearance model in order to localize the object in each frame. Such approaches are however prone to fail in case of e.g. fast appearance changes or presence of distractor objects, where a target appearance model alone is insufficient for robust tracking. Having the knowledge about the presence and locations of other objects in the surrounding scene can be highly beneficial in such cases. This scene information can be propagated through the sequence and used to, for instance, explicitly avoid distractor objects and eliminate target candidate regions. In this work, we propose a novel tracking architecture which can utilize scene information for tracking. Our tracker represents such information as dense localized state vectors, which can encode, for example, if the local region is target, background, or distractor. These state vectors are propagated through the sequence and combined with the appearance model output to localize the target. Our network is learned to effectively utilize the scene information by directly maximizing tracking performance on video segments. The proposed approach sets a new state-of-the-art on 3 tracking benchmarks, achieving an AO score of 63.6% on the recent GOT-10k dataset. △ Less

Submitted 1 May, 2020; v1 submitted 24 March, 2020; originally announced March 2020.

arXiv:2003.09526 [pdf, other]

An Energy-Aware Online Learning Framework for Resource Management in Heterogeneous Platforms

Authors: Sumit K. Mandal, Ganapati Bhat, Janardhan Rao Doppa, Partha Pratim Pande, Umit Y. Ogras

Abstract: Mobile platforms must satisfy the contradictory requirements of fast response time and minimum energy consumption as a function of dynamically changing applications. To address this need, system-on-chips (SoC) that are at the heart of these devices provide a variety of control knobs, such as the number of active cores and their voltage/frequency levels. Controlling these knobs optimally at runtime… ▽ More Mobile platforms must satisfy the contradictory requirements of fast response time and minimum energy consumption as a function of dynamically changing applications. To address this need, system-on-chips (SoC) that are at the heart of these devices provide a variety of control knobs, such as the number of active cores and their voltage/frequency levels. Controlling these knobs optimally at runtime is challenging for two reasons. First, the large configuration space prohibits exhaustive solutions. Second, control policies designed offline are at best sub-optimal since many potential new applications are unknown at design-time. We address these challenges by proposing an online imitation learning approach. Our key idea is to construct an offline policy and adapt it online to new applications to optimize a given metric (e.g., energy). The proposed methodology leverages the supervision enabled by power-performance models learned at runtime. We demonstrate its effectiveness on a commercial mobile platform with 16 diverse benchmarks. Our approach successfully adapts the control policy to an unknown application after executing less than 25% of its instructions. △ Less

Submitted 20 March, 2020; originally announced March 2020.

Comments: This paper has been accepted to be published in a future issue of ACM TODAES

arXiv:1909.12297 [pdf, other]

Energy-Based Models for Deep Probabilistic Regression

Authors: Fredrik K. Gustafsson, Martin Danelljan, Goutam Bhat, Thomas B. Schön

Abstract: While deep learning-based classification is generally tackled using standardized approaches, a wide variety of techniques are employed for regression. In computer vision, one particularly popular such technique is that of confidence-based regression, which entails predicting a confidence value for each input-target pair (x,y). While this approach has demonstrated impressive results, it requires im… ▽ More While deep learning-based classification is generally tackled using standardized approaches, a wide variety of techniques are employed for regression. In computer vision, one particularly popular such technique is that of confidence-based regression, which entails predicting a confidence value for each input-target pair (x,y). While this approach has demonstrated impressive results, it requires important task-dependent design choices, and the predicted confidences lack a natural probabilistic meaning. We address these issues by proposing a general and conceptually simple regression method with a clear probabilistic interpretation. In our proposed approach, we create an energy-based model of the conditional target density p(y|x), using a deep neural network to predict the un-normalized density from (x,y). This model of p(y|x) is trained by directly minimizing the associated negative log-likelihood, approximated using Monte Carlo sampling. We perform comprehensive experiments on four computer vision regression tasks. Our approach outperforms direct regression, as well as other probabilistic and confidence-based methods. Notably, our model achieves a 2.2% AP improvement over Faster-RCNN for object detection on the COCO dataset, and sets a new state-of-the-art on visual tracking when applied for bounding box estimation. In contrast to confidence-based methods, our approach is also shown to be directly applicable to more general tasks such as age and head-pose estimation. Code is available at https://github.com/fregu856/ebms_regression. △ Less

Submitted 19 July, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

Comments: ECCV 2020. Code is available at https://github.com/fregu856/ebms_regression

arXiv:1907.10129 [pdf, other]

CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology

Authors: Aditi Chaudhary, Elizabeth Salesky, Gayatri Bhat, David R. Mortensen, Jaime G. Carbonell, Yulia Tsvetkov

Abstract: This paper presents the submission by the CMU-01 team to the SIGMORPHON 2019 task 2 of Morphological Analysis and Lemmatization in Context. This task requires us to produce the lemma and morpho-syntactic description of each token in a sequence, for 107 treebanks. We approach this task with a hierarchical neural conditional random field (CRF) model which predicts each coarse-grained feature (eg. PO… ▽ More This paper presents the submission by the CMU-01 team to the SIGMORPHON 2019 task 2 of Morphological Analysis and Lemmatization in Context. This task requires us to produce the lemma and morpho-syntactic description of each token in a sequence, for 107 treebanks. We approach this task with a hierarchical neural conditional random field (CRF) model which predicts each coarse-grained feature (eg. POS, Case, etc.) independently. However, most treebanks are under-resourced, thus making it challenging to train deep neural models for them. Hence, we propose a multi-lingual transfer training regime where we transfer from multiple related languages that share similar typology. △ Less

Submitted 23 July, 2019; originally announced July 2019.

Comments: In Proceedings of the ACL-SIGMORPHON 2019 Shared Task: Crosslinguality and Context in Morphology

arXiv:1904.09814 [pdf, other]

Power and Thermal Analysis of Commercial Mobile Platforms: Experiments and Case Studies

Authors: Ganapati Bhat, Suat Gumussoy, Umit Y. Ogras

Abstract: State-of-the-art mobile processors can deliver fast response time and high throughput to maximize the user experience. However, high performance comes at the expense of larger power density, which leads to higher skin temperatures. Since this can degrade the user experience, there is a strong need for power consumption and thermal analysis in mobile processors. In this paper, we first perform expe… ▽ More State-of-the-art mobile processors can deliver fast response time and high throughput to maximize the user experience. However, high performance comes at the expense of larger power density, which leads to higher skin temperatures. Since this can degrade the user experience, there is a strong need for power consumption and thermal analysis in mobile processors. In this paper, we first perform experiments on the Nexus 6P phone to study the power, performance and thermal behavior of modern smartphones. Using the insight from these experiments, we propose a control algorithm that throttles select applications without affecting other apps. We demonstrate our governor on the Exynos 5422 processor employed in the Odroid-XU3 board. △ Less

Submitted 19 March, 2019; originally announced April 2019.

Comments: To appear in proceedings of IEEE DATE 2019

arXiv:1904.07220 [pdf, other]

Learning Discriminative Model Prediction for Tracking

Authors: Goutam Bhat, Martin Danelljan, Luc Van Gool, Radu Timofte

Abstract: The current strive towards end-to-end trainable computer vision systems imposes major challenges for the task of visual tracking. In contrast to most other vision problems, tracking requires the learning of a robust target-specific appearance model online, during the inference stage. To be end-to-end trainable, the online learning of the target model thus needs to be embedded in the tracking archi… ▽ More The current strive towards end-to-end trainable computer vision systems imposes major challenges for the task of visual tracking. In contrast to most other vision problems, tracking requires the learning of a robust target-specific appearance model online, during the inference stage. To be end-to-end trainable, the online learning of the target model thus needs to be embedded in the tracking architecture itself. Due to the imposed challenges, the popular Siamese paradigm simply predicts a target feature template, while ignoring the background appearance information during inference. Consequently, the predicted model possesses limited target-background discriminability. We develop an end-to-end tracking architecture, capable of fully exploiting both target and background appearance information for target model prediction. Our architecture is derived from a discriminative learning loss by designing a dedicated optimization process that is capable of predicting a powerful model in only a few iterations. Furthermore, our approach is able to learn key aspects of the discriminative loss itself. The proposed tracker sets a new state-of-the-art on 6 tracking benchmarks, achieving an EAO score of 0.440 on VOT2018, while running at over 40 FPS. The code and models are available at https://github.com/visionml/pytracking. △ Less

Submitted 8 June, 2020; v1 submitted 15 April, 2019; originally announced April 2019.

arXiv:1904.04164 [pdf, other]

Contextual Affective Analysis: A Case Study of People Portrayals in Online #MeToo Stories

Authors: Anjalie Field, Gayatri Bhat, Yulia Tsvetkov

Abstract: In October 2017, numerous women accused producer Harvey Weinstein of sexual harassment. Their stories encouraged other women to voice allegations of sexual harassment against many high profile men, including politicians, actors, and producers. These events are broadly referred to as the #MeToo movement, named for the use of the hashtag "#metoo" on social media platforms like Twitter and Facebook.… ▽ More In October 2017, numerous women accused producer Harvey Weinstein of sexual harassment. Their stories encouraged other women to voice allegations of sexual harassment against many high profile men, including politicians, actors, and producers. These events are broadly referred to as the #MeToo movement, named for the use of the hashtag "#metoo" on social media platforms like Twitter and Facebook. The movement has widely been referred to as "empowering" because it has amplified the voices of previously unheard women over those of traditionally powerful men. In this work, we investigate dynamics of sentiment, power and agency in online media coverage of these events. Using a corpus of online media articles about the #MeToo movement, we present a contextual affective analysis---an entity-centric approach that uses contextualized lexicons to examine how people are portrayed in media articles. We show that while these articles are sympathetic towards women who have experienced sexual harassment, they consistently present men as most powerful, even after sexual assault allegations. While we focus on media coverage of the #MeToo movement, our method for contextual affective analysis readily generalizes to other domains. △ Less

Submitted 8 April, 2019; originally announced April 2019.

Comments: Accepted to ICWSM 2019

arXiv:1903.03168 [pdf, other]

OpenHealth: Open Source Platform for Wearable Health Monitoring

Authors: Ganapati Bhat, Ranadeep Deb, Umit Y. Ogras

Abstract: Movement disorders are becoming one of the leading causes of functional disability due to aging populations and extended life expectancy. Wearable health monitoring is emerging as an effective way to augment clinical care for movement disorders. However, wearable devices face a number of adaptation and technical challenges that hinder their widespread adoption. To address these challenges, we intr… ▽ More Movement disorders are becoming one of the leading causes of functional disability due to aging populations and extended life expectancy. Wearable health monitoring is emerging as an effective way to augment clinical care for movement disorders. However, wearable devices face a number of adaptation and technical challenges that hinder their widespread adoption. To address these challenges, we introduce OpenHealth, an open source platform for wearable health monitoring. OpenHealth aims to design a standard set of hardware/software and wearable devices that can enable autonomous collection of clinically relevant data. The OpenHealth platform includes a wearable device, standard software interfaces and reference implementations of human activity and gesture recognition applications. △ Less

Submitted 16 March, 2019; v1 submitted 18 February, 2019; originally announced March 2019.

Comments: To appear in a future issue of IEEE Design & Test

arXiv:1812.03916 [pdf]

doi 10.1109/LSP.2017.2750979

An individualized super Gaussian single microphone Speech Enhancement for hearing aid users with smartphone as an assistive device

Authors: Chandan K A Reddy, Nikhil Shankar, Gautam Bhat, Ram Charan, Issa Panahi

Abstract: In this letter, we derive a new super Gaussian Joint Maximum a Posteriori based single microphone speech enhancement gain function. The developed Speech Enhancement method is implemented on a smartphone, and this arrangement functions as an assistive device to hearing aids. We introduce a tradeoff parameter in the derived gain function that allows the smartphone user to customize their listening p… ▽ More In this letter, we derive a new super Gaussian Joint Maximum a Posteriori based single microphone speech enhancement gain function. The developed Speech Enhancement method is implemented on a smartphone, and this arrangement functions as an assistive device to hearing aids. We introduce a tradeoff parameter in the derived gain function that allows the smartphone user to customize their listening preference, by controlling the amount of noise suppression and speech distortion in real-time based on their level of hearing comfort perceived in noisy real world acoustic environment. Objective quality and intelligibility measures show the effectiveness of the proposed method in comparison to benchmark techniques considered in this paper. Subjective results reflect the usefulness of the developed Speech Enhancement application in real-world noisy conditions at signal to noise ratio levels of 0 dB and 5 dB. △ Less

Submitted 10 December, 2018; originally announced December 2018.

Comments: 5 pages

arXiv:1812.03914 [pdf]

A Computationally Efficient and Practically Feasible Two Microphones Blind Speech Separation Method

Authors: Chandan K A Reddy, Gautam Bhat, Nikhil Shankar, Issa Panahi

Abstract: Traditionally, Blind Speech Separation techniques are computationally expensive as they update the demixing matrix at every time frame index, making them impractical to use in many Real-Time applications. In this paper, a robust data-driven two-microphone sound source localization method is used as a criterion to reduce the computational complexity of the Independent Vector Analysis (IVA) Blind Sp… ▽ More Traditionally, Blind Speech Separation techniques are computationally expensive as they update the demixing matrix at every time frame index, making them impractical to use in many Real-Time applications. In this paper, a robust data-driven two-microphone sound source localization method is used as a criterion to reduce the computational complexity of the Independent Vector Analysis (IVA) Blind Speech Separation (BSS) method. IVA is used to separate convolutedly mixed speech and noise sources. The practical feasibility of the proposed method is proved by implementing it on a smartphone device to separate speech and noise in Real-World scenarios for Hearing-Aid applications. The experimental results with objective and subjective tests reveal the practical usability of the developed method in many real-world applications. △ Less

Submitted 10 December, 2018; originally announced December 2018.

Comments: 5 pages

arXiv:1811.07628 [pdf, other]

ATOM: Accurate Tracking by Overlap Maximization

Authors: Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg

Abstract: While recent years have witnessed astonishing improvements in visual tracking robustness, the advancements in tracking accuracy have been limited. As the focus has been directed towards the development of powerful classifiers, the problem of accurate target state estimation has been largely overlooked. In fact, most trackers resort to a simple multi-scale search in order to estimate the target bou… ▽ More While recent years have witnessed astonishing improvements in visual tracking robustness, the advancements in tracking accuracy have been limited. As the focus has been directed towards the development of powerful classifiers, the problem of accurate target state estimation has been largely overlooked. In fact, most trackers resort to a simple multi-scale search in order to estimate the target bounding box. We argue that this approach is fundamentally limited since target estimation is a complex task, requiring high-level knowledge about the object. We address this problem by proposing a novel tracking architecture, consisting of dedicated target estimation and classification components. High level knowledge is incorporated into the target estimation through extensive offline learning. Our target estimation component is trained to predict the overlap between the target object and an estimated bounding box. By carefully integrating target-specific information, our approach achieves previously unseen bounding box accuracy. We further introduce a classification component that is trained online to guarantee high discriminative power in the presence of distractors. Our final tracking framework sets a new state-of-the-art on five challenging benchmarks. On the new large-scale TrackingNet dataset, our tracker ATOM achieves a relative gain of 15% over the previous best approach, while running at over 30 FPS. Code and models are available at https://github.com/visionml/pytracking. △ Less

Submitted 11 April, 2019; v1 submitted 19 November, 2018; originally announced November 2018.

Comments: CVPR 2019 (Oral). Complete code and models are available at https://github.com/visionml/pytracking

arXiv:1808.08615 [pdf, other]

doi 10.1145/3240765.3240833

Online Human Activity Recognition using Low-Power Wearable Devices

Authors: Ganapati Bhat, Ranadeep Deb, Vatika Vardhan Chaurasia, Holly Shill, Umit Y. Ogras

Abstract: Human activity recognition~(HAR) has attracted significant research interest due to its applications in health monitoring and patient rehabilitation. Recent research on HAR focuses on using smartphones due to their widespread use. However, this leads to inconvenient use, limited choice of sensors and inefficient use of resources, since smartphones are not designed for HAR. This paper presents the… ▽ More Human activity recognition~(HAR) has attracted significant research interest due to its applications in health monitoring and patient rehabilitation. Recent research on HAR focuses on using smartphones due to their widespread use. However, this leads to inconvenient use, limited choice of sensors and inefficient use of resources, since smartphones are not designed for HAR. This paper presents the first HAR framework that can perform both online training and inference. The proposed framework starts with a novel technique that generates features using the fast Fourier and discrete wavelet transforms of a textile-based stretch sensor and accelerometer. Using these features, we design an artificial neural network classifier which is trained online using the policy gradient algorithm. Experiments on a low power IoT device (TI-CC2650 MCU) with nine users show 97.7% accuracy in identifying six activities and their transitions with less than 12.5 mW power consumption. △ Less

Submitted 4 February, 2019; v1 submitted 26 August, 2018; originally announced August 2018.

Comments: This is in proceedings of ICCAD 2018. The datasets are available at https://github.com/gmbhat/human-activity-recognition

arXiv:1804.06833 [pdf, other]

Unveiling the Power of Deep Tracking

Authors: Goutam Bhat, Joakim Johnander, Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg

Abstract: In the field of generic object tracking numerous attempts have been made to exploit deep features. Despite all expectations, deep trackers are yet to reach an outstanding level of performance compared to methods solely based on handcrafted features. In this paper, we investigate this key issue and propose an approach to unlock the true potential of deep features for tracking. We systematically stu… ▽ More In the field of generic object tracking numerous attempts have been made to exploit deep features. Despite all expectations, deep trackers are yet to reach an outstanding level of performance compared to methods solely based on handcrafted features. In this paper, we investigate this key issue and propose an approach to unlock the true potential of deep features for tracking. We systematically study the characteristics of both deep and shallow features, and their relation to tracking accuracy and robustness. We identify the limited data and low spatial resolution as the main challenges, and propose strategies to counter these issues when integrating deep features for tracking. Furthermore, we propose a novel adaptive fusion approach that leverages the complementary properties of deep and shallow features to improve both robustness and accuracy. Extensive experiments are performed on four challenging datasets. On VOT2017, our approach significantly outperforms the top performing tracker from the challenge with a relative gain of 17% in EAO. △ Less

Submitted 18 April, 2018; originally announced April 2018.

arXiv:1705.03428 [pdf, other]

Deep Projective 3D Semantic Segmentation

Authors: Felix Järemo Lawin, Martin Danelljan, Patrik Tosteberg, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg

Abstract: Semantic segmentation of 3D point clouds is a challenging problem with numerous real-world applications. While deep learning has revolutionized the field of image semantic segmentation, its impact on point cloud data has been limited so far. Recent attempts, based on 3D deep learning approaches (3D-CNNs), have achieved below-expected results. Such methods require voxelizations of the underlying po… ▽ More Semantic segmentation of 3D point clouds is a challenging problem with numerous real-world applications. While deep learning has revolutionized the field of image semantic segmentation, its impact on point cloud data has been limited so far. Recent attempts, based on 3D deep learning approaches (3D-CNNs), have achieved below-expected results. Such methods require voxelizations of the underlying point cloud data, leading to decreased spatial resolution and increased memory consumption. Additionally, 3D-CNNs greatly suffer from the limited availability of annotated datasets. In this paper, we propose an alternative framework that avoids the limitations of 3D-CNNs. Instead of directly solving the problem in 3D, we first project the point cloud onto a set of synthetic 2D-images. These images are then used as input to a 2D-CNN, designed for semantic segmentation. Finally, the obtained prediction scores are re-projected to the point cloud to obtain the segmentation results. We further investigate the impact of multiple modalities, such as color, depth and surface normals, in a multi-stream network architecture. Experiments are performed on the recent Semantic3D dataset. Our approach sets a new state-of-the-art by achieving a relative gain of 7.9 %, compared to the previous best approach. △ Less

Submitted 9 May, 2017; originally announced May 2017.

Comments: Submitted to CAIP 2017

arXiv:1612.04538 [pdf, ps, other]

Grammatical Constraints on Intra-sentential Code-Switching: From Theories to Working Models

Authors: Gayatri Bhat, Monojit Choudhury, Kalika Bali

Abstract: We make one of the first attempts to build working models for intra-sentential code-switching based on the Equivalence-Constraint (Poplack 1980) and Matrix-Language (Myers-Scotton 1993) theories. We conduct a detailed theoretical analysis, and a small-scale empirical study of the two models for Hindi-English CS. Our analyses show that the models are neither sound nor complete. Taking insights from… ▽ More We make one of the first attempts to build working models for intra-sentential code-switching based on the Equivalence-Constraint (Poplack 1980) and Matrix-Language (Myers-Scotton 1993) theories. We conduct a detailed theoretical analysis, and a small-scale empirical study of the two models for Hindi-English CS. Our analyses show that the models are neither sound nor complete. Taking insights from the errors made by the models, we propose a new model that combines features of both the theories. △ Less

Submitted 14 December, 2016; originally announced December 2016.

Comments: 13 pages

arXiv:1611.09224 [pdf, other]

ECO: Efficient Convolution Operators for Tracking

Authors: Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg

Abstract: In recent years, Discriminative Correlation Filter (DCF) based methods have significantly advanced the state-of-the-art in tracking. However, in the pursuit of ever increasing tracking performance, their characteristic speed and real-time capability have gradually faded. Further, the increasingly complex models, with massive number of trainable parameters, have introduced the risk of severe over-f… ▽ More In recent years, Discriminative Correlation Filter (DCF) based methods have significantly advanced the state-of-the-art in tracking. However, in the pursuit of ever increasing tracking performance, their characteristic speed and real-time capability have gradually faded. Further, the increasingly complex models, with massive number of trainable parameters, have introduced the risk of severe over-fitting. In this work, we tackle the key causes behind the problems of computational complexity and over-fitting, with the aim of simultaneously improving both speed and performance. We revisit the core DCF formulation and introduce: (i) a factorized convolution operator, which drastically reduces the number of parameters in the model; (ii) a compact generative model of the training sample distribution, that significantly reduces memory and time complexity, while providing better diversity of samples; (iii) a conservative model update strategy with improved robustness and reduced complexity. We perform comprehensive experiments on four benchmarks: VOT2016, UAV123, OTB-2015, and TempleColor. When using expensive deep features, our tracker provides a 20-fold speedup and achieves a 13.0% relative gain in Expected Average Overlap compared to the top ranked method in the VOT2016 challenge. Moreover, our fast variant, using hand-crafted features, operates at 60 Hz on a single CPU, while obtaining 65.0% AUC on OTB-2015. △ Less

Submitted 10 April, 2017; v1 submitted 28 November, 2016; originally announced November 2016.

Comments: Accepted at CVPR 2017. Includes supplementary material

Showing 1–37 of 37 results for author: Bhat, G