research-article

HyperLearn: A Distributed Approach for Representation Learning in Datasets With Many Modalities

Authors:

Devanshu Arya,

Stevan Rudinac,

Marcel WorringAuthors Info & Claims

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

Pages 2245 - 2253

https://doi.org/10.1145/3343031.3350572

Published: 15 October 2019 Publication History

Get Access

Abstract

Multimodal datasets contain an enormous amount of relational information, which grows exponentially with the introduction of new modalities. Learning representations in such a scenario is inherently complex due to the presence of multiple heterogeneous information channels. These channels can encode both (a) inter-relations between the items of different modalities and (b) intra-relations between the items of the same modality. Encoding multimedia items into a continuous low-dimensional semantic space such that both types of relations are captured and preserved is extremely challenging, especially if the goal is a unified end-to-end learning framework. The two key challenges that need to be addressed are: 1) the framework must be able to merge complex intra and inter relations without losing any valuable information and 2) the learning model should be invariant to the addition of new and potentially very different modalities. In this paper, we propose a flexible framework which can scale to data streams from many modalities. To that end we introduce a hypergraph-based model for data representation and deploy Graph Convolutional Networks to fuse relational information within and across modalities. Our approach provides an efficient solution for distributing otherwise extremely computationally expensive or even unfeasible training processes across multiple-GPUs, without any sacrifices in accuracy. Moreover, adding new modalities to our model requires only an additional GPU unit keeping the computational time unchanged, which brings representation learning to truly multimodal datasets. We demonstrate the feasibility of our approach in the experiments on multimedia datasets featuring second, third and fourth order relations.

References

[1]

Devanshu Arya and Marcel Worring. 2018. Exploiting Relational Information in Social Networks using Geometric Deep Learning on Hypergraphs. In Proceedings of the 2018 ACM International Conference on Multimedia Retrieval. ACM, 117--125.

Abstract

References

Cited By

Index Terms

Recommendations

Adapt and explore: Multimodal mixup for representation learning

How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents

Learning from the global view: Supervised contrastive learning of multimodal representation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations