research-article

AesMamba: Universal Image Aesthetic Assessment with State Space Models

Authors:

Fei Gao,

Yuhao Lin,

Jiaqi Shi,

Maoying Qiao,

Nannan WangAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 7444 - 7453

https://doi.org/10.1145/3664647.3681011

Published: 28 October 2024 Publication History

Get Access

Abstract

Image Aesthetic Assessment (IAA) aims to objectively predict the generic or personalized evaluations, of the aesthetic or fine-grained multi-attributes, based on visual or multimodal inputs. Previously, researchers have designed diverse and specialized methods, for specific IAA tasks, based on different input-output situations. Is it possible to design a universal IAA framework applicable for the whole IAA task taxonomy? In this paper, we explore this issue, and propose a modular IAA framework, dubbed AesMamba. Specially, we use the Visual State Space Model (VMamba), instead of CNNs or ViTs, to learn comprehensive representations of aesthetic-related attributes; because VMamba can efficiently achieve both global and local effective receptive fields. Afterward, a modal-adaptive module is used to automatically produce the integrated representations, conditioned on the type of input. In the prediction module, we propose a Multitask Balanced Adaptation (MBA) module, to boost task-specific features, with emphasis on the tail instances. Finally, we formulate the personalized IAA task as a multimodal learning problem, by converting a user's anonymous subject characters to a text prompt. This prompting strategy effectively employs the semantics of flexibly selected characters, for inferring individual preferences. AesMamba can be applied to diverse IAA tasks, through flexible combination of these modules. Extensive experiments on numerous datasets, demonstrate that AesMamba consistently achieves superior or competitive performance, on all IAA tasks, in comparison with previous SOTA methods. The code has been released at https://github.com/AiArt-Gao/AesMamba Github.

References

[1]

Seyed Ali Amirshahi, Gregor Uwe Hayn-Leichsenring, Joachim Denzler, and Christoph Redies. 2015. Jenaesthetics subjective dataset: analyzing paintings by subjective scores. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops. Springer, 3--19.

Abstract

References

Index Terms

Recommendations

Attribute-Driven Multimodal Hierarchical Prompts for Image Aesthetic Quality Assessment

BMI-Net: A Brain-inspired Multimodal Interaction Network for Image Aesthetic Assessment

Learning Task Grouping using Supervised Task Space Partitioning in Lifelong Multitask Learning

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations