skip to main content
10.1145/3609395.3610597acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

LiveAE: Attention-based and Edge-assisted Viewport Prediction for Live 360° Video Streaming

Published: 26 September 2023 Publication History

Abstract

Viewport prediction plays a crucial role in live 360° video streaming as it determines which tiles should be prefetched in high quality, thereby significantly impacting the user experience. However, the current approach to viewport prediction, which integrates content-level visual features with the viewer's head movement trajectory, faces the challenge of striking a balance between prediction accuracy and computational complexity. In this paper, we propose LiveAE, a novel attention-based and edge-assisted viewport prediction framework for live 360° video streaming. Specifically, we employ a pre-trained video encoder called Vision Transformer (ViT) for general visual feature extraction and a cross-attention mechanism for user-specific interest tracking. To address the computational complexity issue, we offload the aforementioned content-level operations to an edge server while retaining trajectory-related functions on the client side. Extensive experiments show that our proposed method not only outperforms state-of-the-art algorithms but also ensures the real-time requirements of live 360° video streaming.

References

[1]
Yixuan Ban, Lan Xie, Zhimin Xu, Xinggong Zhang, Zongming Guo, and Yue Wang. 2018. Cub360: Exploiting cross-users behaviors for viewport prediction in 360 video adaptive streaming. In 2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.
[2]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[3]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[4]
Xianglong Feng, Zeyang Bao, and Sheng Wei. 2021. LiveObj: object semantics-based viewport prediction for live mobile virtual reality streaming. IEEE Transactions on Visualization and Computer Graphics 27, 5 (2021), 2736--2745.
[5]
Xianglong Feng, Weitian Li, and Sheng Wei. 2021. LiveROI: region of interest analysis for viewport prediction in live mobile virtual reality streaming. In Proceedings of the 12th ACM Multimedia Systems Conference. 132--145.
[6]
Xianglong Feng, Yao Liu, and Sheng Wei. 2020. LiveDeep: Online viewport prediction for live virtual reality streaming using lifelong deep learning. In 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE, 800--808.
[7]
Xianglong Feng, Viswanathan Swaminathan, and Sheng Wei. 2019. Viewport prediction for live 360-degree mobile video streaming using user-content hybrid motion tracking. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 2 (2019), 1--22.
[8]
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017).
[9]
Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and Mu Li. 2019. Bag of tricks for image classification with convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 558--567.
[10]
Apple Inc. 2023. Core ML. Retrieved June 19, 2023 from https://developer.apple.com/documentation/coreml
[11]
Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).
[12]
Huaishao Luo, Lei Ji, Ming Zhong, Yang Chen, Wen Lei, Nan Duan, and Tianrui Li. 2022. CLIP4Clip: An empirical study of CLIP for end to end video clip retrieval and captioning. Neurocomputing 508 (2022), 293--304.
[13]
Yixiang Mao, Liyang Sun, Yong Liu, and Yao Wang. 2020. Low-latency fov-adaptive coding and streaming for interactive 360° video streaming. In Proceedings of the 28th ACM International Conference on Multimedia. 3696--3704.
[14]
Roy Miles, Mehmet Kerim Yucel, Bruno Manganelli, and Albert Saà-Garriga. 2023. MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10480--10490.
[15]
Rafael Monroy, Sebastian Lutz, Tejo Chalasani, and Aljosa Smolic. 2018. Salnet360: Saliency maps for omni-directional images with cnn. Signal Processing: Image Communication 69 (2018), 26--34.
[16]
Afshin Taghavi Nasrabadi, Aliehsan Samiei, and Ravi Prakash. 2020. Viewport prediction for 360 videos: a clustering approach. In Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video. 34--39.
[17]
Stefano Petrangeli, Gwendal Simon, and Viswanathan Swaminathan. 2018. Trajectory-based viewport prediction for 360-degree virtual reality videos. In 2018 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR). IEEE, 157--160.
[18]
Feng Qian, Bo Han, Qingyang Xiao, and Vijay Gopalakrishnan. 2018. Flare: Practical viewport-adaptive 360-degree video streaming for mobile devices. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking. 99--114.
[19]
Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, and Alexey Dosovitskiy. 2021. Do vision transformers see like convolutional neural networks? Advances in Neural Information Processing Systems 34 (2021), 12116--12128.
[20]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[21]
Liyang Sun, Yixiang Mao, Tongyu Zong, Yong Liu, and Yao Wang. 2020. Flocking-based live streaming of 360-degree video. In Proceedings of the 11th ACM Multimedia Systems Conference. 26--37.
[22]
Liyang Sun, Yixiang Mao, Tongyu Zong, Yong Liu, and Yao Wang. 2022. Live 360 Degree Video Delivery based on User Collaboration in a Streaming Flock. IEEE Transactions on Multimedia (2022).
[23]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[24]
Shibo Wang, Shusen Yang, Hailiang Li, Xiaodan Zhang, Chen Zhou, Chenren Xu, Feng Qian, Nanbin Wang, and Zongben Xu. 2022. SalientVR: saliency-driven mobile 360-degree video streaming with gaze information. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking. 542--555.
[25]
Shuoqian Wang, Xiaoyang Zhang, Mengbai Xiao, Kenneth Chiu, and Yao Liu. 2020. Sphericrtc: A system for content-adaptive real-time 360-degree video communication. In Proceedings of the 28th ACM International Conference on Multimedia. 3595--3603.
[26]
Yu Emma Wang, Gu-Yeon Wei, and David Brooks. 2019. Benchmarking TPU, GPU, and CPU platforms for deep learning. arXiv preprint arXiv:1907.10701 (2019).
[27]
Lan Xie, Zhimin Xu, Yixuan Ban, Xinggong Zhang, and Zongming Guo. 2017. 360probdash: Improving qoe of 360 video streaming using tile-based http adaptive streaming. In Proceedings of the 25th ACM international conference on Multimedia. 315--323.
[28]
Lan Xie, Xinggong Zhang, and Zongming Guo. 2018. Cls: A cross-user learning based system for improving qoe in 360-degree video adaptive streaming. In Proceedings of the 26th ACM international conference on Multimedia. 564--572.
[29]
Tan Xu, Bo Han, and Feng Qian. 2019. Analyzing viewport prediction under different VR interactions. In Proceedings of the 15th International Conference on Emerging Networking Experiments And Technologies. 165--171.
[30]
Yanyu Xu, Yanbing Dong, Junru Wu, Zhengzhong Sun, Zhiru Shi, Jingyi Yu, and Shenghua Gao. 2018. Gaze prediction in dynamic 360 immersive videos. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5333--5342.
[31]
Lei Zhang, Weizhen Xu, Donghuan Lu, Laizhong Cui, and Jiangchuan Liu. 2022. MFVP: Mobile-Friendly Viewport Prediction for Live 360-Degree Video Streaming. In 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.
[32]
Qi Zhang, Jianchao Wei, Shanshe Wang, Siwei Ma, and Wen Gao. 2022. RealVR: Efficient, Economical, and Quality-of-Experience-Driven VR Video System Based on MPEG OMAF. IEEE Transactions on Multimedia (2022).
[33]
Yuanxing Zhang, Pengyu Zhao, Kaigui Bian, Yunxin Liu, Lingyang Song, and Xiaoming Li. 2019. DRL360: 360-degree video streaming with deep reinforcement learning. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 1252--1260.

Cited By

View all

Index Terms

  1. LiveAE: Attention-based and Edge-assisted Viewport Prediction for Live 360° Video Streaming

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    EMS '23: Proceedings of the 2023 Workshop on Emerging Multimedia Systems
    September 2023
    65 pages
    ISBN:9798400703034
    DOI:10.1145/3609395
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 September 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. viewport prediction
    2. 360° videos
    3. live video streaming

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    EMS '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 9 of 15 submissions, 60%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)97
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 27 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media