skip to main content
10.1145/3458305.3463378acmconferencesArticle/Chapter ViewAbstractPublication PagesmmsysConference Proceedingsconference-collections
research-article
Public Access

LiveROI: region of interest analysis for viewport prediction in live mobile virtual reality streaming

Published: 15 July 2021 Publication History

Abstract

Virtual reality (VR) streaming can provide immersive video viewing experience to the end users but with huge bandwidth consumption. Recent research has adopted selective streaming to address the bandwidth challenge, which predicts and streams the user's viewport of interest with high quality and the other portions of the video with low quality. However, the existing viewport prediction mechanisms mainly target the video-on-demand (VOD) scenario relying on historical video and user trace data to build the prediction model. The community still lacks an effective viewport prediction approach to support live VR streaming, the most engaging and popular VR streaming experience. We develop a region of interest (ROI)-based viewport prediction approach, namely LiveROI, for live VR streaming. LiveROI employs an action recognition algorithm to analyze the video content and uses the analysis results as the basis of viewport prediction. To eliminate the need of historical video/user data, LiveROI employs adaptive user preference modeling and word embedding to dynamically select the video viewport at runtime based on the user head orientation. We evaluate LiveROI with 12 VR videos viewed by 48 users obtained from a public VR head movement dataset. The results show that LiveROI achieves high prediction accuracy and significant bandwidth savings with real-time processing to support live VR streaming.

References

[1]
2019. ABC News, VR Virtual reality news has opened the door to boundless possibilities allowing users to be anywhere we are at any time. https://abcnews.go.com/VR.
[2]
2020. DASH Industry Forum. http://dashif.org/.
[3]
Sung-Ho Bae and Munchurl Kim. 2015. A novel SSIM index for image quality assessment using a new luminance adaptation effect model in pixel intensity domain. In Visual Communications and Image Processing (VCIP). 1--4.
[4]
Yixuan Ban, Lan Xie, Zhimin Xu, Xinggong Zhang, Zongming Guo, and Yue Wang. 2018. Cub360: Exploiting cross-users behaviors for viewport prediction in 360 video adaptive streaming. In IEEE International Conference on Multimedia and Expo (ICME). 1--6.
[5]
Yanan Bao, Huasen Wu, Tianxiao Zhang, Albara Ah Ramli, and Xin Liu. 2016. Shooting a moving target: Motion-prediction-based transmission for 360-degree videos. In IEEE International Conference on Big Data. 1161--1170.
[6]
Ejder Bastug, MehdiBennis, Muriel Médard, and Mérouane Debbah. 2017. Toward interconnected virtual reality: Opportunities, challenges, and enablers. IEEE Communications Magazine 55, 6 (2017), 110--117.
[7]
Ali Borji, Ming-Ming Cheng, Qibin Hou, Huaizu Jiang, and Jia Li. 2014. Salient object detection: A survey. Computational Visual Media (2014), 1--34.
[8]
Zoya Bylinskii, Tilke Judd, Aude Oliva, Antonio Torralba, and Frédo Durand. 2018. What do different evaluation metrics tell us about saliency models? IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 3 (2018), 740--757.
[9]
CNET. 2016. Watch any YouTube video in VR mode. https://www.cnet.com/how-to/watch-any-youtube-video-in-vr-mode/.
[10]
Ching-Ling Fan, Jean Lee, Wen-Chih Lo, Chun-Ying Huang, Kuan-Ta Chen, and Cheng-Hsin Hsu. 2017. Fixation Prediction for 360° Video Streaming in Head-Mounted Virtual Reality. In ACM Workshop on Network and Operating Systems Support for Digital Audio and Video. 67--72.
[11]
Xianglong Feng, Yao Liu, and Sheng Wei. 2020. LiveDeep: Online Viewport Prediction for Live Virtual Reality Streaming Using Lifelong Deep Learning. In IEEE Conference on Virtual Reality and 3D User Interfaces (VR). 800--808.
[12]
Xianglong Feng, Viswanathan Swaminathan, and Sheng Wei. 2019. Viewport Prediction for Live 360-Degree Mobile Video Streaming Using User-Content Hybrid Motion Tracking. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 2 (2019), 43.
[13]
Stas Goferman, Lihi Zelnik-Manor, and Ayellet Tal. 2011. Context-aware saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 10 (2011), 1915--1926.
[14]
Google. 2013. word2vec. https://code.google.com/archive/p/word2vec/.
[15]
Google. 2018. Google Daydream vs. Gear VR: Which Mobile VR Headset is Better? https://www.howtogeek.com/350858/google-daydream-vs.-gear-vr-which-mobile-vr-headset-is-better/.
[16]
Hadi Hadizadeh and Ivan V Bajić. 2013. Saliency-aware video compression. IEEE Transactions on Image Processing 23, 1 (2013), 19--33.
[17]
Xun Huang, Chengyao Shen, Xavier Boix, and Qi Zhao. 2015. Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In IEEE International Conference on Computer Vision (ICCV). 262--270.
[18]
ISO/IEC. 2015. Information technology - High efficiency coding and media delivery in heterogeneous environments - Part 2: High efficiency video coding, ISO/IEC 230082:2015. Retrieved May 14, 2019 from https://www.iso.org/standard/67660.html
[19]
Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2012. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2012), 221--231.
[20]
Yangqing Jia. 2020. Caffe deep learning framework by BAIR. Retrieved Mar 15, 2020 from https://caffe.berkeleyvision.org/
[21]
Nan Jiang, Yao Liu, Tian Guo, Wenyao Xu, Viswanathan Swaminathan, Lisong Xu, and Sheng Wei. 2020. QuRate: Power-efficient mobile immersive video streaming. In ACM Multimedia Systems Conference (MMSys). 99--111.
[22]
Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, et al. 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017).
[23]
Asifullah Khan, Anabia Sohail, Umme Zahoora, and Aqsa Saeed Qureshi. 2020. A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review 53, 8 (2020), 5455--5516.
[24]
Aerin Kim. 2016. Phrase2Vec in Practice : How to calculate the semantic similarity between two phrases. https://bitbucket.org/yunazzang/aiwiththebest_byor/src/master/.
[25]
Evgeny Kuzyakov, Shannon Chen, and Renbin Peng. 2017. Enhancing high-resolution 360 streaming with view prediction. Facebook Inc.
[26]
Jean Le Feuvre and Cyril Concolato. 2016. Tiled-based adaptive streaming using MPEG-DASH. In ACM Multimedia Systems Conference (MMSys). 1--3.
[27]
Chenge Li, Weixi Zhang, Yong Liu, and Yao Wang. 2019. Very long term field of view prediction for 360-degree video streaming. In IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). 297--302.
[28]
Lucas Matney. 2017. CNN launches dedicated virtual reality journalism unit. https://techcrunch.com/2017/03/07/cnn-launches-dedicated-virtual-reality-journalism-unit/.
[29]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[30]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111--3119.
[31]
Anh Nguyen, Zhisheng Yan, and Klara Nahrstedt. 2018. Your attention is unique: Detecting 360-degree video saliency in head-mounted display for head movement prediction. In ACM International Conference on Multimedia (MM). 1190--1198.
[32]
Jounsup Park, Mingyuan Wu, Kuan-Ying Lee, Bo Chen, Klara Nahrstedt, Michael Zink, and Ramesh Sitaraman. 2020. SEAWARE: Semantic Aware View Prediction System for 360-degree Video Streaming. In IEEE International Symposium on Multimedia (ISM). 57--64.
[33]
Stefano Petrangeli, Gwendal Simon, and Viswanathan Swaminathan. 2018. Trajectory-Based Viewport Prediction for 360-Degree Virtual Reality Videos. In IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR). 157--160.
[34]
Mateusz Przepiorkowski. 2017. VR is leading us into the next generation of sports media. https://venturebeat.com/2018/11/16/vr-is-leading-us-into-the-next-generation-of-sports-media/.
[35]
Feng Qian, Bo Han, Qingyang Xiao, and Vijay Gopalakrishnan. 2018. Flare: Practical viewport-adaptive 360-degree video streaming for mobile devices. In International Conference on Mobile Computing and Networking (MobiCom). 99--114.
[36]
Feng Qian, Lusheng Ji, Bo Han, and Vijay Gopalakrishnan. 2016. Optimizing 360 Video Delivery Over Cellular Networks. In ACM Workshop on All Things Cellular: Operations, Applications and Challenges. 1--6.
[37]
Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. 2019. Generalized intersection over union: A metric and a loss for bounding box regression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 658--666.
[38]
Samsung. 2017. Samsung Gear VR (2017) review. https://www.cnet.com/reviews/samsung-gear-vr-2017-review/
[39]
Jonathan Shieber. 2018. National Geographic is working with YouTube and DayDream on its latest VR series. https://techcrunch.com/2018/12/11/national-geographic-is-working-with-youtube-and-daydream-on-its-latest-vr-series/.
[40]
Jangwoo Son, Dongmin Jang, and Eun-Seok Ryu. 2018. Implementing 360 video tiled streaming system. In ACM Multimedia Systems Conference (MMSys). 521--524.
[41]
THEVERGE. 2019. HTC announces a PC-powered VR headset called the Vive Cosmos. https://www.theverge.com/2019/1/7/18172740/htc-vive-cosmos-vr-headset-announce-ces-2019
[42]
Sheng Wei and Viswanathan Swaminathan. 2014. Low Latency Live Video Streaming over HTTP 2.0. In ACM Workshop on Network and Operating System Support on Digital Audio and Video (NOSSDAV). 37--42.
[43]
Chenglei Wu, Zhihao Tan, Zhi Wang, and Shiqiang Yang. 2017. A dataset for exploring user behaviors in VR spherical video streaming. In ACM on Multimedia Systems Conference (MMSys). 193--198.
[44]
Mai Xu, Yuhang Song, Jianyi Wang, MingLang Qiao, Liangyu Huo, and Zulin Wang. 2018. Predicting head movement in panoramic video: A deep reinforcement learning approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 11 (2018), 2693--2708.
[45]
Tan Xu, Bo Han, and Feng Qian. 2019. Analyzing viewport prediction under different VR interactions. In International Conference on Emerging Networking Experiments And Technologies (CoNEXT). 165--171.
[46]
Yanyu Xu, Yanbing Dong, Junru Wu, Zhengzhong Sun, Zhiru Shi, Jingyi Yu, and Shenghua Gao. 2018. Gaze prediction in dynamic 360 immersive videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5333--5342.
[47]
Zhimin Xu, Yixuan Ban, Kai Zhang, Lan Xie, Xinggong Zhang, Zongming Guo, Shengbin Meng, and Yue Wang. 2018. Tile-based qoe-driven http/2 streaming system for 360 video. In 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 1--4.
[48]
Z. Xu, X. Zhang, K. Zhang, and Z. Guo. 2018. Probabilistic Viewport Adaptive Streaming for 360-degree Videos. In IEEE International Symposium on Circuits and Systems (ISCAS). 1--5.
[49]
Jiang Yu and Yong Liu. 2019. Field-of-view prediction in 360-degree videos with attention-based neural encoder-decoder networks. In ACM Workshop on Immersive Mixed and Virtual Environment Systems (MMVE). 37--42.
[50]
Mohammadreza Zolfaghari, Kamaljeet Singh, and Thomas Brox. 2018. Eco: Efficient convolutional network for online video understanding. In European Conference on Computer Vision (ECCV). 695--712.
[51]
Mohammadreza Zolfaghari, Kamaljeet Singh, and Thomas Brox. 2018. ECO Github Repository. https://github.com/mzolfaghari/ECO-efficient-video-understanding.

Cited By

View all

Index Terms

  1. LiveROI: region of interest analysis for viewport prediction in live mobile virtual reality streaming

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MMSys '21: Proceedings of the 12th ACM Multimedia Systems Conference
    June 2021
    254 pages
    ISBN:9781450384346
    DOI:10.1145/3458305
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 July 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. live video streaming
    2. viewport prediction
    3. virtual reality

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MMSys '21
    Sponsor:
    MMSys '21: 12th ACM Multimedia Systems Conference
    September 28 - October 1, 2021
    Istanbul, Turkey

    Acceptance Rates

    MMSys '21 Paper Acceptance Rate 18 of 55 submissions, 33%;
    Overall Acceptance Rate 176 of 530 submissions, 33%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)220
    • Downloads (Last 6 weeks)34
    Reflects downloads up to 27 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media