research-article

Public Access

LiveROI: region of interest analysis for viewport prediction in live mobile virtual reality streaming

Authors:

Xianglong Feng,

Sheng WeiAuthors Info & Claims

MMSys '21: Proceedings of the 12th ACM Multimedia Systems Conference

Pages 132 - 145

https://doi.org/10.1145/3458305.3463378

Published: 15 July 2021 Publication History

Abstract

Virtual reality (VR) streaming can provide immersive video viewing experience to the end users but with huge bandwidth consumption. Recent research has adopted selective streaming to address the bandwidth challenge, which predicts and streams the user's viewport of interest with high quality and the other portions of the video with low quality. However, the existing viewport prediction mechanisms mainly target the video-on-demand (VOD) scenario relying on historical video and user trace data to build the prediction model. The community still lacks an effective viewport prediction approach to support live VR streaming, the most engaging and popular VR streaming experience. We develop a region of interest (ROI)-based viewport prediction approach, namely LiveROI, for live VR streaming. LiveROI employs an action recognition algorithm to analyze the video content and uses the analysis results as the basis of viewport prediction. To eliminate the need of historical video/user data, LiveROI employs adaptive user preference modeling and word embedding to dynamically select the video viewport at runtime based on the user head orientation. We evaluate LiveROI with 12 VR videos viewed by 48 users obtained from a public VR head movement dataset. The results show that LiveROI achieves high prediction accuracy and significant bandwidth savings with real-time processing to support live VR streaming.

References

[1]

2019. ABC News, VR Virtual reality news has opened the door to boundless possibilities allowing users to be anywhere we are at any time. https://abcnews.go.com/VR.

[2]

2020. DASH Industry Forum. http://dashif.org/.

[3]

Sung-Ho Bae and Munchurl Kim. 2015. A novel SSIM index for image quality assessment using a new luminance adaptation effect model in pixel intensity domain. In Visual Communications and Image Processing (VCIP). 1--4.

[4]

Yixuan Ban, Lan Xie, Zhimin Xu, Xinggong Zhang, Zongming Guo, and Yue Wang. 2018. Cub360: Exploiting cross-users behaviors for viewport prediction in 360 video adaptive streaming. In IEEE International Conference on Multimedia and Expo (ICME). 1--6.

[5]

Yanan Bao, Huasen Wu, Tianxiao Zhang, Albara Ah Ramli, and Xin Liu. 2016. Shooting a moving target: Motion-prediction-based transmission for 360-degree videos. In IEEE International Conference on Big Data. 1161--1170.

[6]

Ejder Bastug, MehdiBennis, Muriel Médard, and Mérouane Debbah. 2017. Toward interconnected virtual reality: Opportunities, challenges, and enablers. IEEE Communications Magazine 55, 6 (2017), 110--117.

Digital Library

[7]

Ali Borji, Ming-Ming Cheng, Qibin Hou, Huaizu Jiang, and Jia Li. 2014. Salient object detection: A survey. Computational Visual Media (2014), 1--34.

[8]

Zoya Bylinskii, Tilke Judd, Aude Oliva, Antonio Torralba, and Frédo Durand. 2018. What do different evaluation metrics tell us about saliency models? IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 3 (2018), 740--757.

Digital Library

[9]

CNET. 2016. Watch any YouTube video in VR mode. https://www.cnet.com/how-to/watch-any-youtube-video-in-vr-mode/.

[10]

Ching-Ling Fan, Jean Lee, Wen-Chih Lo, Chun-Ying Huang, Kuan-Ta Chen, and Cheng-Hsin Hsu. 2017. Fixation Prediction for 360° Video Streaming in Head-Mounted Virtual Reality. In ACM Workshop on Network and Operating Systems Support for Digital Audio and Video. 67--72.

Digital Library

[11]

Xianglong Feng, Yao Liu, and Sheng Wei. 2020. LiveDeep: Online Viewport Prediction for Live Virtual Reality Streaming Using Lifelong Deep Learning. In IEEE Conference on Virtual Reality and 3D User Interfaces (VR). 800--808.

[12]

Xianglong Feng, Viswanathan Swaminathan, and Sheng Wei. 2019. Viewport Prediction for Live 360-Degree Mobile Video Streaming Using User-Content Hybrid Motion Tracking. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 2 (2019), 43.

Digital Library

[13]

Stas Goferman, Lihi Zelnik-Manor, and Ayellet Tal. 2011. Context-aware saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 10 (2011), 1915--1926.

Digital Library

[14]

Google. 2013. word2vec. https://code.google.com/archive/p/word2vec/.

[15]

Google. 2018. Google Daydream vs. Gear VR: Which Mobile VR Headset is Better? https://www.howtogeek.com/350858/google-daydream-vs.-gear-vr-which-mobile-vr-headset-is-better/.

[16]

Hadi Hadizadeh and Ivan V Bajić. 2013. Saliency-aware video compression. IEEE Transactions on Image Processing 23, 1 (2013), 19--33.

Digital Library

[17]

Xun Huang, Chengyao Shen, Xavier Boix, and Qi Zhao. 2015. Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In IEEE International Conference on Computer Vision (ICCV). 262--270.

[18]

ISO/IEC. 2015. Information technology - High efficiency coding and media delivery in heterogeneous environments - Part 2: High efficiency video coding, ISO/IEC 230082:2015. Retrieved May 14, 2019 from https://www.iso.org/standard/67660.html

[19]

Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2012. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2012), 221--231.

Digital Library

[20]

Yangqing Jia. 2020. Caffe deep learning framework by BAIR. Retrieved Mar 15, 2020 from https://caffe.berkeleyvision.org/

[21]

Nan Jiang, Yao Liu, Tian Guo, Wenyao Xu, Viswanathan Swaminathan, Lisong Xu, and Sheng Wei. 2020. QuRate: Power-efficient mobile immersive video streaming. In ACM Multimedia Systems Conference (MMSys). 99--111.

Digital Library

[22]

Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, et al. 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017).

[23]

Asifullah Khan, Anabia Sohail, Umme Zahoora, and Aqsa Saeed Qureshi. 2020. A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review 53, 8 (2020), 5455--5516.

Digital Library

[24]

Aerin Kim. 2016. Phrase2Vec in Practice : How to calculate the semantic similarity between two phrases. https://bitbucket.org/yunazzang/aiwiththebest_byor/src/master/.

[25]

Evgeny Kuzyakov, Shannon Chen, and Renbin Peng. 2017. Enhancing high-resolution 360 streaming with view prediction. Facebook Inc.

[26]

Jean Le Feuvre and Cyril Concolato. 2016. Tiled-based adaptive streaming using MPEG-DASH. In ACM Multimedia Systems Conference (MMSys). 1--3.

Digital Library

[27]

Chenge Li, Weixi Zhang, Yong Liu, and Yao Wang. 2019. Very long term field of view prediction for 360-degree video streaming. In IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). 297--302.

[28]

Lucas Matney. 2017. CNN launches dedicated virtual reality journalism unit. https://techcrunch.com/2017/03/07/cnn-launches-dedicated-virtual-reality-journalism-unit/.

[29]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).

[30]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111--3119.

[31]

Anh Nguyen, Zhisheng Yan, and Klara Nahrstedt. 2018. Your attention is unique: Detecting 360-degree video saliency in head-mounted display for head movement prediction. In ACM International Conference on Multimedia (MM). 1190--1198.

Digital Library

[32]

Jounsup Park, Mingyuan Wu, Kuan-Ying Lee, Bo Chen, Klara Nahrstedt, Michael Zink, and Ramesh Sitaraman. 2020. SEAWARE: Semantic Aware View Prediction System for 360-degree Video Streaming. In IEEE International Symposium on Multimedia (ISM). 57--64.

[33]

Stefano Petrangeli, Gwendal Simon, and Viswanathan Swaminathan. 2018. Trajectory-Based Viewport Prediction for 360-Degree Virtual Reality Videos. In IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR). 157--160.

[34]

Mateusz Przepiorkowski. 2017. VR is leading us into the next generation of sports media. https://venturebeat.com/2018/11/16/vr-is-leading-us-into-the-next-generation-of-sports-media/.

[35]

Feng Qian, Bo Han, Qingyang Xiao, and Vijay Gopalakrishnan. 2018. Flare: Practical viewport-adaptive 360-degree video streaming for mobile devices. In International Conference on Mobile Computing and Networking (MobiCom). 99--114.

Digital Library

[36]

Feng Qian, Lusheng Ji, Bo Han, and Vijay Gopalakrishnan. 2016. Optimizing 360 Video Delivery Over Cellular Networks. In ACM Workshop on All Things Cellular: Operations, Applications and Challenges. 1--6.

Digital Library

[37]

Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. 2019. Generalized intersection over union: A metric and a loss for bounding box regression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 658--666.

[38]

Samsung. 2017. Samsung Gear VR (2017) review. https://www.cnet.com/reviews/samsung-gear-vr-2017-review/

[39]

Jonathan Shieber. 2018. National Geographic is working with YouTube and DayDream on its latest VR series. https://techcrunch.com/2018/12/11/national-geographic-is-working-with-youtube-and-daydream-on-its-latest-vr-series/.

[40]

Jangwoo Son, Dongmin Jang, and Eun-Seok Ryu. 2018. Implementing 360 video tiled streaming system. In ACM Multimedia Systems Conference (MMSys). 521--524.

Digital Library

[41]

THEVERGE. 2019. HTC announces a PC-powered VR headset called the Vive Cosmos. https://www.theverge.com/2019/1/7/18172740/htc-vive-cosmos-vr-headset-announce-ces-2019

[42]

Sheng Wei and Viswanathan Swaminathan. 2014. Low Latency Live Video Streaming over HTTP 2.0. In ACM Workshop on Network and Operating System Support on Digital Audio and Video (NOSSDAV). 37--42.

Digital Library

[43]

Chenglei Wu, Zhihao Tan, Zhi Wang, and Shiqiang Yang. 2017. A dataset for exploring user behaviors in VR spherical video streaming. In ACM on Multimedia Systems Conference (MMSys). 193--198.

Digital Library

[44]

Mai Xu, Yuhang Song, Jianyi Wang, MingLang Qiao, Liangyu Huo, and Zulin Wang. 2018. Predicting head movement in panoramic video: A deep reinforcement learning approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 11 (2018), 2693--2708.

[45]

Tan Xu, Bo Han, and Feng Qian. 2019. Analyzing viewport prediction under different VR interactions. In International Conference on Emerging Networking Experiments And Technologies (CoNEXT). 165--171.

Digital Library

[46]

Yanyu Xu, Yanbing Dong, Junru Wu, Zhengzhong Sun, Zhiru Shi, Jingyi Yu, and Shenghua Gao. 2018. Gaze prediction in dynamic 360 immersive videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5333--5342.

[47]

Zhimin Xu, Yixuan Ban, Kai Zhang, Lan Xie, Xinggong Zhang, Zongming Guo, Shengbin Meng, and Yue Wang. 2018. Tile-based qoe-driven http/2 streaming system for 360 video. In 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 1--4.

[48]

Z. Xu, X. Zhang, K. Zhang, and Z. Guo. 2018. Probabilistic Viewport Adaptive Streaming for 360-degree Videos. In IEEE International Symposium on Circuits and Systems (ISCAS). 1--5.

[49]

Jiang Yu and Yong Liu. 2019. Field-of-view prediction in 360-degree videos with attention-based neural encoder-decoder networks. In ACM Workshop on Immersive Mixed and Virtual Environment Systems (MMVE). 37--42.

Digital Library

[50]

Mohammadreza Zolfaghari, Kamaljeet Singh, and Thomas Brox. 2018. Eco: Efficient convolutional network for online video understanding. In European Conference on Computer Vision (ECCV). 695--712.

[51]

Mohammadreza Zolfaghari, Kamaljeet Singh, and Thomas Brox. 2018. ECO Github Repository. https://github.com/mzolfaghari/ECO-efficient-video-understanding.

Cited By

Xu WQiong LZongju PJunhui HHui YTiesong ZYi QKejun WWenyu LYou Y(2023)Research progress of six degree of freedom(6DoF) video technologyJournal of Image and Graphics10.11834/jig.23002528:6(1863-1890)Online publication date: 2023
https://doi.org/10.11834/jig.230025
Mahmoud MRizou SPanayides ALazaridis PKantartzis NKaragiannidis GZaharis Z(2023)A Review of Deep Learning Solutions in 360° Video Streaming2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)10.1109/MOCAST57943.2023.10176729(1-4)Online publication date: 28-Jun-2023
https://doi.org/10.1109/MOCAST57943.2023.10176729
Dong PShen RXie XLi YZuo YZhang L(2023)Predicting Long-Term Field of View in 360-Degree Video StreamingIEEE Network: The Magazine of Global Internetworking10.1109/MNET.106.210044937:3(26-33)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1109/MNET.106.2100449
Show More Cited By

Index Terms

LiveROI: region of interest analysis for viewport prediction in live mobile virtual reality streaming
1. Information systems
  1. Information systems applications
    1. Multimedia information systems
      1. Multimedia streaming

Recommendations

Tiled streaming for layered 3D virtual reality videos with viewport prediction
Abstract
In recent years, the demand of 3D video services has gradually increased. More and more bandwidth hungry applications are proposed, such as immersive media services which need a virtual reality (VR) headset and 3D VR videos to provide users ...
An HTTP/2-Based Adaptive Streaming Framework for 360° Virtual Reality Videos
MM '17: Proceedings of the 25th ACM international conference on Multimedia

Virtual Reality (VR) devices are becoming accessible to a large public, which is going to increase the demand for 360° VR videos. VR videos are often characterized by a poor quality of experience, due to the high bandwidth required to stream the 360° ...
Exploring New York in 8K: an adaptive tile-based virtual reality video streaming experience
MMSys '19: Proceedings of the 10th ACM Multimedia Systems Conference

Adapting and tiling the streaming of virtual reality (VR) video content has the potential to reduce the ultra-high bandwidth requirements of this type of multimedia services. Towards that goal, the optimization of a number of aspects is currently ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMSys '21: Proceedings of the 12th ACM Multimedia Systems Conference

June 2021

254 pages

ISBN:9781450384346

DOI:10.1145/3458305

General Chairs:
Özgü Alay
University of Oslo and Simula Metropolitan
,
Cheng-Hsin Hsu
National Tsing Hua University
,
Ali C. Begen
Ozyegin University and Networked Media

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

MMSys '21

Sponsor:

SIGMM

MMSys '21: 12th ACM Multimedia Systems Conference

September 28 - October 1, 2021

Istanbul, Turkey

Acceptance Rates

MMSys '21 Paper Acceptance Rate 18 of 55 submissions, 33%;

Overall Acceptance Rate 176 of 530 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
700
Total Downloads

Downloads (Last 12 months)220
Downloads (Last 6 weeks)34

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xu WQiong LZongju PJunhui HHui YTiesong ZYi QKejun WWenyu LYou Y(2023)Research progress of six degree of freedom(6DoF) video technologyJournal of Image and Graphics10.11834/jig.23002528:6(1863-1890)Online publication date: 2023
https://doi.org/10.11834/jig.230025
Mahmoud MRizou SPanayides ALazaridis PKantartzis NKaragiannidis GZaharis Z(2023)A Review of Deep Learning Solutions in 360° Video Streaming2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)10.1109/MOCAST57943.2023.10176729(1-4)Online publication date: 28-Jun-2023
https://doi.org/10.1109/MOCAST57943.2023.10176729
Dong PShen RXie XLi YZuo YZhang L(2023)Predicting Long-Term Field of View in 360-Degree Video StreamingIEEE Network: The Magazine of Global Internetworking10.1109/MNET.106.210044937:3(26-33)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1109/MNET.106.2100449
Wang ZHe XZhang ZZhang YCao ZCheng WWang WCui Y(2023)Edge-Assisted Real-Time Video Analytics With Spatial–Temporal Redundancy SuppressionIEEE Internet of Things Journal10.1109/JIOT.2022.322475010:7(6324-6335)Online publication date: 1-Apr-2023
https://doi.org/10.1109/JIOT.2022.3224750
Guimard QSassatelli LMarchetti FBecattini FSeidenari LBimbo AMurray NSimon GFarias MViola IMontagud M(2022)Deep variational learning for multiple trajectory prediction of 360° head movementsProceedings of the 13th ACM Multimedia Systems Conference10.1145/3524273.3528176(12-26)Online publication date: 14-Jun-2022
https://dl.acm.org/doi/10.1145/3524273.3528176
Nyamtiga BHermawan ALuckyarno YKim TJung DKwak JYun J(2022)Edge-Computing-Assisted Virtual Reality Computation Offloading: An Empirical StudyIEEE Access10.1109/ACCESS.2022.320512010(95892-95907)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3205120

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten