skip to main content
10.1145/3329714.3338134acmconferencesArticle/Chapter ViewAbstractPublication Pagesweb3dConference Proceedingsconference-collections
research-article

Enhancing the AR Experience with Machine Learning Services

Published: 26 July 2019 Publication History

Abstract

In this paper, we present and evaluate a web service that offers cloud-based machine learning services to improve Augmented Reality applications on mobile and web clients with special regards to tracking quality and registration of complex scenes that require an application-specific coordinate frame. Specifically, our service aims at reducing camera drift that still occurs in modern AR frameworks as well as helps with the initial camera alignment in a known scene by estimating the absolute camera pose using a configurable context-based image segmentation in combination with an adaptive image classification. We demonstrate real-world applications that utilize our web service and evaluate the performance and accuracy of the underlying image segmentation and the camera pose estimation. We also discuss the initial configuration along with the semi-automatic process of generating training data, and the training of the machine learning models for the corresponding tasks.

References

[1]
2019. Apple ARKit. https://developer.apple.com/arkit/.
[2]
2019. Google ARCore. https://developers.google.com/ar/.
[3]
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence 39, 12(2017), 2481–2495.
[4]
Igor Barros Barbosa, Marco Cristani, Barbara Caputo, Aleksander Rognhaugen, and Theoharis Theoharis. 2017. Looking beyond appearances: Synthetic training data for deep CNNs in re-identification. Computer Vision and Image Understanding(2017).
[5]
Guoliang Chen, Xiaolin Meng, Yunjia Wang, Yanzhe Zhang, Peng Tian, and Huachao Yang. 2015. Integrated WiFi/PDR/Smartphone using an unscented kalman filter algorithm for 3D indoor localization. Sensors 15, 9 (2015), 24595–24614.
[6]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In ECCV.
[7]
Estefania Munoz Diaz. 2015. Inertial pocket navigation system: Unaided 3D positioning. Sensors 15, 4 (2015), 9156–9178.
[8]
Andreas Dietze, Marcel Klomann, Yvonne Jung, Michael Englert, Sebastian Rieger, Achim Rehberger, Silvan Hau, and Paul Grimm. 2017. SMULGRAS: A Platform for Smart Multicodal Graphics Search. In Proceedings Web3D ’17. ACM, New York, USA, 17:1–17:9.
[9]
Michael Englert, Marcel Klomann, Paul Grimm, and Yvonne Jung. 2018. Methoden zur Realisierung und Verbesserung von Indoor-Lokalisierung in AR-Anwendungen. In 15. Workshop der GI-Fachgruppe VR/AR. Shaker, 7–18.
[10]
Georg Gerstweiler, Emanuel Vonach, and Hannes Kaufmann. 2015. Hymotrack: A mobile AR navigation system for complex indoor environments. Sensors 16, 1 (2015), 17.
[11]
Edmundo Guerra, Rodrigo Munguia, Yolanda Bolea, and Antoni Grau. 2016. Human collaborative localization and mapping in indoor environments with non-continuous stereo. Sensors 16, 3 (2016), 275.
[12]
Bert M Haralick, Chung-Nan Lee, Karsten Ottenberg, and Michael Nölle. 1994. Review and analysis of solutions of the three point perspective pose estimation problem. International journal of computer vision 13, 3 (1994), 331–356.
[13]
Tadanobu Inoue, Subhajit Chaudhury, Giovanni De Magistris, and Sakyasingha Dasgupta. 2017. Transfer learning from synthetic to real images using variational autoencoders for robotic applications. arXiv preprint arXiv:1709.06762(2017).
[14]
Alex Kendall and Roberto Cipolla. 2015. Modelling Uncertainty in Deep Learning for Camera Relocalization. CoRR abs/1509.05909(2015). http://arxiv.org/abs/1509.05909
[15]
Alex Kendall, Matthew Grimes, and Roberto Cipolla. 2015. Convolutional networks for real-time 6-DOF camera relocalization. CoRR abs/1505.07427(2015). http://arxiv.org/abs/1505.07427
[16]
Kourosh Khoshelham and Sisi Zlatanova. 2016. Sensors for Indoor Mapping and Navigation.
[17]
Marcel Klomann, Michael Englert, Kai Weber, Paul Grimm, and Yvonne Jung. 2018. Improving mobile MR applications using a cloud-based image segmentation approach with synthetic training data. In Proceedings of the 23rd International Conference on 3D Web Technology(Web3D ’18). ACM, 4:1–4:7.
[18]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105.
[19]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440.
[20]
E. Marchand, H. Uchiyama, and F. Spindler. 2016. Pose Estimation for Augmented Reality: A Hands-On Survey. IEEE Transactions on Visualization and Computer Graphics 22, 12(2016), 2633–2651.
[21]
Matt Miesnieks. 2017. Why is ARKit better than the alternatives?(2017). https://medium.com/super-ventures-blog/why-is-arkit-better-than-the-alternatives-af8871889d6a
[22]
OpenCV Team. 2019. Open Source Computer Vision Library. http://www.opencv.org/.
[23]
Benjamin Planche, Ziyan Wu, Kai Ma, Shanhui Sun, Stefan Kluckner, Terrence Chen, Andreas Hutter, Sergey Zakharov, Harald Kosch, and Jan Ernst. 2017. Depthsynth: Real-time realistic synthetic data generation from cad models for 2.5 d recognition. arXiv preprint arXiv:1702.08558(2017).
[24]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556(2014).
[25]
Baochen Sun and Kate Saenko. 2014. From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains. In BMVC, Vol. 1. 3.
[26]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, 2015. Going deeper with convolutions. CVPR.
[27]
J. Wu, L. Ma, and X. Hu. 2017. Delving deeper into convolutional neural networks for camera relocalization. In 2017 IEEE International Conference on Robotics and Automation (ICRA). 5644–5651.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
Web3D '19: Proceedings of the 24th International Conference on 3D Web Technology
July 2019
131 pages
ISBN:9781450367981
DOI:10.1145/3329714
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2019

Permissions

Request permissions for this article.

Check for updates

Badges

  • Best Paper

Author Tags

  1. AR Authoring
  2. Computer Vision
  3. Image Segmentation
  4. Machine Learning
  5. Mobile Mixed Reality
  6. Tracking
  7. Training Data Generation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

Web3D '19

Acceptance Rates

Overall Acceptance Rate 27 of 71 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 29 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media