skip to main content
research-article

Sprite-from-Sprite: Cartoon Animation Decomposition with Self-supervised Sprite Estimation

Published: 30 November 2022 Publication History

Abstract

We present an approach to decompose cartoon animation videos into a set of "sprites" --- the basic units of digital cartoons that depict the contents and transforms of each animated object. The sprites in real-world cartoons are unique: artists may draw arbitrary sprite animations for expressiveness, where the animated content is often complicated, irregular, and challenging; alternatively, artists may also reduce their workload by tweening and adjusting sprites, or even reuse static sprites, in which case the transformations are relatively regular and simple. Based on these observations, we propose a sprite decomposition framework using Pixel Multilayer Perceptrons (Pixel MLPs) where the estimation of each sprite is conditioned on and guided by all other sprites. In this way, once those relatively regular and simple sprites are resolved, the decomposition of the remaining "challenging" sprites can simplified and eased with the guidance of other sprites. We call this method "sprite-from-sprite" cartoon decomposition. We study ablative architectures of our framework, and the user study demonstrates that our results are the most preferred ones in 19/20 cases.

References

[1]
Yagiz Aksoy, Tung Ozan Aydin, Aljosa Smolic, and Marc Pollefeys. 2017. Unmixingbased soft color segmentation for image manipulation. ACM Transactions on Graphics (2017).
[2]
Thomas Brox and Jitendra Malik. 2010. Object Segmentation by Long Term Analysis of Point Trajectories.
[3]
Huiwen Chang, Ohad Fried, Yiming Liu, Stephen DiVerdi, and Adam Finkelstein. 2015. Palette-based Photo Recoloring. ACM Transactions on Graphics (2015).
[4]
Achal Dave, Pavel Tokmakov, and Deva Ramanan. 2019. Towards Segmenting Anything That Moves. In ICCV Workshops.
[5]
Marek Dvorožňák, Wilmot Li, Vladimir G. Kim, and Daniel Sýkora. 2018. ToonSynth: Example-Based Synthesis of Hand-Colored Cartoon Animations. ACM Transactions on Graphics 37, 4, Article 167 (2018).
[6]
Thibault Groueix, Matthew Fisher, Vladimir G Kim, Bryan C Russell, and Mathieu Aubry. 2018. A papier-mâché approach to learning 3d surface generation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 216--224.
[7]
Ondřej Jamriška, Šárka Sochorová, Ondřej Texler, Michal Lukáč, Jakub Fišer, Jingwan Lu, Eli Shechtman, and Daniel Sýkora. 2019. Stylizing Video by Example. ACM Transactions on Graphics 38, 4, Article 107 (2019).
[8]
Yoni Kasten, Dolev Ofri, Oliver Wang, and Tali Dekel. 2021. Layered Neural Atlases for Consistent Video Editing.
[9]
Margret Keuper. 2017. Higher-Order Minimum Cost Lifted Multicuts for Motion Segmentation.
[10]
Margret Keuper, Bjoern Andres, and Thomas Brox. 2015. Motion Trajectory Segmentation via Minimum Cost Multicuts.
[11]
Yeong Jun Koh and Chang-Su Kim. 2017. Primary Object Segmentation in Videos Based on Region Augmentation and Reduction.
[12]
Dong Lao and Ganesh Sundaramoorthi. 2018. Extending layered models to 3d motion. In Proceedings of the European conference on computer vision (ECCV). 435--451.
[13]
Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, and James M Rehg. 2013. Video segmentation by tracking many figure-ground segments.
[14]
Zhengqi Li, Simon Niklaus, Noah Snavely, and Oliver Wang. 2020. Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes. arXiv preprint arXiv:2011.13084 (2020).
[15]
T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. 2017. Focal loss for dense object detection. ICCV (2017).
[16]
Xueting Liu, Xiangyu Mao, Xuan Yang, Linling Zhang, and Tien-Tsin Wong. 2013. Stereoscopizing Cel Animations. ACM Transactions on Graphics (SIGGRAPH Asia 2013 issue) 32, 6 (November 2013), 223:1--223:10.
[17]
Erika Lu, Forrester Cole, Tali Dekel, Weidi Xie, Andrew Zisserman, David Salesin, William T Freeman, and Michael Rubinstein. 2020. Layered Neural Rendering for Retiming People in Video. In SIGGRAPH Asia.
[18]
Erika Lu, Forrester Cole, Tali Dekel, Andrew Zisserman, William T Freeman, and Michael Rubinstein. 2021. Omnimatte: Associating Objects and Their Effects in Video. In CVPR.
[19]
Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and Andreas Geiger. 2019. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4460--4470.
[20]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision. Springer, 405--421.
[21]
Tom Monnier, Elliot Vincent, Jean Ponce, and Mathieu Aubry. 2021. Unsupervised Layered Image Decomposition into Object Prototypes. In ICCV.
[22]
R.M.H. Nguyen, S. Cohen B. Price, and M. S. Brown. 2017. GroupTheme Recoloring for Multi-Image Color Consistency. Computer Graphics Forum (2017).
[23]
Peter Ochs and Thomas Brox. 2011. Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions.
[24]
Peter Ochs, Jitendra Malik, and Thomas Brox. 2014a. Segmentation of moving objects by long term video analysis. 36, 6 (2014).
[25]
P. Ochs, J. Malik, and T. Brox. 2014b. Segmentation of moving objects by long term video analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 6 (Jun 2014), 1187--1200. Preprint.
[26]
Julian Ost, Fahim Mannan, Nils Thuerey, Julian Knodt, and Felix Heide. 2021. Neural Scene Graphs for Dynamic Scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2856--2865.
[27]
Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. 2019. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28]
F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung. 2016. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation.
[29]
Alex Rav-Acha, Pushmeet Kohli, Carsten Rother, and Andrew Fitzgibbon. 2008. Unwrap Mosaics: A New Representation for Video Editing . In SIGGRAPH '08 ACM SIGGRAPH 2008 papers.
[30]
Jianbo Shi and J. Malik. 1998. Motion segmentation and tracking using normalized cuts.
[31]
Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. 2020. Implicit neural representations with periodic activation functions. Advances in Neural Information Processing Systems 33 (2020).
[32]
Dmitriy Smirnov, Michael Gharbi, Matthew Fisher, Vitor Guizilini, Alexei A. Efros, and Justin Solomon. 2021. MarioNette: Self-Supervised Sprite Learning.
[33]
Daniel Sýkora, Mirela Ben-Chen, Martin Čadík, Brian Whited, and Maryann Simmons. 2011. TexToons: Practical Texture Mapping for Hand-drawn Cartoon Animations. In Proceedings of International Symposium on Non-photorealistic Animation and Rendering. 75--83.
[34]
Daniel Sýkora, Jan Buriánek, and Jiří Žára. 2005. Sketching Cartoons by Example. In Proceedings of Eurographics Workshop on Sketch-Based Interfaces and Modeling. 27--34.
[35]
Jianchao Tan, Jyh-Ming Lien, and Yotam Ginglod. 2017. Decomposing Images into Layers via RGB-space Geometry. ACM Transactions on Graphics (2017).
[36]
Jianchao Tan, Jyh-Ming Lien, and Yotam Ginglod. 2018. Efficient palette-based decomposition and recoloring of images via RGBXY-space geometry. ACM Transactions on Graphics (2018).
[37]
Matthew Tancik, Pratul P. Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T. Barron, and Ren Ng. 2020. Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains. NeurIPS (2020).
[38]
Zachary Teed and Jia Deng. 2020. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. arXiv:2003.12039 [cs.CV]
[39]
Lance Williams. 1983. Pyramidal Parametrics.
[40]
Christopher Xie, Yu Xiang, Zaid Harchaoui, and Dieter Fox. 2019. Object Discovery in Videos as Foreground Motion Clustering.
[41]
Jun Xing, Li-Yi Wei, Takaaki Shiratori, and Koji Yatani. 2015. Autocomplete Hand-Drawn Animations. ACM Trans. Graph. 34, 6 (2015).
[42]
Charig Yang, Hala Lamdouar, Erika Lu, Andrew Zisserman, and Weidi Xie. 2021b. Self-supervised Video Object Segmentation by Motion Grouping.
[43]
Yanchao Yang, Brian Lai, and Stefano Soatto. 2021a. DyStaB: Unsupervised Object Segmentation via Dynamic-Static Bootstrapping.
[44]
Yanchao Yang, Antonio Loquercio, Davide Scaramuzza, and Stefano Soatto. 2019. Un-supervised Moving Object Detection via Contextual Information Separation.
[45]
Vickie Ye, Zhengqi Li, Richard Tucker, Angjoo Kanazawa, and Noah Snavely. 2022. Deformable Sprites for Unsupervised Video Decomposition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46]
Hong-Xing Yu, Leonidas J. Guibas, and Jiajun Wu. 2022. Unsupervised Discovery of Object Radiance Fields.
[47]
Qing Zhang, Chunxia Xiao, Hanqiu Sun, and Feng Tang. 2017. Palette-Based Image Recoloring Using Color Decomposition Optimization. IEEE Transactions on Image Processing (2017).
[48]
Song-Hai Zhang, Tao Chen, Yi-Fei Zhang, Shi-Min Hu, and Ralph R. Martin. 2009. Vectorizing Cartoon Animations. TVCG (2009).
[49]
Haichao Zhu, Xueting Liu, Tien-Tsin Wong, and Pheng-Ann Heng. 2016. Globally Optimal Toon Tracking. ACM Transactions on Graphics 35, 4 (July 2016), 75:1--75:10.

Cited By

View all

Index Terms

  1. Sprite-from-Sprite: Cartoon Animation Decomposition with Self-supervised Sprite Estimation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 41, Issue 6
      December 2022
      1428 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3550454
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 November 2022
      Published in TOG Volume 41, Issue 6

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. animation
      2. cartoon
      3. implicit neural representations
      4. video decomposition

      Qualifiers

      • Research-article

      Funding Sources

      • RGC General Research Fund

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)113
      • Downloads (Last 6 weeks)13
      Reflects downloads up to 06 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media