skip to main content
10.1145/3583780.3615202acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

FCT-GAN: Enhancing Global Correlation of Table Synthesis via Fourier Transform

Published: 21 October 2023 Publication History

Abstract

An alternative method for sharing knowledge while complying with strict data access regulations, such as the European General Data Protection Regulation (GDPR), is the emergence of synthetic tabular data. Mainstream table synthesizers utilize methodologies derived from Generative Adversarial Networks (GAN). Although several state-of-the-art (SOTA) tabular GAN algorithms inherit Convolutional Neural Network (CNN)-based architectures, which have proven effective for images, they tend to overlook two critical properties of tabular data: (i) the global correlation across columns, and (ii) the semantic invariance to the column order. Permuting columns in a table does not alter the semantic meaning of the data, but features extracted by CNNs can change significantly due to their limited convolution filter kernel size. To address the above problems, we propose FCT-GAN the first conditional tabular GAN to adopt Fourier networks into table synthesis. FCT-GAN enhances permutation invariant GAN training by strengthening the learning of global correlations via Fourier layers. Extensive evaluation on benchmarks and real-world datasets show that FCT-GAN can synthesize tabular data with better (up to 27.8%) machine learning utility (i.e. a proxy of global correlations) and higher (up to 26.5%) statistical similarity to real data. FCT-GAN also has the least variation on synthetic data quality among 7 SOTA baselines on 3 different training-data column orders.

References

[1]
Samuel A. Assefa, Danial Dervovic, Mahmoud Mahfouz, Robert E. Tillman, Prashant Reddy, and Manuela Veloso. 2020. Generating Synthetic Data in Finance: Opportunities, Challenges and Pitfalls. In ICAIF (New York, New York). New York, NY, Article 44, 8 pages. https://doi.org/10.1145/3383455.3422554
[2]
Lu Chi, Borui Jiang, and Yadong Mu. 2020. Fast fourier convolution. NeurIPS, Vol. 33 (2020), 4479--4488.
[3]
Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F Stewart, and Jimeng Sun. 2017. Generating multi-label discrete patient records using generative adversarial networks. arXiv preprint arXiv:1703.06490 (2017).
[4]
Chris Donahue, Julian McAuley, and Miller Puckette. 2018. Adversarial audio synthesis. arXiv preprint arXiv:1802.04208 (2018).
[5]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. NeurIPS, Vol. 27 (2014).
[6]
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. 2017. Improved Training of Wasserstein GANs. In NeurIPS (Long Beach, California, USA). 5769--5779.
[7]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, Vol. 33 (2020), 6840--6851.
[8]
R. Keys. 1981. Cubic convolution interpolation for digital image processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 29, 6 (1981), 1153--1160. https://doi.org/10.1109/TASSP.1981.1163711
[9]
Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko. 2022. TabDDPM: Modelling Tabular Data with Diffusion Models. arXiv preprint arXiv:2209.15421 (2022).
[10]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (Lake Tahoe, Nevada) (NIPS'12). Red Hook, NY, USA, 1097--1105.
[11]
Jaehoon Lee, Jihyeon Hyeong, Jinsung Jeon, Noseong Park, and Jihoon Cho. 2021. Invertible Tabular GANs: Killing Two Birds with One Stone for Tabular Data Synthesis. NeurIPS, Vol. 34 (2021), 4263--4273.
[12]
James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, and Santiago Ontanon. 2021. Fnet: Mixing tokens with fourier transforms. arXiv preprint arXiv:2105.03824 (2021).
[13]
Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. 2020. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895 (2020).
[14]
J. Lin. 1991. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, Vol. 37, 1 (1991), 145--151. https://doi.org/10.1109/18.61115
[15]
Michael Mathieu, Mikael Henaff, and Yann LeCun. 2013. Fast training of convolutional networks through ffts. arXiv preprint arXiv:1312.5851 (2013).
[16]
Noseong Park, Mahmoud Mohammadi, Kshitij Gorde, Sushil Jajodia, Hongkyu Park, and Youngmin Kim. 2018. Data Synthesis Based on Generative Adversarial Networks. Proc. VLDB Endow., Vol. 11, 10 (2018), 1071--1083.
[17]
Aaditya Ramdas, Nicolás García Trillos, and Marco Cuturi. 2017. On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests. Entropy, Vol. 19, 2 (2017). https://doi.org/10.3390/e19020047
[18]
Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, and Jie Zhou. 2021. Global filter networks for image classification. NeurIPS, Vol. 34 (2021), 980--993.
[19]
Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. 2020. Implicit neural representations with periodic activation functions. NeurIPS, Vol. 33 (2020), 7462--7473.
[20]
G.K. Wallace. 1992. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics, Vol. 38, 1 (1992), xviii--xxxiv. https://doi.org/10.1109/30.125072
[21]
Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. 2019. Modeling Tabular data using Conditional GAN. In NeurIPS, Vol. 32. Curran Associates, Inc., 7335--7345. https://proceedings.neurips.cc/paper/2019/file/254ed7d2de3b23ab10936522dd547b78-Paper.pdf
[22]
Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2017. Seqgan: Sequence generative adversarial nets with policy gradient. In Proceedings of the AAAI conference on artificial intelligence, Vol. 31.
[23]
Zilong Zhao, Aditya Kunar, Robert Birke, and Lydia Y. Chen. 2021. CTAB-GAN: Effective Table Data Synthesizing. In Proceedings of The 13th Asian Conference on Machine Learning, Vol. 157. 97--112. https://proceedings.mlr.press/v157/zhao21a.html
[24]
Zilong Zhao, Aditya Kunar, Robert Birke, and Lydia Y. Chen. 2022. CTAB-GAN: Enhancing Tabular Data Synthesis. arXiv preprint arXiv:2204.00401 (2022).
[25]
Yujin Zhu, Zilong Zhao, Robert Birke, and Lydia Y. Chen. 2022. Permutation-Invariant Tabular Data Synthesis. arXiv preprint arXiv:2211.09286 (2022).

Cited By

View all

Index Terms

  1. FCT-GAN: Enhancing Global Correlation of Table Synthesis via Fourier Transform

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
    October 2023
    5508 pages
    ISBN:9798400701245
    DOI:10.1145/3583780
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. fourier transform
    2. gan
    3. tabular data

    Qualifiers

    • Short-paper

    Funding Sources

    • European Union

    Conference

    CIKM '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 141
      Total Downloads
    • Downloads (Last 12 months)106
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 21 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media