skip to main content
10.1145/3442442.3451378acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

FinSBD-2021: The 3rd Shared Task on Structure Boundary Detection in Unstructured Text in the Financial Domain

Published: 03 June 2021 Publication History

Abstract

Document processing is a foundational pre-processing task in natural language application applied in the financial domain. In this paper, we present the result of FinSBD-3, the 3rd shared task on Structure Boundary Detection in unstructured text in the financial domain. The shared task is organized as part of the 1st Workshop on Financial Technology on the Web. Participants were asked to create system detecting the boundaries of elements in unstructured text extracted from financial PDF. This edition extends the previous shared tasks by adding boundaries of visual elements such as tables, figures, page headers and page footers; on top of sentences, lists and list items which were already present in previous edition of the shared tasks.

References

[1]
2017. Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR 1, 1, 254–261.
[2]
2020. DeepPaperComposer: A Simple Solution for Training Data Preparation for Parsing Research Papers. 91–96. https://www.aclweb.org/anthology/2020.sdp-1.10
[3]
Abderrahim Ait Azzi, Houda Bouamor, and Sira Ferradans. 2019. The finsbd-2019 shared task: Sentence boundary detection in pdf noisy text in the financial domain. In Proceedings of the First Workshop on Financial Technology and Natural Language Processing. 74–80.
[4]
Cordula Guder Sebastian Brarda-Steffen Bickel Johannes Höhne Jean Baptiste Faddoul Anoop R Katti, Christian Reisswig. 2018. Chargrid: Towards Understanding 2D Documents. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 4459–4469. https://doi.org/10.18653/v1/D18-1476
[5]
Azka Gilani, Shah Rukh Qasim, Imran Malik, and Faisal Shafait. 2017. Table Detection Using Deep Learning. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 771–776. https://doi.org/10.1109/ICDAR.2017.131
[6]
L. Hao, L. Gao, X. Yi, and Z. Tang. 2016. A Table Detection Method for PDF Documents Based on Convolutional Neural Networks. 12th IAPR Workshop on Document Analysis Systems (DAS) IEEE Computer Society, 287–292.
[7]
Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. (2017).
[8]
Glenn Jocher. [n.d.]. ([n. d.]). https://doi.org/10.5281/zenodo.4154370
[9]
Tien Dung Le. 2021. taka at the FinSBD-3 task: Tables and Figures Extraction using Object Detection Techniques. In Proceedings of the 1st Workshop on Financial Technology on the Web.
[10]
Ryoya Yamada Manabu Ohta. 2019. A Cell-detection-based Table-structure Recognition Method. In Proceedings of the ACM Symposium on Document Engineering 2019.
[11]
Ermelinda Oro Max Göbel, Tamir Hassan. 2013. ICDAR 2013 Table Competition. In Proceedings of the 2013 12th International Conference on Document Analysis and. 1449–1453. https://doi.org/10.1109/ICDAR.2013.292
[12]
Ivo Wolf Andreas Dengel Sebastian Schreiber, Stefan Agne and Sheraz Ahmed. 2017. DeepDeSRT: Deep learning for detection and structure recognition of tables in document images. In Proceedings of the 14th International Conference on Document Analysis and Recognition (ICDAR). 1162–1167.
[13]
Ke Tian and Hua Chen. 2021. aiai at the FinSBD3-shared task: Structure Boundary Detection of Noisy Financial Texts in English and French Using Data Augmentation and Hybrid Deep Learning model. In Proceedings of the 1st Workshop on Financial Technology on the Web.
[14]
Abderrahim Ait Azzi Dialekti Valsamou-Stanislawski Willy AU, Bianca Chong. 2020. FinSBD-2020: The 2nd Shared Task on Sentence Boundary Detection in Unstructured Text in the Financial Domain. In Proceedings of the Second Workshop on Financial Technology and Natural Language Processing. 47–54. https://www.aclweb.org/anthology/2020.finnlp-1.8
[15]
Antonio Jimeno Yepes Xu Zhong, Jianbin Tang. 2019. PubLayNet: Largest Dataset Ever for Document Layout Analysis. In Proceedings of the 15th International Conference on Document Analysis and Recognition (ICDAR). 1015–1022.
[16]
Xiao Yang, Ersin Yumer, Paul Asente, Mike Kraley, Daniel Kifer, and C. Lee Giles. 2017. Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 4342–4351. https://doi.org/10.1109/CVPR.2017.462

Cited By

View all
  • (2023)Intelligent Document Processing in End-to-End RPA Contexts: A Systematic Literature ReviewConfluence of Artificial Intelligence and Robotic Process Automation10.1007/978-981-19-8296-5_5(95-131)Online publication date: 14-Mar-2023
  • (2023)InFi-BERT 1.0: Transformer-Based Language Model for Indian Financial Volatility PredictionMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-23633-4_10(128-138)Online publication date: 31-Jan-2023
  1. FinSBD-2021: The 3rd Shared Task on Structure Boundary Detection in Unstructured Text in the Financial Domain

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '21: Companion Proceedings of the Web Conference 2021
    April 2021
    726 pages
    ISBN:9781450383134
    DOI:10.1145/3442442
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 June 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. computer vision
    2. document parsing
    3. document segmentation
    4. natural language processing
    5. sentence segmentation
    6. shared task

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    WWW '21
    Sponsor:
    WWW '21: The Web Conference 2021
    April 19 - 23, 2021
    Ljubljana, Slovenia

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 14 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Intelligent Document Processing in End-to-End RPA Contexts: A Systematic Literature ReviewConfluence of Artificial Intelligence and Robotic Process Automation10.1007/978-981-19-8296-5_5(95-131)Online publication date: 14-Mar-2023
    • (2023)InFi-BERT 1.0: Transformer-Based Language Model for Indian Financial Volatility PredictionMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-23633-4_10(128-138)Online publication date: 31-Jan-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media