Performance Enhancement Leveraging Mask-RCNN on Bengali Document Layout Analysis

Datta, Shrestha; Mollah, Md Adith; Fairooz, Raisa; Fahim, Tariful Islam

Computer Science > Computer Vision and Pattern Recognition

arXiv:2308.10511 (cs)

[Submitted on 21 Aug 2023 (v1), last revised 22 Aug 2023 (this version, v2)]

Title:Performance Enhancement Leveraging Mask-RCNN on Bengali Document Layout Analysis

Authors:Shrestha Datta, Md Adith Mollah, Raisa Fairooz, Tariful Islam Fahim

View PDF

Abstract:Understanding digital documents is like solving a puzzle, especially historical ones. Document Layout Analysis (DLA) helps with this puzzle by dividing documents into sections like paragraphs, images, and tables. This is crucial for machines to read and understand these documents. In the DL Sprint 2.0 competition, we worked on understanding Bangla documents. We used a dataset called BaDLAD with lots of examples. We trained a special model called Mask R-CNN to help with this understanding. We made this model better by step-by-step hyperparameter tuning, and we achieved a good dice score of 0.889. However, not everything went perfectly. We tried using a model trained for English documents, but it didn't fit well with Bangla. This showed us that each language has its own challenges. Our solution for the DL Sprint 2.0 is publicly available at this https URL along with notebooks, weights, and inference notebook.

Comments:	Contest paper, Conest: DL sprint 2.0 (Link: this https URL), Solution link: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2308.10511 [cs.CV]
	(or arXiv:2308.10511v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2308.10511

Submission history

From: Shrestha Datta [view email]
[v1] Mon, 21 Aug 2023 06:51:58 UTC (77 KB)
[v2] Tue, 22 Aug 2023 14:08:20 UTC (77 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Performance Enhancement Leveraging Mask-RCNN on Bengali Document Layout Analysis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Performance Enhancement Leveraging Mask-RCNN on Bengali Document Layout Analysis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators