Proceedings of the ACM Symposium on Document Engineering 2024

Document capture applications on smartphones have emerged as popular tools for digitizing documents. For many individuals, capturing documents with their smartphones is more convenient than using dedicated photocopiers or scanners, even if the quality of ...

- 0
- 26
Metrics
Total Citations0
Total Downloads26
Last 12 Months26
Last 6 weeks4

Abstract
Get Access

short-paper

Which is the most suitable scanner resolution for documents? Detailing the answer given to the question raised by Professor George Nagy

Rafael Dueire Lins,
Daniela Raposo Nunes de Mello,
Raimundo Correa de Oliveira

Article No.: 2, Pages 1–4https://doi.org/10.1145/3685650.3685672

Defining the correct image resolution is a fundamental issue to preserve all the information in a document, keeping the minimum image acquisition and processing times, as well as the storage space and computer bandwidth for network transmission, allowing ...

- 0
- 14
Metrics
Total Citations0
Total Downloads14
Last 12 Months14
Last 6 weeks2

Abstract
Get Access

research-article

Open Access

Best Paper

ZigZag: A Robust Adaptive Approach to Non-Uniformly Illuminated Document Image Binarization

Jean-Luc Bloechle,
Jean Hennebert,
Christophe Gisler

Article No.: 3, Pages 1–10https://doi.org/10.1145/3685650.3685661

In the era of mobile imaging, the quality of document photos captured by smartphones often suffers due to adverse lighting conditions. Traditional document analysis and optical character recognition systems encounter difficulties with images that have ...

- 1
- 166
Metrics
Total Citations1
Total Downloads166
Last 12 Months166
Last 6 weeks55

Abstract
View online with eReader
PDF

research-article

Texture-based Document Binarization

Rodrigo Bernardino,
Rafael Dueire Lins,
Ricardo Barboza

Article No.: 4, Pages 1–10https://doi.org/10.1145/3685650.3685663

Image binarization, the conversion of a color image into its monochromatic version, plays a key role in many document processing pipelines. The technical literature presents over a hundred different algorithms for document image binarization yielding ...

- 1
- 27
Metrics
Total Citations1
Total Downloads27
Last 12 Months27
Last 6 weeks2

Abstract
Get Access

research-article

Open Access

A Heuristic Algorithm for Mathematical Markup Encoding Based on the Relative Positions of Characters

Chun-Min Lin,
Jason Lin,
Shin-Hung Lin,
Jo-Kai Liao

Article No.: 5, Pages 1–10https://doi.org/10.1145/3685650.3685659

Mathematical expressions (MEs) are the most crucial technical content in scientific documents, yet their presentations are not easy to describe. However, LaTeX, one of the primary markup languages in mathematics, enables MEs to be easily understood by ...

- 0
- 81
Metrics
Total Citations0
Total Downloads81
Last 12 Months81
Last 6 weeks23

Abstract
View online with eReader
PDF

research-article

Open Access

Best Student Paper

Graph Detective: A User Interface for Intuitive Graph Exploration Through Visualized Queries

Dominik Opitz,
Andreas Hamm,
Roxanne El Baff,
Jasper Korte,
Tobias Hecking

Article No.: 6, Pages 1–9https://doi.org/10.1145/3685650.3685660

Graph databases are used across several domains due to the intuitive structure of graphs. They are well-suited for storing document collections together with their interlinkages through metadata and annotations. Yet, querying such graphs requires ...

- 0
- 97
Metrics
Total Citations0
Total Downloads97
Last 12 Months97
Last 6 weeks24

Abstract
View online with eReader
PDF

research-article

CatalogBank: A Structured and Interoperable Catalog Dataset with a Semi-Automatic Annotation Tool (DocumentLabeler) for Engineering System Design

Hasan Sinan Bank,
Daniel R. Herber

Article No.: 7, Pages 1–9https://doi.org/10.1145/3685650.3685665

In the realm of document engineering and Natural Language Processing (NLP), the integration of digitally born catalogs into product design processes presents a novel avenue for enhancing information extraction and interoperability. This paper introduces ...

- 0
- 25
Metrics
Total Citations0
Total Downloads25
Last 12 Months25
Last 6 weeks3

Abstract
Get Access

short-paper

Open Access

TopicTag: Automatic Annotation of NMF Topic Models Using Chain of Thought and Prompt Tuning with LLMs

Selma Wanna,
Nicholas Solovyev,
Ryan Barron,
Maksim E. Eren,
Manish Bhattarai,
Kim Ø. Rasmussen,
Boian S. Alexandrov

Article No.: 8, Pages 1–4https://doi.org/10.1145/3685650.3685667

Topic modeling is a technique for organizing and extracting themes from large collections of unstructured text. Non-negative matrix factorization (NMF) is a common unsupervised approach that decomposes a term frequency-inverse document frequency (TF-IDF) ...

- 0
- 149
Metrics
Total Citations0
Total Downloads149
Last 12 Months149
Last 6 weeks51

Abstract
View online with eReader
PDF

short-paper

Open Access

Post-OCR Correction with OpenAI's GPT Models on Challenging English Prosody Texts

James Zhang,
Wouter Haverals,
Mary Naydan,
Brian W. Kernighan

Article No.: 9, Pages 1–4https://doi.org/10.1145/3685650.3685669

The digitization of historical documents faces challenges with the accuracy of Optical Character Recognition (OCR). Noting the success of large language models (LLMs) on many text-based tasks, this paper explores the potential of OpenAI's GPT models (3.5-...

- 0
- 316
Metrics
Total Citations0
Total Downloads316
Last 12 Months316
Last 6 weeks106

Abstract
View online with eReader
PDF

short-paper

Open Access

Detecting AI-Generated Texts in Cross-Domains

You Zhou,
Jie Wang

Article No.: 10, Pages 1–4https://doi.org/10.1145/3685650.3685673

Existing tools to detect text generated by a large language model (LLM) have met with certain success, but their performance can drop when dealing with texts in new domains. To tackle this issue, we train a ranking classifier called RoBERTa-Ranker, a ...

- 0
- 196
Metrics
Total Citations0
Total Downloads196
Last 12 Months196
Last 6 weeks43

Abstract
View online with eReader
PDF

panel

Competition on Binarizing Photographed Document Images 2024 Quality, Time and Space Report

Rafael Dueire Lins,
Gustavo P. Chaves,
Gabriel de F. P e Silva,
Thaylor Vieira,
Ricardo da Silva Barboza,
Steven J. Simske

Article No.: 11, Pages 1–12https://doi.org/10.1145/3685650.3686793

Many document processing platforms have image binarization as a key step. The performance of binarization algorithms depends on several factors that span from the quality of the digitalization devices to the intrinsic features of the document itself and ...

- 0
- 23
Metrics
Total Citations0
Total Downloads23
Last 12 Months23
Last 6 weeks2

Abstract
Get Access

research-article

Assessing Abstractive and Extractive Methods for Automatic News Summarization

Hilário Oliveira,
Rafael Dueire Lins

Article No.: 12, Pages 1–10https://doi.org/10.1145/3685650.3685664

Automatic Text Summarization (ATS) is a research area that originated in the late 1950s and has gained increasing importance with the surge of text data available today. ATS approaches are generally classified into extractive and abstractive methods. ...

- 0
- 95
Metrics
Total Citations0
Total Downloads95
Last 12 Months95
Last 6 weeks14

Abstract
Get Access

short-paper

Assessing the Reliability and Validity of the Measures for Automatic Text Summarization

Rafael Dueire Lins,
Hilário Oliveira,
Steven J. Simske

Article No.: 13, Pages 1–4https://doi.org/10.1145/3685650.3685671

Automatic Text Summarization (ATS) is a research area that originated in the late 1950s and has gained increasing importance with the surging amount of text data available today. One of the key challenges in this area is how to quantitatively assess the ...

- 1
- 57
Metrics
Total Citations1
Total Downloads57
Last 12 Months57
Last 6 weeks12

Abstract
Get Access

short-paper

Open Access

An Efficient PDF Malware Detection Method Using Highly Compact Features

Ran Liu,
Cynthia Matuszek,
Charles Nicholas

Article No.: 14, Pages 1–4https://doi.org/10.1145/3685650.3685668

The growing use of PDFs has made them a prime target for malware attacks. Machine learning-based approaches for detecting PDF malware are increasingly popular due to their high accuracy and efficiency. However, the effectiveness of these systems largely ...

- 0
- 127
Metrics
Total Citations0
Total Downloads127
Last 12 Months127
Last 6 weeks32

Abstract
View online with eReader
PDF

short-paper

Automatically producing accessible and reusable PDFs with LATEX

Frank Mittelbach,
Ulrike Fischer,
David Carlisle,
Joseph Wright

Article No.: 15, Pages 1–4https://doi.org/10.1145/3685650.3685670

In this application note we outline the goals of the "LATEX Tagged PDF" project, describe its current status, show how it can already now been used to create accessible and reusable PDFs, and outline our future plans for a successful completion. Further ...

- 0
- 111
Metrics
Total Citations0
Total Downloads111
Last 12 Months111
Last 6 weeks29

Abstract
Get Access

research-article

LexBoost: Improving Lexical Document Retrieval with Nearest Neighbors

Hrishikesh Kulkarni,
Nazli Goharian,
Ophir Frieder,
Sean MacAvaney

Article No.: 16, Pages 1–10https://doi.org/10.1145/3685650.3685658

Sparse retrieval methods like BM25 are based on lexical overlap, focusing on the surface form of the terms that appear in the query and the document. The use of inverted indices in these methods leads to high retrieval efficiency. On the other hand, ...

- 0
- 68
Metrics
Total Citations0
Total Downloads68
Last 12 Months68
Last 6 weeks10

Abstract
Get Access

short-paper

Similarity Problems in Paragraph Justification: An Extension to the Knuth-Plass Algorithm

Didier Verna

Article No.: 17, Pages 1–4https://doi.org/10.1145/3685650.3685666

In high quality typography, consecutive lines beginning or ending with the same word or sequence of characters is considered a defect. We have implemented an extension to TEX'S paragraph justification algorithm which handles this problem. Experimentation ...

- 0
- 19
Metrics
Total Citations0
Total Downloads19
Last 12 Months19
Last 6 weeks2

Abstract
Get Access

Cited By

Lins R, Oliveira H and Simske S Assessing the Reliability and Validity of the Measures for Automatic Text Summarization Proceedings of the ACM Symposium on Document Engineering 2024, (1-4)

Save to Binder

Create a New Binder

Name

Index Terms

Proceedings of the ACM Symposium on Document Engineering 2024

Index terms have been assigned to the content through auto-classification.

Comments

Recommendations

DocEng '09: Proceedings of the 9th ACM symposium on Document engineering
DocEng '17: Proceedings of the 2017 ACM Symposium on Document Engineering
DocEng '10: Proceedings of the 10th ACM symposium on Document engineering

Acceptance Rates

DocEng '24 Paper Acceptance Rate 16 of 27 submissions, 59%;

Overall Acceptance Rate 194 of 564 submissions, 34%

Year	Submitted	Accepted	Rate
DocEng '24	27	16	59%
DocEng '23	27	9	33%
DocEng '19	77	30	39%
DocEng '17	71	13	18%
DocEng '16	35	11	31%
DocEng '15	31	11	35%
DocEng '14	41	15	37%
DocEng '13	50	16	32%
DocEng '10	42	13	31%
DocEng '08	62	21	34%
DocEng '02	46	21	46%
DocEng '01	55	18	33%
Overall	564	194	34%

Export Citations

Select Citation format

Please download or close your previous search result export first before starting a new bulk export.
Preview is not available.
By clicking download,a status dialog will open to start the export process. The process may takea few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress.
Download
- Download citation
- Copy citation

Save to Binder

Sections

Proceeding Downloads

Cited By

Save to Binder

Index Terms

Recommendations

DocEng '09: Proceedings of the 9th ACM symposium on Document engineering

DocEng '17: Proceedings of the 2017 ACM Symposium on Document Engineering

DocEng '10: Proceedings of the 10th ACM symposium on Document engineering

Acceptance Rates