skip to main content
10.1145/3587259.3627570acmconferencesArticle/Chapter ViewAbstractPublication Pagesk-capConference Proceedingsconference-collections
research-article
Open access

Annotation and Extraction of Industrial Procedural Knowledge from Textual Documents

Published: 05 December 2023 Publication History

Abstract

The ability to extract valuable information from documents and convert it into knowledge is crucial for driving technological innovation across industries. While adding metadata to manuals enhances their searchability, the real knowledge is still hidden in the procedural information they contain, which offers vital guidance for operators. Therefore, the approach of extracting and transforming unstructured human-readable information into machine-interpretable data is fundamental for establishing cutting-edge digital knowledge-based platforms. This paper presents a methodology tailored to the specific requirements of users who are seeking support in extracting and representing procedural knowledge from documents. We introduce a tool designed to support users in manually annotating procedures within PDF documents and generating a corresponding procedural knowledge graph. We assess the tool in real-world scenarios, aimed at evaluating its effectiveness in accomplishing various tasks. Finally, we generate a procedural knowledge graph that can facilitate knowledge discovery.

References

[1]
Patrizio Bellan, Mauro Dragoni, and Chiara Ghidini. 2021. Process Extraction from Text: state of the art and challenges for the future. CoRR abs/2110.03754 (2021).
[2]
Anila Sahar Butt and Peter Fitch. 2020. ProvONE+: A Provenance Model for Scientific Workflows. In WISE(LNCS, Vol. 12343). Springer, 431–444.
[3]
Wellington Moreira de Oliveira, Daniel de Oliveira, and Vanessa Braganholo. 2018. Provenance Analytics for Workflow-Based Computational Experiments: A Survey. ACM Comput. Surv. 51, 3 (2018), 53:1–53:25.
[4]
Aldo Gangemi, Silvio Peroni, David M. Shotton, and Fabio Vitali. 2017. The Publishing Workflow Ontology (PWO). Semantic Web 8, 5 (2017), 703–718.
[5]
Daniel Garijo, Yolanda Gil, and Óscar Corcho. 2017. Abstract, link, publish, exploit: An end to end framework for workflow sharing. Future Gener. Comput. Syst. 75 (2017), 271–283.
[6]
Dan Jurafsky and James H. Martin. 2009. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition. Prentice Hall, Pearson Education International.
[7]
Sachin S Kamble, Angappa Gunasekaran, and Shradha A Gawankar. 2018. Sustainable Industry 4.0 framework: A systematic literature review identifying the current trends and future perspectives. Process safety and environmental protection 117 (2018), 408–425.
[8]
Dena Mujtaba and Nihar Mahapatra. 2019. Recent trends in natural language understanding for procedural knowledge. In 2019 International Conference on Computational Science and Computational Intelligence (CSCI). IEEE, IEEE Computer Society, 420–424.
[9]
Mark Neumann, Zejiang Shen, and Sam Skjonsberg. 2021. PAWLS: PDF Annotation With Labels and Structure. In ACL 2021. Association for Computational Linguistics, 258–264.
[10]
Mariana Neves and Jurica Seva. 2021. An extensive review of tools for manual annotation of documents. Briefings Bioinform. 22, 1 (2021), 146–163.
[11]
Anisa Rula, Gloria Re Calegari, Antonia Azzini, Ilaria Baroni, and Irene Celino. 2023. K-Hub: a modular ontology to support document retrieval and knowledge extraction in Industry 5.0. In 20th ESWC 2023(LNCS). Springer.
[12]
Hiroyuki Shindo, Yohei Munesada, and Yuji Matsumoto. 2018. PDFAnno: a Web-based Linguistic Annotation Tool for PDF Documents. In Proceedings of the 11th LREC 2018. (ELRA).
[13]
Kai Sun, Yifan Ethan Xu, Hanwen Zha, Yue Liu, and Xin Luna Dong. 2023. Head-to-Tail: How Knowledgeable are Large Language Models (LLM)? A.K.A. Will LLMs Replace Knowledge Graphs?arxiv:2308.10168 [cs.CL]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
K-CAP '23: Proceedings of the 12th Knowledge Capture Conference 2023
December 2023
270 pages
ISBN:9798400701412
DOI:10.1145/3587259
  • Editors:
  • Brent Venable,
  • Daniel Garijo,
  • Brian Jalaian
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2023

Check for updates

Author Tags

  1. data acquisition
  2. data engineering
  3. data extraction
  4. digitisation
  5. knowledge graph
  6. procedural knowledge

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • It is also partially funded by the European Union (PERKS)
  • was partially supported by the K-HUB (Manufacturing Knowledge Hub) project, co-funded by EIT Manufacturing

Conference

K-CAP '23
Sponsor:
K-CAP '23: Knowledge Capture Conference 2023
December 5 - 7, 2023
FL, Pensacola, USA

Acceptance Rates

Overall Acceptance Rate 55 of 198 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)346
  • Downloads (Last 6 weeks)30
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media