short-paper

UDAAN - Machine Learning based Post-Editing tool for Document Translation

Authors:

Ayush Maheshwari,

Ajay Ravindran,

Venkatapathy Subramanian,

Ganesh RamakrishnanAuthors Info & Claims

CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)

Pages 263 - 267

https://doi.org/10.1145/3570991.3571068

Published: 04 January 2023 Publication History

Get Access

Abstract

We introduce UDAAN, an open-source post-editing tool that can reduce manual editing efforts to quickly produce publishable-standard documents in several Indic languages. UDAAN has an end-to-end Machine Translation (MT) plus post-editing pipeline wherein users can upload a document to obtain raw MT output. Further, users can edit the raw translations using our tool. UDAAN offers several advantages: i. Domain-aware, vocabulary-based lexical constrained MT. ii. source-target and target-target lexicon suggestions for users. Replacements are based on the source and target texts’ lexicon alignment. iii. Translation suggestions are based on logs created during user interaction. iv. Source-target sentence alignment visualisation that reduces the cognitive load of users during editing. v. Translated outputs from our tool are available in multiple formats: docs, latex, and PDF. We also provide the facility to use around 100 in-domain dictionaries for lexicon-aware machine translation. Although we limit our experiments to English-to-Hindi translation, our tool is independent of the source and target languages. Experimental results based on the usage of the tools and users’ feedback show that our tool speeds up the translation time by approximately a factor of three compared to the baseline method of translating documents from scratch. Our tool is available for both Windows and Linux platforms. The tool is open-source under MIT license, and the source code can be accessed from our website, https://www.udaanproject.org. Demonstration and tutorial videos for various features of our tool can be accessed here. Our MT pipeline can be accessed at https://udaaniitb.aicte-india.org/udaan/translate/.

References

[1]

Guttu Sai Abhishek, Harshad Ingole, Parth Laturia, Vineeth Dorna, Ayush Maheshwari, Ganesh Ramakrishnan, and Rishabh Iyer. 2021. SPEAR : Semi-supervised Data Programming in Python. arxiv:2108.00373 [cs.LG]

Abstract

References

Index Terms

Recommendations

Post-Ordering by Parsing with ITG for Japanese-English Statistical Machine Translation

Large aligned treebanks for syntax-based machine translation

Language Modeling for Syntax-Based Machine Translation Using Tree Substitution Grammars: A Case Study on Chinese-English Translation

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations