A Corpus for Multilingual Analysis of Online Terms of Service

Kasper Drawzeski, Andrea Galassi, Agnieszka Jablonowska, Francesca Lagioia, Marco Lippi, Hans Wolfgang Micklitz, Giovanni Sartor, Giacomo Tagiuri, Paolo Torroni


Abstract
We present the first annotated corpus for multilingual analysis of potentially unfair clauses in online Terms of Service. The data set comprises a total of 100 contracts, obtained from 25 documents annotated in four different languages: English, German, Italian, and Polish. For each contract, potentially unfair clauses for the consumer are annotated, for nine different unfairness categories. We show how a simple yet efficient annotation projection technique based on sentence embeddings could be used to automatically transfer annotations across languages.
Anthology ID:
2021.nllp-1.1
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Nikolaos Aletras, Ion Androutsopoulos, Leslie Barrett, Catalina Goanta, Daniel Preotiuc-Pietro
Venue:
NLLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–8
Language:
URL:
https://aclanthology.org/2021.nllp-1.1
DOI:
10.18653/v1/2021.nllp-1.1
Bibkey:
Cite (ACL):
Kasper Drawzeski, Andrea Galassi, Agnieszka Jablonowska, Francesca Lagioia, Marco Lippi, Hans Wolfgang Micklitz, Giovanni Sartor, Giacomo Tagiuri, and Paolo Torroni. 2021. A Corpus for Multilingual Analysis of Online Terms of Service. In Proceedings of the Natural Legal Language Processing Workshop 2021, pages 1–8, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
A Corpus for Multilingual Analysis of Online Terms of Service (Drawzeski et al., NLLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.nllp-1.1.pdf
Data
Multilingual Terms of Service