Article

Legal IR and NLP: The History, Challenges, and State-of-the-Art

Authors:

Debasis Ganguly,

Jack G. Conrad,

Kripabandhu Ghosh,

Saptarshi Ghosh,

Paheli Bhattacharya,

Shubham Kumar Nigam,

Shounak PaulAuthors Info & Claims

Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part III

Pages 331 - 340

https://doi.org/10.1007/978-3-031-28241-6_34

Published: 02 April 2023 Publication History

Abstract

Artificial Intelligence (AI), Machine Learning (ML), Information Retrieval (IR) and Natural Language Processing (NLP) are transforming the way legal professionals and law firms approach their work. The significant potential for the application of AI to Law, for instance, by creating computational solutions for legal tasks, has intrigued researchers for decades. This appeal has only been amplified with the advent of Deep Learning (DL). It is worth noting that working with legal text is far more challenging as compared to the other subdomains of IR/NLP, mainly due to the typical characteristics of legal text, such as considerably longer documents, complex language and lack of large-scale annotated datasets.

In this tutorial, we introduce the audience to these characteristics of legal text, and with it, the challenges associated with processing the legal documents. We touch upon the history of AI and Law research, and how it has evolved over the years from relatively simpler approaches to more complex ones, such as those involving DL. We organize the tutorial as follows. First, we provide a brief introduction to state-of-the-art research in the general domain of IR and NLP. We then discuss in more detail IR/NLP tasks specific to the legal domain. We outline the methodologies (both from an academic and industry perspective), and the available tools and datasets to evaluate the methodologies. This is then followed by a hands-on coding/demo session.

References

[1]

Alammar, J.: Ecco: an open source library for the explainability of transformer language models. In: Proceedings of ACL-IJCNLP (2021)

[2]

Bench-Capon, T., et al.: A history of AI and Law in 50 papers: 25 years of the international conference on AI and Law. AI & Law (2012)

[3]

Bhattacharya, P., Ghosh, K., Pal, A., Ghosh, S.: Hier-SPCNet: a legal statute hierarchy-based heterogeneous network for computing legal case document similarity. In: Proceedings of SIGIR (2020)

[4]

Bhattacharya, P., Ghosh, K., Pal, A., Ghosh, S.: Legal case document similarity: You need both network and text. Information Processing & Management (2022)

[5]

Bhattacharya, P., Hiware, K., Rajgaria, S., Pochhi, N., Ghosh, K., Ghosh, S.: A comparative study of summarization algorithms applied to legal case judgments. In: Proceedings of ECIR (2019)

[6]

Bhattacharya, P., Paul, S., Ghosh, K., Ghosh, S., Wyner, A.: Identification of rhetorical roles of sentences in Indian legal judgments. In: Proceedings of JURIX (2019)

[7]

Bhattacharya, P., Paul, S., Ghosh, K., Ghosh, S., Wyner, A.: DeepRhole: deep learning for rhetorical role labeling of sentences in legal case documents. AI & Law (2021)

[8]

Bhattacharya, P., Poddar, S., Rudra, K., Ghosh, K., Ghosh, S.: Incorporating domain knowledge for extractive summarization of legal case documents. In: Proceedings of ICAIL (2021)

[9]

Branting, K., et al.: Semi-supervised methods for explainable legal prediction. In: Proceedings of ICAIL (2019)

[10]

Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopoulos, I.: LEGAL-BERT: the muppets straight out of law school. In: Proceedings of EMNLP (2020)

[11]

Chalkidis, I., et al.: LexGLUE: a benchmark dataset for legal language understanding in English. In: Proceedings of ACL (2022)

[12]

Conrad, J.G., Al-Kofahi, K.: Scenario analytics: analyzing jury verdicts to evaluate legal case outcomes. In: Proceedings of ICAIL (2017)

[13]

Conrad, J.G., Zeleznikow, J.: The Significance of Evaluation in AI and Law: A case study re-examining ICAIL proceedings. In: Proceedings of ICAIL (2013)

[14]

Conrad, J.G., Zeleznikow, J.: The Role of Evaluation in AI and Law: an examination of its different forms in the AI and Law Journal. In: Proceedings of ICAIL (2015)

[15]

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL (2019)

[16]

Dhani, J.S., Bhatt, R., Ganesan, B., Sirohi, P., Bhatnagar, V.: Similar cases recommendation using legal knowledge graphs. CoRR (2021)

[17]

Diao, S., Xu, R., Su, H., Jiang, Y., Song, Y., Zhang, T.: Taming pre-trained language models with n-gram representations for low-resource domain adaptation. In: Proceedings of ACL-IJCNLP (2021)

[18]

Garrido-Muñoz, I., Montejo-Ráez, A., Martínez-Santiago, F., Ureña-López, L.A.: A survey on bias in deep NLP. Applied Sciences (2021)

[19]

Górski, Ł., Ramakrishna, S.: Explainable artificial intelligence, lawyer’s perspective. In: Proceedings of ICAIL (2021)

[20]

Governatori, G., Bench-Capon, T., Verheij, B., Araszkiewicz, M., Francesconi, E., Grabmair, M.: Thirty years of Artificial Intelligence and Law: the first decade. AI & Law (2022)

[21]

Henderson, P., Krass, M.S., Zheng, L., Guha, N., Manning, C.D., Jurafsky, D., Ho, D.E.: Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset. arXiv (2022)

[22]

Iyengar, P.: Case study-indiankanoon (2011)

[23]

Joshi, P., Santy, S., Budhiraja, A., Bali, K., Choudhury, M.: The state and fate of linguistic diversity and inclusion in the NLP world. arXiv (2020)

[24]

Kann, K., Cho, K., Bowman, S.R.: Towards realistic practices in low-resource natural language processing: the development set. arXiv (2019)

[25]

Leitner, E., Rehm, G., Moreno-Schneider, J.: A dataset of German legal documents for named entity recognition. In: Proceedings of LREC (2020)

[26]

Lertvittayakumjorn, P., Toni, F.: Explanation-based human debugging of NLP models: a survey. Trans. Assoc. Comput. Linguist. (2021)

[27]

Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. arXiv (2019)

[28]

Malik, V., et al.: ILDC for CJPE: Indian Legal Documents Corpus for Court Judgment Prediction and Explanation. In: Proceedings of ACL-IJCNLP (2021)

[29]

Paul, S., Goyal, P., Ghosh, S.: Automatic charge identification from facts: a few sentence-level charge annotations is all you need. In: Proceedings of COLING (2020)

[30]

Paul, S., Goyal, P., Ghosh, S.: LeSICiN: a heterogeneous graph-based approach for automatic legal statute identification from Indian legal documents. In: Proceedings of AAAI (2022)

[31]

Paul, S., Mandal, A., Goyal, P., Ghosh, S.: Pre-training Transformers on Indian Legal Text. arXiv (2022)

[32]

Şahin, G.G.: To augment or not to augment? a comparative study on text augmentation techniques for low-resource NLP. Computational Linguistics (2022)

[33]

Sartor, G., et al.: Thirty years of Artificial Intelligence and Law: the second decade. AI & Law (2022)

[34]

Savelka, J., Walker, V., Grabmair, M., Ashley, K.: Sentence boundary detection in adjudicatory decisions in the United States. TAL (2017)

[35]

Shukla, A., et al.: Legal case document summarization: extractive and abstractive methods and their evaluation. In: Proceedings of AACL (2022)

[36]

Sil, R., Roy, A., Bhushan, B., Mazumdar, A.: Artificial intelligence and machine learning based legal application: the state-of-the-art and future research trends. In: 2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS) (2019)

[37]

Vaswani, A., et al.: Attention is all you need. In: Proceedings of NeurIPS (2017)

[38]

Villata, S., et al.: Thirty years of Artificial Intelligence and Law: the third decade. AI & Law (2022)

[39]

Xiao, C., Hu, X., Liu, Z., Tu, C., Sun, M.: Lawformer: a pre-trained language model for Chinese legal long documents. AI Open (2021)

[40]

Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Proceedings of NeurIPS (2019)

[41]

Yu, W., et al.: Explainable legal case matching via inverse optimal transport-based rationale extraction. In: Proceedings of SIGIR (2022)

[42]

Zheng, L., Guha, N., Anderson, B.R., Henderson, P., Ho, D.E.: When does pretraining help? assessing self-supervised learning for law and the CaseHOLD dataset of 53,000+ legal holdings. In: Proceedings of ICAIL (2021)

[43]

Zhong, H., Xiao, C., Tu, C., Zhang, T., Liu, Z., Sun, M.: How does NLP benefit legal system: A summary of legal artificial intelligence. In: Proceedings of ACL (2020)

Cited By

Gurrin CKruschwitz UKamps J(2023)Report on the 45th European Conference on Information Retrieval (ECIR 2023)ACM SIGIR Forum10.1145/3636341.363635557:1(1-11)Online publication date: 4-Dec-2023
https://dl.acm.org/doi/10.1145/3636341.3636355

Recommendations

FIRE 2020 AILA Track: Artificial Intelligence for Legal Assistance
FIRE '20: Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation

The FIRE 2020 AILA track aimed at developing datasets and frameworks for the following two tasks: (i) Precedent and Statute Retrieval, where the task was to identify relevant prior cases and statutes (written laws) given a factual scenario, and (ii) ...
The structuring of legal knowledge in LOIS
Legal knowledge extraction and searching & legal ontology applications

Legal information retrieval is in need of the provision of legal knowledge for the improvement of search strategies. For this purpose, the LOIS project is concerned with the construction of a multilingual WordNet for cross-lingual information retrieval ...
Three roads to complexity, AI and the law of robots: on crimes, contracts, and torts
AICOL'11: Proceedings of the 25th IVR Congress conference on AI Approaches to the Complexity of Legal Systems: models and ethical challenges for legal systems, legal language and legal ontologies, argumentation and software agents

The paper examines the impact of robotics technology on contemporary legal systems and, more particularly, some of the legal challenges brought on by the information revolution in the fields of criminal law, contracts, and tort law. Whereas, in ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part III

Apr 2023

634 pages

ISBN:978-3-031-28240-9

DOI:10.1007/978-3-031-28241-6

Editors:
Jaap Kamps
University of Amsterdam, Amsterdam, The Netherlands
,
Lorraine Goeuriot
Université Grenoble-Alpes, Saint-Martin-d’Hères, France
,
Fabio Crestani
Università della Svizzera Italiana, Lugano, Switzerland
,
Maria Maistro
University of Copenhagen, Copenhagen, Denmark
,
Hideo Joho
University of Tsukuba, Ibaraki, Japan
,
Brian Davis
Dublin City University, Dublin, Ireland
,
Cathal Gurrin
Dublin City University, Dublin, Ireland
,
Udo Kruschwitz
Universität Regensburg, Regensburg, Germany
,
Annalina Caputo
Dublin City University, Dublin, Ireland

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 02 April 2023

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gurrin CKruschwitz UKamps J(2023)Report on the 45th European Conference on Information Retrieval (ECIR 2023)ACM SIGIR Forum10.1145/3636341.363635557:1(1-11)Online publication date: 4-Dec-2023
https://dl.acm.org/doi/10.1145/3636341.3636355

View Options

View options

Figures

Tables

Media

View Table of Conten