skip to main content
10.1007/978-3-031-28241-6_34guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Legal IR and NLP: The History, Challenges, and State-of-the-Art

Published: 02 April 2023 Publication History

Abstract

Artificial Intelligence (AI), Machine Learning (ML), Information Retrieval (IR) and Natural Language Processing (NLP) are transforming the way legal professionals and law firms approach their work. The significant potential for the application of AI to Law, for instance, by creating computational solutions for legal tasks, has intrigued researchers for decades. This appeal has only been amplified with the advent of Deep Learning (DL). It is worth noting that working with legal text is far more challenging as compared to the other subdomains of IR/NLP, mainly due to the typical characteristics of legal text, such as considerably longer documents, complex language and lack of large-scale annotated datasets.
In this tutorial, we introduce the audience to these characteristics of legal text, and with it, the challenges associated with processing the legal documents. We touch upon the history of AI and Law research, and how it has evolved over the years from relatively simpler approaches to more complex ones, such as those involving DL. We organize the tutorial as follows. First, we provide a brief introduction to state-of-the-art research in the general domain of IR and NLP. We then discuss in more detail IR/NLP tasks specific to the legal domain. We outline the methodologies (both from an academic and industry perspective), and the available tools and datasets to evaluate the methodologies. This is then followed by a hands-on coding/demo session.

References

[1]
Alammar, J.: Ecco: an open source library for the explainability of transformer language models. In: Proceedings of ACL-IJCNLP (2021)
[2]
Bench-Capon, T., et al.: A history of AI and Law in 50 papers: 25 years of the international conference on AI and Law. AI & Law (2012)
[3]
Bhattacharya, P., Ghosh, K., Pal, A., Ghosh, S.: Hier-SPCNet: a legal statute hierarchy-based heterogeneous network for computing legal case document similarity. In: Proceedings of SIGIR (2020)
[4]
Bhattacharya, P., Ghosh, K., Pal, A., Ghosh, S.: Legal case document similarity: You need both network and text. Information Processing & Management (2022)
[5]
Bhattacharya, P., Hiware, K., Rajgaria, S., Pochhi, N., Ghosh, K., Ghosh, S.: A comparative study of summarization algorithms applied to legal case judgments. In: Proceedings of ECIR (2019)
[6]
Bhattacharya, P., Paul, S., Ghosh, K., Ghosh, S., Wyner, A.: Identification of rhetorical roles of sentences in Indian legal judgments. In: Proceedings of JURIX (2019)
[7]
Bhattacharya, P., Paul, S., Ghosh, K., Ghosh, S., Wyner, A.: DeepRhole: deep learning for rhetorical role labeling of sentences in legal case documents. AI & Law (2021)
[8]
Bhattacharya, P., Poddar, S., Rudra, K., Ghosh, K., Ghosh, S.: Incorporating domain knowledge for extractive summarization of legal case documents. In: Proceedings of ICAIL (2021)
[9]
Branting, K., et al.: Semi-supervised methods for explainable legal prediction. In: Proceedings of ICAIL (2019)
[10]
Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopoulos, I.: LEGAL-BERT: the muppets straight out of law school. In: Proceedings of EMNLP (2020)
[11]
Chalkidis, I., et al.: LexGLUE: a benchmark dataset for legal language understanding in English. In: Proceedings of ACL (2022)
[12]
Conrad, J.G., Al-Kofahi, K.: Scenario analytics: analyzing jury verdicts to evaluate legal case outcomes. In: Proceedings of ICAIL (2017)
[13]
Conrad, J.G., Zeleznikow, J.: The Significance of Evaluation in AI and Law: A case study re-examining ICAIL proceedings. In: Proceedings of ICAIL (2013)
[14]
Conrad, J.G., Zeleznikow, J.: The Role of Evaluation in AI and Law: an examination of its different forms in the AI and Law Journal. In: Proceedings of ICAIL (2015)
[15]
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL (2019)
[16]
Dhani, J.S., Bhatt, R., Ganesan, B., Sirohi, P., Bhatnagar, V.: Similar cases recommendation using legal knowledge graphs. CoRR (2021)
[17]
Diao, S., Xu, R., Su, H., Jiang, Y., Song, Y., Zhang, T.: Taming pre-trained language models with n-gram representations for low-resource domain adaptation. In: Proceedings of ACL-IJCNLP (2021)
[18]
Garrido-Muñoz, I., Montejo-Ráez, A., Martínez-Santiago, F., Ureña-López, L.A.: A survey on bias in deep NLP. Applied Sciences (2021)
[19]
Górski, Ł., Ramakrishna, S.: Explainable artificial intelligence, lawyer’s perspective. In: Proceedings of ICAIL (2021)
[20]
Governatori, G., Bench-Capon, T., Verheij, B., Araszkiewicz, M., Francesconi, E., Grabmair, M.: Thirty years of Artificial Intelligence and Law: the first decade. AI & Law (2022)
[21]
Henderson, P., Krass, M.S., Zheng, L., Guha, N., Manning, C.D., Jurafsky, D., Ho, D.E.: Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset. arXiv (2022)
[22]
Iyengar, P.: Case study-indiankanoon (2011)
[23]
Joshi, P., Santy, S., Budhiraja, A., Bali, K., Choudhury, M.: The state and fate of linguistic diversity and inclusion in the NLP world. arXiv (2020)
[24]
Kann, K., Cho, K., Bowman, S.R.: Towards realistic practices in low-resource natural language processing: the development set. arXiv (2019)
[25]
Leitner, E., Rehm, G., Moreno-Schneider, J.: A dataset of German legal documents for named entity recognition. In: Proceedings of LREC (2020)
[26]
Lertvittayakumjorn, P., Toni, F.: Explanation-based human debugging of NLP models: a survey. Trans. Assoc. Comput. Linguist. (2021)
[27]
Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. arXiv (2019)
[28]
Malik, V., et al.: ILDC for CJPE: Indian Legal Documents Corpus for Court Judgment Prediction and Explanation. In: Proceedings of ACL-IJCNLP (2021)
[29]
Paul, S., Goyal, P., Ghosh, S.: Automatic charge identification from facts: a few sentence-level charge annotations is all you need. In: Proceedings of COLING (2020)
[30]
Paul, S., Goyal, P., Ghosh, S.: LeSICiN: a heterogeneous graph-based approach for automatic legal statute identification from Indian legal documents. In: Proceedings of AAAI (2022)
[31]
Paul, S., Mandal, A., Goyal, P., Ghosh, S.: Pre-training Transformers on Indian Legal Text. arXiv (2022)
[32]
Şahin, G.G.: To augment or not to augment? a comparative study on text augmentation techniques for low-resource NLP. Computational Linguistics (2022)
[33]
Sartor, G., et al.: Thirty years of Artificial Intelligence and Law: the second decade. AI & Law (2022)
[34]
Savelka, J., Walker, V., Grabmair, M., Ashley, K.: Sentence boundary detection in adjudicatory decisions in the United States. TAL (2017)
[35]
Shukla, A., et al.: Legal case document summarization: extractive and abstractive methods and their evaluation. In: Proceedings of AACL (2022)
[36]
Sil, R., Roy, A., Bhushan, B., Mazumdar, A.: Artificial intelligence and machine learning based legal application: the state-of-the-art and future research trends. In: 2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS) (2019)
[37]
Vaswani, A., et al.: Attention is all you need. In: Proceedings of NeurIPS (2017)
[38]
Villata, S., et al.: Thirty years of Artificial Intelligence and Law: the third decade. AI & Law (2022)
[39]
Xiao, C., Hu, X., Liu, Z., Tu, C., Sun, M.: Lawformer: a pre-trained language model for Chinese legal long documents. AI Open (2021)
[40]
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Proceedings of NeurIPS (2019)
[41]
Yu, W., et al.: Explainable legal case matching via inverse optimal transport-based rationale extraction. In: Proceedings of SIGIR (2022)
[42]
Zheng, L., Guha, N., Anderson, B.R., Henderson, P., Ho, D.E.: When does pretraining help? assessing self-supervised learning for law and the CaseHOLD dataset of 53,000+ legal holdings. In: Proceedings of ICAIL (2021)
[43]
Zhong, H., Xiao, C., Tu, C., Zhang, T., Liu, Z., Sun, M.: How does NLP benefit legal system: A summary of legal artificial intelligence. In: Proceedings of ACL (2020)

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part III
Apr 2023
634 pages
ISBN:978-3-031-28240-9
DOI:10.1007/978-3-031-28241-6

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 02 April 2023

Author Tags

  1. AI & Law
  2. Legal data analytics
  3. Natural language processing
  4. Legal information retrieval

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media