short-paper

Open access

MUSER: A Multi-View Similar Case Retrieval Dataset

Authors:

Weixing ShenAuthors Info & Claims

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 5336 - 5340

https://doi.org/10.1145/3583780.3615125

Published: 21 October 2023 Publication History

Abstract

Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. However, existing SCR datasets only focus on the fact description section when judging the similarity between cases, ignoring other valuable sections (e.g., the court's opinion) that can provide insightful reasoning process behind. Furthermore, the case similarities are typically measured solely by the textual semantics of the fact descriptions, which may fail to capture the full complexity of legal cases from the perspective of legal knowledge. In this work, we present MUSER, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations. Specifically, we select three perspectives (legal fact, dispute focus, and law statutory) and build a comprehensive and structured label schema of legal elements for each of them, to enable accurate and knowledgeable evaluation of case similarities. The constructed dataset originates from Chinese civil cases and contains 100 query cases and 4,024 candidate cases. We implement several text classification algorithms for legal element prediction and various retrieval methods for retrieving similar cases on MUSER. The experimental results indicate that incorporating legal elements can benefit the performance of SCR models, but further efforts are still required to address the remaining challenges posed by MUSER. The source code and dataset are released at https://github.com/THUlawtech/MUSER.

References

[1]

Layman E Alen. 1962. Beyond document retrieval toward information retrieval. MINNESOTA LAW REVIEW, 47, 713. https://core.ac.uk/download/pdf/72834475.pdf.

[2]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, 4171--4186.

[3]

Yoshinobu Kano, Mi-Young Kim, Masaharu Yoshioka, Yao Lu, Juliano Rabelo, Naoki Kiyota, Randy Goebel, and Ken Satoh. 2019. COLIEE-2018: Evaluation of the competition on legal information extraction and entailment. In JSAI-isAI Workshops. Springer, 177--192. https://link.springer.com/chapter/10.1007/978--3-030--31605--1_14.

Digital Library

[4]

Haitao Li, Qingyao Ai, Jia Chen, Qian Dong, Yueyue Wu, Yiqun Liu, Chong Chen, and Qi Tian. 2023. SAILER: Structure-aware pre-trained language model for legal case retrieval. arXiv preprint arXiv:2304.11370. https://arxiv.org/abs/2304.11370.

[5]

Haitao Li, Weihang Su, Changyue Wang, Yueyue Wu, Qingyao Ai, and Yiqun Liu. 2023. THUIR@ COLIEE 2023: Incorporating structural knowledge into pre-trained language models for legal case retrieval. arXiv preprint arXiv:2305.06812. https://arxiv.org/abs/2305.06812.

[6]

Haitao Li, Changyue Wang, Weihang Su, Yueyue Wu, Qingyao Ai, and Yiqun Liu. 2023. THUIR@ COLIEE 2023: More parameters and legal knowledge for legal case entailment. arXiv preprint arXiv:2305.06817. https://arxiv.org/abs/2305.06817.

[7]

Qingquan Li, Qifan Zhang, Junjie Yao, and Yingjie Zhang. 2020. Event extraction for criminal legal text. In Proceedings of ICKG, 573--580.

[8]

Bulou Liu, Yiran Hu, Yueyue Wu, Yiqun Liu, Fan Zhang, Chenliang Li, Min Zhang, Shaoping Ma, and Weixing Shen. 2023. Investigating conversational agent action in legal case retrieval. In Proceedings of ECIR, 622--635. https://link.springer.com/chapter/10.1007/978--3-031--28244--7_39.

Digital Library

[9]

Bulou Liu, Yueyue Wu, Yiqun Liu, Fan Zhang, Yunqiu Shao, Chenliang Li, Min Zhang, and Shaoping Ma. 2021. Conversational vs traditional: comparing search behavior and outcome in legal case retrieval. In Proceedings of SIGIR, 1622--1626.

Digital Library

[10]

Bulou Liu, Yueyue Wu, Fan Zhang, Yiqun Liu, Zhihong Wang, Chenliang Li, Min Zhang, and Shaoping Ma. 2022. Query generation and buffer mechanism: towards a better conversational agent for legal case retrieval. Information Processing & Management, 59, 5, 103051.

Digital Library

[11]

Yixiao Ma, Yunqiu Shao, Bulou Liu, Yiqun Liu, Min Zhang, and Shaoping Ma. 2021. Retrieving legal cases from a large-scale candidate corpus. Proceedings of COLIEE. https://thuyshao.github.io/files/COLIEE2021_Workshop_CR_thuir.pdf.

[12]

Yixiao Ma, Yunqiu Shao, Yueyue Wu, Yiqun Liu, Ruizhe Zhang, Min Zhang, and Shaoping Ma. 2021. Lecard: A legal case retrieval dataset for chinese law system. In Proceedings of SIGIR, 2342--2348.

Digital Library

[13]

Shubham Kumar Nigam, Navansh Goel, and Arnab Bhattacharya. 2022. Nigam@COLIEE-22: Legal case retrieval and entailment using cascading of lexical and semantic-based models. In JSAI International Symposium on Artificial Intelligence. Springer, 96--108. https://link.springer.com/chapter/10.1007/978--3-031--29168--5_7.

[14]

Jay M Ponte and W Bruce Croft. 1998. A language modeling approach to information retrieval. In Proceedings of SIGIR, 275--281.

Digital Library

[15]

Stephen E. Robertson, Steve Walker, Micheline Hancock-Beaulieu, Mike Gatford, and A. Payne. 1995. Okapi at TREC-4. In Proceedings of TREC. http://trec.nist.gov/pubs/trec4/papers/city.ps.gz.

[16]

Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management, 24, 5, 513--523.

[17]

Yunqiu Shao, Haitao Li, Yueyue Wu, Yiqun Liu, Qingyao Ai, Jiaxin Mao, Yixiao Ma, and Shaoping Ma. 2023. An intent taxonomy of legal case retrieval. arXiv preprint arXiv:2307.13298. https://arxiv.org/abs/2307.13298.

[18]

Yunqiu Shao, Bulou Liu, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2020. THUIR@COLIEE-2020: Leveraging semantic understanding and exact matching for legal case retrieval and entailment. arXiv preprint arXiv:2012.13102. https://arxiv.org/abs/2012.13102.

[19]

Vu Tran, Minh Le Nguyen, and Ken Satoh. 2019. Building legal case retrieval systems with lexical matching and summarization using a pre-trained phrase scoring model. In Proceedings of ICAIL, 275--282.

Digital Library

[20]

David M Trubek. 1980. The construction and deconstruction of a disputes-focused approach: an afterword. Law and society review, 727--747.

[21]

Xiaozhi Wang et al. 2020. MAVEN: A massive general domain event detection dataset. In Proceedings of EMNLP, 1652--1671.

[22]

Zhaowei Wang. 2022. Legal element-oriented modeling with multi-view contrastive learning for legal case retrieval. In Proceedings of IJCNN, 01--10.

[23]

Chaojun Xiao, Xueyu Hu, Zhiyuan Liu, Cunchao Tu, and Maosong Sun. 2021. Lawformer: A pre-trained language model for chinese legal long documents. AI Open, 2, 79--84.

[24]

Chaojun Xiao et al. 2019. CAIL2019-SCM: A dataset of similar case matching in legal domain. arXiv preprint arXiv:1911.08962. https://arxiv.org/abs/1911.08962.

[25]

Feng Yao, Jingyuan Zhang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Yun Liu, and Weixing Shen. 2023. Unsupervised legal evidence retrieval via contrastive learning with approximate aggregated positive. In Proceedings of AAAI number 4. Vol. 37, 4783--4791.

Digital Library

[26]

Feng Yao et al. 2022. LEVEN: A large-scale chinese legal event detection dataset. In Findings of ACL, 183--201.

[27]

Weijie Yu, Zhongxiang Sun, Jun Xu, Zhenhua Dong, Xu Chen, Hongteng Xu, and Ji-Rong Wen. 2022. Explainable legal case matching via inverse optimal transport-based rationale extraction. In Proceedings of SIGIR, 657--668.

Digital Library

[28]

Haoxi Zhong, Chaojun Xiao, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, and Maosong Sun. 2020. How does nlp benefit legal system: a summary of legal artificial intelligence. In Proceedings of ACL, 5218--5230.

Index Terms

MUSER: A Multi-View Similar Case Retrieval Dataset
1. Applied computing
  1. Law, social and behavioral sciences
    1. Law

Recommendations

LeCaRD: A Legal Case Retrieval Dataset for Chinese Law System
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Legal case retrieval is of vital importance for ensuring justice in different kinds of law systems and has recently received increasing attention in information retrieval (IR) research. However, the relevance judgment criteria of previous retrieval ...
LeDQA: A Chinese Legal Case Document-based Question Answering Dataset
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

Legal question answering based on case documents is a pivotal legal AI application and helps extract key elements from the legal case documents to promote downstream tasks. Intuitively, the form of this task is similar to legal machine reading ...
Result Diversification for Legal case Retrieval
SIGIR-AP '23: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region

Legal case retrieval has received considerable attention in the last decade. As more and more legal documents are collected and stored in digital form, the need for efficient and reliable access to relevant information in large-scale legal databases ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

October 2023

5508 pages

ISBN:9798400701245

DOI:10.1145/3583780

General Chairs:
Ingo Frommholz
University of Wolverhampton, UK
,
Frank Hopfgartner
University of Koblenz, Germany
,
Mark Lee
University of Birmingham, UK
,
Michael Oakes
University of Birmingham, UK
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Min Zhang
Tsinghua University, China
,
Rodrygo Santos
Federal University of Minas Gerais, Brazil

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

The National Key Research and Development Program of China
The National Natural Science Foundation of China

Conference

CIKM '23

Sponsor:

CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2023

Birmingham, United Kingdom

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
466
Total Downloads

Downloads (Last 12 months)400
Downloads (Last 6 weeks)30

Reflects downloads up to 21 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents