skip to main content
10.1145/3583780.3615125acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper
Open access

MUSER: A Multi-View Similar Case Retrieval Dataset

Published: 21 October 2023 Publication History

Abstract

Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. However, existing SCR datasets only focus on the fact description section when judging the similarity between cases, ignoring other valuable sections (e.g., the court's opinion) that can provide insightful reasoning process behind. Furthermore, the case similarities are typically measured solely by the textual semantics of the fact descriptions, which may fail to capture the full complexity of legal cases from the perspective of legal knowledge. In this work, we present MUSER, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations. Specifically, we select three perspectives (legal fact, dispute focus, and law statutory) and build a comprehensive and structured label schema of legal elements for each of them, to enable accurate and knowledgeable evaluation of case similarities. The constructed dataset originates from Chinese civil cases and contains 100 query cases and 4,024 candidate cases. We implement several text classification algorithms for legal element prediction and various retrieval methods for retrieving similar cases on MUSER. The experimental results indicate that incorporating legal elements can benefit the performance of SCR models, but further efforts are still required to address the remaining challenges posed by MUSER. The source code and dataset are released at https://github.com/THUlawtech/MUSER.

References

[1]
Layman E Alen. 1962. Beyond document retrieval toward information retrieval. MINNESOTA LAW REVIEW, 47, 713. https://core.ac.uk/download/pdf/72834475.pdf.
[2]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, 4171--4186.
[3]
Yoshinobu Kano, Mi-Young Kim, Masaharu Yoshioka, Yao Lu, Juliano Rabelo, Naoki Kiyota, Randy Goebel, and Ken Satoh. 2019. COLIEE-2018: Evaluation of the competition on legal information extraction and entailment. In JSAI-isAI Workshops. Springer, 177--192. https://link.springer.com/chapter/10.1007/978--3-030--31605--1_14.
[4]
Haitao Li, Qingyao Ai, Jia Chen, Qian Dong, Yueyue Wu, Yiqun Liu, Chong Chen, and Qi Tian. 2023. SAILER: Structure-aware pre-trained language model for legal case retrieval. arXiv preprint arXiv:2304.11370. https://arxiv.org/abs/2304.11370.
[5]
Haitao Li, Weihang Su, Changyue Wang, Yueyue Wu, Qingyao Ai, and Yiqun Liu. 2023. THUIR@ COLIEE 2023: Incorporating structural knowledge into pre-trained language models for legal case retrieval. arXiv preprint arXiv:2305.06812. https://arxiv.org/abs/2305.06812.
[6]
Haitao Li, Changyue Wang, Weihang Su, Yueyue Wu, Qingyao Ai, and Yiqun Liu. 2023. THUIR@ COLIEE 2023: More parameters and legal knowledge for legal case entailment. arXiv preprint arXiv:2305.06817. https://arxiv.org/abs/2305.06817.
[7]
Qingquan Li, Qifan Zhang, Junjie Yao, and Yingjie Zhang. 2020. Event extraction for criminal legal text. In Proceedings of ICKG, 573--580.
[8]
Bulou Liu, Yiran Hu, Yueyue Wu, Yiqun Liu, Fan Zhang, Chenliang Li, Min Zhang, Shaoping Ma, and Weixing Shen. 2023. Investigating conversational agent action in legal case retrieval. In Proceedings of ECIR, 622--635. https://link.springer.com/chapter/10.1007/978--3-031--28244--7_39.
[9]
Bulou Liu, Yueyue Wu, Yiqun Liu, Fan Zhang, Yunqiu Shao, Chenliang Li, Min Zhang, and Shaoping Ma. 2021. Conversational vs traditional: comparing search behavior and outcome in legal case retrieval. In Proceedings of SIGIR, 1622--1626.
[10]
Bulou Liu, Yueyue Wu, Fan Zhang, Yiqun Liu, Zhihong Wang, Chenliang Li, Min Zhang, and Shaoping Ma. 2022. Query generation and buffer mechanism: towards a better conversational agent for legal case retrieval. Information Processing & Management, 59, 5, 103051.
[11]
Yixiao Ma, Yunqiu Shao, Bulou Liu, Yiqun Liu, Min Zhang, and Shaoping Ma. 2021. Retrieving legal cases from a large-scale candidate corpus. Proceedings of COLIEE. https://thuyshao.github.io/files/COLIEE2021_Workshop_CR_thuir.pdf.
[12]
Yixiao Ma, Yunqiu Shao, Yueyue Wu, Yiqun Liu, Ruizhe Zhang, Min Zhang, and Shaoping Ma. 2021. Lecard: A legal case retrieval dataset for chinese law system. In Proceedings of SIGIR, 2342--2348.
[13]
Shubham Kumar Nigam, Navansh Goel, and Arnab Bhattacharya. 2022. Nigam@COLIEE-22: Legal case retrieval and entailment using cascading of lexical and semantic-based models. In JSAI International Symposium on Artificial Intelligence. Springer, 96--108. https://link.springer.com/chapter/10.1007/978--3-031--29168--5_7.
[14]
Jay M Ponte and W Bruce Croft. 1998. A language modeling approach to information retrieval. In Proceedings of SIGIR, 275--281.
[15]
Stephen E. Robertson, Steve Walker, Micheline Hancock-Beaulieu, Mike Gatford, and A. Payne. 1995. Okapi at TREC-4. In Proceedings of TREC. http://trec.nist.gov/pubs/trec4/papers/city.ps.gz.
[16]
Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management, 24, 5, 513--523.
[17]
Yunqiu Shao, Haitao Li, Yueyue Wu, Yiqun Liu, Qingyao Ai, Jiaxin Mao, Yixiao Ma, and Shaoping Ma. 2023. An intent taxonomy of legal case retrieval. arXiv preprint arXiv:2307.13298. https://arxiv.org/abs/2307.13298.
[18]
Yunqiu Shao, Bulou Liu, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2020. THUIR@COLIEE-2020: Leveraging semantic understanding and exact matching for legal case retrieval and entailment. arXiv preprint arXiv:2012.13102. https://arxiv.org/abs/2012.13102.
[19]
Vu Tran, Minh Le Nguyen, and Ken Satoh. 2019. Building legal case retrieval systems with lexical matching and summarization using a pre-trained phrase scoring model. In Proceedings of ICAIL, 275--282.
[20]
David M Trubek. 1980. The construction and deconstruction of a disputes-focused approach: an afterword. Law and society review, 727--747.
[21]
Xiaozhi Wang et al. 2020. MAVEN: A massive general domain event detection dataset. In Proceedings of EMNLP, 1652--1671.
[22]
Zhaowei Wang. 2022. Legal element-oriented modeling with multi-view contrastive learning for legal case retrieval. In Proceedings of IJCNN, 01--10.
[23]
Chaojun Xiao, Xueyu Hu, Zhiyuan Liu, Cunchao Tu, and Maosong Sun. 2021. Lawformer: A pre-trained language model for chinese legal long documents. AI Open, 2, 79--84.
[24]
Chaojun Xiao et al. 2019. CAIL2019-SCM: A dataset of similar case matching in legal domain. arXiv preprint arXiv:1911.08962. https://arxiv.org/abs/1911.08962.
[25]
Feng Yao, Jingyuan Zhang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Yun Liu, and Weixing Shen. 2023. Unsupervised legal evidence retrieval via contrastive learning with approximate aggregated positive. In Proceedings of AAAI number 4. Vol. 37, 4783--4791.
[26]
Feng Yao et al. 2022. LEVEN: A large-scale chinese legal event detection dataset. In Findings of ACL, 183--201.
[27]
Weijie Yu, Zhongxiang Sun, Jun Xu, Zhenhua Dong, Xu Chen, Hongteng Xu, and Ji-Rong Wen. 2022. Explainable legal case matching via inverse optimal transport-based rationale extraction. In Proceedings of SIGIR, 657--668.
[28]
Haoxi Zhong, Chaojun Xiao, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, and Maosong Sun. 2020. How does nlp benefit legal system: a summary of legal artificial intelligence. In Proceedings of ACL, 5218--5230.

Index Terms

  1. MUSER: A Multi-View Similar Case Retrieval Dataset

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
    October 2023
    5508 pages
    ISBN:9798400701245
    DOI:10.1145/3583780
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 October 2023

    Check for updates

    Author Tags

    1. datasets
    2. domain-specific
    3. similar case retrieval

    Qualifiers

    • Short-paper

    Funding Sources

    • The National Key Research and Development Program of China
    • The National Natural Science Foundation of China

    Conference

    CIKM '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 466
      Total Downloads
    • Downloads (Last 12 months)400
    • Downloads (Last 6 weeks)30
    Reflects downloads up to 21 Dec 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media