skip to main content
10.1145/2911451.2911510acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article
Open access

Engineering Quality and Reliability in Technology-Assisted Review

Published: 07 July 2016 Publication History

Abstract

The objective of technology-assisted review ("TAR") is to find as much relevant information as possible with reasonable effort. Quality is a measure of the extent to which a TAR method achieves this objective, while reliability is a measure of how consistently it achieves an acceptable result. We are concerned with how to define, measure, and achieve high quality and high reliability in TAR. When quality is defined using the traditional goal-post method of specifying a minimum acceptable recall threshold, the quality and reliability of a TAR method are both, by definition, equal to the probability of achieving the threshold. Assuming this definition of quality and reliability, we show how to augment any TAR method to achieve guaranteed reliability, for a quantifiable level of additional review effort. We demonstrate this result by augmenting the TAR method supplied as the baseline model implementation for the TREC 2015 Total Recall Track, measuring reliability and effort for 555 topics from eight test collections. While our empirical results corroborate our claim of guaranteed reliability, we observe that the augmentation strategy may entail disproportionate effort, especially when the number of relevant documents is low. To address this limitation, we propose stopping criteria for the model implementation that may be applied with no additional review effort, while achieving empirical reliability that compares favorably to the provably reliable method. We further argue that optimizing reliability according to the traditional goal-post method is inconsistent with certain subjective aspects of quality, and that optimizing a Taguchi quality loss function may be more apt.

References

[1]
M. Bagdouri, W. Webber, D. D. Lewis, and D. W. Oard. Towards minimizing the annotation cost of certified text classification. In SIGIR 2013.
[2]
D. Blair and M. E. Maron. An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun. ACM, 28(3):289--299, 1985.
[3]
D. C. Blair. Stairs redux: Thoughts on the stairs evaluation, ten years after. J. Am. Soc. Inf. Sci., 47(1):4--22, Jan. 1996.
[4]
G. Cormack and M. Mojdeh. Machine learning for information retrieval: TREC 2009 Web, Relevance Feedback and Legal Tracks. In TREC 2009.
[5]
G. V. Cormack and M. R. Grossman. Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In SIGIR 2014.
[6]
G. V. Cormack and M. R. Grossman. Multi-faceted recall of continuous active learning for technology-assisted review. In SIGIR 2015.
[7]
G. V. Cormack and M. R. Grossman. Autonomy and reliability of continuous active learning for technology-assisted review. arXiv:1504.06868, 2015.
[8]
G. V. Cormack, C. R. Palmer, and C. L. A. Clarke. Efficient construction of large test collections. In SIGIR 1998.
[9]
M. R. Grossman and G. V. Cormack. Comments on "The implications of Rule 26(g) on the use of technology-assisted review". Fed. Cts. L. Rev., 7:285--312, 2014.
[10]
C. Lefebvre, E. Manheimer, and J. Glanville. Searching for studies. Cochrane Handbook for Systematic Reviews of Interventions, 2008.
[11]
D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. J. Mach. Learn. Res., 5:361--397, 2004.
[12]
D. Remus and F. S. Levy. Can robots be lawyers' Computers, lawyers, and the practice of law. http://dx.doi.org/10.2139/ssrn.2701092, 2015.
[13]
A. Roegiest, G. V. Cormack, M. R. Grossman, and C. L. A. Clarke. Notebook Draft TREC 2015 Total Recall Track Overview. In TREC 2015.
[14]
M. Sanderson and H. Joho. Forming test collections with no system pooling. In SIGIR 2004.
[15]
V. Satopaa, J. Albrecht, D. Irwin, and B. Raghavan. Finding a "kneedle" in a haystack: Detecting knee points in system behavior. In ICDCSW 2011.
[16]
K. Schieneman and T. Gricks. The implications of Rule 26(g) on the use of technology-assisted review. Fed. Cts. L. Rev., 7:239--274, 2013.
[17]
I. Soboroff and S. Robertson. Building a filtering test collection for TREC 2002. In SIGIR 2003.
[18]
G. Taguchi. Introduction to Quality Engineering: Designing Quality Into Products and Processes. 1986.
[19]
E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing & Management, 36(5):697--716, 2000.
[20]
E. M. Voorhees. The philosophy of information retrieval evaluation. In Evaluation of cross-language information retrieval systems, pages 143--170. Springer, 2002.
[21]
E. M. Voorhees and D. K. Harman. The Text REtrieval Conference. In E. M. Voorhees and D. K. Harman, editors, TREC: Experiment and Evaluation in Information Retrieval, pages 3--19. MIT Press, 2005.
[22]
E. Yilmaz and S. Robertson. On the choice of effectiveness measures for learning to rank. Information Retrieval, 13(3):271--290, 2010.
[23]
J. Zobel, A. Moffat, and L. A. Park. Against recall: Is it persistence, cardinality, density, coverage, or totality? In ACM SIGIR Forum, volume 43, pages 3--8. ACM, 2009.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
July 2016
1296 pages
ISBN:9781450340694
DOI:10.1145/2911451
This work is licensed under a Creative Commons Attribution-NoDerivatives International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2016

Check for updates

Author Tags

  1. continuous active learning
  2. e-discovery
  3. electronic discovery
  4. predictive coding
  5. quality
  6. relevance feedback
  7. reliability
  8. systematic review
  9. technology-assisted review
  10. test collections

Qualifiers

  • Research-article

Funding Sources

  • Natural Sciences and Engineering Research Council

Conference

SIGIR '16
Sponsor:

Acceptance Rates

SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)485
  • Downloads (Last 6 weeks)87
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media