research-article

Open access

Engineering Quality and Reliability in Technology-Assisted Review

Authors:

Gordon V. Cormack,

Maura R. GrossmanAuthors Info & Claims

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Pages 75 - 84

https://doi.org/10.1145/2911451.2911510

Published: 07 July 2016 Publication History

Abstract

The objective of technology-assisted review ("TAR") is to find as much relevant information as possible with reasonable effort. Quality is a measure of the extent to which a TAR method achieves this objective, while reliability is a measure of how consistently it achieves an acceptable result. We are concerned with how to define, measure, and achieve high quality and high reliability in TAR. When quality is defined using the traditional goal-post method of specifying a minimum acceptable recall threshold, the quality and reliability of a TAR method are both, by definition, equal to the probability of achieving the threshold. Assuming this definition of quality and reliability, we show how to augment any TAR method to achieve guaranteed reliability, for a quantifiable level of additional review effort. We demonstrate this result by augmenting the TAR method supplied as the baseline model implementation for the TREC 2015 Total Recall Track, measuring reliability and effort for 555 topics from eight test collections. While our empirical results corroborate our claim of guaranteed reliability, we observe that the augmentation strategy may entail disproportionate effort, especially when the number of relevant documents is low. To address this limitation, we propose stopping criteria for the model implementation that may be applied with no additional review effort, while achieving empirical reliability that compares favorably to the provably reliable method. We further argue that optimizing reliability according to the traditional goal-post method is inconsistent with certain subjective aspects of quality, and that optimizing a Taguchi quality loss function may be more apt.

References

[1]

M. Bagdouri, W. Webber, D. D. Lewis, and D. W. Oard. Towards minimizing the annotation cost of certified text classification. In SIGIR 2013.

Digital Library

[2]

D. Blair and M. E. Maron. An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun. ACM, 28(3):289--299, 1985.

Digital Library

[3]

D. C. Blair. Stairs redux: Thoughts on the stairs evaluation, ten years after. J. Am. Soc. Inf. Sci., 47(1):4--22, Jan. 1996.

Digital Library

[4]

G. Cormack and M. Mojdeh. Machine learning for information retrieval: TREC 2009 Web, Relevance Feedback and Legal Tracks. In TREC 2009.

[5]

G. V. Cormack and M. R. Grossman. Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In SIGIR 2014.

Digital Library

[6]

G. V. Cormack and M. R. Grossman. Multi-faceted recall of continuous active learning for technology-assisted review. In SIGIR 2015.

Digital Library

[7]

G. V. Cormack and M. R. Grossman. Autonomy and reliability of continuous active learning for technology-assisted review. arXiv:1504.06868, 2015.

[8]

G. V. Cormack, C. R. Palmer, and C. L. A. Clarke. Efficient construction of large test collections. In SIGIR 1998.

Digital Library

[9]

M. R. Grossman and G. V. Cormack. Comments on "The implications of Rule 26(g) on the use of technology-assisted review". Fed. Cts. L. Rev., 7:285--312, 2014.

[10]

C. Lefebvre, E. Manheimer, and J. Glanville. Searching for studies. Cochrane Handbook for Systematic Reviews of Interventions, 2008.

[11]

D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. J. Mach. Learn. Res., 5:361--397, 2004.

Digital Library

[12]

D. Remus and F. S. Levy. Can robots be lawyers' Computers, lawyers, and the practice of law. http://dx.doi.org/10.2139/ssrn.2701092, 2015.

[13]

A. Roegiest, G. V. Cormack, M. R. Grossman, and C. L. A. Clarke. Notebook Draft TREC 2015 Total Recall Track Overview. In TREC 2015.

[14]

M. Sanderson and H. Joho. Forming test collections with no system pooling. In SIGIR 2004.

Digital Library

[15]

V. Satopaa, J. Albrecht, D. Irwin, and B. Raghavan. Finding a "kneedle" in a haystack: Detecting knee points in system behavior. In ICDCSW 2011.

Digital Library

[16]

K. Schieneman and T. Gricks. The implications of Rule 26(g) on the use of technology-assisted review. Fed. Cts. L. Rev., 7:239--274, 2013.

[17]

I. Soboroff and S. Robertson. Building a filtering test collection for TREC 2002. In SIGIR 2003.

Digital Library

[18]

G. Taguchi. Introduction to Quality Engineering: Designing Quality Into Products and Processes. 1986.

[19]

E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing & Management, 36(5):697--716, 2000.

Digital Library

[20]

E. M. Voorhees. The philosophy of information retrieval evaluation. In Evaluation of cross-language information retrieval systems, pages 143--170. Springer, 2002.

Digital Library

[21]

E. M. Voorhees and D. K. Harman. The Text REtrieval Conference. In E. M. Voorhees and D. K. Harman, editors, TREC: Experiment and Evaluation in Information Retrieval, pages 3--19. MIT Press, 2005.

[22]

E. Yilmaz and S. Robertson. On the choice of effectiveness measures for learning to rank. Information Retrieval, 13(3):271--290, 2010.

Digital Library

[23]

J. Zobel, A. Moffat, and L. A. Park. Against recall: Is it persistence, cardinality, density, coverage, or totality? In ACM SIGIR Forum, volume 43, pages 3--8. ACM, 2009.

Digital Library

Cited By

Boetje Jvan de Schoot R(2024)The SAFE procedure: a practical stopping heuristic for active learning-based screening in systematic reviews and meta-analysesSystematic Reviews10.1186/s13643-024-02502-713:1Online publication date: 1-Mar-2024
https://doi.org/10.1186/s13643-024-02502-7
Kusa WPeikos GStaudinger MLipani AHanbury AOosterhuis HBast HXiong C(2024)Normalised Precision at Fixed Recall for Evaluating TARProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672532(43-49)Online publication date: 2-Aug-2024
https://dl.acm.org/doi/10.1145/3664190.3672532
Mao XZhuang SKoopman BZuccon GHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening PrioritisationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657921(2357-2362)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657921
Show More Cited By

Index Terms

Engineering Quality and Reliability in Technology-Assisted Review

Recommendations

Scalability of Continuous Active Learning for Reliable High-Recall Text Classification
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

For finite document collections, continuous active learning ('CAL') has been observed to achieve high recall with high probability, at a labeling cost asymptotically proportional to the number of relevant documents. As the size of the collection ...
Improvement of 3P and 6R mechanical robots reliability and quality applying FMEA and QFD approaches

In the past few years, extending usage of robotic systems has increased the importance of robot reliability and quality. To improve the robot reliability and quality by applying standard approaches such as Failure Mode and Effect Analysis (FMEA) and ...
Multi-Faceted Recall of Continuous Active Learning for Technology-Assisted Review
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Continuous active learning achieves high recall for technology-assisted review, not only for an overall information need, but also for various facets of that information need, whether explicit or implicit. Through simulations using Cormack and Grossman'...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

July 2016

1296 pages

ISBN:9781450340694

DOI:10.1145/2911451

General Chairs:
Raffaele Perego
ISTI-CNR, Italy
,
Fabrizio Sebastiani
Qatar Computing Research Institute, HBKU, Qatar
,
Program Chairs:
Javed Aslam
Northeastern University, US
,
Ian Ruthven
University of Strathclyde, UK
,
Justin Zobel
University of Melbourne, Australia

Copyright © 2016 Owner/Author.

This work is licensed under a Creative Commons Attribution-NoDerivatives International 4.0 License.

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2016

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Sciences and Engineering Research Council

Conference

SIGIR '16

Sponsor:

SIGIR

SIGIR '16: The 39th International ACM SIGIR conference on research and development in Information Retrieval

July 17 - 21, 2016

Pisa, Italy

Acceptance Rates

SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

47
Total Citations
View Citations
2,267
Total Downloads

Downloads (Last 12 months)485
Downloads (Last 6 weeks)87

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Boetje Jvan de Schoot R(2024)The SAFE procedure: a practical stopping heuristic for active learning-based screening in systematic reviews and meta-analysesSystematic Reviews10.1186/s13643-024-02502-713:1Online publication date: 1-Mar-2024
https://doi.org/10.1186/s13643-024-02502-7
Kusa WPeikos GStaudinger MLipani AHanbury AOosterhuis HBast HXiong C(2024)Normalised Precision at Fixed Recall for Evaluating TARProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672532(43-49)Online publication date: 2-Aug-2024
https://dl.acm.org/doi/10.1145/3664190.3672532
Mao XZhuang SKoopman BZuccon GHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening PrioritisationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657921(2357-2362)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657921
Yang EHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Contextualization with SPLADE for High Recall RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657919(2337-2341)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657919
Bin-Hezam RStevenson MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)RLStop: A Reinforcement Learning Stopping Method for TARProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657911(2604-2608)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657911
Campos DFütterer TGfrörer TLavelle-Hill RMurayama KKönig LHecht MZitzmann SScherer R(2024)Screening Smarter, Not Harder: A Comparative Analysis of Machine Learning Screening Algorithms and Heuristic Stopping Criteria for Systematic Reviews in Educational ResearchEducational Psychology Review10.1007/s10648-024-09862-536:1Online publication date: 8-Feb-2024
https://doi.org/10.1007/s10648-024-09862-5
O’Halloran TMcManus BHarbison AGrossman MCormack G(2024)Comparison of Tools and Methods for Technology-Assisted ReviewInformation Management10.1007/978-3-031-64359-0_9(106-126)Online publication date: 18-Jul-2024
https://doi.org/10.1007/978-3-031-64359-0_9
König LZitzmann SFütterer TCampos DScherer RHecht M(2024) An evaluation of the performance of stopping rules in AI ‐aided screening for psychological meta‐analytical research Research Synthesis Methods10.1002/jrsm.1762Online publication date: 16-Oct-2024
https://doi.org/10.1002/jrsm.1762
Hou ZTipton E(2024)Enhancing recall in automated record screening: A resampling algorithmResearch Synthesis Methods10.1002/jrsm.169015:3(372-383)Online publication date: 7-Jan-2024
https://doi.org/10.1002/jrsm.1690
Stevenson MBin-Hezam R(2023)Stopping Methods for Technology-assisted Reviews Based on Point ProcessesACM Transactions on Information Systems10.1145/363199042:3(1-37)Online publication date: 29-Dec-2023
https://dl.acm.org/doi/10.1145/3631990
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten