skip to main content
10.1145/2348283.2348353acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Automatic refinement of patent queries using concept importance predictors

Published: 12 August 2012 Publication History

Abstract

Patent prior art queries are full patent applications which are much longer than standard web search topics. Such queries are composed of hundreds of terms and do not represent a focused information need. One way to make the queries more focused is to select a group of key terms as representatives. Existing works show that such a selection to reduce patent queries is a challenging task mainly because of the presence of ambiguous terms. Given this setup, we present a query modeling approach where we utilize patent-specific characteristics to generate more precise queries. We propose to automatically disambiguate query terms by employing noun phrases that are extracted using the global analysis of the patent collection. We further introduce a method for predicting whether expansion using noun phrases would improve the retrieval effectiveness.
Our experiments show that we can obtain almost 20% improvement by performing query expansion using the true importance of the noun phrase queries. Based on this observation, we introduce various features that can be used to estimate the importance of the noun phrase query. We evaluated the effectiveness of the proposed method on the patent prior art search collection CLEF-IP 2010. Our experimental results indicate that the proposed features make good predictors of the noun phrase importance, and selective application of noun phrase queries using the importance predictors outperforms existing query generation methods.

References

[1]
G. Amati, C. Carpineto, and G. Romano. Query difficulty, robustness and selective application of query expansion. In ECIR, pages 127--137, 2004.
[2]
S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In SIGIR, pages 299--306, 2002.
[3]
S. Cronen-Townsend, Y. Zhou, and W. B. Croft. A framework for selective query expansion. In CIKM, pages 236 -- 237, 2004.
[4]
F. Diaz and R. Jones. Using temporal profiles of queries for precision prediction. In SIGIR, pages 18--24, 2004.
[5]
J. H. Friedman. Stochastic gradient boosting. In Computational Statistics and Data Analysis, volume 38, pages 367--378, 1999.
[6]
A. Fujii. Enhancing patent retrieval by citation analysis. In SIGIR, pages 793--794, 2007.
[7]
D. Ganguly, J. Leveling, W. Magdy, and G. J. F. Jones. Patent query reduction based on pseudo-relevant documents. In CIKM, pages 1953--1956, 2011.
[8]
C. Hauff, L. Azzopardi, D. Hiemstra, and F. de Jong. Query performance prediction: Evaluation contrasted with effectiveness. In ECIR, pages 204--216, 2010.
[9]
B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In SPIRE, pages 43--54, 2004.
[10]
B. He and I. Ounis. Combining fields for query expansion and adaptive query expansion. Information Processing Management, 43(5):1294--1307, 2007.
[11]
H. Itoh, H. Mano, and Y. Ogawa. Term distillation in patent retrieval. In Proceedings of the ACL-2003 Workshop on Patent corpus processing, volume 20, pages 41--45, 2003.
[12]
M. Iwayama, A. Fujii, N. Kando, and A. Takano. Overview of the third NTCIR workshop. In Proceedings of the ACL-2003 Workshop on Patent corpus processing, pages 24--32, 2003.
[13]
K. Konishi. Query terms extraction from patent document for invalidity search. In NTCIR-5, 2005.
[14]
W. Magdy and G. J. F. Jones. PRES: A score metric for evaluating recall-oriented information retrieval applications. In SIGIR, pages 611--618, 2010.
[15]
W. Magdy, P. Lopez, and G. J. F. Jones. Simple vs. sophisticated approaches for patent prior-art search. In ECIR, pages 725--728, 2010.
[16]
P. Mahdabi, M. Keikha, S. Gerani, M. Landoni, and F. Crestani. Building queries for prior-art search. In Proceedings of Information Retrieval Facility Conference (IRFC), pages 3--15, 2011.
[17]
J. Peng, C. Macdonald, B. He, V. Plachouras, and I. Ounis. Incorporating term dependency in the DFR framework. In SIGIR, pages 843--844, 2007.
[18]
F. Piroi and J. Tait. CLEF-IP 2010: Retrieval experiments in the intellectual property domain. In CLEF-2010 (Notebook Papers/LABs/Workshops), 2010.
[19]
K. Toutanova, D. Klein, C. Manning, and Y. Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of HLT-NAACL, pages 252--259, 2003.
[20]
M. Winaver, O. Kurland, and C. Domshlak. Towards robust query expansion: Model selection in the language modeling framework. In SIGIR, pages 729--730, 2007.
[21]
X. Xue and W. B. Croft. Transforming patents into prior-art queries. In SIGIR, pages 808--809, 2009.
[22]
Y. Yang, N. Bansal, W. Dakka, P. Ipeirotis, N. Koudas, and D. Papadias. Query by document. In WSDM, pages 34--43, 2009.
[23]
E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty. In SIGIR, pages 512--519, 2005.

Cited By

View all

Index Terms

  1. Automatic refinement of patent queries using concept importance predictors

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
      August 2012
      1236 pages
      ISBN:9781450314725
      DOI:10.1145/2348283
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 August 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. patent search
      2. query generation
      3. relevance model

      Qualifiers

      • Research-article

      Conference

      SIGIR '12
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)9
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 06 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media