skip to main content
10.1145/3383583.3398512acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

Identification, Tracking and Impact: Understanding the Trade Secret of Catchphrases

Published: 01 August 2020 Publication History

Abstract

Understanding the topical evolution in industrial innovation is a challenging problem. With the advancement in the digital repositories in the form of patent documents, it is becoming increasingly more feasible to understand the innovation secrets - 'catchphrases' - of organizations. However, searching and understanding this enormous textual information is a natural bottleneck. In this paper, we propose an unsupervised method for the extraction of catchphrases from the abstracts of patents granted by the U.S. Patent and Trademark Office over the years. Our proposed system achieves substantial improvement, both in terms of precision and recall, against state-of-the-art techniques. As a second objective, we conduct an extensive empirical study to understand the temporal evolution of the catchphrases across various organizations. We also show how the overall innovation evolution in the form of introduction of newer catchphrases in an organization's patents correlates with the future citations received by the patents filed by that organization. Our code and data sets will be placed in the public domain.

References

[1]
Daniele Archibugi and Mario Planta. 1996. Measuring technological change through patents and innovation surveys. Technovation, Vol. 16, 9 (1996), 451--519.
[2]
Kendall W Artz, Patricia M Norman, Donald E Hatfield, and Laura B Cardinal. 2010. A longitudinal study of the impact of R&D, patents, and product innovation on firm performance. Journal of product innovation management, Vol. 27, 5 (2010), 725--740.
[3]
Richard A Bettis and Michael A Hitt. 1995. The new competitive landscape. Strategic management journal, Vol. 16, S1 (1995), 7--19.
[4]
Nicholas Bloom and John Van Reenen. 2002. Patents, real options and firm performance. The Economic Journal, Vol. 112, 478 (2002), C97--C116.
[5]
Zvi Boger, Tsvi Kuflik, Peretz Shoval, and Bracha Shapira. 2001. Automatic keyword identification by artificial neural networks compared to manual identification by users of filtering systems. Information Processing & Management, Vol. 37, 2 (2001), 187--198.
[6]
Lutz Bornmann and Hans-Dieter Daniel. 2008. What do citation counts measure? A review of studies on citing behavior. Journal of documentation, Vol. 64, 1 (2008), 45--80.
[7]
Tanmoy Chakraborty, Suhansanu Kumar, Pawan Goyal, Niloy Ganguly, and Animesh Mukherjee. 2015. On the categorization of scientific citation profiles in computer science. Commun. ACM, Vol. 58, 9 (2015), 82--90.
[8]
Hung-Hsuan Chen, Pucktada Treeratpituk, Prasenjit Mitra, and C Lee Giles. 2013. CSSeer: an expert recommendation system based on CiteseerX. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries. ACM, 381--382.
[9]
Yin-Hui Cheng, Fu-Yung Kuan, Shih-Chieh Chuang, and Yun Ken. 2009. Profitability decided by patent quality? An empirical study of the US semiconductor industry. Scientometrics, Vol. 82, 1 (2009), 175--183.
[10]
Andras Csomai and Rada Mihalcea. 2008. Linguistically motivated features for enhanced back-of-the-book indexing. Proceedings of ACL-08: HLT (2008), 932--940.
[11]
Eibe Frank, Gordon W Paynter, Ian H Witten, Carl Gutwin, and Craig G Nevill-Manning. 1999. Domain-specific keyphrase extraction. In 16th International joint conference on artificial intelligence (IJCAI 99), Vol. 2. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 668--673.
[12]
Parthasarathy Gopavarapu, Line C Pouchard, and Santiago Pujol. 2016. Increasing datasets discoverability in an engineering data platform using keyword extraction. In Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries. ACM, 225--226.
[13]
Christine Greenhalgh and Mark Longland. 2005. Running to stand still?--the value of R&D, patents and trade marks in innovating manufacturing firms. International Journal of the Economics of Business, Vol. 12, 3 (2005), 307--328.
[14]
Bronwyn H Hall, Adam Jaffe, and Manuel Trajtenberg. 2005. Market value and patent citations. RAND Journal of economics (2005), 16--38.
[15]
Constance E Helfat and Margaret A Peteraf. 2003. The dynamic resource-based view: Capability lifecycles. Strategic management journal, Vol. 24, 10 (2003), 997--1010.
[16]
Steve Jones and Gordon Paynter. 1999. Topic-based browsing within a digital library using keyphrases. In Proceedings of the fourth ACM conference on Digital libraries. ACM, 114--121.
[17]
Bruce Krulwich and Chad Burkey. 1997. The InfoFinder agent: Learning user interests through heuristic phrase extraction. IEEE Expert, Vol. 12, 5 (1997), 22--27.
[18]
Leah S Larkey. 1999. A patent search and classification system. In Proceedings of the fourth ACM conference on Digital libraries. ACM, 179--187.
[19]
Changyong Lee, Yangrae Cho, Hyeonju Seol, and Yongtae Park. 2012. A stochastic patent citation analysis approach to assessing future technological impacts. Technological Forecasting and Social Change, Vol. 79, 1 (2012), 16--29.
[20]
Changyong Lee, Ohjin Kwon, Myeongjung Kim, and Daeil Kwon. 2018. Early identification of emerging technologies: A machine learning approach using multiple patent indicators. Technological Forecasting and Social Change, Vol. 127 (2018), 291--303.
[21]
Arpan Mandal, Kripabandhu Ghosh, Arindam Pal, and Saptarshi Ghosh. 2017. Automatic Catchphrase Identification from Legal Court Case Documents. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 2187--2190.
[22]
Olena Medelyan, Eibe Frank, and Ian H Witten. 2009. Human-competitive tagging using automatic keyphrase extraction. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3-Volume 3. Association for Computational Linguistics, 1318--1327.
[23]
Chau Q Nguyen and Tuoi T Phan. 2009. An ontology-based approach for key phrase extraction. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. Association for Computational Linguistics, 181--184.
[24]
Sooyoung Oh, Zhen Lei, Prasenjit Mitra, and John Yen. 2012. Evaluating and ranking patents using weighted citations. In Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries. ACM, 281--284.
[25]
Stephen Robertson, Hugo Zaragoza, et al. 2009. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval, Vol. 3, 4 (2009), 333--389.
[26]
Rushdi Shams and Robert E Mercer. 2012. Investigating keyphrase indexing with text denoising. In Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries. ACM, 263--266.
[27]
Takashi Tomokiyo and Matthew Hurst. 2003. A language model approach to keyphrase extraction. In Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment-Volume 18. Association for Computational Linguistics, 33--40.
[28]
Suzan Verberne, Maya Sappelli, Djoerd Hiemstra, and Wessel Kraaij. 2016. Evaluation and analysis of term scoring methods for term extraction. Information Retrieval Journal, Vol. 19, 5 (2016), 510--545.
[29]
Ian H Witten and Olena Medelyan. 2006. Thesaurus based automatic keyphrase indexing. In Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'06). IEEE, 296--297.
[30]
Ian H Witten, Gordon W Paynter, Eibe Frank, Carl Gutwin, and Craig G Nevill-Manning. 2005. KEA: Practical Automated Keyphrase Extraction. In Design and Usability of Digital Libraries: Case Studies in the Asia Pacific. IGI Global, 129--152.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020
August 2020
611 pages
ISBN:9781450375856
DOI:10.1145/3383583
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. digital library
  2. patents

Qualifiers

  • Research-article

Conference

JCDL '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 415 of 1,482 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 117
    Total Downloads
  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media