skip to main content
10.5555/3200334.3200342acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

Understanding the impact of early citers on long-term scientific impact

Published: 19 June 2017 Publication History

Abstract

This paper explores an interesting new dimension to the challenging problem of predicting long-term scientific impact (LTSI) usually measured by the number of citations accumulated by a paper in the long-term. It is well known that early citations (within 1--2 years after publication) acquired by a paper positively affects its LTSI. However, there is no work that investigates if the set of authors who bring in these early citations to a paper also affect its LTSI. In this paper, we demonstrate for the first time, the impact of these authors whom we call early citers (EC) on the LTSI of a paper. Note that this study of the complex dynamics of EC introduces a brand new paradigm in citation behavior analysis. Using a massive computer science bibliographic dataset we identify two distinct categories of EC - we call those authors who have high overall publication/citation count in the dataset as influential and the rest of the authors as non-influential. We investigate three characteristic properties of EC and present an extensive analysis of how each category correlates with LTSI in terms of these properties. In contrast to popular perception, we find that influential EC negatively affects LTSI possibly owing to attention stealing. To motivate this, we present several representative examples from the dataset. A closer inspection of the collaboration network reveals that this stealing effect is more profound if an EC is nearer to the authors of the paper being investigated. As an intuitive use case, we show that incorporating EC properties in the state-of-the-art supervised citation prediction models leads to high performance margins. At the closing, we present an online portal to visualize EC statistics along with the prediction results for a given query paper. We make all the codes and the processed dataset available in the public domain at our portal: http://www.cnergres.iitkgp.ac.in/earlyciters/

References

[1]
Jonathan Adams. 2005. Early citation counts correlate with accumulated impact. Scientometrics 63, 3 (2005), 567--581.
[2]
Carl T Bergstrom, Jevin D West, and Marc A Wiseman. 2008. The Eigenfactor? metrics. The Journal of Neuroscience 28, 45 (2008), 11433--11434.
[3]
Lutz Bornmann, Loet Leydesdorff, and Jian Wang. 2013. Which percentile-based approach should be preferred for calculating normalized citation impact values? An empirical comparison of five approaches including a newly developed citation-rank approach (P100). Journal of Informetrics 7, 4 (2013), 933--944.
[4]
Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. 1984. Classification and regression trees. CRC press.
[5]
Tim Brody, Stevan Harnad, and Leslie Carr. 2006. Earlier web usage statistics as predictors of later citation impact. Journal of the American Society for Information Science and Technology 57, 8 (2006), 1060--1072.
[6]
Michael Callaham, Robert L Wears, and Ellen Weber. 2002. Journal prestige, publication bias, and other characteristics associated with citation of published studies in peer-reviewed journals. Jama 287, 21 (2002), 2847--2850.
[7]
A Colin Cameron and Frank AG Windmeijer. 1997. An R-squared measure of goodness of fit for some common nonlinear regression models. Journal of Econometrics 77, 2 (1997), 329--342.
[8]
Tanmoy Chakraborty, Suhansanu Kumar, Pawan Goyal, Niloy Ganguly, and Animesh Mukherjee. 2014. Towards a Stratified Learning Approach to Predict Future Citation Counts. In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '14). IEEE Press, 351--360.
[9]
Fereshteh Didegah and Mike Thelwall. 2013. Which factors help authors produce the highest impact research? Collaboration, journal and document properties. Journal of Informetrics 7, 4 (2013), 861--873.
[10]
Leo Egghe. 2006. Theory and practise of the g-index. Scientometrics 69, 1 (2006), 131--152.
[11]
Lawrence D. Fu and Constantin Aliferis. 2008. Models for Predicting and Explaining Citation Count of Biomedical Articles. PMC 2008 (2008), 222--226.
[12]
Eugene Garfield. 1999. Journal impact factor: a brief review. Canadian Medical Association Journal 161, 8 (1999), 979--980.
[13]
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter 11, 1 (2009), 10--18.
[14]
Jorge E Hirsch. 2005. An index to quantify an individual's scientific research output. Proceedings of the National academy of Sciences of the United States of America (2005), 16569--16572.
[15]
Jorge E Hirsch and Gualberto Buela-Casal. 2014. The meaning of the h-index. International Journal of Clinical and Health Psychology 14, 2 (2014), 161--164.
[16]
Abhaya V Kulkarni, Jason W Busse, and Iffat Shams. 2007. Characteristics associated with citation rate of the medical literature. PloS one 2, 5 (2007), e403.
[17]
Cyril Labbé. 2010. Ike Antkare one of the great stars in the scientific firmament. ISSI newsletter 6, 2 (2010), 48--52.
[18]
Joseph Lee Rodgers and W Alan Nicewander. 1988. Thirteen ways to look at the correlation coefficient. The American Statistician 42, 1 (1988), 59--66.
[19]
Avishay Livne, Eytan Adar, Jaime Teevan, and Susan Dumais. 2013. Predicting citation counts using text and graph mining. In Proc. the iConference 2013 Workshop on Computational Scientometrics: Theory and Applications.
[20]
Cynthia Lokker, K Ann McKibbon, R James McKinlay, Nancy L Wilczynski, and R Brian Haynes. 2008. Prediction of citation counts for clinical articles at two years using data available within three weeks of publication: retrospective cohort study. BMJ 336, 7645 (2008), 655--657.
[21]
John Mingers. 2008. Exploring the dynamics of journal citations: modelling with S-curves. Journal of the Operational Research Society 59, 8 (2008), 1013--1025.
[22]
Carl Edward Rasmussen. 2006. Gaussian processes for machine learning. (2006).
[23]
Mayank Singh, Vikas Patidar, Suhansanu Kumar, Tanmoy Chakraborty, Animesh Mukherjee, and Pawan Goyal. 2015. The Role Of Citation Context In Predicting Long-Term Citation Profiles: An Experimental Study Based On A Massive Bibliographic Text Dataset. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 1271--1280.
[24]
Alex Smola and Vladimir Vapnik. 1997. Support vector regression machines. Advances in neural information processing systems 9 (1997), 155--161.
[25]
Clara Stegehuis, Nelly Litvak, and Ludo Waltman. 2015. Predicting the long-term citation impact of recent publications. Journal of informetrics 9, 3 (2015), 642--657.
[26]
David I. Stern. 2014. High-Ranked Social Science Journal Articles Can Be Identified from Early Citation Information. PLOS ONE 9 (11 2014), 1--11.
[27]
Dashun Wang, Chaoming Song, and Albert-László Barabási. 2013. Quantifying long-term scientific impact. Science 342, 6154 (2013), 127--132.
[28]
Jian Wang. 2013. Citation time window choice for research impact evaluation. Scientometrics 94, 3 (2013), 851--872.
[29]
Mingyang Wang, Guang Yu, and Daren Yu. 2009. Effect of the age of papers on the preferential attachment in citation networks. Physica A: Statistical Mechanics and its Applications 388, 19 (2009), 4273 -- 4276.
[30]
Michafll Charles Waumans and Hugues Bersini. 2016. Genealogical Trees of Scientific Papers. PLOS ONE 11, 3 (03 2016), 1--15.
[31]
Shuai Xiao, Junchi Yan, Changsheng Li, Bo Jin, Xiangfeng Wang, Xiaokang Yang, Stephen M. Chu, and Hongyuan Zha. 2016. On Modeling and Predicting Individual Paper Citation Count over Time. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9--15 July 2016. 2676--2682. http://www.ijcai.org/Abstract/16/380
[32]
Rui Yan, Congrui Huang, Jie Tang, Yan Zhang, and Xiaoming Li. 2012. To better stand on the shoulder of giants. In Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries. ACM, 51--60.
[33]
Rui Yan, Jie Tang, Xiaobing Liu, Dongdong Shan, and Xiaoming Li. 2011. Citation count prediction: learning to estimate future citations for literature. In Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 1247--1252.

Cited By

View all

Index Terms

  1. Understanding the impact of early citers on long-term scientific impact

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      JCDL '17: Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries
      June 2017
      383 pages
      ISBN:9781538638613

      Sponsors

      Publisher

      IEEE Press

      Publication History

      Published: 19 June 2017

      Check for updates

      Author Tags

      1. citation count
      2. early citers
      3. long-term scientific impact
      4. supervised regression models

      Qualifiers

      • Research-article

      Conference

      JCDL '17
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 415 of 1,482 submissions, 28%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 12 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media