skip to main content
10.1145/1255175.1255237acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
Article

Agreeing to disagree: search engines and their public interfaces

Published: 18 June 2007 Publication History

Abstract

Google, Yahoo and MSN all provide both web user interfaces (WUIs) and application programming interfaces (APIs) to their collections. Whether building collections of resources or studying the search engines themselves, the search engines request that researchers use their APIs and not "scrape" the WUIs. However, anecdotal evidence suggests the interfaces produce different results. We provide the first in depth quantitative analysis of the results produced by the Google, MSN and Yahoo API and WUI interfaces. We have queried both interfaces for five months and found significant discrepancies between the interfaces in several categories. In general, we found MSN to produce the most consistent results between their two interfaces. Our findings suggest that the API indexes are not older, but they are probably smaller for Google and Yahoo. We also examined how search results decay over time and built predictive models based on the observed decay rates. Based on our findings, it can take over a year for half of the top 10 results to a popular query to be replaced in Google and Yahoo; for MSN it may take only 2-3 months.

References

[1]
Ask terms of service, 2006. http://sp.ask.com/en/docs/about/terms_of_service.shtml.
[2]
J. Bar-Ilan. Search engine results over time-A case study on search engine stability. Cybermetrics, 2/3(1), 1998/99.
[3]
J. Bar-Ilan. Methods for measuring search engine performance over time. Journal of the American Society for Information Science and Technology, 53(4):308--319, 2002.
[4]
J. Bar-Ilan. Comparing rankings of search results on the Web. Information Processing & Management, 41(6):1511--1519, Dec. 2005.
[5]
J. Bar-Ilan. Expectations versus reality-search engine features needed for web research at mid 2005. Cybermetrics, 9(1), 2005.
[6]
J. Bar-Ilan, M. Levene, and M. Mat-Hassan. Dynamics of search engine rankings-A case study. In Proceedings of the 3rd International Workshop on Web Dynamics, May 2004.
[7]
J. Bar-Ilan, M. Mat-Hassan, and M. Levene. Methods for comparing rankings of search engine results. Computer Networks, 50(10):1448--1463, July 2006.
[8]
Z. Bar-Yossef and M. Gurevich. Random sampling from a search engine's index. In Proceedings of WWW '06, pages 367--376, 2006.
[9]
D. Bergmark. Collection synthesis. In Proceedings of JCDL'02, pages 253--262, 2002.
[10]
D. Bergmark, C. Lagoze, and A. Sbityakov. Focused crawls, tunneling, and digital libraries. In Proceedings of ECDL'02, pages 91--106, 2002.
[11]
K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. In Proceedings of WWW7, pages 379--388, 1998.
[12]
A. Broder, M. Fontura, V. Josifovski, R. Kumar,R. Motwani, S. Nabar, R. Panigrahy, A. Tomkins, and Y. Xu. Estimating corpus size via queries. In Proceedings of CIKM '06, pages 594--603, 2006.
[13]
D. Clinton. Beyond the SOAP search API, Dec. 2006. http://google-code-updates.blogspot.com/2006/12/beyond-soap-search-api.html.
[14]
K. Curran and A. Doherty. Automated broadcast media monitoring using the Google API. In Proceedings of CCNC 2006, volume 2, pages 1098--1102, 2006.
[15]
M. Cutts. GoogleGuy's posts, June 2005. http://www.webmasterworld.com/forum30/29720.htm.
[16]
M. Cutts. Google datacenters. Video, July 31 2006. http://video.google.com/videoplay?docid=8726665066825965913.
[17]
Did-it, Enquiro, and Eyetools uncover search's Golden Triangle, 2005. http://www.enquiro.com/eye-tracking-pr.asp.
[18]
W. Ding and G. Marchionini. A comparative study of web search service performance. In Proceedings of the ASIS Annual Meeting, volume 33, pages 136--142, 1996.
[19]
R. Fagin, R. Kumar, and D. Sivakumar. Comparing top klists. SIAM Journal on Discrete Mathematics, 17(1):134--160, 2003.
[20]
P. Festa. Google worm targets AOL, Yahoo. Dec. 28 2004. http://news.com.com/Google+worm+targets+AOL%2C+Yahoo/2100-7349_3-5504769.html.
[21]
S. Gauch, G. Wang, and M. Gomez. Profusion: Intelligent fusion from multiple, distributed search engines. Journal of Universal Computer Science, 2(9):637--649, 1996.
[22]
B. Gillette. Google blacklisting researchers? Dec. 14 2004. http://www.emailbattles.com/2005/12/14/virus_aacdehdcic_ei/.
[23]
Google privacy center: Terms of service, 2006. http://www.google.com/terms_of_service.html.
[24]
A. Gulli and A. Signorini. The indexable web is more than 11.5 billion pages. In Proceedings of WWW '05, pages 902--903, May 2005.
[25]
T. G. Habing, T. W. Cole, and W. H. Mischo. Developing a technical registry of OAI data providers. In Proceedings of ECDL '04, pages 400--410, 2004.
[26]
N. Jain, M. Dahlin, and R. Tewari. Using Bloom filters to refine web search results. In Proceedings of the 8th International Workshop on the Web and Databases, 2005.
[27]
M. Klein, M. L. Nelson, and J. Z. Pao. Augmenting OAI-PMH repository holdings using search engine APIs. In Proceedings of JCDL '07, 2007.
[28]
W. Koehler. A longitudinal study of web pages continued: A consideration of document persistence. Information Research, 9(2), 2004.
[29]
M. Koo and H. Skinner. Improving web searches: Case study of quit-smoking web sites for teenagers. Journal of Medical Internet Research, 5(4), Nov. 2003.
[30]
R. Kraft and R. Stata. Finding buying guides with a web carnivore. In First Latin American Web Congress (LA-WEB'03), pages 84--92, 2003.
[31]
S. Lawrence and C. L. Giles. Accessibility of information on the web. Intelligence, 11(1):32--39, 2000.
[32]
The Lycos 50, 2006. http://50.lycos.com/.
[33]
P. Mayr and F. Tosques. Google Web APIs - an instrument for webometric analyses? In Proceedings of the 10th International Conference of the International Society for Scientometrics and Informetrics (ISSI '05), 2005.
[34]
F. McCown. Comparison of search engine interfaces, 2006. http://www.cs.odu.edu/~fmccown/research/se_apis/.
[35]
F. McCown, J. Bollen, and M. L. Nelson. Evaluation of the NSDL and Google for obtaining pedagogical resources. In Proceedings of ECDL '05, pages 344--355, 2005.
[36]
F. McCown, S. Chan, M. L. Nelson, and J. Bollen. The availability and persistence of web references in D-Lib Magazine. In Proceedings of the 5th International Web Archiving Workshop (IWAW '05), Sept. 2005.
[37]
F. McCown, X. Liu, M. L. Nelson, and M. Zubair. Search engine coverage of the OAI-PMH corpus. IEEE Internet Computing, 10(2):66--73, Mar/Apr 2006.
[38]
M. Moffatt. Yahoo error: Unable to process request at this time - error 999. Feb. 14 2005. http://murraymoffatt.com/software-problem-0011.html.
[39]
MSN terms of service, 2006. http://tou.live.com/en-us/default.aspx.
[40]
MSN Web Search API. http://msdn.microsoft.com/msn/msnsearch/.
[41]
G. Pant. Deriving link-context from HTML tag tree. In Proceedings of DMKD '03, pages 49--55, 2003.
[42]
G. Pant, K. Tsioutsiouliklis, J. Johnson, and C. L. Giles. Panorama: extending digital libraries with topical crawlers. In Proceedings of JCDL '04, pages 142--150, 2004.
[43]
R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpreting the data: Parallel analysis with Sawzall. Dynamic Grids and Worldwide Computing, 13(4):277--298, Nov. 2005.
[44]
C. Snelson. Sampling the Web: The development of a custom search tool for research. Library and Information Science Research Electronic Journal, 16(1), Dec. 2005.
[45]
A. Spink, B. J. Jansen, C. Blakely, and S. Koshman. A study of results overlap and uniqueness among major web search engines. Information Processing & Management, 42(5):1379--1391, Sept. 2006.
[46]
K. C. Sua, S. E. Waldren, and T. B. Patrick. Differences inthe effects of filters on health information retrieval from the internet in three languages from three countries: A comparative study. In Proceedings of MEDINFO 2004, 2004.
[47]
M. Thelwall. Can the Web give useful information about commercial uses of scientific research? Online Information Review, 28:120--130, 2004.
[48]
L. Vaughan. New measurements for search engine evaluation proposed and tested. Information Processing & Management, 40(4):677--691, May 2004.
[49]
What's a "supplemental result?" Google Webmaster Help Center, 2006. http://www.google.com/support/webmasters/bin/answer.py?answer=34473.
[50]
Wikipedia: List of basic computer science topics, 2006. http://en.wikipedia.org/wiki/List_of_basic_computer_science_topics.
[51]
Yahoo! Web Search APIs. http://developer.yahoo.net/search/web/.
[52]
Z. Zhuang, R. Wagle, and L. C. Giles. What's there and what's not?: Focused crawling for missing documents in digital libraries. In Proceedings of JCDL '05, pages 301--310, 2005.

Cited By

View all

Index Terms

  1. Agreeing to disagree: search engines and their public interfaces

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      JCDL '07: Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
      June 2007
      534 pages
      ISBN:9781595936448
      DOI:10.1145/1255175
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 June 2007

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. API
      2. distance measurement
      3. search engine interfaces
      4. search engine results

      Qualifiers

      • Article

      Conference

      JCDL07
      JCDL07: Joint Conference on Digital Libraries
      June 18 - 23, 2007
      BC, Vancouver, Canada

      Acceptance Rates

      Overall Acceptance Rate 415 of 1,482 submissions, 28%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)16
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 03 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media