skip to main content
research-article

Workload Characterization and Performance Implications of Large-Scale Blog Servers

Published: 01 November 2012 Publication History

Abstract

With the ever-increasing popularity of Social Network Services (SNSs), an understanding of the characteristics of these services and their effects on the behavior of their host servers is critical. However, there has been a lack of research on the workload characterization of servers running SNS applications such as blog services. To fill this void, we empirically characterized real-world Web server logs collected from one of the largest South Korean blog hosting sites for 12 consecutive days. The logs consist of more than 96 million HTTP requests and 4.7TB of network traffic. Our analysis reveals the following: (i) The transfer size of nonmultimedia files and blog articles can be modeled using a truncated Pareto distribution and a log-normal distribution, respectively; (ii) user access for blog articles does not show temporal locality, but is strongly biased towards those posted with image or audio files. We additionally discuss the potential performance improvement through clustering of small files on a blog page into contiguous disk blocks, which benefits from the observed file access patterns. Trace-driven simulations show that, on average, the suggested approach achieves 60.6% better system throughput and reduces the processing time for file access by 30.8% compared to the best performance of the Ext4 filesystem.

References

[1]
Aban, I. B., Meerschaert, M. M., and Panorska, A. K. 2006. Parameter estimation for the truncated pareto distribution. J. Amer. Statist. Assoc. 101, 473, 270--277.
[2]
Arlitt, M. F. and Jin, T. 2000. A workload characterization study of the 1998 world cup web site. IEEE Netw. 14, 3, 33--37.
[3]
Arlitt, M. F. and Williamson, C. L. 1997. Internet web servers: Workload characterization and performance implications. IEEE/ACM Trans. Netw. 5, 5, 631--645.
[4]
Barford, P. and Crovella, M. 1999. A performance evaluation of hyper text transfer protocols. SIGMETRICS Perform. Eval. Rev. 27, 1, 188--197.
[5]
Bent, L., Rabinovich, M., Voelker, G. M., and Xiao, Z. 2004. Characterization of a large web site population with implications for content delivery. In Proceedings of the 13th International World Wide Web Conference.
[6]
Borghol, Y., Mitra, S., Ardon, S., Carlsson, N., Eager, D., and Mahanti, A. 2011. Characterizing and modeling popularity of user-generated videos. Perform. Eval. 68, 11, 1037--1055.
[7]
Bucy, J. S., Schindler, J., Schlosser, S. W., and Ganger, G. R. 2008. The disksim simulation environment version 4.0 reference manual. Tech. rep. CMU-PDL-08-101, Carnegie Mellon University.
[8]
Burke, M., Marlow, C., and Lento, T. 2009. Feed me: Motivating newcomer contribution in social network sites. In Proceedings of the 27th ACM CHI Conference on Human Factors in Computing Systems.
[9]
Cha, M., Mislove, A., and Gummadi, K. P. 2009. A measurement-driven analysis of information propagation in the flickr social network. In Proceedings of the 18th International Conference on World Wide Web.
[10]
Challenger, J. 1996. A distributed web server and its performance analysis on multiple platforms. In Proceedings of the 16th International Conference on Distributed Computing Systems.
[11]
Crovella, M. E. and Bestavros, A. 1997. Self-similarity in world wide web traffic: Evidence and possible causes. IEEE/ACM Trans. Netw. 5, 6, 835--846.
[12]
Crovella, M. E. and Taqqu, M. S. 1999. Estimating the heavy tail index from scaling properties. Meth. Comput. Appl. Probab. 1, 1, 55--79.
[13]
Dingle, A., MacNair, E., and Nguyen, T. 1999. An analysis of web server performance. In Proceedings of the Global Telecommunication Conference.
[14]
Duarte, F., Mattos, B., Bestavros, A., Almeida, V., and Almeida, J. 2007. Traffic characteristics and communication patterns in blogosphere. In Proceedings of the International Conference on Weblogs and Social Media.
[15]
Faber, A. M., Gupta, M., and Viecco, C. H. 2006. Revisiting web server workload invariants in the context of scientific web sites. In Proceedings of the ACM/IEEE Conference on Supercomputing.
[16]
Gill, P., Arlitt, M., Li, Z., and Mahanti, A. 2007. Youtube traffic characterization: A view from the edge. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement.
[17]
Gill, P., Arlitt, M., Carlsson, N., Mahanti, A., and Williamson, C. 2011. Characterizing organizational use of web-based services: Methodology, challenges, observations, and insights. ACM Trans. Web 5, 4, 19:1--19:23.
[18]
Guo, L., Tan, E., Chen, S., Zhang, X., and Zhao, Y. E. 2009. Analyzing patterns of user content generation in online social networks. In Proceedings of the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
[19]
Holmedahl, V., Smith, B., and Yang, T. 1998. Cooperative caching of dynamic content on a distributed web server. In Proceedings of the 7th International Symposium on High Performance Distributed Computing.
[20]
Iyengar, A. and Challenger, J. 1997. Improving web server performance by caching dynamic data. In Proceedings of the USENIX Symposium on Internet Technologies and Systems.
[21]
Kant, K. and Won, Y. 1999. Performance impact of uncached file accesses in specweb99. In Proceedings of the 2nd IEEE Workshop on Workload Characterization.
[22]
Krishnamurthy, B. 2009. A measure of online social networks. In Proceedings of the 1st International Conference on Communication Systems and Networks.
[23]
Leskovec, J., Mcglohon, M., Faloutsos, C., Glance, N., and Hurst, M. 2007. Cascading behavior in large blog graphs. In Proceedings of the 7th SIAM International Conference on Data Mining.
[24]
Li, Z., Chen, Z., Srinivasan, S. M., and Zhou, Y. 2004. C-miner: Mining block correlations in storage systems. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies.
[25]
Limpert, E., Stahel, W. A., and Abbt, M. 2001. Log-Normal Distributions across the Sciences: Keys and Clues. BioScience.
[26]
Nagpurkar, P., Horn, W., Gopalakrishnan, U., Dubey, N., Jann, J., and Pattnaik, P. 2008. Workload characterization of selected JEE-based web 2.0 applications. In Proceedings of the IEEE International Symposium on Workload Characterization.
[27]
Ohara, M., Nagpurkar, P., Ueda, Y., and Ishizaki, K. 2009. The data-centricity of web 2.0 workloads and its impact on server performance. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software.
[28]
Oke, A. and Bunt, R. B. 2002. Hierarchical workload characterization for a busy web server. In Proceedings of the 12th International Conference on Modelling Tools and Techniques for Computer and Communication System Performance Evaluation.
[29]
Patterson, R. H., Gibson, G. A., Ginting, E., Stodolsky, D., and Zelenka, J. 1995. Informed prefetching and caching. In Proceedings of the 15th ACM Symposium on Operating System Principles.
[30]
Paxson, V. and Floyd, S. 1994. Wide-area traffic: the failure of poisson modeling. In Proceedings of the Conference on Communications Architectures, Protocols and Applications.
[31]
Rodriguez, P. 2009. Web infrastructure for the 21st century. In Proceedings of the 18th International World Wide Web Conference.
[32]
Shriver, E., Gabber, E., Huang, L., and Stein, C. A. 2001. Proceedings of the USENIX Annual Technical Conference.
[33]
Stewart, C., Leventi, M., and Shen, K. 2008. Empirical examination of a collaborative web application. In Proceedings of the IEEE International Symposium on Workload Characterization.
[34]
Tomkins, A., Patterson, R. H., and Gibson, G. 1997. Informed multi-process prefetching and caching. In Proceedings of the ACM SIGMETRICS Conference.
[35]
Veres, S. and Ionescu, D. 2009. Measurement-Based traffic characterization for web 2.0 applications. In Proceedings of the International Instrumentation and Measurement Technology Conference.
[36]
Wachs, M., Abd-El-Malek, M., Thereska, E., and Ganger, G. R. 2007. Argon: Performance insulation for shared storage servers. In Proceedings of the 6th USENIX Conference on File and Storage Technologies.
[37]
Wang, J. and Li, D. 2003.A light-weight, temporary file system for large-scale web servers. In Proceedings of the 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.
[38]
Williams, A., Arlitt, M., Williamson, C., and Barker, K. 2005. Web Workload Characterization: Ten Years Later. Springer.
[39]
Zipf, G. K. 1949. Human Behavior and the Principle of Least-Effort. Addison-Wesley.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on the Web
ACM Transactions on the Web  Volume 6, Issue 4
November 2012
138 pages
ISSN:1559-1131
EISSN:1559-114X
DOI:10.1145/2382616
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2012
Accepted: 01 August 2012
Revised: 01 May 2012
Received: 01 September 2011
Published in TWEB Volume 6, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Social network services
  2. filesystems
  3. measurement
  4. modeling
  5. workload characterization

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)1
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media