skip to main content
research-article

AutoPlacer: Scalable Self-Tuning Data Placement in Distributed Key-Value Stores

Published: 08 December 2014 Publication History

Abstract

This article addresses the problem of self-tuning the data placement in replicated key-value stores. The goal is to automatically optimize replica placement in a way that leverages locality patterns in data accesses, such that internode communication is minimized. To do this efficiently is extremely challenging, as one needs not only to find lightweight and scalable ways to identify the right assignment of data replicas to nodes but also to preserve fast data lookup. The article introduces new techniques that address these challenges. The first challenge is addressed by optimizing, in a decentralized way, the placement of the objects generating the largest number of remote operations for each node. The second challenge is addressed by combining the usage of consistent hashing with a novel data structure, which provides efficient probabilistic data placement. These techniques have been integrated in a popular open-source key-value store. The performance results show that the throughput of the optimized system can be six times better than a baseline system employing the widely used static placement based on consistent hashing.

References

[1]
M. Ahmad, B. Kemme, I. Brondino, M. Patiño-Martínez, and R. Jiménez-Peris. 2013. Transactional failure recovery for a distributed key-value store. In Proceedings of the 14th Middleware (Middleware'13). Springer, Berlin, China, 267--286.
[2]
P. Almeida, C. Baquero, N. Preguiça, and D. Hutchison. 2007. Scalable Bloom filters. Information Processing Letters 101, 6 (March 2007), 255--261.
[3]
C. Amza, A. Cox, and W. Zwaenepoel. 2003. Conflict-aware scheduling for dynamic content applications. In Proceedings of the 4th Conference on USENIX Symposium on Internet Technologies and Systems (USITS'03). USENIX Association, Berkeley, CA.
[4]
B. Ban and V. Blagojevic. 2002. Reliable Group Communication with JGroups 3.x. Technical Report. Red Hat, Inc. Retrieved from http://www.jgroups.org.
[5]
C. Bauer and G. King. 2006. Java Persistence with Hibernate. Manning Publications.
[6]
C. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, New York.
[7]
B. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13, 7 (July 1970), 422--426.
[8]
K. Chandy and J. Hewes. 1976. File allocation in distributed systems. In Proceedings of the ACM SIGMETRICS (SIGMETRICS'76). ACM, New York, 10--13.
[9]
F. Chang and others. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Compututer Systems 26, 2 (June 2008), 4:1--4:26.
[10]
B. Chazelle, J. Kilian, R. Rubinfeld, and A. Tal. 2004. The Bloomier filter: An efficient data structure for static support lookup tables. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'04). Society for Industrial and Applied Mathematics.
[11]
H. Chen, M. Song, J. Song, A. Gavrilovska, and K. Schwan. 2011. HEaRS: A hierarchical energy-aware resource scheduler for virtualized data centers. In Proceedings of the International Conference on Cluster Computing (CLUSTER'11). IEEE, New York, 508--512.
[12]
N. Cook, D. Milojicic, and V. Talwar. 2012. Cloud management. Journal of Internet Services and Applications 3, 1 (2012), 67--75.
[13]
B. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. 2008. PNUTS: Yahoo!'s hosted data serving platform. In Proceedings of the 34th International Conference on Very Large Databases (VLDB'08). VLDB Endowment, Auckland, New Zealand.
[14]
B. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC'10). ACM, New York, 143--154.
[15]
J. Corbett and others. 2012. Spanner: Google's globally-distributed database. In Proceedings of the 10th Symposium on Operating Systems Design and Implementation (OSDI'12). USENIX Association, Berkeley, CA, 251--264.
[16]
F. Cruz, F. Maia, M. Matos, R. Oliveira, J. Paulo, J. Pereira, and R. Vilaça. 2013. MeT: Workload aware elasticity for NoSQL. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys'13). ACM, New York, 183--196.
[17]
C. Curino, E. Jones, Y. Zhang, and S. Madden. 2010. Schism: A workload-driven approach to database replication and partitioning. In Proceedings of the 36th International Conference on Very Large Databases (VLDB'10). VLDB Endowment, Singapore.
[18]
G. DeCandia and others. 2007. Dynamo: Amazon's highly available key-value store. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP'07). ACM, New York, 205--220.
[19]
D. Didona, P. Romano, S. Peluso, and F. Quaglia. 2012. Transactional auto scaler: Elastic scaling of in-memory transactional data grids. In Proceedings of the 9th ACM International Conference on Autonomic Computing (ICAC'12). ACM, San Jose, CA, 125--134.
[20]
P. Domingos and G. Hulten. 2000. Mining high-speed data streams. In Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining (SIGKDD'12). ACM, Boston, Massachusetts, USA.
[21]
L. Dowdy and D. Foster. 1982. Comparative models of the file assignment problem. ACM Computing Surveys 14, 2 (June 1982), 287--313.
[22]
B. Fleisch and G. Popek. 1989. Mirage: A coherent distributed shared memory design. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (SOSP'89). ACM, New York, 211--223.
[23]
T. Forell, D. Milojicic, and V. Talwar. 2011. Cloud management: Challenges and opportunities. In IPDPS Workshops. IEEE, Los Alamitos, CA, 881--889.
[24]
S. Garbatov and J. Cachopo. 2011. Data access pattern analysis and prediction for object-oriented applications. INFOCOMP Journal of Computer Science 10, 4 (December 2011), 1--14.
[25]
Y. Jia, I. Brondino, R. Jiménez-Peris, M. Patiño Martínez, and D. Ma. 2013. A multi-resource load balancing algorithm for cloud cache systems. In Proceedings of the 28th Annual ACM Symposium on Applied Computing (SAC'13). ACM, New York, 463--470.
[26]
R. Jiménez-Peris, M. Patiño Martínez, and G. Alonso. 2002. Non-intrusive, parallel recovery of replicated data. In Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems (SRDS'02). IEEE, Los Alamitos, CA, 150--159.
[27]
P. Krishnan, D. Raz, and Y. Shavitt. 2000. The cache location problem. IEEE/ACM Transactions on Networking 8, 5 (October 2000), 568--582.
[28]
L. Sangyeol and L. Taewook. 2004. CUSUM test for parameter change based on the maximum likelihood estimator. Sequential Analysis: Design Methods and Applications 23, 2 (2004), 239--256.
[29]
A. Lakshman and P. Malik. 2010. Cassandra: A decentralized structured storage system. SIGOPS Operating Systems Review 44, 2 (April 2010), 35--40.
[30]
N. Laoutaris, O. Telelis, V. Zissimopoulos, and I. Stavrakakis. 2006. Distributed selfish replication. IEEE Transactions on Parallel and Distributed Systems 17, 12 (December 2006), 1401--1413.
[31]
A. Leff, J. Wolf, and P. Yu. 1993. Replication algorithms in a remote caching architecture. IEEE Transactions on Parallel and Distributed Systems 4, 11 (November 1993), 1185--1204.
[32]
S. Leutenegger and D. Dias. 1993. A modeling study of the TPC-C benchmark. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD'93). ACM, New York, 22--31.
[33]
S. Li, T. Abdelzaher, and M. Yuan. 2011. TAPA: Temperature aware power allocation in data center with Map-Reduce. In Proceedings of the IGCC Workshops. 1--8.
[34]
S. Li, S. Wang, F. Yang, S. Hu, F. Saremi, and T. Abdelzaher. 2013. Proteus: Power proportional memory cache cluster in data centers. In Proceedings of the 33rd International Conference on Distributed Computing Systems (ICDCS'13). IEEE, New York, 73--82.
[35]
H. Liu and H. Motoda. 1998. Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic, Norwell, MA.
[36]
F. Marchioni and M. Surtani. 2012. Infinispan Data Grid Platform. PACKT Publishing.
[37]
A. Metwally, D. Agrawal, and A. El Abbadi. 2005. Efficient computation of frequent and top-k elements in data streams. In Proceedings of the 10th International Conference on Database Theory (ICDT'05). Springer-Verlag, 398--412.
[38]
T. Mitchell. 1997. Machine Learning. McGraw-Hill, New York.
[39]
A. Pavlo, C. Curino, and S. Zdonik. 2012. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD'12). ACM, New York, 61--72.
[40]
S. Peluso, P. Romano, and F. Quaglia. 2012a. SCORe: A scalable one-copy serializable partial replication protocol. In Proceedings of the 13th Middleware (Middleware'12). Springer-Verlag, New York, 456--475.
[41]
S. Peluso, P. Ruivo, P. Romano, F. Quaglia, and L. Rodrigues. 2012b. When scalability meets consistency: Genuine multiversion update-serializable partial data replication. In Proceedings of the 32nd International Conference on Distributed Computing Systems (ICDCS'12). IEEE, Los Alamitos, CA, 455--465.
[42]
J. Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco, CA.
[43]
RedHat/JBoss. 2013. Non Blocking State Transfer V2. Retrieved from https://github.com/infinispan/infinispan/wiki/Non-Blocking-State-Transfer-V2.
[44]
P. Romano, M. Little, F. Quaglia, L. Rodrigues, and V. Ziparo. 2014. Cloud-TM: Transactional, Object-oriented, Self-tuning Cloud Data Store. Technical Report 7. INESC-ID.
[45]
P. Ruivo, M. Couceiro, P. Romano, and L. Rodrigues. 2011. Exploiting total order multicast in weakly consistent transactional caches. In Proceedings of the the 17th Pacific Rim International Symposium on Dependable Computing (PRDC'11). IEEE, Los Alamitos, CA.
[46]
A. L. Tatarowicz, C. Curino, E. Jones, and S. Madden. 2012. Lookup tables: Fine-grained partitioning for distributed databases. In Proceedings of the 28th International Conference on Data Engineering (ICDE'12). IEEE Computer Society, Washington, DC, 102--113.
[47]
R. Vilaça, R. Oliveira, and J. Pereira. 2011. A correlation-aware data placement strategy for key-value stores. In Proceedings of the 11th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS'11). Springer-Verlag, 214--227.
[48]
L. Wang, J. Xu, M. Zhao, and J. Fortes. 2011. Adaptive virtual resource management with fuzzy model predictive control. In Proceedings of the 8th ACM International Conference on Autonomic Computing (ICAC'11). ACM, New York, 191--192.
[49]
I. Witten and E. Frank. 2005. Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann, San Francisco, CA.
[50]
G.-Won You, S.-Won Hwang, and N. Jain. 2013. Ursa: Scalable load and power management in cloud storage systems. ACM Transactions on Storage 9, 1, Article 1 (March 2013), 29 pages.
[51]
S. Zaman and D. Grosu. 2011. A distributed algorithm for the replica placement problem. IEEE Transactions on Parallel and Distributed Systems 22, 9 (September 2011), 1455--1468.
[52]
V. Ziparo, F. Cottefoglie, D. Calisi, M. Zaratti, F. Giannone, and P. Romano. 2013. D4.3 - Prototype of pilot application I. In Cloud-TM Project. Retrieved from http://cloudtm.ist.utl.pt/.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Autonomous and Adaptive Systems
ACM Transactions on Autonomous and Adaptive Systems  Volume 9, Issue 4
January 2015
137 pages
ISSN:1556-4665
EISSN:1556-4703
DOI:10.1145/2695594
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2014
Accepted: 01 June 2014
Revised: 01 June 2014
Received: 01 January 2014
Published in TAAS Volume 9, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Distributed data management
  2. data placement
  3. machine learning
  4. probabilistic algorithms

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)4
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media