skip to main content
10.1145/1807167.1807232acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Indexing multi-dimensional data in a cloud system

Published: 06 June 2010 Publication History

Abstract

Providing scalable database services is an essential requirement for extending many existing applications of the Cloud platform. Due to the diversity of applications, database services on the Cloud must support large-scale data analytical jobs and high concurrent OLTP queries. Most existing work focuses on some specific type of applications. To provide an integrated framework, we are designing a new system, epiC, as our solution to next-generation database systems. In epiC, indexes play an important role in improving overall performance. Different types of indexes are built to provide efficient query processing for different applications.
In this paper, we propose RT-CAN, a multi-dimensional indexing scheme in epiC. RT-CAN integrates CAN [23] based routing protocol and the R-tree based indexing scheme to support efficient multi-dimensional query processing in a Cloud system. RT-CAN organizes storage and compute nodes into an overlay structure based on an extended CAN protocol. In our proposal, we make a simple assumption that each compute node uses an R-tree like indexing structure to index the data that are locally stored. We propose a query-conscious cost model that selects beneficial local R-tree nodes for publishing. By keeping the number of persistently connected nodes small and maintaining a global multi-dimensional search index, we can locate the compute nodes that may contain the answer with a few hops, making the scheme scalable in terms of data volume and number of compute nodes. Experiments on Amazon's EC2 show that our proposed routing protocol and indexing scheme are robust, efficient and scalable.

References

[1]
http://hadoop.apache.org/.
[2]
http://www.comp.nus.edu.sg/~epic.
[3]
http://www.fhoow.de/institute/iapg/personen/brinkhoff/generator/.
[4]
K. Aberer, P. Cudré-Mauroux, A. D. Z. Despotovic, M. Hauswirth, M. Punceva, and R. Schmidt. P-grid: A self-organizing structured p2p system. In SIGMOD 2003.
[5]
A. Abouzeid, K. Bajda-Pawlikowski, D. J. Abadi, A. Rasin, and A. Silberschatz. Hadoopdb: An architectural hybrid of mapreduce and dbms technologies for analytical workloads. PVLDB, 2(1):922--933, 2009.
[6]
M. K. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A new paradigm for building scalable distributed systems. In SOSP 2007.
[7]
S. A. Weil, S. A. Brandt, E. L. Miller, and D. D. E. Long. Ceph: A scalable, high-performance distributed file system. In SODI 2006.
[8]
V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. SIGPLAN Not., 35(5):1--12, 2000.
[9]
E. Bertino, B. C. Ooi, R. Sacks-Davis, K. Tan, J. Zobel, B. Shidlovsky, and B. Cantania. Indexing Techniques for Advanced Database Applications. Monograph series, Kluwer Academic, 1997.
[10]
B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H. A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. Pnuts: Yahoo!'s hosted data serving platform. In VLDB 2008.
[11]
W. Cai, S. Zhou, W. Qian, L. Xu, K. Tan, and A. Zhou. C2: a new overlay network based on can and chord. Int. J. High Perform. Comput. Netw., 3(4):248--261, 2005.
[12]
R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. Scope: easy and efficient parallel processing of massive data sets. Proc. VLDB Endow., 1(2):1265--1276, 2008.
[13]
F. Chang, J. Dean, S.Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. OSDI 2003.
[14]
A. Crainiceanu, P. Linga, J. Gehrke, and J. Shanmugasundaram. Querying peer-to-peer networks using p-trees. In WebDB 2004.
[15]
A. Crainiceanu, P. Linga, A. Machanavajjhala, J. Gehrke, and J. Shanmugasundaram. P-ring: an efficient and robust p2p range index structure. In SIGMOD 2007.
[16]
J. Dean and S. Ghemawat. Mapreduce: a flexible data processing tool. Commun. ACM, 53(1):72--77, 2010.
[17]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In SIGOPS 2007.
[18]
S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In SOSP 2003.
[19]
W. Gilks, S. Richardson, and D. Spiegelhalter. Markov chain monte carlo in practice. 1996.
[20]
H. V. Jagadish, B. C. Ooi, K.-L. Tan, Q. H. Vu, and R. Zhang. Speeding up search in peer-to-peer networks with a multi-way tree structure. In SIGMOD, pages 1--12, 2006.
[21]
H. V. Jagadish, B. C. Ooi, and Q. H. Vu. Baton: A balanced tree structure for peer-to-peer networks. In VLDB 2005.
[22]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI 2004.
[23]
S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content-addressable network. In SIGCOMM 2001.
[24]
A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In International Conference on Distributed Systems Platforms 2001.
[25]
A. Silberstein, B. F. Cooper, U. Srivastava, E. Vee, R. Yerneni, and R. Ramakrishnan. Efficient bulk insertion into a distributed ordered table. In SIGMOD 2008.
[26]
I. Stoica, R. Morris, D. R. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In SIGCOMM 2001.
[27]
Y. Tao, J. Zhang, D. Papadias, and N. Mamoulis. An efficient cost model for optimization of nearest neighbor search in low and medium dimensional spaces. IEEE Trans. on Knowl. and Data Eng., 16(10):1169--1184, 2004.
[28]
Q. H. Vu, M. Lupu, and B. C. Ooi. Peer-To-Peer Computing: Principles And Applications. Springer, November 2009.
[29]
S. Wu and K.-L. Wu. An indexing framework for efficient retrieval on the cloud. IEEE Data Engineering Bulletin, 32(1):77--84, 2009.
[30]
H.-c. Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker. Map-reduce-merge: simplified relational data processing on large clusters. In SIGMOD, pages 1029 1040, 2007.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
June 2010
1286 pages
ISBN:9781450300322
DOI:10.1145/1807167
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud
  2. index
  3. query processing

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '10
Sponsor:
SIGMOD/PODS '10: International Conference on Management of Data
June 6 - 10, 2010
Indiana, Indianapolis, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)2
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Multi-Dimensional Flat Indexing for Encrypted DataIEEE Transactions on Cloud Computing10.1109/TCC.2024.340890512:3(928-941)Online publication date: Jul-2024
  • (2023)Developing a Base Domain Ontology from Geoscience Report Collection to Aid in Information Retrieval towards Spatiotemporal and Topic AssociationISPRS International Journal of Geo-Information10.3390/ijgi1301001413:1(14)Online publication date: 30-Dec-2023
  • (2023)Efficient Method for Continuous IoT Data Stream Indexing in the Fog-Cloud Computing LevelBig Data and Cognitive Computing10.3390/bdcc70201197:2(119)Online publication date: 14-Jun-2023
  • (2023)Hyper-USS: Answering Subset Query Over Multi-Attribute Data StreamProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599383(1698-1709)Online publication date: 6-Aug-2023
  • (2023)Efficient and lightweight indexing approach for multi-dimensional historical data in blockchainFuture Generation Computer Systems10.1016/j.future.2022.09.002139(210-223)Online publication date: Feb-2023
  • (2022)Towards a distributed SaaS management system in a multi-cloud environmentCluster Computing10.1007/s10586-022-03619-x25:6(4051-4071)Online publication date: 20-Jun-2022
  • (2022)Data-Intensive Workflow ManagementundefinedOnline publication date: 26-Feb-2022
  • (2021)Big data integration enhancement based on attributes conditional dependency and similarity index methodMathematical Biosciences and Engineering10.3934/mbe.202142918:6(8661-8682)Online publication date: 2021
  • (2021)An Efficient Data Analysis Framework for Online Security ProcessingJournal of Computer Networks and Communications10.1155/2021/92908532021Online publication date: 1-Jan-2021
  • (2021)FacetsBase: A Key-Value Store Optimized for Querying on Scholarly DataIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2018.28443139:1(302-315)Online publication date: 1-Jan-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media