skip to main content
research-article

A Survey on Accessing Dataspaces

Published: 28 September 2016 Publication History

Abstract

Dataspaces provide a co-existence approach for heterogeneous data. Relationships among these heterogeneous data are often incrementally identified, such as object associations or attribute synonyms. With the different degree of relationships recognized, various query answers may be obtained. In this paper, we review the major techniques for processing and optimizing queries in dataspaces, according to their different abilities of handling relationships, including 1) simple search query without considering relationships, 2) association query over object associations, 3) heterogeneity query with attribute correspondences, and 4) similarity query for similar objects. Techniques such as indexing, query rewriting, expansion, and semantic query optimization are discussed for these query types. Finally, we highlight possible directions in accessing dataspaces.

References

[1]
M. Arenas, L. E. Bertossi, and J. Chomicki. Consistent query answers in inconsistentdatabases. In Proceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 31 - June 2, 1999, Philadelphia, Pennsylvania, USA, pages 68--79, 1999.
[2]
P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functionaldependencies for data cleaning. In Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, April 15-20, 2007, pages 746--755, 2007.
[3]
U. S. Chakravarthy, J. Grant, and J. Minker. Logic-based approach to semantic query optimization. ACM Trans. Database Syst., 15(2):162--207, 1990.
[4]
J. Dittrich and M. A. V. Salles. idm: A unified and versatile data model for personal dataspace management. In Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12-15, 2006, pages 367--378, 2006.
[5]
X. Dong and A. Y. Halevy. Indexing dataspaces. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, China, June 12-14, 2007, pages 43--54, 2007.
[6]
X. L. Dong, A. Y. Halevy, and C. Yu. Data integration with uncertainty. In Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23-27, 2007, pages 687--698, 2007.
[7]
X. L. Dong, A. Y. Halevy, and C. Yu. Data integration with uncertainty. VLDB J., 18(2):469--500, 2009.
[8]
M. J. Franklin, A. Y. Halevy, and D. Maier. From databases to dataspaces: a new abstraction for information management. SIGMOD Record, 34(4):27--33, 2005.
[9]
G. Gou, M. Kormilitsin, and R. Chirkova. Query evaluation using overlapping views: completeness and efficiency. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, June 27-29, 2006, pages 37--48, 2006.
[10]
A. Y. Halevy, M. J. Franklin, and D. Maier. Principles of dataspace systems. In Proceedings of the Twenty-Fifth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 26-28, 2006, Chicago, Illinois, USA, pages 1--9, 2006.
[11]
B. Howe, D. Maier, N. Rayner, and J. Rucker. Quarrying dataspaces: Schemaless profiling of unfamiliar information sources. In Proceedings of the 24th International Conference on Data Engineering Workshops, ICDE 2008, April 7-12, 2008, Cancún, México, pages 270--277, 2008.
[12]
S. R. Jeffery, M. J. Franklin, and A. Y. Halevy. Pay-as-you-go user feedback for dataspace systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10-12, 2008, pages 847--860, 2008.
[13]
A. Y. Levy and Y. Sagiv. Semantic query optimization in datalog programs. In Proceedings of the Fourteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 22-25, 1995, San Jose, California, USA, pages 163--173, 1995.
[14]
Y. Li and X. Meng. Supporting context-based query in personal dataspace. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, November 2-6, 2009, pages 1437--1440, 2009.
[15]
X. Lian, L. Chen, and S. Song. Consistent query answers in inconsistent probabilistic databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010, pages 303--314, 2010.
[16]
J. Liu, X. Dong, and A. Y. Halevy. Answering structured queries on unstructured data. In Ninth International Workshop on the Web and Databases, WebDB 2006, Chicago, Illinois, USA, June 30, 2006, 2006.
[17]
S. Ma, W. Fan, and L. Bravo. Extending inclusion dependencies with conditions. Theor. Comput. Sci., 515:64--95, 2014.
[18]
E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. VLDB J., 10(4):334--350, 2001.
[19]
M. A. V. Salles, J. Dittrich, and L. Blunschi. Intensional associations in dataspaces. In Proceedings of the 26th International Conference on Data Engineering, ICDE 2010, March 1-6, 2010, Long Beach, California, USA, pages 984--987, 2010.
[20]
M. A. V. Salles, J. Dittrich, S. K. Karakashian, O. R. Girard, and L. Blunschi. itrails: Pay-as-you-go information integration in dataspaces. In Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23-27, 2007, pages 663--674, 2007.
[21]
A. D. Sarma, X. L. Dong, and A. Y. Halevy. Uncertainty in data integration and dataspace support platforms. In Z. Bellahsene, A. Bonifati, and E. Rahm, editors, Schema Matching and Mapping, Data-Centric Systems and Applications, pages 75--108. Springer, 2011.
[22]
S. Song and L. Chen. Indexing dataspaces with partitions. World Wide Web, 16(2):141--170, 2013.
[23]
S. Song, L. Chen, and H. Cheng. On concise set of relative candidate keys. PVLDB, 7(12):1179--1190, 2014.
[24]
S. Song, L. Chen, and P. S. Yu. On data dependencies in dataspaces. In Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11-16, 2011, Hannover, Germany, pages 470--481, 2011.
[25]
S. Song, L. Chen, and P. S. Yu. Comparable dependencies over heterogeneous data. VLDB J., 22(2):253--274, 2013.
[26]
S. Song, L. Chen, and M. Yuan. Materialization and decomposition of dataspaces for efficient search. IEEE Trans. Knowl. Data Eng., 23(12):1872--1887, 2011.
[27]
A. Trotman and B. Sigurbjörnsson. Narrowed extended xpath I (NEXI). In Advances in XML Information Retrieval, Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004, Dagstuhl Castle, Germany, December 6-8, 2004, Revised Selected Papers, pages 16--40, 2004.
[28]
P. Valduriez. Join indices. ACM Trans. Database Syst., 12(2):218--246, 1987.
[29]
J. Wang, S. Song, X. Lin, X. Zhu, and J. Pei. Cleaning structured event logs: A graph repair approach. In 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, pages 30--41, 2015.
[30]
J. Wang, S. Song, X. Zhu, and X. Lin. Efficient recovery of missing events. PVLDB, 6(10):841--852, 2013.
[31]
W. Zheng, L. Zou, X. Lian, J. X. Yu, S. Song, and D. Zhao. How to build templates for RDF question/answering: An uncertain graph similarity join approach. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, pages 1809--1824, 2015.
[32]
M. Zhong, M. Liu, and Y. He. 3sepias: A semi-structured search engine for personal information in dataspace system. Inf. Sci., 218:31--50, 2013.
[33]
X. Zhu, S. Song, X. Lian, J. Wang, and L. Zou. Matching heterogeneous event data. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014, pages 1211--1222, 2014.
[34]
X. Zhu, S. Song, J. Wang, P. S. Yu, and J. Sun. Matching heterogeneous events with patterns. In IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, pages 376--387, 2014.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 45, Issue 2
June 2016
66 pages
ISSN:0163-5808
DOI:10.1145/3003665
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 September 2016
Published in SIGMOD Volume 45, Issue 2

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media