skip to main content
10.1145/3105831.3105841acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

The SusCity Big Data Warehousing Approach for Smart Cities

Published: 12 July 2017 Publication History

Abstract

Nowadays, the concept of Smart City provides a rich analytical context, highlighting the need to store and process vast amounts of heterogeneous data flowing at different velocities. This data is defined as Big Data, which imposes significant difficulties in traditional data techniques and technologies. Data Warehouses (DWs) have long been recognized as a fundamental enterprise asset, providing fact-based decision support for several organizations. The concept of DW is evolving. Traditionally, Relational Database Management Systems (RDBMSs) are used to store historical data, providing different analytical perspectives regarding several business processes. With the current advancements in Big Data techniques and technologies, the concept of Big Data Warehouse (BDW) emerges to surpass several limitations of traditional DWs. This paper presents a novel approach for designing and implementing BDWs, which has been supporting the SusCity data visualization platform. The BDW is a crucial component of the SusCity research project in the context of Smart Cities, supporting analytical tasks based on data collected in the city of Lisbon.

References

[1]
Anderson, J.C. et al. 2010. CouchDB: the definitive guide. O'Reilly Media, Inc.
[2]
Apache Kafka Homepage: 2017. https://kafka.apache.org/. Accessed: 2017-03-05.
[3]
Cattell, R. 2011. Scalable SQL and NoSQL data stores. ACM SIGMOD Record. 39, 4 (2011), 12--27.
[4]
Chart.js | Open source HTML5 Charts: 2017. http://www.chartjs.org/. Accessed: 2017-03-05.
[5]
Cheng, B. et al. 2015. Building a big data platform for smart cities: Experience and lessons from santander. Big Data (BigData Congress), 2015 IEEE International Congress on (2015), 592--599.
[6]
Chevalier, M. et al. 2017. Document-oriented Models for Data Warehouses - NoSQL Document-oriented for Data Warehouses. (Mar. 2017), 142--149.
[7]
Chevalier, M. et al. 2015. Implementing multidimensional data warehouses into NoSQL. International Conference on Enterprise Information Systems (ICEIS 2015) (2015), 172--183.
[8]
Clegg, D. 2015. Evolving data warehouse and BI architectures: The big data challenge. TDWI Business Intelligence Journal. 20, 1 (2015), 19--24.
[9]
Costa, C. and Santos, M.Y. 2017. A Conceptual Model for the Professional Profile of a Data Scientist. (Apr. 2017).
[10]
Costa, C. and Santos, M.Y. 2016. BASIS: A big data architecture for smart cities. 2016 SAI Computing Conference (SAI) (Jul. 2016), 1247--1256.
[11]
Costa, C. and Santos, M.Y. 2015. Improving cities sustainability through the use of data mining in a context of big city data. The 2015 International Conference of Data Mining and Knowledge Engineering (2015), 320--325.
[12]
Costa, C. and Santos, M.Y. 2016. Reinventing the Energy Bill in Smart Cities with NoSQL Technologies. Transactions on Engineering Technologies. S. Ao et al., eds. Springer Singapore. 383--396.
[13]
Dean, J. and Ghemawat, S. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM. 51, 1 (Jan. 2008), 107--113.
[14]
Floratou, A. et al. 2014. SQL-on-Hadoop: Full Circle Back to Shared-nothing Database Architectures. Proc. VLDB Endow. 7, 12 (Aug. 2014), 1295--1306.
[15]
Foo, A. 2013. Is the data warehouse dead? IBM Data Management Magazine. 5 (2013).
[16]
Girtelschmid, S. et al. 2013. Big data in large scale intelligent smart city installations. Proceedings of International Conference on Information Integration and Web-based Applications & Services (2013), 428.
[17]
Golab, L. and Johnson, T. 2014. Data stream warehousing. 2014 IEEE 30th International Conference on Data Engineering (ICDE) (Mar. 2014), 1290--1293.
[18]
Google Maps JavaScript API: 2017. https://developers.google.com/maps/documentation/javascript/. Accessed: 2017-03-05.
[19]
Goss, R.G. and Veeramuthu, K. 2013. Heading towards big data building a better data warehouse for more data, more speed, and more users. Advanced Semiconductor Manufacturing Conference (ASMC), 2013 24th Annual SEMI (2013), 220--225.
[20]
Gröger, C. et al. 2014. The Deep Data Warehouse: Link-Based Integration and Enrichment of Warehouse Data and Unstructured Content. IEEE 18th International Enterprise Distributed Object Computing Conference (EDOC) (Sep. 2014), 210--217.
[21]
Hall, M. et al. 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter. 11, 1 (2009), 10--18.
[22]
Hevner, A.R. et al. 2004. Design Science in Information Systems Research. MIS Q. 28, 1 (Mar. 2004), 75--105.
[23]
Hive Transactions - Apache Hive - Apache Software Foundation: 2017. https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions. Accessed: 2017-01-30.
[24]
Hortonworks 2016. Solving Apache Hadoop Security: A Holistic Approach to a Secure Data Lake. Hortonworks.
[25]
Huai, Y. et al. 2014. Major Technical Advancements in Apache Hive. Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (New York, NY, USA, 2014), 1235--1246.
[26]
Jara, A.J. et al. 2013. Determining human dynamics through the internet of things. Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT)-Volume 03 (2013), 109--113.
[27]
jQuery: 2017. https://jquery.com/. Accessed: 2017-03-05.
[28]
Kearney, M. 2012. Embracing big data from the warehouse. IBM Data Management Magazine.
[29]
Khan, Z. et al. 2013. Cloud based big data analytics for smart future cities. Proceedings of the 2013 IEEE/ACM 6th international conference on utility and cloud computing (2013), 381--386.
[30]
Kimball, R. and Ross, M. 2013. The data warehouse toolkit: The definitive guide to dimensional modeling. John Wiley & Sons.
[31]
Kobielus, J. 2012. Hadoop: Nucleus of the next-generation big data warehouse. IBM Data Management Magazine.
[32]
Kornacker, M. et al. 2015. Impala: A modern, open-source sql engine for hadoop. Proc. CIDR'15 (California, USA, 2015).
[33]
Krishnan, K. 2013. Data Warehousing in the Age of Big Data. Morgan Kaufmann Publishers Inc.
[34]
Li, X. and Mao, Y. 2015. Real-Time data ETL framework for big realtime data analysis. 2015 IEEE International Conference on Information and Automation (Aug. 2015), 1289--1294.
[35]
Lipcon, T. et al. 2015. Kudu: Storage for Fast Analytics on Fast Data. Cloudera.
[36]
Madden, S. 2012. From databases to big data. IEEE Internet Computing. 16, 3 (2012), 4--6.
[37]
Marz, N. and Warren, J. 2015. Big Data: Principles and best practices of scalable realtime data systems. Manning Publications Co.
[38]
Mohanty, S. et al. 2013. Big Data imperatives: enterprise Big Data warehouse, BI implementations and analytics. Apress.
[39]
NBD-PWG 2015. NIST Big Data Interoperability Framework: Volume 6, Reference Architecture. Technical Report #NIST SP 1500-6. National Institute of Standards and Technology.
[40]
Peffers, K. et al. 2007. A Design Science Research Methodology for Information Systems Research. J. Manage. Inf. Syst. 24, 3 (Dec. 2007), 45--77.
[41]
Philip Chen, C.L. and Zhang, C.-Y. 2014. Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Information Sciences. 275, (Aug. 2014), 314--347.
[42]
Presto | Distributed SQL Query Engine for Big Data: 2016. https://prestodb.io/. Accessed: 2016-10-23.
[43]
Qiao, L. et al. 2015. Gobblin: Unifying data ingestion for Hadoop. Proceedings of the VLDB Endowment. 8, 12 (2015), 1764--1769.
[44]
Russom, P. 2016. Data Warehouse Modernization in the Age of Big Data Analytics. The Data Warehouse Institute.
[45]
Russom, P. 2014. Evolving Data Warehouse Architectures in the Age of Big Data. The Data Warehouse Institute.
[46]
Santos, M.Y. and Costa, C. 2016. Data Models in NoSQL Databases for Big Data Contexts. 2016 International Conference of Data Mining and Big Data (DMBD) (2016), 1--11.
[47]
Santos, M.Y. and Costa, C. 2016. Data Warehousing in Big Data: From Multidimensional to Tabular Data Models. Ninth International C* Conference on Computer Science & Software Engineering (C3S2E) (2016), 51--60.
[48]
Shanahan, J.G. and Dai, L. 2015. Large scale distributed data science using apache spark. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2015), 2323--2324.
[49]
Shvachko, K. et al. 2010. The Hadoop Distributed File System. 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) (May 2010), 1--10.
[50]
Simmhan, Y. et al. 2013. Cloud-based software platform for big data analytics in smart grids. Computing in Science & Engineering. 15, 4 (2013), 38--47.
[51]
Song, J. et al. 2015. HaoLap: A Hadoop based OLAP system for big data. Journal of Systems and Software. 102, (Apr. 2015), 167--181.
[52]
SUSCITY -- An MIT Portugal Project: 2016. http://suscity-project.eu/inicio/. Accessed: 2016-05-04.
[53]
Talend Open Studio for Big Data Product Details: 2017. https://www.talend.com/download_page_type/talend-open-studio/. Accessed: 2017-03-05.
[54]
Thusoo, A. et al. 2010. Data Warehousing and Analytics Infrastructure at Facebook. Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (New York, NY, USA, 2010), 1013--1020.
[55]
Thusoo, A. et al. 2010. Hive-a petabyte scale data warehouse using hadoop. IEEE 26th International Conference on Data Engineering (ICDE) (2010), 996--1005.
[56]
Vilajosana, I. et al. 2013. Bootstrapping smart cities through a self-sustainable model based on big data flows. IEEE Communications magazine. 51, 6 (2013), 128--134.
[57]
Wang, H. et al. 2015. Efficient query processing framework for big data warehouse: an almost join-free approach. Frontiers of Computer Science. 9, 2 (2015), 224--236.
[58]
Wang, H. et al. 2011. LinearDB: A Relational Approach to Make Data Warehouse Scale Like MapReduce. Database Systems for Advanced Applications. J.X. Yu et al., eds. Springer Berlin Heidelberg. 306--320.
[59]
Wang, S. et al. 2014. High dimensional biological data retrieval optimization with NoSQL technology. BMC Genomics. 15 Suppl 8, (2014), S3--S3.
[60]
Ward, J.S. and Barker, A. 2013. Undefined By Data: A Survey of Big Data Definitions. arXiv:1309.5821 [cs.DB]. (Sep. 2013).
[61]
Welcome to Apache Hadoop: 2016. https://hadoop.apache.org/. Accessed: 2017-02-01.
[62]
Zikopoulos, P. and Eaton, C. 2011. Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Osborne Media.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IDEAS '17: Proceedings of the 21st International Database Engineering & Applications Symposium
July 2017
338 pages
ISBN:9781450352208
DOI:10.1145/3105831
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Univ of the West of England: University of the West of England
  • BytePress
  • Concordia University: Concordia University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Big Data
  2. Big Data Warehousing
  3. Data Warehouse
  4. Hadoop
  5. NoSQL
  6. Smart Cities

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

IDEAS 2017

Acceptance Rates

IDEAS '17 Paper Acceptance Rate 38 of 102 submissions, 37%;
Overall Acceptance Rate 74 of 210 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)5
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Breaking barriers for breaking ground: A categorisation of public sector challenges to smart city project implementationPublic Policy and Administration10.1177/09520767241263233Online publication date: 19-Jul-2024
  • (2023)Technological Innovations for Enhancing Disaster Resilience in Smart Cities: A Comprehensive Urban Scholar’s AnalysisSustainability10.3390/su15151203615:15(12036)Online publication date: 6-Aug-2023
  • (2023)Join Operation for Semantic Data Enrichment of Asynchronous Time Series DataAxioms10.3390/axioms1204034912:4(349)Online publication date: 1-Apr-2023
  • (2023)An Improvement of Data Flow for Business Intelligence: Royal Project Foundation Case StudyAdvances in Intelligent Networking and Collaborative Systems10.1007/978-3-031-40971-4_4(38-48)Online publication date: 29-Aug-2023
  • (2022)A Review of Multisensor Data Fusion Solutions in Smart Manufacturing: Systems and TrendsSensors10.3390/s2205173422:5(1734)Online publication date: 23-Feb-2022
  • (2022)HS-PARAM: Hive-Spark Parameterization Framework to Optimize Ingestion and Storage of Heterogeneous Data2022 14th International Conference on COMmunication Systems & NETworkS (COMSNETS)10.1109/COMSNETS53615.2022.9668594(227-230)Online publication date: 4-Jan-2022
  • (2022)Data Architecture for Data-Driven Service Platform: Royal Project Foundation Case StudyAdvances in Network-Based Information Systems10.1007/978-3-031-14314-4_13(131-141)Online publication date: 12-Aug-2022
  • (2022)Data Service Platform for Social and Community to Drive the Royal Project FoundationAdvances in Internet, Data & Web Technologies10.1007/978-3-030-95903-6_1(1-10)Online publication date: 2-Feb-2022
  • (2021)Advancing Logistics 4.0 with the Implementation of a Big Data Warehouse: A Demonstration Case for the Automotive IndustryElectronics10.3390/electronics1018222110:18(2221)Online publication date: 10-Sep-2021
  • (2020)Scrutinize the Idea of Hadoop-Based Data Lake for Big Data StorageApplications of Machine Learning10.1007/978-981-15-3357-0_24(365-391)Online publication date: 5-May-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media