skip to main content
article

Unified domain-specific language for collecting and processing data of social media

Published: 01 October 2018 Publication History

Abstract

Data provided by social media becomes an increasingly important analysis material for social scientists, market analysts, and other stakeholders. Diversity of interests leads to the emergence of a variety of crawling techniques and programming solutions. Nevertheless, these solutions have a lack of flexibility to satisfy requirements of different users and individual crawling scenarios, that can range from a simple query to a complex workflow containing multiple steps and requiring data from different networks to be collected. To address this problem, our paper proposes an approach based on a developed domain specific language (DSL) and architecture of distributed crawling system. The DSL has a declarative style that requires the user to define the description of needed data and based on an ontological model of social networks and the essential crawling techniques. Thus, the crawling system can be applied to collect the data from different online social networks within complex workflows along with the exploitation of various crawling methods implemented in a distributed computing environment.

References

[1]
Arnaboldi, V., Conti, M., Passarella, A., Pezzoni, F. (2013). Ego networks in twitter: an experimental analysis. In INFOCOM, 2013 Proceedings IEEE (pp. 3459-3464): IEEE.
[2]
Avrachenkov, K.E., Mazalov, V.V., Tsynguev, B.T. (2015). Beta Current Flow Centrality for Weighted Networks. In Computational Social Networks (pp. 216-227): Springer International Publishing.
[3]
Bansal, N., & Koudas, N. (2007). Blogscope: spatio-temporal analysis of the blogosphere. In Proceedings of the 16th international conference on World Wide Web (pp. 1269-1270): ACM.
[4]
Boanjak, M., Oliveira, E., Martins, J., Mendes Rodrigues, E., Sarmento, L. (2012). TwitterEcho: a distributed focused crawler to support open research with twitter data. In Proceedings of the 21st international conference companion on World Wide Web (pp. 1233-1240): ACM.
[5]
Buccafurri, F., Lax, G., Nocera, A., Ursino, D. (2012). Crawling social internetworking systems. In 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (pp. 506-510): IEEE. - (BFS, Random Walk and others).
[6]
Buccafurri, F., Lax, G., Nocera, A., Ursino, D. (2015). A system for extracting structural information from Social Network accounts. Software: Practice and Experience, 45(9), 1251-1275.
[7]
Buccafurri, F., Lax, G., Nicolazzo, S., Nocera, A. (2016). A model to support design and development of multiple-social-network applications. Information Sciences, 331, 99-119.
[8]
Buraya, K., Farseev, A., Filchenkov, A., Chua, T.S. (2017). Towards User Personality Profiling from Multiple Social Networks. In AAAI (pp. 4909-4910).
[9]
Butakov, N., Chuprova, Y., Knyazkov, K., Shindyapina, N., Boukhanovsky, A. (2015). Evolutionary-based Framework for Optimizing the Spread of Information on Twitter. Procedia Computer Science, 66, 287- 296.
[10]
Dunbar, R.I.M., Arnaboldi, V., Conti, M., Passarella, A. (2015). The structure of online social networks mirrors those in the offline world. Social Networks, 43, 39-47.
[11]
Duvanova, D., Nikolaev, A., Nikolsko-Rzhevskyy, A., Semenov, A. (2015). Violent conflict and online segregation: An analysis of social network communication across Ukraine's regions. Journal of Comparative Economics.
[12]
Farseev, A., Nie, L., Akbari, M., Chua, T.S. (2015). Harvesting multiple sources for user profile learning: a big data study. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (pp. 235-242): ACM.
[13]
Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A. (2010).Walking in Facebook: A case study of unbiased sampling of OSNs. In IEEE (pp. 1-9).
[14]
Hicks, A., & BE, D.F. (2015). Mining Twitter as a First Step toward Assessing the Adequacy of Gender Identification Terms on Intake Forms.
[15]
Kahanda, I., & Neville, J. (2009). Using Transactional Information to Predict Link Strength in Online Social Networks. ICWSM, 9, 74-81.
[16]
Knyazkov, K.V., Kovalchuk, S.V., Tchurov, T.N., Maryin, S.V., Boukhanovsky, A.V. (2012). CLAVIRE: e-Science infrastructure for data-driven computing. Journal of Computational Science, 3(6), 504-510.
[17]
Kwak, H., Lee, C., Park, H., Moon, S. (2010). What is Twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web (pp. 591-600): ACM.
[18]
Li, R., Lei, K.H., Khadiwala, R., Chang, K.C.C. (2012). Tedas: A twitter-based event detection and analysis system. In 2012 ieee 28th international conference on Data engineering (icde) (pp. 1273-1276): IEEE.
[19]
Marcus, A., Bernstein, M.S., Badar, O., Karger, D.R., Madden, S., Miller, R.C. (2012). Processing and visualizing the data in tweets. ACM SIGMOD Record, 40(4), 21-27.
[20]
Mathioudakis, M., & Koudas, N. (2010). Twittermonitor: trend detection over the twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (pp. 1155-1158): ACM.
[21]
METRA, I. (2014). Influence based exploration of twitter social network.
[22]
Papadakis, G., Tserpes, K., Sardis, E., Kardara, M., Papaoikonomou, A., Aisopos, F. (2012). Social media meta-API: leveraging the content of social networks. In Proceedings of the 21st international conference companion on World Wide Web (pp. 271-274): ACM.
[23]
Psallidas, F., Ntoulas, A., Delis, A. (2013). Soc web: Efficient monitoring of social network activities. In Web Information Systems Engineering-WISE 2013 (pp. 118-136): Springer Berlin Heidelberg.
[24]
Serrano, D., Stroulia, E., Barbosa, D., Guana, V. (2012). Sociql: A query language for the socialweb, Springer Berlin Heidelberg.
[25]
Shuai, H.H., Yang, D.N., Shen, C.Y., Yu, P.S., Chen, M.S. (2015). QMSampler: Joint Sampling of Multiple Networks with Quality Guarantee. arXiv:1502.07439.
[26]
Teng, S.Y., Yeh, M.Y., Chuang, K.T. (2015). Toward Understanding the Mobile Social Properties: An Analysis on Instagram Photo-Sharing Network. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 (pp. 266-269): ACM.
[27]
Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J.M., Kulkarni, S., Bhagat, N. (2014). Storm@ twitter. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data (pp. 147-156): ACM.
[28]
Valkanas, G., & Gunopulos, D. (2013). How the live web feels about events. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (pp. 639-648): ACM.
[29]
Valkanas, G., Saravanou, A., Gunopulos, D. (2014). A faceted crawler for the twitter service. In Web Information Systems Engineering-WISE 2014 (pp. 178-188): Springer International Publishing.
[30]
Wang, X., Tokarchuk, L., Cuadrado, F., Poslad, S. (2013). Exploiting hashtags for adaptive microblog crawling. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (pp. 311-315): ACM.
[31]
Wachowicz, M., Arteaga, M.D., Cha, S., Bourgeois, Y. (2015). Developing a streaming data processing workflow for querying space-time activities from geotagged tweets. Computers, Environment and Urban Systems.
[32]
Xiong, F., Liu, Y., Zhang, Z.J., Zhu, J., Zhang, Y. (2012). An information diffusion model based on retweeting mechanism for online social media. Physics Letters A, 376(30), 2103-2108.
[33]
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Stoica, I. (2012a). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (pp. 2-2): USENIX Association.
[34]
Zaharia, M., Das, T., Li, H., Shenker, S., Stoica, I. (2012b). Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In Presented as part of the.
[35]
Zou, J., Fekri, F., McLaughlin, S.W. (2015). Mining Streaming Tweets for Real-Time Event Credibility Prediction in Twitter. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 (pp. 1586-1589): ACM.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of Intelligent Information Systems
Journal of Intelligent Information Systems  Volume 51, Issue 2
October 2018
245 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 October 2018

Author Tags

  1. Crawling
  2. Domain-specific language
  3. Ontology
  4. Social media
  5. Social networks

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media