skip to main content
10.1145/3038912.3052672acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Distilling Information Reliability and Source Trustworthiness from Digital Traces

Published: 03 April 2017 Publication History

Abstract

Online knowledge repositories typically rely on their users or dedicated editors to evaluate the reliability of their contents. These explicit feedback mechanisms can be viewed as noisy measurements of both information reliability and information source trustworthiness. Can we leverage these noisy measurements, often biased, to distill a robust, unbiased and interpretable measure of both notions?
In this paper, we argue that the large volume of digital traces left by the users within knowledge repositories also reflect information reliability and source trustworthiness. In particular, we propose a temporal point process modeling framework which links the temporal behavior of the users to information reliability and source trustworthiness. Furthermore, we develop an efficient convex optimization procedure to learn the parameters of the model from historical traces of the evaluations provided by these users. Experiments on real-world data gathered from Wikipedia and Stack Overflow show that our modeling framework accurately predicts evaluation events, provides an interpretable measure of information reliability and source trustworthiness, and yields interesting insights about real-world events.

References

[1]
A. Borodin, G. O. Roberts, J. S. Rosenthal, and P. Tsaparas. Link analysis ranking: algorithms, theory, and experiments. ACM Transactions on Internet Technology, 5(1):231--297, 2005.
[2]
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In VLDB, 2004.
[3]
M. Wu and A. Marian. Corroborating answers from multiple web sources. In WebDB, 2007.
[4]
X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, K. Murphy, S. Sun, and W. Zhang. From data fusion to knowledge fusion. VLDB, 2014.
[5]
X. L. Dong, E. Gabrilovich, K. Murphy, V. Dang, W. Horn, C. Lugaresi, S. Sun, and W. Zhang. Knowledge-based trust: Estimating the trustworthiness of web sources. VLDB, 2015.
[6]
H. Xiao, J. Gao, Q. Li, F. Ma, L. Su, Y Feng., and A. Zhang. Towards confidence in the truth: A bootstrapping based truth discovery approach. In KDD, 2016.
[7]
B. T. Adler and L. De Alfaro. A content-driven reputation system for the wikipedia. In WWW, 2007.
[8]
J. Pasternack and D. Roth. Latent credibility analysis. In WWW, 2013.
[9]
X. Yin and W. Tan. Semi-supervised truth discovery. In WWW, 2011.
[10]
B. Zhao and J. Han. A probabilistic model for estimating real-valued truth from conflicting sources. Proceedings of QDB, 2012.
[11]
B. Zhao, B. I. Rubinstein, J. Gemmell, and J. Han. A bayesian approach to discovering truth from conflicting sources for data integration. VLDB, 2012.
[12]
Y. Li, Q. Li, J. Gao, L. Su, B. Zhao, W. Fan, and J. Han. On the discovery of evolving truth. In KDD, 2015.
[13]
X. Liu, X. L. Dong, B. C. Ooi, and D. Srivastava. Online data fusion. VLDB, 2011.
[14]
A. Pal, V. Rastogi, A. Machanavajjhala, and P. Bohannon. Information integration over time in unreliable and uncertain environments. In WWW, 2012.
[15]
S. Wang, D. Wang, L. Su, L. Kaplan, and T. F. Abdelzaher. Towards cyber-physical systems in social spaces: The data reliability challenge. In RTSS, 2014.
[16]
M. Gomez-Rodriguez, D. Balduzzi, and B. Schölkopf. Uncovering the temporal dynamics of diffusion networks. In ICML, 2011.
[17]
N. Du, L. Song, M. Gomez-Rodriguez, and H. Zha. Scalable influence estimation in continuous-time diffusion networks. In NIPS, 2013.
[18]
H. Daneshmand, M. Gomez-Rodriguez, L. Song, and B. Schölkopf. Estimating diffusion network structures: Recovery conditions, sample complexity & soft-thresholding algorithm. In ICML, 2014.
[19]
M. Farajtabar, X. Ye, S. Harati, L. Song, and H. Zha. Multistage campaigning in social networks. In NIPS, 2016.
[20]
M. Karimi, E. Tavakoli, M. Farajtabar, L. Song, and M. Gomez-Rodriguez. Smart Broadcasting: Do you want to be seen? In KDD, 2016.
[21]
M. Farajtabar, N. Du, M. Gomez-Rodriguez, I. Valera, H. Zha, and L. Song. Shaping social activity by incentivizing users. In NIPS, 2014.
[22]
N. Du, H. Dai, R. Trivedi, U. Upadhyay, M. Gomez-Rodriguez, and L. Song. Recurrent Marked Temporal Point Process: Embedding Event History to Vector. In KDD, 2016.
[23]
D. Hunter, P. Smyth, D. Q. Vu, and A. U. Asuncion. Dynamic egocentric models for citation networks. In ICML, 2011.
[24]
M. Farajtabar, Y. Wang, M. Gomez-Rodriguez, S. Li, H. Zha, and L. Song. Coevolve: A joint point process model for information diffusion and network co-evolution. In NIPS, 2015.
[25]
A. De, I. Valera, N. Ganguly, S. Bhattacharya, and M. Gomez-Rodriguez. Learning and forecasting opinion dynamics in social networks. In NIPS, 2016.
[26]
I. Valera and M. Gomez-Rodriguez. Modeling adoption and usage of competing products. In ICDM, 2015.
[27]
O. Aalen, O. Borgan, and H. K. Gjessing. Survival and event history analysis: a process point of view. Springer, 2008.
[28]
K. Zhou, H. Zha, and L. Song. Learning triggering kernels for multi-dimensional hawkes processes. In ICML, 2013.
[29]
S. Diamond and S. Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 2016.
[30]
R. Řehůřek and P. Sojka. Software Framework for Topic Modelling with Large Corpora. In LREC, 2010.
[31]
A. Anderson, J. Kleinberg, and S. Mullainathan. Assessing Human Error Against a Benchmark of Perfection. In KDD, 2016.
[32]
S. Greenstein and F. Zhu. Is wikipedia biased? The American economic review, 102(3):343--348, 2012.
[33]
U. Upadhyay, I. Valera, and M. Gomez-Rodriguez. Uncovering the dynamics of crowdlearning and the value of knowledge. In WSDM, 2017.

Cited By

View all

Index Terms

  1. Distilling Information Reliability and Source Trustworthiness from Digital Traces

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          WWW '17: Proceedings of the 26th International Conference on World Wide Web
          April 2017
          1678 pages
          ISBN:9781450349130

          Sponsors

          • IW3C2: International World Wide Web Conference Committee

          In-Cooperation

          Publisher

          International World Wide Web Conferences Steering Committee

          Republic and Canton of Geneva, Switzerland

          Publication History

          Published: 03 April 2017

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. information reliability
          2. point processes
          3. source trustworthiness

          Qualifiers

          • Research-article

          Conference

          WWW '17
          Sponsor:
          • IW3C2

          Acceptance Rates

          WWW '17 Paper Acceptance Rate 164 of 966 submissions, 17%;
          Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)28
          • Downloads (Last 6 weeks)3
          Reflects downloads up to 06 Nov 2024

          Other Metrics

          Citations

          Cited By

          View all

          View Options

          Get Access

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media