On Scalable and Robust Truth Discovery in Big Data Social Media Sensing Applications

Daniel Zhang, Dong Wang, Nathan Vance, Yang Zhang, Steven Mike

Research output: Contribution to journalArticlepeer-review


Identifying trustworthy information in the presence of noisy data contributed by numerous unvetted sources from online social media (e.g., Twitter, Facebook, and Instagram) has been a crucial task in the era of big data. This task, referred to as truth discovery, targets at identifying the reliability of the sources and the truthfulness of claims they make without knowing either a priori. In this work, we identified three important challenges that have not been well addressed in the current truth discovery literature. The first one is misinformation spread where a significant number of sources are contributing to false claims, making the identification of truthful claims difficult. For example, on Twitter, rumors, scams, and influence bots are common examples of sources colluding, either intentionally or unintentionally, to spread misinformation and obscure the truth. The second challenge is data sparsity or the long-Tail phenomenon where a majority of sources only contribute a small number of claims, providing insufficient evidence to determine those sources' trustworthiness. For example, in the Twitter datasets that we collected during real-world events, more than 90 percent of sources only contributed to a single claim. Third, many current solutions are not scalable to large-scale social sensing events because of the centralized nature of their truth discovery algorithms. In this paper, we develop a Scalable and Robust Truth Discovery (SRTD) scheme to address the above three challenges. In particular, the SRTD scheme jointly quantifies both the reliability of sources and the credibility of claims using a principled approach. We further develop a distributed framework to implement the proposed truth discovery scheme using Work Queue in an HTCondor system. The evaluation results on three real-world datasets show that the SRTD scheme significantly outperforms the state-of-The-Art truth discovery methods in terms of both effectiveness and efficiency.

Original languageEnglish (US)
Article number8334619
Pages (from-to)195-208
Number of pages14
JournalIEEE Transactions on Big Data
Issue number2
StatePublished - Jun 1 2019
Externally publishedYes


  • Big data
  • rumor robust
  • scalable
  • sparse social media sensing
  • truth discovery
  • twitter

ASJC Scopus subject areas

  • Information Systems
  • Information Systems and Management


Dive into the research topics of 'On Scalable and Robust Truth Discovery in Big Data Social Media Sensing Applications'. Together they form a unique fingerprint.

Cite this