This paper introduces an active-learning-based truth estimator for social networks, such as Twitter, that enhances estimation accuracy significantly by requesting a well-selected (small) fraction of data to be labeled. Data assessment and truth discovery from arbitrary open online sources are a hard problem due to uncertainty regarding source reliability. Multiple truth finding systems were developed to solve this problem. Their accuracy is limited by the noisy nature of the data, where distortions, fabrications, omissions, and duplication are introduced. This paper presents a semi-supervised truth estimator for social networks, in which a portion of inputs are carefully selected to be reliably verified. The challenge is to find the subset of observations to verify that would maximally enhance the overall fact-finding accuracy. This work extends previous passive approaches to recursive truth estimation, as well as semi-supervised approaches where the estimator has no control over the choice of data to be labeled. Results show that by optimally selecting claims to be verified, we improve estimated accuracy by 12% over unsupervised baseline, and by 5% over previous semi-supervised approaches.
- Active Learning
- Maximum Likelihood Estimation
- Semi Supervision
- Social Sensing
- Truth Discovery
ASJC Scopus subject areas
- Computer Networks and Communications
A semi-supervised active-learning truth estimator for social networks. / Cui, Hang; Abdelzaher, Tarek; Kaplan, Lance.The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019. Association for Computing Machinery, Inc, 2019. p. 296-306 (The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019).
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution