TY - GEN
T1 - A Constrained Maximum Likelihood Estimator for Unguided Social Sensing
AU - Shao, Huajie
AU - Yao, Shuochao
AU - Zhao, Yiran
AU - Zhang, Chao
AU - Han, Jinda
AU - Kaplan, Lance
AU - Su, Lu
AU - Abdelzaher, Tarek
N1 - Funding Information:
ACKNOWLEDGEMENTS Research reported in this paper was sponsored in part by the Army Research Laboratory under Cooperative Agreements W911NF-09-2-0053 and W911NF-17-2-0196, in part by DARPA under award W911NF-17-C-0099, and in part by NSF under grants CNS 16-18627 and CNS 13-20209. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory, DARPA, NSF, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.
Publisher Copyright:
© 2018 IEEE.
PY - 2018/10/8
Y1 - 2018/10/8
N2 - This paper develops a constrained expectation maximization algorithm (CEM) that improves the accuracy of truth estimation in unguided social sensing applications. Unguided social sensing refers to the act of leveraging naturally occurring observations on social media as 'sensor measurements', when the sources post at will and not in response to specific sensing campaigns or surveys. A key challenge in social sensing, in general, lies in estimating the veracity of reported observations, when the sources reporting these observations are of unknown reliability and their observations themselves cannot be readily verified. This problem is known as fact-finding. Unsupervised solutions have been proposed to the fact-finding problem that explore notions of internal data consistency in order to estimate observation veracity. This paper observes that unguided social sensing gives rise to a new (and very simple) constraint that dramatically reduces the space of feasible fact-finding solutions, hence significantly improving the quality of fact-finding results. The constraint relies on a simple approximate test of source independence, applicable to unguided sensing, and incorporates information about the number of independent sources of an observation to constrain the posterior estimate of its probability of correctness. Two different approaches are developed to test the independence of sources for purposes of applying this constraint, leading to two flavors of the CEM algorithm, we call CEM and CEM-Jaccard. We show using both simulation and real data sets collected from Twitter that by forcing the algorithm to converge to a solution in which the constraint is satisfied, the quality of solutions is significantly improved.
AB - This paper develops a constrained expectation maximization algorithm (CEM) that improves the accuracy of truth estimation in unguided social sensing applications. Unguided social sensing refers to the act of leveraging naturally occurring observations on social media as 'sensor measurements', when the sources post at will and not in response to specific sensing campaigns or surveys. A key challenge in social sensing, in general, lies in estimating the veracity of reported observations, when the sources reporting these observations are of unknown reliability and their observations themselves cannot be readily verified. This problem is known as fact-finding. Unsupervised solutions have been proposed to the fact-finding problem that explore notions of internal data consistency in order to estimate observation veracity. This paper observes that unguided social sensing gives rise to a new (and very simple) constraint that dramatically reduces the space of feasible fact-finding solutions, hence significantly improving the quality of fact-finding results. The constraint relies on a simple approximate test of source independence, applicable to unguided sensing, and incorporates information about the number of independent sources of an observation to constrain the posterior estimate of its probability of correctness. Two different approaches are developed to test the independence of sources for purposes of applying this constraint, leading to two flavors of the CEM algorithm, we call CEM and CEM-Jaccard. We show using both simulation and real data sets collected from Twitter that by forcing the algorithm to converge to a solution in which the constraint is satisfied, the quality of solutions is significantly improved.
KW - Constrained expectation maximization (CEM)
KW - Estimation accuracy
KW - Social networks
KW - Truth discovery
UR - http://www.scopus.com/inward/record.url?scp=85056168849&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85056168849&partnerID=8YFLogxK
U2 - 10.1109/INFOCOM.2018.8486306
DO - 10.1109/INFOCOM.2018.8486306
M3 - Conference contribution
AN - SCOPUS:85056168849
T3 - Proceedings - IEEE INFOCOM
SP - 2429
EP - 2437
BT - INFOCOM 2018 - IEEE Conference on Computer Communications
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE Conference on Computer Communications, INFOCOM 2018
Y2 - 15 April 2018 through 19 April 2018
ER -