A Constrained Maximum Likelihood Estimator for Unguided Social Sensing

Huajie Shao, Shuochao Yao, Yiran Zhao, Chao Zhang, Jinda Han, Lance Kaplan, Lu Su, Tarek Abdelzaher

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper develops a constrained expectation maximization algorithm (CEM) that improves the accuracy of truth estimation in unguided social sensing applications. Unguided social sensing refers to the act of leveraging naturally occurring observations on social media as 'sensor measurements', when the sources post at will and not in response to specific sensing campaigns or surveys. A key challenge in social sensing, in general, lies in estimating the veracity of reported observations, when the sources reporting these observations are of unknown reliability and their observations themselves cannot be readily verified. This problem is known as fact-finding. Unsupervised solutions have been proposed to the fact-finding problem that explore notions of internal data consistency in order to estimate observation veracity. This paper observes that unguided social sensing gives rise to a new (and very simple) constraint that dramatically reduces the space of feasible fact-finding solutions, hence significantly improving the quality of fact-finding results. The constraint relies on a simple approximate test of source independence, applicable to unguided sensing, and incorporates information about the number of independent sources of an observation to constrain the posterior estimate of its probability of correctness. Two different approaches are developed to test the independence of sources for purposes of applying this constraint, leading to two flavors of the CEM algorithm, we call CEM and CEM-Jaccard. We show using both simulation and real data sets collected from Twitter that by forcing the algorithm to converge to a solution in which the constraint is satisfied, the quality of solutions is significantly improved.

Original languageEnglish (US)
Title of host publicationINFOCOM 2018 - IEEE Conference on Computer Communications
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2429-2437
Number of pages9
ISBN (Electronic)9781538641286
DOIs
StatePublished - Oct 8 2018
Event2018 IEEE Conference on Computer Communications, INFOCOM 2018 - Honolulu, United States
Duration: Apr 15 2018Apr 19 2018

Publication series

NameProceedings - IEEE INFOCOM
Volume2018-April
ISSN (Print)0743-166X

Other

Other2018 IEEE Conference on Computer Communications, INFOCOM 2018
CountryUnited States
CityHonolulu
Period4/15/184/19/18

Fingerprint

Maximum likelihood
Flavors
Sensors

Keywords

  • Constrained expectation maximization (CEM)
  • Estimation accuracy
  • Social networks
  • Truth discovery

ASJC Scopus subject areas

  • Computer Science(all)
  • Electrical and Electronic Engineering

Cite this

Shao, H., Yao, S., Zhao, Y., Zhang, C., Han, J., Kaplan, L., ... Abdelzaher, T. (2018). A Constrained Maximum Likelihood Estimator for Unguided Social Sensing. In INFOCOM 2018 - IEEE Conference on Computer Communications (pp. 2429-2437). [8486306] (Proceedings - IEEE INFOCOM; Vol. 2018-April). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/INFOCOM.2018.8486306

A Constrained Maximum Likelihood Estimator for Unguided Social Sensing. / Shao, Huajie; Yao, Shuochao; Zhao, Yiran; Zhang, Chao; Han, Jinda; Kaplan, Lance; Su, Lu; Abdelzaher, Tarek.

INFOCOM 2018 - IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc., 2018. p. 2429-2437 8486306 (Proceedings - IEEE INFOCOM; Vol. 2018-April).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shao, H, Yao, S, Zhao, Y, Zhang, C, Han, J, Kaplan, L, Su, L & Abdelzaher, T 2018, A Constrained Maximum Likelihood Estimator for Unguided Social Sensing. in INFOCOM 2018 - IEEE Conference on Computer Communications., 8486306, Proceedings - IEEE INFOCOM, vol. 2018-April, Institute of Electrical and Electronics Engineers Inc., pp. 2429-2437, 2018 IEEE Conference on Computer Communications, INFOCOM 2018, Honolulu, United States, 4/15/18. https://doi.org/10.1109/INFOCOM.2018.8486306
Shao H, Yao S, Zhao Y, Zhang C, Han J, Kaplan L et al. A Constrained Maximum Likelihood Estimator for Unguided Social Sensing. In INFOCOM 2018 - IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc. 2018. p. 2429-2437. 8486306. (Proceedings - IEEE INFOCOM). https://doi.org/10.1109/INFOCOM.2018.8486306
Shao, Huajie ; Yao, Shuochao ; Zhao, Yiran ; Zhang, Chao ; Han, Jinda ; Kaplan, Lance ; Su, Lu ; Abdelzaher, Tarek. / A Constrained Maximum Likelihood Estimator for Unguided Social Sensing. INFOCOM 2018 - IEEE Conference on Computer Communications. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 2429-2437 (Proceedings - IEEE INFOCOM).
@inproceedings{7bdcb2e31f6a4518b8fed4420fe8a628,
title = "A Constrained Maximum Likelihood Estimator for Unguided Social Sensing",
abstract = "This paper develops a constrained expectation maximization algorithm (CEM) that improves the accuracy of truth estimation in unguided social sensing applications. Unguided social sensing refers to the act of leveraging naturally occurring observations on social media as 'sensor measurements', when the sources post at will and not in response to specific sensing campaigns or surveys. A key challenge in social sensing, in general, lies in estimating the veracity of reported observations, when the sources reporting these observations are of unknown reliability and their observations themselves cannot be readily verified. This problem is known as fact-finding. Unsupervised solutions have been proposed to the fact-finding problem that explore notions of internal data consistency in order to estimate observation veracity. This paper observes that unguided social sensing gives rise to a new (and very simple) constraint that dramatically reduces the space of feasible fact-finding solutions, hence significantly improving the quality of fact-finding results. The constraint relies on a simple approximate test of source independence, applicable to unguided sensing, and incorporates information about the number of independent sources of an observation to constrain the posterior estimate of its probability of correctness. Two different approaches are developed to test the independence of sources for purposes of applying this constraint, leading to two flavors of the CEM algorithm, we call CEM and CEM-Jaccard. We show using both simulation and real data sets collected from Twitter that by forcing the algorithm to converge to a solution in which the constraint is satisfied, the quality of solutions is significantly improved.",
keywords = "Constrained expectation maximization (CEM), Estimation accuracy, Social networks, Truth discovery",
author = "Huajie Shao and Shuochao Yao and Yiran Zhao and Chao Zhang and Jinda Han and Lance Kaplan and Lu Su and Tarek Abdelzaher",
year = "2018",
month = "10",
day = "8",
doi = "10.1109/INFOCOM.2018.8486306",
language = "English (US)",
series = "Proceedings - IEEE INFOCOM",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "2429--2437",
booktitle = "INFOCOM 2018 - IEEE Conference on Computer Communications",
address = "United States",

}

TY - GEN

T1 - A Constrained Maximum Likelihood Estimator for Unguided Social Sensing

AU - Shao, Huajie

AU - Yao, Shuochao

AU - Zhao, Yiran

AU - Zhang, Chao

AU - Han, Jinda

AU - Kaplan, Lance

AU - Su, Lu

AU - Abdelzaher, Tarek

PY - 2018/10/8

Y1 - 2018/10/8

N2 - This paper develops a constrained expectation maximization algorithm (CEM) that improves the accuracy of truth estimation in unguided social sensing applications. Unguided social sensing refers to the act of leveraging naturally occurring observations on social media as 'sensor measurements', when the sources post at will and not in response to specific sensing campaigns or surveys. A key challenge in social sensing, in general, lies in estimating the veracity of reported observations, when the sources reporting these observations are of unknown reliability and their observations themselves cannot be readily verified. This problem is known as fact-finding. Unsupervised solutions have been proposed to the fact-finding problem that explore notions of internal data consistency in order to estimate observation veracity. This paper observes that unguided social sensing gives rise to a new (and very simple) constraint that dramatically reduces the space of feasible fact-finding solutions, hence significantly improving the quality of fact-finding results. The constraint relies on a simple approximate test of source independence, applicable to unguided sensing, and incorporates information about the number of independent sources of an observation to constrain the posterior estimate of its probability of correctness. Two different approaches are developed to test the independence of sources for purposes of applying this constraint, leading to two flavors of the CEM algorithm, we call CEM and CEM-Jaccard. We show using both simulation and real data sets collected from Twitter that by forcing the algorithm to converge to a solution in which the constraint is satisfied, the quality of solutions is significantly improved.

AB - This paper develops a constrained expectation maximization algorithm (CEM) that improves the accuracy of truth estimation in unguided social sensing applications. Unguided social sensing refers to the act of leveraging naturally occurring observations on social media as 'sensor measurements', when the sources post at will and not in response to specific sensing campaigns or surveys. A key challenge in social sensing, in general, lies in estimating the veracity of reported observations, when the sources reporting these observations are of unknown reliability and their observations themselves cannot be readily verified. This problem is known as fact-finding. Unsupervised solutions have been proposed to the fact-finding problem that explore notions of internal data consistency in order to estimate observation veracity. This paper observes that unguided social sensing gives rise to a new (and very simple) constraint that dramatically reduces the space of feasible fact-finding solutions, hence significantly improving the quality of fact-finding results. The constraint relies on a simple approximate test of source independence, applicable to unguided sensing, and incorporates information about the number of independent sources of an observation to constrain the posterior estimate of its probability of correctness. Two different approaches are developed to test the independence of sources for purposes of applying this constraint, leading to two flavors of the CEM algorithm, we call CEM and CEM-Jaccard. We show using both simulation and real data sets collected from Twitter that by forcing the algorithm to converge to a solution in which the constraint is satisfied, the quality of solutions is significantly improved.

KW - Constrained expectation maximization (CEM)

KW - Estimation accuracy

KW - Social networks

KW - Truth discovery

UR - http://www.scopus.com/inward/record.url?scp=85056168849&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85056168849&partnerID=8YFLogxK

U2 - 10.1109/INFOCOM.2018.8486306

DO - 10.1109/INFOCOM.2018.8486306

M3 - Conference contribution

AN - SCOPUS:85056168849

T3 - Proceedings - IEEE INFOCOM

SP - 2429

EP - 2437

BT - INFOCOM 2018 - IEEE Conference on Computer Communications

PB - Institute of Electrical and Electronics Engineers Inc.

ER -