Decision tree Thompson sampling for mining hidden populations through attributed search

Suhansanu Kumar, Heting Gao, Changyu Wang, Kevin Chen Chuan Chang, Hari Sundaram

Research output: Contribution to journalArticlepeer-review

Abstract

Researchers often query online social platforms through their application programming interfaces (API) to find target populations such as people with mental illness (De Choudhury et al. in Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing, CSCW ’17. ACM, New York, pp 353–369, https://doi.org/10.1145/2998181.2998220, 2017) and jazz musicians (Heckathorn and Jeffri in Poetics 28(4):307, 2001). Entities of such target population satisfy a property that is typically identified using an oracle (human or a pre-trained classifier). When the property of the target entities is not directly queryable via the API, we refer to the property as ‘hidden’ and the population as hidden population. Our objective in this paper is to sample such target populations that satisfy a certain property that can be verified by an oracle (typically a pre-trained classifier). However, the property itself cannot be a part of the query. We refer to such populations which are not directly queryable via the API as a hidden population. Finding individuals who belong to these populations on social networks is hard because they are non-queryable, and the sampler has to explore from a combinatorial query space within a finite budget limit. Further, limited API calls, a limited number of queryable attributes to explore from and an exponential query space to select a query from make the problem very challenging. To address the challenges, we model the search for hidden population search as a decision problem. By exploiting the correlation between queryable attributes and the population of interest and by hierarchically ordering the query space, we propose a decision-tree-based Thompson sampler (DT-TMP) that efficiently discovers the right combination of attributes to query. Additionally, DT-TMP alleviates the problem of exploring the exponentially large query space through a hierarchical ordering of the queries. Our proposed sampler outperforms the state-of-the-art samplers in online experiments, for example, by 54% in Twitter. When the number of matching entities to a query is known in offline experiments, DT-TMP performs exceedingly well by a factor of 0.9–1.5 × over the baseline samplers. In future, we wish to explore the option of finding hidden populations by formulating more complex queries.

Original languageEnglish (US)
Article number6
JournalSocial Network Analysis and Mining
Volume12
Issue number1
DOIs
StatePublished - Dec 2022

Keywords

  • Decision tree
  • Hidden population mining
  • Reinforcement learning
  • Sampling

ASJC Scopus subject areas

  • Information Systems
  • Communication
  • Media Technology
  • Human-Computer Interaction
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Decision tree Thompson sampling for mining hidden populations through attributed search'. Together they form a unique fingerprint.

Cite this