Multi-Attribute Topic Feature Construction for Social Media-based Prediction

Alex Morales, Nupoor Gandhi, Man Pui Sally Chan, Sophie Lohmann, Travis Sanchez, Kathleen A. Brady, Lyle Ungar, Dolores Albarracin, Chengxiang Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The effectiveness of social media-based prediction highly depends on whether we can construct effective content-based features based on social media text data. Features constructed based on topics learned using a topic model are very attractive due to their expressiveness in semantic representation and accommodation of inexact matching of semantically related words. We develop a novel general framework for constructing multi-attribute topic features using multi-views of the text data defined according to metadata attributes and study their effectiveness for a text-based prediction task. Furthermore we propose and study multiple weighting strategies to align text-based features and prediction outcomes. We evaluate the proposed method on a Twitter corpus of over 100 million tweets collected over a seven year period in 2009-2015 to predict human immunodeficiency virus (HIV) new diagnosis and other sexually transmitted infections (STIs) new diagnosis in the United States at the zipcode-level and county-level resolutions. The results show that feature representations based on attributes such as authors, locations, and hashtags are generally more effective than the conventional topic feature representation.

Original languageEnglish (US)
Title of host publicationProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
EditorsYang Song, Bing Liu, Kisung Lee, Naoki Abe, Calton Pu, Mu Qiao, Nesreen Ahmed, Donald Kossmann, Jeffrey Saltz, Jiliang Tang, Jingrui He, Huan Liu, Xiaohua Hu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1073-1078
Number of pages6
ISBN (Electronic)9781538650356
DOIs
StatePublished - Jan 22 2019
Event2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States
Duration: Dec 10 2018Dec 13 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

Conference

Conference2018 IEEE International Conference on Big Data, Big Data 2018
CountryUnited States
CitySeattle
Period12/10/1812/13/18

Fingerprint

Metadata
Viruses
Semantics

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems

Cite this

Morales, A., Gandhi, N., Chan, M. P. S., Lohmann, S., Sanchez, T., Brady, K. A., ... Zhai, C. (2019). Multi-Attribute Topic Feature Construction for Social Media-based Prediction. In Y. Song, B. Liu, K. Lee, N. Abe, C. Pu, M. Qiao, N. Ahmed, D. Kossmann, J. Saltz, J. Tang, J. He, H. Liu, ... X. Hu (Eds.), Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018 (pp. 1073-1078). [8622347] (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2018.8622347

Multi-Attribute Topic Feature Construction for Social Media-based Prediction. / Morales, Alex; Gandhi, Nupoor; Chan, Man Pui Sally; Lohmann, Sophie; Sanchez, Travis; Brady, Kathleen A.; Ungar, Lyle; Albarracin, Dolores; Zhai, Chengxiang.

Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018. ed. / Yang Song; Bing Liu; Kisung Lee; Naoki Abe; Calton Pu; Mu Qiao; Nesreen Ahmed; Donald Kossmann; Jeffrey Saltz; Jiliang Tang; Jingrui He; Huan Liu; Xiaohua Hu. Institute of Electrical and Electronics Engineers Inc., 2019. p. 1073-1078 8622347 (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Morales, A, Gandhi, N, Chan, MPS, Lohmann, S, Sanchez, T, Brady, KA, Ungar, L, Albarracin, D & Zhai, C 2019, Multi-Attribute Topic Feature Construction for Social Media-based Prediction. in Y Song, B Liu, K Lee, N Abe, C Pu, M Qiao, N Ahmed, D Kossmann, J Saltz, J Tang, J He, H Liu & X Hu (eds), Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018., 8622347, Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018, Institute of Electrical and Electronics Engineers Inc., pp. 1073-1078, 2018 IEEE International Conference on Big Data, Big Data 2018, Seattle, United States, 12/10/18. https://doi.org/10.1109/BigData.2018.8622347
Morales A, Gandhi N, Chan MPS, Lohmann S, Sanchez T, Brady KA et al. Multi-Attribute Topic Feature Construction for Social Media-based Prediction. In Song Y, Liu B, Lee K, Abe N, Pu C, Qiao M, Ahmed N, Kossmann D, Saltz J, Tang J, He J, Liu H, Hu X, editors, Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018. Institute of Electrical and Electronics Engineers Inc. 2019. p. 1073-1078. 8622347. (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018). https://doi.org/10.1109/BigData.2018.8622347
Morales, Alex ; Gandhi, Nupoor ; Chan, Man Pui Sally ; Lohmann, Sophie ; Sanchez, Travis ; Brady, Kathleen A. ; Ungar, Lyle ; Albarracin, Dolores ; Zhai, Chengxiang. / Multi-Attribute Topic Feature Construction for Social Media-based Prediction. Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018. editor / Yang Song ; Bing Liu ; Kisung Lee ; Naoki Abe ; Calton Pu ; Mu Qiao ; Nesreen Ahmed ; Donald Kossmann ; Jeffrey Saltz ; Jiliang Tang ; Jingrui He ; Huan Liu ; Xiaohua Hu. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 1073-1078 (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018).
@inproceedings{15e64e5f71ad4e74a507c85a84b04510,
title = "Multi-Attribute Topic Feature Construction for Social Media-based Prediction",
abstract = "The effectiveness of social media-based prediction highly depends on whether we can construct effective content-based features based on social media text data. Features constructed based on topics learned using a topic model are very attractive due to their expressiveness in semantic representation and accommodation of inexact matching of semantically related words. We develop a novel general framework for constructing multi-attribute topic features using multi-views of the text data defined according to metadata attributes and study their effectiveness for a text-based prediction task. Furthermore we propose and study multiple weighting strategies to align text-based features and prediction outcomes. We evaluate the proposed method on a Twitter corpus of over 100 million tweets collected over a seven year period in 2009-2015 to predict human immunodeficiency virus (HIV) new diagnosis and other sexually transmitted infections (STIs) new diagnosis in the United States at the zipcode-level and county-level resolutions. The results show that feature representations based on attributes such as authors, locations, and hashtags are generally more effective than the conventional topic feature representation.",
author = "Alex Morales and Nupoor Gandhi and Chan, {Man Pui Sally} and Sophie Lohmann and Travis Sanchez and Brady, {Kathleen A.} and Lyle Ungar and Dolores Albarracin and Chengxiang Zhai",
year = "2019",
month = "1",
day = "22",
doi = "10.1109/BigData.2018.8622347",
language = "English (US)",
series = "Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "1073--1078",
editor = "Yang Song and Bing Liu and Kisung Lee and Naoki Abe and Calton Pu and Mu Qiao and Nesreen Ahmed and Donald Kossmann and Jeffrey Saltz and Jiliang Tang and Jingrui He and Huan Liu and Xiaohua Hu",
booktitle = "Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018",
address = "United States",

}

TY - GEN

T1 - Multi-Attribute Topic Feature Construction for Social Media-based Prediction

AU - Morales, Alex

AU - Gandhi, Nupoor

AU - Chan, Man Pui Sally

AU - Lohmann, Sophie

AU - Sanchez, Travis

AU - Brady, Kathleen A.

AU - Ungar, Lyle

AU - Albarracin, Dolores

AU - Zhai, Chengxiang

PY - 2019/1/22

Y1 - 2019/1/22

N2 - The effectiveness of social media-based prediction highly depends on whether we can construct effective content-based features based on social media text data. Features constructed based on topics learned using a topic model are very attractive due to their expressiveness in semantic representation and accommodation of inexact matching of semantically related words. We develop a novel general framework for constructing multi-attribute topic features using multi-views of the text data defined according to metadata attributes and study their effectiveness for a text-based prediction task. Furthermore we propose and study multiple weighting strategies to align text-based features and prediction outcomes. We evaluate the proposed method on a Twitter corpus of over 100 million tweets collected over a seven year period in 2009-2015 to predict human immunodeficiency virus (HIV) new diagnosis and other sexually transmitted infections (STIs) new diagnosis in the United States at the zipcode-level and county-level resolutions. The results show that feature representations based on attributes such as authors, locations, and hashtags are generally more effective than the conventional topic feature representation.

AB - The effectiveness of social media-based prediction highly depends on whether we can construct effective content-based features based on social media text data. Features constructed based on topics learned using a topic model are very attractive due to their expressiveness in semantic representation and accommodation of inexact matching of semantically related words. We develop a novel general framework for constructing multi-attribute topic features using multi-views of the text data defined according to metadata attributes and study their effectiveness for a text-based prediction task. Furthermore we propose and study multiple weighting strategies to align text-based features and prediction outcomes. We evaluate the proposed method on a Twitter corpus of over 100 million tweets collected over a seven year period in 2009-2015 to predict human immunodeficiency virus (HIV) new diagnosis and other sexually transmitted infections (STIs) new diagnosis in the United States at the zipcode-level and county-level resolutions. The results show that feature representations based on attributes such as authors, locations, and hashtags are generally more effective than the conventional topic feature representation.

UR - http://www.scopus.com/inward/record.url?scp=85062589317&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062589317&partnerID=8YFLogxK

U2 - 10.1109/BigData.2018.8622347

DO - 10.1109/BigData.2018.8622347

M3 - Conference contribution

T3 - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

SP - 1073

EP - 1078

BT - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

A2 - Song, Yang

A2 - Liu, Bing

A2 - Lee, Kisung

A2 - Abe, Naoki

A2 - Pu, Calton

A2 - Qiao, Mu

A2 - Ahmed, Nesreen

A2 - Kossmann, Donald

A2 - Saltz, Jeffrey

A2 - Tang, Jiliang

A2 - He, Jingrui

A2 - Liu, Huan

A2 - Hu, Xiaohua

PB - Institute of Electrical and Electronics Engineers Inc.

ER -