TY - JOUR
T1 - Word embedding enrichment for dictionary construction
T2 - An example of incivility in Cantonese
AU - Liang, Hai
AU - Ng, Yee Man Margaret
AU - Tsang, Nathan L.T.
N1 - Publisher Copyright:
© The authors.
PY - 2023
Y1 - 2023
N2 - Dictionary-based methods remain valuable to measure concepts based on texts, though supervised machine learning has been widely used in much recent communication research. The present study proposes a semi-automatic and easily implemented method to build and enrich dictionaries based on word embeddings. As an example, we create a dictionary of political incivility that contains vulgarity and name-calling words in Cantonese. The study shows that dictionary-based classification outperforms supervised machine learning methods, including deep neural network models. Furthermore, a small number of random seed words can generate a highly accurate dictionary. However, the uncivil content detected is only weakly correlated with uncivil perceptions, as we demonstrate in a population-based survey experiment. The strengths and limitations of dictionary-based methods are discussed.
AB - Dictionary-based methods remain valuable to measure concepts based on texts, though supervised machine learning has been widely used in much recent communication research. The present study proposes a semi-automatic and easily implemented method to build and enrich dictionaries based on word embeddings. As an example, we create a dictionary of political incivility that contains vulgarity and name-calling words in Cantonese. The study shows that dictionary-based classification outperforms supervised machine learning methods, including deep neural network models. Furthermore, a small number of random seed words can generate a highly accurate dictionary. However, the uncivil content detected is only weakly correlated with uncivil perceptions, as we demonstrate in a population-based survey experiment. The strengths and limitations of dictionary-based methods are discussed.
KW - Cantonese
KW - dictionary construction
KW - machine learning
KW - political incivility
KW - swearing
UR - http://www.scopus.com/inward/record.url?scp=85176269933&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85176269933&partnerID=8YFLogxK
U2 - 10.5117/CCR2023.1.10.LIAN
DO - 10.5117/CCR2023.1.10.LIAN
M3 - Article
AN - SCOPUS:85176269933
SN - 2665-9085
VL - 5
JO - Computational Communication Research
JF - Computational Communication Research
IS - 1
ER -