TY - GEN
T1 - The bag of communities
T2 - 2017 ACM SIGCHI Conference on Human Factors in Computing Systems, CHI 2017
AU - Chandrasekharan, Eshwar
AU - Samory, Mattia
AU - Srinivasan, Anirudh
AU - Gilbert, Eric
N1 - Publisher Copyright:
© 2017 ACM.
PY - 2017/5/2
Y1 - 2017/5/2
N2 - Since its earliest days, harassment and abuse have plagued the Internet. Recent research has focused on in-domain methods to detect abusive content and faces several challenges, most notably the need to obtain large training corpora. In this paper, we introduce a novel computational approach to address this problem called Bag of Communities (BoC) - a technique that leverages large-scale, preexisting data from other Internet communities. We then apply BoC toward identifying abusive behavior within a major Internet community. Specifically, we compute a post's similarity to 9 other communities from 4chan, Reddit, Voat and MetaFilter. We show that a BoC model can be used on communities "off the shelf" with roughly 75% accuracy - no training examples are needed from the target community. A dynamic BoC model achieves 91.18% accuracy after seeing 100, 000 human-moderated posts, and uniformly outperforms in-domain methods. Using this conceptual and empirical work, we argue that the BoC approach may allow communities to deal with a range of common problems, like abusive behavior, faster and with fewer engineering resources.
AB - Since its earliest days, harassment and abuse have plagued the Internet. Recent research has focused on in-domain methods to detect abusive content and faces several challenges, most notably the need to obtain large training corpora. In this paper, we introduce a novel computational approach to address this problem called Bag of Communities (BoC) - a technique that leverages large-scale, preexisting data from other Internet communities. We then apply BoC toward identifying abusive behavior within a major Internet community. Specifically, we compute a post's similarity to 9 other communities from 4chan, Reddit, Voat and MetaFilter. We show that a BoC model can be used on communities "off the shelf" with roughly 75% accuracy - no training examples are needed from the target community. A dynamic BoC model achieves 91.18% accuracy after seeing 100, 000 human-moderated posts, and uniformly outperforms in-domain methods. Using this conceptual and empirical work, we argue that the BoC approach may allow communities to deal with a range of common problems, like abusive behavior, faster and with fewer engineering resources.
KW - Abusive behavior
KW - Machine learning
KW - Moderation
KW - Online communities
KW - Social computing
UR - http://www.scopus.com/inward/record.url?scp=85029490388&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85029490388&partnerID=8YFLogxK
U2 - 10.1145/3025453.3026018
DO - 10.1145/3025453.3026018
M3 - Conference contribution
AN - SCOPUS:85029490388
T3 - Conference on Human Factors in Computing Systems - Proceedings
SP - 3175
EP - 3187
BT - CHI 2017 - Proceedings of the 2017 ACM SIGCHI Conference on Human Factors in Computing Systems
PB - Association for Computing Machinery
Y2 - 6 May 2017 through 11 May 2017
ER -