The bag of communities: Identifying abusive behavior online with preexisting internet data

Eshwar Chandrasekharan, Mattia Samory, Anirudh Srinivasan, Eric Gilbert

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Since its earliest days, harassment and abuse have plagued the Internet. Recent research has focused on in-domain methods to detect abusive content and faces several challenges, most notably the need to obtain large training corpora. In this paper, we introduce a novel computational approach to address this problem called Bag of Communities (BoC) - a technique that leverages large-scale, preexisting data from other Internet communities. We then apply BoC toward identifying abusive behavior within a major Internet community. Specifically, we compute a post's similarity to 9 other communities from 4chan, Reddit, Voat and MetaFilter. We show that a BoC model can be used on communities "off the shelf" with roughly 75% accuracy - no training examples are needed from the target community. A dynamic BoC model achieves 91.18% accuracy after seeing 100, 000 human-moderated posts, and uniformly outperforms in-domain methods. Using this conceptual and empirical work, we argue that the BoC approach may allow communities to deal with a range of common problems, like abusive behavior, faster and with fewer engineering resources.

Original languageEnglish (US)
Title of host publicationCHI 2017 - Proceedings of the 2017 ACM SIGCHI Conference on Human Factors in Computing Systems
Subtitle of host publicationExplore, Innovate, Inspire
PublisherAssociation for Computing Machinery
Pages3175-3187
Number of pages13
ISBN (Electronic)9781450346559
DOIs
StatePublished - May 2 2017
Externally publishedYes
Event2017 ACM SIGCHI Conference on Human Factors in Computing Systems, CHI 2017 - Denver, United States
Duration: May 6 2017May 11 2017

Publication series

NameConference on Human Factors in Computing Systems - Proceedings
Volume2017-May

Other

Other2017 ACM SIGCHI Conference on Human Factors in Computing Systems, CHI 2017
Country/TerritoryUnited States
CityDenver
Period5/6/175/11/17

Keywords

  • Abusive behavior
  • Machine learning
  • Moderation
  • Online communities
  • Social computing

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Graphics and Computer-Aided Design

Fingerprint

Dive into the research topics of 'The bag of communities: Identifying abusive behavior online with preexisting internet data'. Together they form a unique fingerprint.

Cite this