TY - GEN
T1 - LLM-Mod
T2 - 2024 CHI Conference on Human Factors in Computing Sytems, CHI EA 2024
AU - Kolla, Mahi
AU - Salunkhe, Siddharth
AU - Chandrasekharan, Eshwar
AU - Saha, Koustuv
N1 - Publisher Copyright:
© 2024 Association for Computing Machinery. All rights reserved.
PY - 2024/5/11
Y1 - 2024/5/11
N2 - Content moderation is critical for maintaining healthy online spaces. However, it remains a predominantly manual task. Moderators are often exhausted by low moderator-to-posts ratio. Researchers have been exploring computational tools to assist human moderators. The natural language understanding capabilities of large language models (LLMs) open up possibilities to use LLMs for online moderation. This work explores the feasibility of using LLMs to identify rule violations on Reddit.We examine howan LLM-based moderator (LLM-Mod) reasons about 744 posts across 9 subreddits that violate different types of rules. We find that while LLM-Mod has a good true-negative rate (92.3%), it has a bad true-positive rate (43.1%), performing poorly when flagging rule-violating posts. LLM-Mod is likely to flag keyword-matching-based rule violations, but cannot reason about posts with higher complexity. We discuss the considerations for integrating LLMs into content moderation workflows and designing platforms that support both AI-driven and human-in-the-loop moderation.
AB - Content moderation is critical for maintaining healthy online spaces. However, it remains a predominantly manual task. Moderators are often exhausted by low moderator-to-posts ratio. Researchers have been exploring computational tools to assist human moderators. The natural language understanding capabilities of large language models (LLMs) open up possibilities to use LLMs for online moderation. This work explores the feasibility of using LLMs to identify rule violations on Reddit.We examine howan LLM-based moderator (LLM-Mod) reasons about 744 posts across 9 subreddits that violate different types of rules. We find that while LLM-Mod has a good true-negative rate (92.3%), it has a bad true-positive rate (43.1%), performing poorly when flagging rule-violating posts. LLM-Mod is likely to flag keyword-matching-based rule violations, but cannot reason about posts with higher complexity. We discuss the considerations for integrating LLMs into content moderation workflows and designing platforms that support both AI-driven and human-in-the-loop moderation.
UR - http://www.scopus.com/inward/record.url?scp=85194133383&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85194133383&partnerID=8YFLogxK
U2 - 10.1145/3613905.3650828
DO - 10.1145/3613905.3650828
M3 - Conference contribution
AN - SCOPUS:85194133383
T3 - Conference on Human Factors in Computing Systems - Proceedings
BT - CHI 2024 - Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Sytems
PB - Association for Computing Machinery
Y2 - 11 May 2024 through 16 May 2024
ER -