Retrieving Webpages Using Online Discussions

Kevin Ros, Matthew Jin, Jacob Levine, Cheng Xiang Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Online discussions are a ubiquitous aspect of everyday life. An Internet user who interacts with an online discussion may benefit from seeing hyperlinks to webpages relevant to the discussion because the relevant webpages can provide added context, act as citations for background sources, or condense information so that conversations can proceed seamlessly at a high level. In this paper, we propose and study a new task of retrieving relevant webpages given an online discussion. We frame the task as a novel retrieval problem where we treat a sequence of comments in an online discussion as a query and use such a query to retrieve relevant webpages. We construct a new data set using Reddit, an online discussion forum, to study this new problem. We explore and evaluate multiple representative retrieval methods to examine their effectiveness for solving this new problem. We also propose to leverage the comments that contain hyperlinks as training data to enable supervised learning and further improve retrieval performance. We find that results using modern retrieval methods are promising and that leveraging comments with hyperlinks as training data can further improve performance. We release our data set and code to enable additional research in this direction.

Original languageEnglish (US)
Title of host publicationICTIR 2023 - Proceedings of the 2023 ACM SIGIR International Conference on the Theory of Information Retrieval
PublisherAssociation for Computing Machinery
Pages159-168
Number of pages10
ISBN (Electronic)9798400700736
DOIs
StatePublished - Aug 9 2023
Externally publishedYes
Event9th ACM SIGIR International Conference on the Theory of Information Retrieval, ICTIR 2023 - Taipei, Taiwan, Province of China
Duration: Jul 23 2023 → …

Publication series

NameICTIR 2023 - Proceedings of the 2023 ACM SIGIR International Conference on the Theory of Information Retrieval

Conference

Conference9th ACM SIGIR International Conference on the Theory of Information Retrieval, ICTIR 2023
Country/TerritoryTaiwan, Province of China
CityTaipei
Period7/23/23 → …

Keywords

  • discussion forums
  • hyperlink prediction
  • information retrieval

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Information Systems

Fingerprint

Dive into the research topics of 'Retrieving Webpages Using Online Discussions'. Together they form a unique fingerprint.

Cite this