TY - GEN
T1 - Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy
AU - Kang, Seong Ku
AU - Agarwal, Shivam
AU - Jin, Bowen
AU - Lee, Dongha
AU - Yu, Hwanjo
AU - Han, Jiawei
N1 - This work was supported IITP grant funded by MSIT (No.2018-0- 00584, No.2019-0-01906), NRF grant funded by the MSIT (No.RS- 2023-00217286, No.2020R1A2B5B03097210). It was also in part by US DARPA KAIROS Program No. FA8750-19-2-1004 and INCAS Program No. HR001121C0165, National Science Foundation IIS- 19-56151, and the Molecule Maker Lab Institute: An AI Research Institutes program supported by NSF under Award No. 2019897, and the Institute for Geospatial Understanding through an Integrative Discovery Environment (I-GUIDE) by NSF under Award No. 2118329.
PY - 2024/5/13
Y1 - 2024/5/13
N2 - Document retrieval has greatly benefited from the advancements of large-scale pre-trained language models (PLMs). However, their effectiveness is often limited in theme-specific applications for specialized areas or industries, due to unique terminologies, incomplete contexts of user queries, and specialized search intents. To capture the theme-specific information and improve retrieval, we propose to use a corpus topical taxonomy, which outlines the latent topic structure of the corpus while reflecting user-interested aspects. We introduce ToTER (Topical Taxonomy Enhanced Retrieval) framework, which identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts. As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers. Through extensive quantitative, ablative, and exploratory experiments on two real-world datasets, we ascertain the benefits of using topical taxonomy for retrieval in theme-specific applications and demonstrate the effectiveness of ToTER.
AB - Document retrieval has greatly benefited from the advancements of large-scale pre-trained language models (PLMs). However, their effectiveness is often limited in theme-specific applications for specialized areas or industries, due to unique terminologies, incomplete contexts of user queries, and specialized search intents. To capture the theme-specific information and improve retrieval, we propose to use a corpus topical taxonomy, which outlines the latent topic structure of the corpus while reflecting user-interested aspects. We introduce ToTER (Topical Taxonomy Enhanced Retrieval) framework, which identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts. As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers. Through extensive quantitative, ablative, and exploratory experiments on two real-world datasets, we ascertain the benefits of using topical taxonomy for retrieval in theme-specific applications and demonstrate the effectiveness of ToTER.
KW - document retrieval
KW - theme-specific application
KW - topical taxonomy
UR - http://www.scopus.com/inward/record.url?scp=85194077885&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85194077885&partnerID=8YFLogxK
U2 - 10.1145/3589334.3645512
DO - 10.1145/3589334.3645512
M3 - Conference contribution
AN - SCOPUS:85194077885
T3 - WWW 2024 - Proceedings of the ACM Web Conference
SP - 1497
EP - 1508
BT - WWW 2024 - Proceedings of the ACM Web Conference
PB - Association for Computing Machinery
T2 - 33rd ACM Web Conference, WWW 2024
Y2 - 13 May 2024 through 17 May 2024
ER -