TY - GEN
T1 - MEGClass
T2 - 2023 Findings of the Association for Computational Linguistics: EMNLP 2023
AU - Kargupta, Priyanka
AU - Komarlu, Tanay
AU - Yoon, Susik
AU - Wang, Xuan
AU - Han, Jiawei
N1 - This research was supported in part by the US DARPA KAIROS Program No. FA8750-19-2-1004 and the National Research Foundation of Korea (Basic Science Research Program: 2021R1A6A3A14043765). Any opinions, findings, and conclusions or recommendations expressed herein are those of the authors and do not necessarily represent the views, either expressed or implied, of DARPA or the U.S. Government.
PY - 2023
Y1 - 2023
N2 - Text classification is essential for organizing unstructured text. Traditional methods rely on human annotations or, more recently, a set of class seed words for supervision, which can be costly, particularly for specialized or emerging domains. To address this, using class surface names alone as extremely weak supervision has been proposed. However, existing approaches treat different levels of text granularity (documents, sentences, or words) independently, disregarding inter-granularity class disagreements and the context identifiable exclusively through joint extraction. In order to tackle these issues, we introduce MEGClass, an extremely weakly supervised text classification method that leverages Mutually-Enhancing Text Granularities. MEGClass utilizes coarse- and fine-grained context signals obtained by jointly considering a document's most class-indicative words and sentences. This approach enables the learning of a contextualized document representation that captures the most discriminative class indicators. By preserving the heterogeneity of potential classes, MEGClass can select the most informative class-indicative documents as iterative feedback to enhance the initial word-based class representations and ultimately fine-tune a pre-trained text classifier. Extensive experiments on seven benchmark datasets demonstrate that MEGClass outperforms other weakly and extremely weakly supervised methods.
AB - Text classification is essential for organizing unstructured text. Traditional methods rely on human annotations or, more recently, a set of class seed words for supervision, which can be costly, particularly for specialized or emerging domains. To address this, using class surface names alone as extremely weak supervision has been proposed. However, existing approaches treat different levels of text granularity (documents, sentences, or words) independently, disregarding inter-granularity class disagreements and the context identifiable exclusively through joint extraction. In order to tackle these issues, we introduce MEGClass, an extremely weakly supervised text classification method that leverages Mutually-Enhancing Text Granularities. MEGClass utilizes coarse- and fine-grained context signals obtained by jointly considering a document's most class-indicative words and sentences. This approach enables the learning of a contextualized document representation that captures the most discriminative class indicators. By preserving the heterogeneity of potential classes, MEGClass can select the most informative class-indicative documents as iterative feedback to enhance the initial word-based class representations and ultimately fine-tune a pre-trained text classifier. Extensive experiments on seven benchmark datasets demonstrate that MEGClass outperforms other weakly and extremely weakly supervised methods.
UR - http://www.scopus.com/inward/record.url?scp=85183293237&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85183293237&partnerID=8YFLogxK
U2 - 10.18653/v1/2023.findings-emnlp.708
DO - 10.18653/v1/2023.findings-emnlp.708
M3 - Conference contribution
AN - SCOPUS:85183293237
T3 - Findings of the Association for Computational Linguistics: EMNLP 2023
SP - 10543
EP - 10558
BT - Findings of the Association for Computational Linguistics
PB - Association for Computational Linguistics (ACL)
Y2 - 6 December 2023 through 10 December 2023
ER -