TY - GEN
T1 - AUTOMATED SUB-FEATURE LABELING USING PROMPT-BASED PRETRAINED LANGUAGE MODEL
AU - Park, Seyoung
AU - Jiang, Yilan
AU - Kim, Harrison
N1 - Publisher Copyright:
Copyright © 2024 by ASME.
PY - 2024
Y1 - 2024
N2 - Many studies have been utilizing online user-generated data to draw product design implications via supervised and unsupervised approaches. While the supervised learning methods typically yield higher performance, they demand extensive data labeling tasks, consuming significant time and effort. This study proposes a framework that automatically labels online user data to address this limitation. The proposed framework consists of two pseudo-labeling mechanisms, key word detection and prompting Pretraied Language Model (PLM). The first stage defines key word for the target topic and then labels datasets by checking if the data contains these key word. The second stage employs the PLM and labels datasets based on their context. Specifically, Prompting PLM adds a task-specific template at the end of the given text data (review) and predicts the masked token (label). This PLM-based approach serves as promising labeling candidates as they can make predictions without additional training data from the target domain. The suggested method was tested on a case study with real-world datasets. The study validates the effectiveness of this novel framework by comparing the pseudo-labeled results on smartphone sub-features to manual ground-truths. The results demonstrate that the new framework achieves F1 scores 28% and 14% higher than a baseline for screen and battery, respectively.
AB - Many studies have been utilizing online user-generated data to draw product design implications via supervised and unsupervised approaches. While the supervised learning methods typically yield higher performance, they demand extensive data labeling tasks, consuming significant time and effort. This study proposes a framework that automatically labels online user data to address this limitation. The proposed framework consists of two pseudo-labeling mechanisms, key word detection and prompting Pretraied Language Model (PLM). The first stage defines key word for the target topic and then labels datasets by checking if the data contains these key word. The second stage employs the PLM and labels datasets based on their context. Specifically, Prompting PLM adds a task-specific template at the end of the given text data (review) and predicts the masked token (label). This PLM-based approach serves as promising labeling candidates as they can make predictions without additional training data from the target domain. The suggested method was tested on a case study with real-world datasets. The study validates the effectiveness of this novel framework by comparing the pseudo-labeled results on smartphone sub-features to manual ground-truths. The results demonstrate that the new framework achieves F1 scores 28% and 14% higher than a baseline for screen and battery, respectively.
UR - http://www.scopus.com/inward/record.url?scp=85210825728&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85210825728&partnerID=8YFLogxK
U2 - 10.1115/DETC2024-142891
DO - 10.1115/DETC2024-142891
M3 - Conference contribution
AN - SCOPUS:85210825728
T3 - Proceedings of the ASME Design Engineering Technical Conference
BT - 50th Design Automation Conference (DAC)
PB - American Society of Mechanical Engineers (ASME)
T2 - ASME 2024 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, IDETC-CIE 2024
Y2 - 25 August 2024 through 28 August 2024
ER -