Phrase Embedding and Clustering for Sub-Feature Extraction from Online Data

Seyoung Park, Harrison M. Kim

Research output: Contribution to journalArticlepeer-review


Recently, online user-generated data have been used as an efficient resource for customer analysis. In the product design area, various methods for analyzing customer preference for product features have been suggested. However, most of them focused on feature categories rather than product components which are crucial in practical applications. To address that limitation, this paper proposes a new methodology for extracting sub-features from online data. First, the method detects phrases in the data and filtered them using product manual documents. The filtered phrases are embedded into vectors, and then they are divided into several groups by two clustering methods. The resulting clusters are labeled by analyzing items in each cluster. Finally, cue phrases for sub-features are obtained by selecting clusters with labels representing product features. The proposed methodology was tested on smartphone review data. The result provides feature clusters containing sub-feature phrases with high accuracy. The obtained cue phrases will be used in analyzing customer preferences for sub-features and this can help product designers determine the optimal component configuration in embodiment design.

Original languageEnglish (US)
Article number054501 EN
JournalJournal of Mechanical Design
Issue number5
StatePublished - May 1 2022


  • data mining
  • feature extraction
  • online data

ASJC Scopus subject areas

  • Mechanics of Materials
  • Mechanical Engineering
  • Computer Science Applications
  • Computer Graphics and Computer-Aided Design


Dive into the research topics of 'Phrase Embedding and Clustering for Sub-Feature Extraction from Online Data'. Together they form a unique fingerprint.

Cite this