Effective use of word order for text categorization with convolutional neural networks

Rie Johnson, Tong Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Convolutional neural network (CNN) is a neural network that can make use of the internal structure of data such as the 2D structure of image data. This paper studies CNN on text categorization to exploit the 1D structure (namely, word order) of text data for accurate prediction. Instead of using low-dimensional word vectors as input as is often done, we directly apply CNN to high-dimensional text data, which leads to directly learning embedding of small text regions for use in classification. In addition to a straightforward adaptation of CNN from image to text, a simple but new variation which employs bag-ofword conversion in the convolution layer is proposed. An extension to combine multiple convolution layers is also explored for higher accuracy. The experiments demonstrate the effectiveness of our approach in comparison with state-of-the-art methods.

Original languageEnglish (US)
Title of host publicationNAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages103-112
Number of pages10
ISBN (Electronic)9781941643495
DOIs
StatePublished - 2015
Externally publishedYes
EventConference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015 - Denver, United States
Duration: May 31 2015Jun 5 2015

Publication series

NameNAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

Other

OtherConference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2015
Country/TerritoryUnited States
CityDenver
Period5/31/156/5/15

ASJC Scopus subject areas

  • Computer Science Applications
  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Effective use of word order for text categorization with convolutional neural networks'. Together they form a unique fingerprint.

Cite this