Authorship classification: A discriminative syntactic tree mining approach

Sangkyum Kim, Hyungsul Kim, Tim Weninger, Jiawei Han, Hyun Duk Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In the past, there have been dozens of studies on automatic authorship classification, and many of these studies concluded that the writing style is one of the best indicators for original authorship. From among the hundreds of features which were developed, syntactic features were best able to reflect an author's writing style. However, due to the high computational complexity for extracting and computing syntactic features, only simple variations of basic syntactic features such as function words, POS(Part of Speech) tags, and rewrite rules were considered. In this paper, we propose a new feature set of k-embedded-edge subtree patterns that holds more syntactic information than previous feature sets. We also propose a novel approach to directly mining them from a given set of syntactic trees. We show that this approach reduces the computational burden of using complex syntactic structures as the feature set. Comprehensive experiments on real-world datasets demonstrate that our approach is reliable and more accurate than previous studies.

Original languageEnglish (US)
Title of host publicationSIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery
Pages455-464
Number of pages10
ISBN (Print)9781450309349
DOIs
StatePublished - 2011
Event34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011 - Beijing, China
Duration: Jul 24 2011Jul 28 2011

Publication series

NameSIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

Other

Other34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011
Country/TerritoryChina
CityBeijing
Period7/24/117/28/11

Keywords

  • Authorship attribution
  • Authorship classification
  • Authorship discrimination
  • Text categorization
  • Text mining

ASJC Scopus subject areas

  • Information Systems

Fingerprint

Dive into the research topics of 'Authorship classification: A discriminative syntactic tree mining approach'. Together they form a unique fingerprint.

Cite this