Profile-based authorship analysis

Jonathan Dunn, Shlomo Argamon, Amin Rasooli, Geet Kumar

Research output: Contribution to journalArticlepeer-review

Abstract

This article presents a profile-based authorship analysis method which first categorizes texts according to social and conceptual characteristics of their author (e.g. Sex and Political Ideology) and then combines these profiles for two authorship analysis tasks: (1) determining shared authorship of pairs of texts without a set of candidate authors and (2) clustering texts according to characteristics of their authors in order to provide an analysis of the types of individuals represented in the data set. The first task outperforms Burrows' Delta by a wide margin on short texts and a small margin on long texts. The second task has no such benchmark with existing methods. The data set for evaluating the method consists of speeches from the US House and Senate from 1995 to 2013. This data set contains both a large number of texts (42,000 in the test sets) and a large number of speakers (over 800). The article shows that this approach to authorship analysis is more accurate than existing approaches given a data set with hundreds of authors. Further, this profile-based method makes new types of analysis possible by looking at types of individuals as well as at specific individuals.

Original languageEnglish (US)
Pages (from-to)689-710
Number of pages22
JournalDigital Scholarship in the Humanities
Volume31
Issue number4
DOIs
StatePublished - Dec 1 2016
Externally publishedYes

ASJC Scopus subject areas

  • Information Systems
  • Language and Linguistics
  • Linguistics and Language
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Profile-based authorship analysis'. Together they form a unique fingerprint.

Cite this