TY - JOUR
T1 - A syntactic characterization of authorship style surrounding proper names
AU - Lučić, Ana
AU - Blake, Catherine Lesley
N1 - Publisher Copyright:
© The Author 2013. Published by Oxford University Press on behalf of EADH.
PY - 2015/4/1
Y1 - 2015/4/1
N2 - Accurately determining who wrote a manuscript has captivated scholars of literary history for centuries, as the true author can have important ramifications in religion, law, literary studies, philosophy, and education. A wide array of lexical, character, syntactic, semantic, and application-specific features have been proposed to represent a text so that authorship attribution can be established automatically. Although surface-level features have been tested extensively, few studies have systematically explored high-level features, in part due to limitations in the natural language processing techniques required to capture highlevel features. However, high-level features, such as sentence structure, are used subconsciously by a writer and thus may be more consistent than surface-level features, such as word choice. In this article, we introduce a new high-level feature based on local syntactic dependencies that an author uses when referring to a named entity (in our case a person's name). The series of experiments in the contexts of movie reviews reveal how the amount of data in both the training and test sets influences predictive performance. Finally, we measure authorship consistency with respect to this new feature and show how consistency influences predictive performance. These results provide other researchers with a new model for how to evaluate new features and suggest that the local syntactic dependencies warrant further investigation.
AB - Accurately determining who wrote a manuscript has captivated scholars of literary history for centuries, as the true author can have important ramifications in religion, law, literary studies, philosophy, and education. A wide array of lexical, character, syntactic, semantic, and application-specific features have been proposed to represent a text so that authorship attribution can be established automatically. Although surface-level features have been tested extensively, few studies have systematically explored high-level features, in part due to limitations in the natural language processing techniques required to capture highlevel features. However, high-level features, such as sentence structure, are used subconsciously by a writer and thus may be more consistent than surface-level features, such as word choice. In this article, we introduce a new high-level feature based on local syntactic dependencies that an author uses when referring to a named entity (in our case a person's name). The series of experiments in the contexts of movie reviews reveal how the amount of data in both the training and test sets influences predictive performance. Finally, we measure authorship consistency with respect to this new feature and show how consistency influences predictive performance. These results provide other researchers with a new model for how to evaluate new features and suggest that the local syntactic dependencies warrant further investigation.
UR - http://www.scopus.com/inward/record.url?scp=84974715559&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84974715559&partnerID=8YFLogxK
U2 - 10.1093/llc/fqt033
DO - 10.1093/llc/fqt033
M3 - Article
AN - SCOPUS:84974715559
SN - 2055-7671
VL - 30
SP - 53
EP - 70
JO - Digital Scholarship in the Humanities
JF - Digital Scholarship in the Humanities
IS - 1
ER -