Modeling Global Syntactic Variation in English Using Dialect Classification

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper evaluates global-scale dialect identification for 14 national varieties of English on both web-crawled data and Twitter data. The paper makes three main contributions: (i) introducing data-driven language mapping as a method for selecting the inventory of national varieties to include in the task; (ii) producing a large and dynamic set of syntactic features using grammar induction rather than focusing on a few hand-selected features such as function words; and (iii) comparing models across both web corpora and social media corpora in order to measure the robustness of syntactic variation across registers.
Original languageEnglish (US)
Title of host publicationProceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
EditorsMarcos Zampieri, Preslav Nakov, Shervin Malmasi, Nikola Ljubešić, Jörg Tiedemann, Ahmed Ali
PublisherAssociation for Computational Linguistics
Pages42-53
ISBN (Electronic)9781950737116
DOIs
StatePublished - Jun 2019
Externally publishedYes

Fingerprint

Dive into the research topics of 'Modeling Global Syntactic Variation in English Using Dialect Classification'. Together they form a unique fingerprint.

Cite this