A supervised POS tagger for written arabic social networking corpora

Rania Al-Sabbagh, Corina R Girju

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents an implementation of Brill's Transformation-Based Part-of-Speech (POS) tagging algorithm trained on a manually-annotated Twitter-based Egyptian Arabic corpus of 423,691 tokens and 70,163 types. Unlike standard POS morpho-syntactic annotation schemes which label each word based on its word-level morpho-syntactic features, we use a function-based annotation scheme in which words are labeled based on their grammatical functions rather than their morpho-syntactic structures given that these two do not necessarily map. While a standard morpho-syntactic scheme makes comparisons with other work easier, the function-based scheme is assumed to be more efficient for building higher-up tools such as base-phrase chunkers, dependency parsers and for NLP applications like subjectivity and sentiment analysis. The function-based scheme also gives new insights about linguistic structural realizations specific to Egyptian Arabic which is currently an under-resourced language.

Original languageEnglish (US)
Title of host publication11th Conference on Natural Language Processing
Subtitle of host publicationEmpirical Methods in Natural Language Processing - Proceedings of the Conference on Natural Language Processing 2012
PublisherKONVENS
Pages39-52
Number of pages14
ISBN (Print)385027005X, 9783850270052
StatePublished - Dec 1 2012
Event11th Conference on Natural Language Processing 2012: Empirical Methods in Natural Language Processing, KONVENS 2012 - Vienna, Austria
Duration: Sep 19 2012Sep 21 2012

Publication series

Name11th Conference on Natural Language Processing, KONVENS 2012: Empirical Methods in Natural Language Processing - Proceedings of the Conference on Natural Language Processing 2012
Volume5

Other

Other11th Conference on Natural Language Processing 2012: Empirical Methods in Natural Language Processing, KONVENS 2012
CountryAustria
CityVienna
Period9/19/129/21/12

ASJC Scopus subject areas

  • Software

Fingerprint Dive into the research topics of 'A supervised POS tagger for written arabic social networking corpora'. Together they form a unique fingerprint.

Cite this