TY - GEN
T1 - A supervised POS tagger for written arabic social networking corpora
AU - Al-Sabbagh, Rania
AU - Girju, Corina R
PY - 2012
Y1 - 2012
N2 - This paper presents an implementation of Brill's Transformation-Based Part-of-Speech (POS) tagging algorithm trained on a manually-annotated Twitter-based Egyptian Arabic corpus of 423,691 tokens and 70,163 types. Unlike standard POS morpho-syntactic annotation schemes which label each word based on its word-level morpho-syntactic features, we use a function-based annotation scheme in which words are labeled based on their grammatical functions rather than their morpho-syntactic structures given that these two do not necessarily map. While a standard morpho-syntactic scheme makes comparisons with other work easier, the function-based scheme is assumed to be more efficient for building higher-up tools such as base-phrase chunkers, dependency parsers and for NLP applications like subjectivity and sentiment analysis. The function-based scheme also gives new insights about linguistic structural realizations specific to Egyptian Arabic which is currently an under-resourced language.
AB - This paper presents an implementation of Brill's Transformation-Based Part-of-Speech (POS) tagging algorithm trained on a manually-annotated Twitter-based Egyptian Arabic corpus of 423,691 tokens and 70,163 types. Unlike standard POS morpho-syntactic annotation schemes which label each word based on its word-level morpho-syntactic features, we use a function-based annotation scheme in which words are labeled based on their grammatical functions rather than their morpho-syntactic structures given that these two do not necessarily map. While a standard morpho-syntactic scheme makes comparisons with other work easier, the function-based scheme is assumed to be more efficient for building higher-up tools such as base-phrase chunkers, dependency parsers and for NLP applications like subjectivity and sentiment analysis. The function-based scheme also gives new insights about linguistic structural realizations specific to Egyptian Arabic which is currently an under-resourced language.
UR - http://www.scopus.com/inward/record.url?scp=84893254755&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84893254755&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84893254755
SN - 385027005X
SN - 9783850270052
T3 - 11th Conference on Natural Language Processing, KONVENS 2012: Empirical Methods in Natural Language Processing - Proceedings of the Conference on Natural Language Processing 2012
SP - 39
EP - 52
BT - 11th Conference on Natural Language Processing
PB - KONVENS
T2 - 11th Conference on Natural Language Processing 2012: Empirical Methods in Natural Language Processing, KONVENS 2012
Y2 - 19 September 2012 through 21 September 2012
ER -