Expanding Universal Dependencies for Polysynthetic Languages: A Case of St. Lawrence Island Yupik

Hyunji Hayley Park, Lane Oscar Schwartz, Francis M. Tyers

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes the development of the first Universal Dependencies (UD) treebank for St. Lawrence Island Yupik, an endangered language spoken in the Bering Strait region. While the UD guidelines provided a general framework for our annotations, language-specific decisions were made necessary by the rich morphology of the polysynthetic language. Most notably, we annotated a corpus at the morpheme level as well as the word level. The morpheme level annotation was conducted using an existing morphological analyzer and manual disambiguation. By comparing the two resulting annotation schemes, we argue that morpheme-level annotation is essential for polysynthetic languages like St. Lawrence Island Yupik. Word-level annotation results in degenerate trees for some Yupik sentences and often fails to capture syntactic relations that can be manifested at the morpheme level. Dependency parsing experiments provide further support for morpheme-level annotation. Implications for UD annotation of other polysynthetic languages are discussed.
Original languageEnglish (US)
Title of host publicationProceedings of the 1st Workshop on Natural Language Processing for Indigenous Languages of the Americas, AmericasNLP 2021
EditorsManuel Mager, Arturo Oncevay, Annette Rios, Ivan Vladimir Meza Ruiz, Alexis Palmer, Graham Neubig, Katharina Kann
PublisherAssociation for Computational Linguistics (ACL)
Pages131-142
Number of pages12
ISBN (Electronic)9781954085442
DOIs
StatePublished - Jun 2021

Fingerprint

Dive into the research topics of 'Expanding Universal Dependencies for Polysynthetic Languages: A Case of St. Lawrence Island Yupik'. Together they form a unique fingerprint.

Cite this