Building a Turkish Treebank

Kemal Oflazer, Bilge Say, Dilek Zeynep Hakkani-Tür, Gökhan Tür

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

We present the issues that we have encountered in designing a treebank architecture for Turkish along with rationale for the choices we have made for various representation schemes. In the resulting representation, the information encoded in the complex agglutinative word structures are represented as a sequence of inflectional groups separated by derivational boundaries. The syntactic relations are encoded as labeled dependency relations among segments of lexical items marked by derivation boundaries. Our current work involves refining a set of treebank annotation guidelines and developing a sophisticated annotation tool with an extendable plug-in architecture for morphological analysis, morphological disambiguation and syntactic annotation disambiguation.
Original languageEnglish (US)
Title of host publicationTreebanks
Subtitle of host publicationBuilding and Using Parsed Corpora
EditorsAnne Abeillé
PublisherSpringer
Chapter15
Pages261-277
ISBN (Electronic)9789401002011
ISBN (Print)9781402013348, 9781402013355
DOIs
StatePublished - 2003
Externally publishedYes

Publication series

NameText, Speech and Language Technology
Volume20
ISSN (Print)1386-291X

Keywords

  • Treebanks
  • Agglutinative Languages
  • Turkish
  • Dependency Syntax

Fingerprint

Dive into the research topics of 'Building a Turkish Treebank'. Together they form a unique fingerprint.

Cite this