Multi-unit association measures Moving beyond pairs of words

Research output: Contribution to journalArticlepeer-review

Abstract

This paper formulates and evaluates a series of multi-unit measures of directional association, building on the pairwise ΔP measure, that are able to quantify association in sequences of varying length and type of representation. Multi-unit measures face an additional segmentation problem: once the implicit length constraint of pairwise measures is abandoned, association measures must also identify the borders of meaningful sequences. This paper takes a vector-based approach to the segmentation problem by using 18 unique measures to describe different aspects of multi-unit association. An examination of these measures across eight languages shows that they are stable across languages and that each provides a unique rank of associated sequences. Taken together, these measures expand corpus-based approaches to association by generalizing across varying lengths and types of representation.

Original languageEnglish (US)
Pages (from-to)183-215
Number of pages33
JournalInternational Journal of Corpus Linguistics
Volume23
Issue number2
DOIs
StatePublished - 2018
Externally publishedYes

Keywords

  • Association strength
  • Collocations
  • Multi-unit association
  • Sequences
  • ΔP

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Multi-unit association measures Moving beyond pairs of words'. Together they form a unique fingerprint.

Cite this