Abstract
This paper formulates and evaluates a series of multi-unit measures of directional association, building on the pairwise ΔP measure, that are able to quantify association in sequences of varying length and type of representation. Multi-unit measures face an additional segmentation problem: once the implicit length constraint of pairwise measures is abandoned, association measures must also identify the borders of meaningful sequences. This paper takes a vector-based approach to the segmentation problem by using 18 unique measures to describe different aspects of multi-unit association. An examination of these measures across eight languages shows that they are stable across languages and that each provides a unique rank of associated sequences. Taken together, these measures expand corpus-based approaches to association by generalizing across varying lengths and types of representation.
Original language | English (US) |
---|---|
Pages (from-to) | 183-215 |
Number of pages | 33 |
Journal | International Journal of Corpus Linguistics |
Volume | 23 |
Issue number | 2 |
DOIs | |
State | Published - 2018 |
Externally published | Yes |
Keywords
- Association strength
- Collocations
- Multi-unit association
- Sequences
- ΔP
ASJC Scopus subject areas
- Language and Linguistics
- Linguistics and Language