Bias in estimates of the classic and incidence-based Jaccard similarity indices: Insights from assemblage simulation

Research output: Contribution to journalArticle

Abstract

Similarity indices are often used for measuring b-diversity and as the starting point of multivariate analysis. In this study, I used simulation to examine the direction and amount of bias in estimates of two similarity indices, Jaccard Coefficient (J) and incidence-based J (J^). I design a novel simulation to generate three sets of assemblages that vary in species richness, species-occurrence distributions, and b-diversity. I characterized assemblage differences with the ratio of [proportion of rare species in all shared species / proportion of rare species in all unshared species] (i.e., PR ss /PR us ) and the Pearson's correlation in the probabilities of shared species between two assemblages (i.e., share-species correlation). I found that J was subject to strong positive or negative bias, depending on PR ss /PR us . J^ was mainly subject to negative bias, which varied with share-species correlation. In both indices, bias varied substantially from one pair of assemblages to another and among datasets. The high variation in the bias across different comparisons of assemblages may compromise b-diversity estimation established at low sampling efforts based on the two indices or their variants.

Original languageEnglish (US)
Pages (from-to)311-318
Number of pages8
JournalCommunity Ecology
Volume19
Issue number3
DOIs
StatePublished - Dec 2018

Fingerprint

similarity index
incidence
simulation
rare species
species occurrence
multivariate analysis
species richness
species diversity
sampling

Keywords

  • Assemblage simulation
  • Beta-diversity
  • Estimating assemblage similarity
  • Under-sampling

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Ecology

Cite this

Bias in estimates of the classic and incidence-based Jaccard similarity indices : Insights from assemblage simulation. / Cao, Y.

In: Community Ecology, Vol. 19, No. 3, 12.2018, p. 311-318.

Research output: Contribution to journalArticle

@article{4b17708d3d2047629bbc5403b0454e68,
title = "Bias in estimates of the classic and incidence-based Jaccard similarity indices: Insights from assemblage simulation",
abstract = "Similarity indices are often used for measuring b-diversity and as the starting point of multivariate analysis. In this study, I used simulation to examine the direction and amount of bias in estimates of two similarity indices, Jaccard Coefficient (J) and incidence-based J (J^). I design a novel simulation to generate three sets of assemblages that vary in species richness, species-occurrence distributions, and b-diversity. I characterized assemblage differences with the ratio of [proportion of rare species in all shared species / proportion of rare species in all unshared species] (i.e., PR ss /PR us ) and the Pearson's correlation in the probabilities of shared species between two assemblages (i.e., share-species correlation). I found that J was subject to strong positive or negative bias, depending on PR ss /PR us . J^ was mainly subject to negative bias, which varied with share-species correlation. In both indices, bias varied substantially from one pair of assemblages to another and among datasets. The high variation in the bias across different comparisons of assemblages may compromise b-diversity estimation established at low sampling efforts based on the two indices or their variants.",
keywords = "Assemblage simulation, Beta-diversity, Estimating assemblage similarity, Under-sampling",
author = "Y. Cao",
year = "2018",
month = "12",
doi = "10.1556/168.2018.19.3.12",
language = "English (US)",
volume = "19",
pages = "311--318",
journal = "Community Ecology",
issn = "1585-8553",
publisher = "Akademiai Kiado",
number = "3",

}

TY - JOUR

T1 - Bias in estimates of the classic and incidence-based Jaccard similarity indices

T2 - Insights from assemblage simulation

AU - Cao, Y.

PY - 2018/12

Y1 - 2018/12

N2 - Similarity indices are often used for measuring b-diversity and as the starting point of multivariate analysis. In this study, I used simulation to examine the direction and amount of bias in estimates of two similarity indices, Jaccard Coefficient (J) and incidence-based J (J^). I design a novel simulation to generate three sets of assemblages that vary in species richness, species-occurrence distributions, and b-diversity. I characterized assemblage differences with the ratio of [proportion of rare species in all shared species / proportion of rare species in all unshared species] (i.e., PR ss /PR us ) and the Pearson's correlation in the probabilities of shared species between two assemblages (i.e., share-species correlation). I found that J was subject to strong positive or negative bias, depending on PR ss /PR us . J^ was mainly subject to negative bias, which varied with share-species correlation. In both indices, bias varied substantially from one pair of assemblages to another and among datasets. The high variation in the bias across different comparisons of assemblages may compromise b-diversity estimation established at low sampling efforts based on the two indices or their variants.

AB - Similarity indices are often used for measuring b-diversity and as the starting point of multivariate analysis. In this study, I used simulation to examine the direction and amount of bias in estimates of two similarity indices, Jaccard Coefficient (J) and incidence-based J (J^). I design a novel simulation to generate three sets of assemblages that vary in species richness, species-occurrence distributions, and b-diversity. I characterized assemblage differences with the ratio of [proportion of rare species in all shared species / proportion of rare species in all unshared species] (i.e., PR ss /PR us ) and the Pearson's correlation in the probabilities of shared species between two assemblages (i.e., share-species correlation). I found that J was subject to strong positive or negative bias, depending on PR ss /PR us . J^ was mainly subject to negative bias, which varied with share-species correlation. In both indices, bias varied substantially from one pair of assemblages to another and among datasets. The high variation in the bias across different comparisons of assemblages may compromise b-diversity estimation established at low sampling efforts based on the two indices or their variants.

KW - Assemblage simulation

KW - Beta-diversity

KW - Estimating assemblage similarity

KW - Under-sampling

UR - http://www.scopus.com/inward/record.url?scp=85060706107&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85060706107&partnerID=8YFLogxK

U2 - 10.1556/168.2018.19.3.12

DO - 10.1556/168.2018.19.3.12

M3 - Article

AN - SCOPUS:85060706107

VL - 19

SP - 311

EP - 318

JO - Community Ecology

JF - Community Ecology

SN - 1585-8553

IS - 3

ER -