TY - JOUR
T1 - Is your ad hoc model selection strategy affecting your multimodel inference?
AU - Morin, Dana J.
AU - Yackulic, Charles B.
AU - Diffendorfer, Jay E.
AU - Lesmeister, Damon B.
AU - Nielsen, Clayton K.
AU - Reid, Janice
AU - Schauber, Eric M.
N1 - Funding Information:
Funding to support southern Illinois data collection was provided by Illinois Department of Natural Resources via Federal Aid in Wildlife Restoration Project W‐135‐R. JEDs contribution was funded by the Land Change Science Program at USGS. DJMs contribution was funded by USDA National Institute of Food and Agriculture, McIntire Stennis project (1020959). Funding for spotted owl and barred owl data collection was provided by the USDA Forest Service and USDI Bureau of Land Management. Funding for the southern California data set was provided by Joint Fire Science Program (no. 042194), the Blasker Environment Grant Program of the San Diego Foundation, and California Department of Fish and Game provided access to Rancho Jamul Ecological Reserve. We thank the many technicians that contributed to collection of data. We thank Jim Hines for assistance with batch processing scripts for PRESENCE, Scott Tremor, Dana Hogan, Genie Flemming, Jenny Duggan for assistance with the southern California data set, and Suresh Sethi for conversations, and Jim Nichols and two anonymous reviewers for constructive reviews critical to the development and improvement of this manuscript. Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. government.
Funding Information:
Funding to support southern Illinois data collection was provided by Illinois Department of Natural Resources via Federal Aid in Wildlife Restoration Project W-135-R. JEDs contribution was funded by the Land Change Science Program at USGS. DJMs contribution was funded by USDA National Institute of Food and Agriculture, McIntire Stennis project (1020959). Funding for spotted owl and barred owl data collection was provided by the USDA Forest Service and USDI Bureau of Land Management. Funding for the southern California data set was provided by Joint Fire Science Program (no. 042194), the Blasker Environment Grant Program of the San Diego Foundation, and California Department of Fish and Game provided access to Rancho Jamul Ecological Reserve. We thank the many technicians that contributed to collection of data. We thank Jim Hines for assistance with batch processing scripts for PRESENCE, Scott Tremor, Dana Hogan, Genie Flemming, Jenny Duggan for assistance with the southern California data set, and Suresh Sethi for conversations, and Jim Nichols and two anonymous reviewers for constructive reviews critical to the development and improvement of this manuscript. Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. government.
Publisher Copyright:
© 2020 The Authors.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Ecologists routinely fit complex models with multiple parameters of interest, where hundreds or more competing models are plausible. To limit the number of fitted models, ecologists often define a model selection strategy composed of a series of stages in which certain features of a model are compared while other features are held constant. Defining these multi-stage strategies requires making a series of decisions, which may potentially impact inferences, but have not been critically evaluated. We begin by identifying key features of strategies, introducing descriptive terms when they did not already exist in the literature. Strategies differ in how they define and order model building stages. Sequential-by-sub-model strategies focus on one sub-model (parameter) at a time with modeling of subsequent sub-models dependent on the selected sub-model structures from the previous stages. Secondary candidate set strategies model sub-models independently and combine the top set of models from each sub-model for selection in a final stage. Build-up approaches define stages across sub-models and increase in complexity at each stage. Strategies also differ in how the top set of models is selected in each stage and whether they use null or more complex sub-model structures for non-target sub-models. We tested the performance of different model selection strategies using four data sets and three model types. For each data set, we determined the "true" distribution of AIC weights by fitting all plausible models. Then, we calculated the number of models that would have been fitted and the portion of "true" AIC weight we recovered under different model selection strategies. Sequential-by-sub-model strategies often performed poorly. Based on our results, we recommend using a build-up or secondary candidate sets, which were more reliable and carrying all models within 5–10 AIC of the top model forward to subsequent stages. The structure of non-target sub-models was less important. Multi-stage approaches cannot compensate for a lack of critical thought in selecting covariates and building models to represent competing a priori hypotheses. However, even when competing hypotheses for different sub-models are limited, thousands or more models may be possible so strategies to explore candidate model space reliably and efficiently will be necessary.
AB - Ecologists routinely fit complex models with multiple parameters of interest, where hundreds or more competing models are plausible. To limit the number of fitted models, ecologists often define a model selection strategy composed of a series of stages in which certain features of a model are compared while other features are held constant. Defining these multi-stage strategies requires making a series of decisions, which may potentially impact inferences, but have not been critically evaluated. We begin by identifying key features of strategies, introducing descriptive terms when they did not already exist in the literature. Strategies differ in how they define and order model building stages. Sequential-by-sub-model strategies focus on one sub-model (parameter) at a time with modeling of subsequent sub-models dependent on the selected sub-model structures from the previous stages. Secondary candidate set strategies model sub-models independently and combine the top set of models from each sub-model for selection in a final stage. Build-up approaches define stages across sub-models and increase in complexity at each stage. Strategies also differ in how the top set of models is selected in each stage and whether they use null or more complex sub-model structures for non-target sub-models. We tested the performance of different model selection strategies using four data sets and three model types. For each data set, we determined the "true" distribution of AIC weights by fitting all plausible models. Then, we calculated the number of models that would have been fitted and the portion of "true" AIC weight we recovered under different model selection strategies. Sequential-by-sub-model strategies often performed poorly. Based on our results, we recommend using a build-up or secondary candidate sets, which were more reliable and carrying all models within 5–10 AIC of the top model forward to subsequent stages. The structure of non-target sub-models was less important. Multi-stage approaches cannot compensate for a lack of critical thought in selecting covariates and building models to represent competing a priori hypotheses. However, even when competing hypotheses for different sub-models are limited, thousands or more models may be possible so strategies to explore candidate model space reliably and efficiently will be necessary.
KW - AIC
KW - information criterion
KW - model selection
KW - multimodel inference
KW - occupancy models
KW - parameter estimation
KW - population models
UR - http://www.scopus.com/inward/record.url?scp=85078996858&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078996858&partnerID=8YFLogxK
U2 - 10.1002/ecs2.2997
DO - 10.1002/ecs2.2997
M3 - Article
AN - SCOPUS:85078996858
SN - 2150-8925
VL - 11
JO - Ecosphere
JF - Ecosphere
IS - 1
M1 - e02997
ER -