TY - JOUR
T1 - Automated identification of diverse Neotropical pollen samples using convolutional neural networks
AU - Punyasena, Surangi W.
AU - Haselhorst, Derek S.
AU - Kong, Shu
AU - Fowlkes, Charless C.
AU - Moreno, J. Enrique
N1 - Thanks to David Tcheng (formerly at the National Center for Supercomputing Application, University of Illinois) for developing the virtual microscope and annotation scripts, Glenn Fried and Mayandi Sivaguru (Carl R. Woese Institute for Genomic Biology, University of Illinois) for help developing the imaging workflows, and Carlos Jaramillo (Smithsonian Tropical Research Institute) for providing access to the Alan Graham reference collection. Funding for this research was provided by the US National Science Foundation (EF‐1137396 and DBI‐1262561 to S.W.P.; DBI‐1262547 to C.C.F.), the University of Illinois Campus Research Board (Grants 09236 and 11239 to S.W.P.) and a University of Illinois Center for Latin American and Caribbean Studies Tinker Travel Grant to D.S.H.
Thanks to David Tcheng (formerly at the National Center for Supercomputing Application, University of Illinois) for developing the virtual microscope and annotation scripts, Glenn Fried and Mayandi Sivaguru (Carl R. Woese Institute for Genomic Biology, University of Illinois) for help developing the imaging workflows, and Carlos Jaramillo (Smithsonian Tropical Research Institute) for providing access to the Alan Graham reference collection. Funding for this research was provided by the US National Science Foundation (EF-1137396 and DBI-1262561 to S.W.P.; DBI-1262547 to C.C.F.), the University of Illinois Campus Research Board (Grants 09236 and 11239 to S.W.P.) and a University of Illinois Center for Latin American and Caribbean Studies Tinker Travel Grant to D.S.H.
PY - 2022/9
Y1 - 2022/9
N2 - Pollen is used to investigate a diverse range of ecological problems, from identifying plant–pollinator relationships to tracking flowering phenology. Pollen types are identified according to a set of distinctive morphological characters which are understood to capture taxonomic differences and phylogenetic relationships among taxa. However, categorizing morphological variation among hyperdiverse pollen samples represents a challenge even for an expert analyst. We present an automated workflow for pollen analysis, from the automated scanning of pollen sample slides to the automated detection and identification of pollen taxa using convolutional neural networks (CNNs). We analysed aerial pollen samples from lowland Panama and used a microscope slide scanner to capture three-dimensional representations of 150 sample slides. These pollen sample images were annotated by an expert using a virtual microscope. Metadata were digitally recorded for ~100 pollen grains per slide, including location, identification and the analyst's confidence of the given identification. We used these annotated images to train and test our detection and classification CNN models. Our approach is two-part. We first compared three methods for training CNN models to detect pollen grains on a palynological slide. We next investigated approaches to training CNN models for pollen identification. Because the diversity of pollen taxa in environmental and palaeontological samples follows a long-tailed distribution, we experimented with methods for addressing imbalanced representation using our most abundant 46 taxa. We found that properly weighting pollen taxa in our training objective functions yielded improved accuracy for individual taxa. Our average accuracy for the 46-way classification problem was 82.3%. We achieved 89.5% accuracy for our 25 most abundant taxa. Pollen represents a challenging visual classification problem that can serve as a model for other areas of biology that rely on visual identification. Our results add to the body of research demonstrating the potential for a fully automated pollen classification system for environmental and palaeontological samples. Slide imaging, pollen detection and specimen identification can be automated to produce a streamlined workflow.
AB - Pollen is used to investigate a diverse range of ecological problems, from identifying plant–pollinator relationships to tracking flowering phenology. Pollen types are identified according to a set of distinctive morphological characters which are understood to capture taxonomic differences and phylogenetic relationships among taxa. However, categorizing morphological variation among hyperdiverse pollen samples represents a challenge even for an expert analyst. We present an automated workflow for pollen analysis, from the automated scanning of pollen sample slides to the automated detection and identification of pollen taxa using convolutional neural networks (CNNs). We analysed aerial pollen samples from lowland Panama and used a microscope slide scanner to capture three-dimensional representations of 150 sample slides. These pollen sample images were annotated by an expert using a virtual microscope. Metadata were digitally recorded for ~100 pollen grains per slide, including location, identification and the analyst's confidence of the given identification. We used these annotated images to train and test our detection and classification CNN models. Our approach is two-part. We first compared three methods for training CNN models to detect pollen grains on a palynological slide. We next investigated approaches to training CNN models for pollen identification. Because the diversity of pollen taxa in environmental and palaeontological samples follows a long-tailed distribution, we experimented with methods for addressing imbalanced representation using our most abundant 46 taxa. We found that properly weighting pollen taxa in our training objective functions yielded improved accuracy for individual taxa. Our average accuracy for the 46-way classification problem was 82.3%. We achieved 89.5% accuracy for our 25 most abundant taxa. Pollen represents a challenging visual classification problem that can serve as a model for other areas of biology that rely on visual identification. Our results add to the body of research demonstrating the potential for a fully automated pollen classification system for environmental and palaeontological samples. Slide imaging, pollen detection and specimen identification can be automated to produce a streamlined workflow.
KW - automated pollen identification
KW - convolutional neural networks
KW - image classification
KW - neotropics
KW - object detection
KW - palynology
UR - http://www.scopus.com/inward/record.url?scp=85132568304&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85132568304&partnerID=8YFLogxK
U2 - 10.1111/2041-210X.13917
DO - 10.1111/2041-210X.13917
M3 - Article
AN - SCOPUS:85132568304
SN - 2041-210X
VL - 13
SP - 2049
EP - 2064
JO - Methods in Ecology and Evolution
JF - Methods in Ecology and Evolution
IS - 9
ER -