TY - JOUR
T1 - Visual recognition software for binary classification and its application to spruce pollen identification
AU - Tcheng, David K.
AU - Nayak, Ashwin K.
AU - Fowlkes, Charless C.
AU - Punyasena, Surangi W.
N1 - Funding Information:
The authors thank C. Pike, J. Almendinger, P. Mueller, B. Hansen, and E. Grimm for pollen samples, help gathering baseline expert classifications, and assistance initiating this research; Hamamatsu for sharing their Nanozoomer NDP viewer API; and M. Sivaguru, D.S. Haselhorst, A. Holz, A. Kesar, S. Tiwari, A. Restrepo, J. Rodiguez, C.J. Wesseln, C. Lindsay, L. Gallagher, Y. Bello, and M. Wycoff for technical assistance and expert examples. We thank Peter Wilf, Shengping Zhang, and an anonymous reviewer for feedback that substantially improved the manuscript. We acknowledge infrastructure support from NSF XSEDE Texas Advanced Computing Center.
Publisher Copyright:
Copyright © 2016 Tcheng et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2016/2
Y1 - 2016/2
N2 - Discriminating between black and white spruce (Picea mariana and Picea glauca) is a difficult palynological classification problem that, if solved, would provide valuable data for paleoclimate reconstructions. We developed an open-source visual recognition software (ARLO, Automated Recognition with Layered Optimization) capable of differentiating between these two species at an accuracy on par with human experts. The system applies pattern recognition and machine learning to the analysis of pollen images and discovers general-purpose image features, defined by simple features of lines and grids of pixels taken at different dimensions, size, spacing, and resolution. It adapts to a given problem by searching for the most effective combination of both feature representation and learning strategy. This results in a powerful and flexible framework for image classification. We worked with images acquired using an automated slide scanner. We first applied a hash-based "pollen spotting" model to segment pollen grains from the slide background. We next tested ARLO's ability to reconstruct black to white spruce pollen ratios using artificially constructed slides of known ratios. We then developed a more scalable hash-based method of image analysis that was able to distinguish between the pollen of black and white spruce with an estimated accuracy of 83.61%, comparable to human expert performance. Our results demonstrate the capability of machine learning systems to automate challenging taxonomic classifications in pollen analysis, and our success with simple image representations suggests that our approach is generalizable to many other object recognition problems.
AB - Discriminating between black and white spruce (Picea mariana and Picea glauca) is a difficult palynological classification problem that, if solved, would provide valuable data for paleoclimate reconstructions. We developed an open-source visual recognition software (ARLO, Automated Recognition with Layered Optimization) capable of differentiating between these two species at an accuracy on par with human experts. The system applies pattern recognition and machine learning to the analysis of pollen images and discovers general-purpose image features, defined by simple features of lines and grids of pixels taken at different dimensions, size, spacing, and resolution. It adapts to a given problem by searching for the most effective combination of both feature representation and learning strategy. This results in a powerful and flexible framework for image classification. We worked with images acquired using an automated slide scanner. We first applied a hash-based "pollen spotting" model to segment pollen grains from the slide background. We next tested ARLO's ability to reconstruct black to white spruce pollen ratios using artificially constructed slides of known ratios. We then developed a more scalable hash-based method of image analysis that was able to distinguish between the pollen of black and white spruce with an estimated accuracy of 83.61%, comparable to human expert performance. Our results demonstrate the capability of machine learning systems to automate challenging taxonomic classifications in pollen analysis, and our success with simple image representations suggests that our approach is generalizable to many other object recognition problems.
UR - http://www.scopus.com/inward/record.url?scp=84959387721&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959387721&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0148879
DO - 10.1371/journal.pone.0148879
M3 - Article
C2 - 26867017
AN - SCOPUS:84959387721
SN - 1932-6203
VL - 11
JO - PloS one
JF - PloS one
IS - 2
M1 - e0148879
ER -