TY - JOUR
T1 - Progress towards establishing collection standards for semi-automated pollen classification in forensic geo-historical location applications
AU - Riley, Kimberly C.
AU - Woodard, Jeffrey P.
AU - Hwang, Grace M.
AU - Punyasena, Surangi W.
N1 - Publisher Copyright:
© 2015 Elsevier B.V..
PY - 2015/10/1
Y1 - 2015/10/1
N2 - The digitization of pollen grain images would permit the creation of a semi-automated system that could aid the expert palynologists in pollen classification. It would reduce cost and time-to-answer as well as improve analyst productivity. These issues are particularly critical in forensic applications. There are numerous factors that should be considered when establishing a digital database intended for semi-automated pollen classification. This paper explores a number of these issues through computer vision and machine learning assessments. The main topics evaluated are morphologically similar species-level classification, optimal training data size, how best to utilize three-dimensional data, accuracy changes due to the availability of metadata, i.e., fluctuations in analysts' confidence in taxa labeling, and using fossil data to classify modern data. This is the first known application of training on fossil data to classify modern taxa. Performances of 95.4% and 93.8% correct classification were achieved on two distinct sets of morphologically similar species-level data, surpassing previous records. We determined that a minimum of 5-10 training images per class was required to yield reasonable performance. Additionally, we established that all depth dimension slices associated with each grain were required to yield the best performance possible. Lastly, the error rate doubles due to decreasing analyst confidence and almost triples when using data from grains of varying ages, further solidifying the importance of comprehensive metadata.
AB - The digitization of pollen grain images would permit the creation of a semi-automated system that could aid the expert palynologists in pollen classification. It would reduce cost and time-to-answer as well as improve analyst productivity. These issues are particularly critical in forensic applications. There are numerous factors that should be considered when establishing a digital database intended for semi-automated pollen classification. This paper explores a number of these issues through computer vision and machine learning assessments. The main topics evaluated are morphologically similar species-level classification, optimal training data size, how best to utilize three-dimensional data, accuracy changes due to the availability of metadata, i.e., fluctuations in analysts' confidence in taxa labeling, and using fossil data to classify modern data. This is the first known application of training on fossil data to classify modern taxa. Performances of 95.4% and 93.8% correct classification were achieved on two distinct sets of morphologically similar species-level data, surpassing previous records. We determined that a minimum of 5-10 training images per class was required to yield reasonable performance. Additionally, we established that all depth dimension slices associated with each grain were required to yield the best performance possible. Lastly, the error rate doubles due to decreasing analyst confidence and almost triples when using data from grains of varying ages, further solidifying the importance of comprehensive metadata.
KW - 3D classification
KW - Automated palynology
KW - Computer vision
KW - Forensics
KW - Pattern recognition
KW - Pollen classification
UR - http://www.scopus.com/inward/record.url?scp=84934756539&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84934756539&partnerID=8YFLogxK
U2 - 10.1016/j.revpalbo.2015.06.005
DO - 10.1016/j.revpalbo.2015.06.005
M3 - Article
AN - SCOPUS:84934756539
VL - 221
SP - 117
EP - 127
JO - Review of Palaeobotany and Palynology
JF - Review of Palaeobotany and Palynology
SN - 0034-6667
ER -