TY - JOUR
T1 - The Catalogue of Life: Assembling data into a global taxonomic checklist
AU - Ower, Geoffrey
AU - Roskov, Yuri
N1 - 2019 Geoffrey Ower, Yuri Roskov
Biodiversity Next, TDWG, 22-25 October 2019, Leiden, The Netherlands
PY - 2019
Y1 - 2019
N2 - Producing a global taxonomic checklist of all species is essential for indexing biodiversity data, and for providing the basic knowledge needed to study, manage, and conserve biological diversity. The Catalogue of Life (CoL) aims to provide a global taxonomic checklist of all species, and includes 1.9 million species names in the 2019 annual edition. The task of assembling data into CoL is complex and requires reformatting data, quality assurance testing, and collaborating with data providers to resolve detected taxonomic conflicts. Global Species Databases (GSDs) are submitted in a wide variety of data formats to CoL by hundreds of taxonomic experts and institutions. Submitted data are reformatted to a standard data submission format: CoL Standard Dataset (ACEF), DarwinCore, or CoLDP. A series of standardized data integrity checks are run to detect and resolve frequently occurring data quality problems including character encoding corruption, non-Latin characters in scientific names, missing parents, duplicated and homonymic names within the GSD and among other GSDs, split taxonomic groups that have been assigned to multiple parent taxa, and other issues. The process and challenges of assembling data into the Catalogue of Life, and future directions of the project in migrating to CoL+ infrastructure will be discussed.
AB - Producing a global taxonomic checklist of all species is essential for indexing biodiversity data, and for providing the basic knowledge needed to study, manage, and conserve biological diversity. The Catalogue of Life (CoL) aims to provide a global taxonomic checklist of all species, and includes 1.9 million species names in the 2019 annual edition. The task of assembling data into CoL is complex and requires reformatting data, quality assurance testing, and collaborating with data providers to resolve detected taxonomic conflicts. Global Species Databases (GSDs) are submitted in a wide variety of data formats to CoL by hundreds of taxonomic experts and institutions. Submitted data are reformatted to a standard data submission format: CoL Standard Dataset (ACEF), DarwinCore, or CoLDP. A series of standardized data integrity checks are run to detect and resolve frequently occurring data quality problems including character encoding corruption, non-Latin characters in scientific names, missing parents, duplicated and homonymic names within the GSD and among other GSDs, split taxonomic groups that have been assigned to multiple parent taxa, and other issues. The process and challenges of assembling data into the Catalogue of Life, and future directions of the project in migrating to CoL+ infrastructure will be discussed.
KW - INHS
U2 - 10.3897/biss.3.37221
DO - 10.3897/biss.3.37221
M3 - Conference article
SN - 2535-0897
VL - 3
JO - Biodiversity Information Science and Standards
JF - Biodiversity Information Science and Standards
M1 - e37221
ER -