High-Throughput Workflow for Computer-Assisted Human Parsing of Biological Specimen Label Data
Huang, Jane Q.
Moen, William E.
MetadataShow full item record
Hundreds of thousands of specimens in herbaria and natural history museums worldwide are potential candidates for digitization, making them more accessible to researchers. An herbarium contains collections of preserved plant specimens created for scientific use. Herbarium specimens are ideal natural history objects for digitization, as the plants are pressed flat and dried, and mounted on individual sheets of paper, creating a nearly two-dimensional object. Building digital repositories of herbarium specimens can increase use and exposure of the collections while simultaneously reducing physical handling. As important as the digitized specimens are, the data contained on the associated specimen labels provide critical information about each specimen (e.g., scientific name, geographic location of specimen, etc.). The volume and heterogeneity of these printed label data present challenges in transforming them into meaningful digital form to support research. The Apiary Project is addressing these challenges by exploring and developing transformation processes in a systematic workflow that yields high-quality machine-processable label data in a cost- and time-efficient manner. The University of North Texas's Texas Center for Digital Knowledge (TxCDK) and the Botanical Research Institute of Texas (BRIT), with funding from an Institute of Museum and Library Services National Leadership Grant, are conducting fundamental research with the goal of identifying how human intelligence can be combined with machine processes for effective and efficient transformation of specimen label information. The results of this research will yield a new workflow model for effective and efficient label data transformation, correction, and enhancement.