Unicorn: The myth of federated search realized simply. Unifying DSpace repositories with the PKP Harvester tool
MetadataShow full item record
The Ohio Digital Resource Commons, located at http://drc.ohiolink.edu, is a union of DSpace repositories operated by higher education institutions in Ohio. The repositories are largely organized and supported by OhioLINK, a consortium of 89 Ohio college and university libraries. In support of the vision of the Digital Resource Commons as a statewide resource, the repository operators saw an immediate need for a federated search tool. A "build it now" approach was taken, and federated searching was implemented in a short timeframe at OhioLINK using the PKP Harvester (http://pkp.sfu.ca/?q=harvester) software. A demonstration of the federated search feature at the Digital Resource Commons will be given, highlighting local customizations that were made to PKP Harvester and DSpace in support of the project. These customizations include changes made to mimic the appearance and behavior of existing search interfaces at OhioLINK, and changes made to meet expressed user requirements. Particular attention will be given to a DSpace change that allows image thumbnails to be displayed in federated search results. Issues encountered during the configuration, implementation, and deployment of the PKP Harvester and DSpace OAI-PMH server will be presented, and the choices made in response to these issues will be explained. The process of integrating the search results with the DSpace interface will be detailed, including ongoing efforts to improve the user experience. The Digital Resource Commons' federated search was implemented as a metadata-based search. We will present a general comparison between metadata and full-text searching, highlighting the advantages and disadvantages of each method. A discussion of metadata uniformity and quality concerns will be presented in the context of federated searching. Particular problems encountered with our metadata will be described, with lessons learned and suggestions for resolution. Operational and maintenance concerns of this system will be discussed, including the metadata harvesting schedule, and the need to flush and rebuild indexes when the metadata schema changes. Future ideas for the DRC's federated search feature will be explored, including an implementation of faceted searching using SOLR, harvesting of non-DSpace repositories, such as CONTENTdm and Fedora, and, finally, the possibility of discarding the current model in favor of an OAI-ORE based system, developed for DSpace at Texas Digital Library, that allows for the possibility of full-text federated searching.