Implementing a Data Publishing Service via DSpace
Dunn, Jon W.
MetadataShow full item record
The Indiana University Libraries and Digital Library Program offer a set of online scholarly communication services to IU scholars under the brand IUScholarWorks. Currently, these services include IUScholarWorks Repository, a DSpace-based institutional repository for dissemination and preservation of articles, papers, technical reports, and other scholarly products, and IUScholarWorks Journals, an Open Journal System-based online journal hosting service. To complement these two existing services, the Libraries and Digital Library Program are collaborating with the Research Technologies division of IU's central IT organization to implement a research data publishing service as a new feature of IUScholarWorks Repository. The idea of this service is to allow researchers to easily publish their datasets for online access at a stable web address, reference these datasets from publications, and assume at least bit-level preservation of the data. The intent is to develop a service that is generic enough to be used for everything from sensor data to statistical data to ethnographic field video. This service will leverage IU's existing Massive Data Storage System, which is an existing large scale centrally-funded distributed storage service offered by Research Technologies to IU faculty, staff, and graduate students for storage of their research data. Based on the consortium-developed High Performance Storage System (HPSS) software, MDSS offers over 2.8 petabytes of disk- and tape-based storage distributed between IU's Bloomington and Indianapolis campuses and supports replication of data between these two sites. Data may be transferred in and out of MDSS using a variety of interfaces, including SFTP, Parallel FTP, GridFTP, HSI, SMB/CIFS, and a simple Web-based user interface. We intend to initially support two data publishing scenarios: One in which a researcher submits a dataset by entering minimal metadata and uploading data files through DSpace's Configurable Submission Interface (which are then automatically placed in MDSS if they are over a specified filesize), and the other in which the researcher indicates as part of the submission process that the data to be published already resides in a personal or research group account in MDSS and should be copied into an IUScholarWorks-managed area of MDSS for availability through DSpace. In this presentation, we will discuss our conception of the service, its technical architecture and design, metadata requirements, and progress on implementation. We will also discuss the potential applicability of our approach and implementation to others who are interested in implementing similar services.