Towards development of no-reference objective measures for perceptual evaluation of singing voice separation
MetadataShow full item record
Singing Voice Separation (SVS) uses audio source separation methods to isolate the vocal component from the background accompaniment in a song mix. A key challenge currently associated with evaluation of SVS is a lack of objective measures which correlate consistently with subjective evaluation. Additionally, the current state-of-the-art evaluation measures require the use of unmixed vocal and instrumental tracks which are often not available. The research presented in this thesis is an attempt to address these challenges by introducing two new objective measures for perceptually relevant evaluation of SVS. The Vocal Isolation Score (VIS) is designed to assess the quality of isolation produced by various SVS algorithms when separating the vocals from the accompaniment. Similarly, the Vocal Intelligibility Preservation Score (VIPS) evaluates the preservation of intelligibility in the separated vocals. Other than an improvement upon the state-of-the-art, both VIS and VIPS have the additional advantage that they do not require references in the form of unmixed vocal or instrumental tracks to perform objective evaluation, unlike the currently popular objective measures used for evaluating audio source separation.