Acoustic segment modeling and preference ranking for music information retrieval
Reed, Jeremy T.
MetadataShow full item record
This dissertation focuses on improving content-based recommendation systems for music. Specifically, progress in the development in music content-based recommendation systems has stalled in recent years due to some faulty assumptions: 1. most acoustic content-based systems for music information retrieval (MIR) assume a bag-of-frames model, where it is assumed that a song contains a simplistic, global audio texture 2. genre, style, mood, and authors are appropriate categories for machine-oriented recommendation 3. similarity is a universal construct and does not vary among different users The main contribution of this dissertation is to address these faulty assumptions by describing a novel approach in MIR that provides user-centric, content-based recommendations based on statistics of acoustic sound elements. First, this dissertation presents the acoustic segment modeling framework that describes a piece of music as a temporal sequence of acoustic segment models (ASMs), which represent individual polyphonic sound elements. A dictionary of ASMs generated in an unsupervised process defines a vocabulary of acoustic tokens that are able to transcribe new musical pieces. Next, standard text-based information retrieval algorithms use statistics of ASM counts to perform various retrieval tasks. Despite a simple feature set compared to other content-based genre recommendation algorithms, the acoustic segment modeling approach is highly competitive on standard genre classification databases. Fundamental to the success of the acoustic segment modeling approach is the ability to model acoustical semantics in a musical piece, which is demonstrated by the detection of musical attributes on temporal characteristics. Further, it is shown that the acoustic segment modeling procedure is able to capture the inherent structure of melody by providing near state-of-the-art performance on an automatic chord recognition task. This dissertation demonstrates that some classification tasks, such as genre, possess information that is not contained in the acoustic signal; therefore, attempts at modeling these categories using only the acoustic content is ill-fated. Further, notions of music similarity are personal in nature and are not derived from a universal ontology. Therefore, this dissertation addresses the second and third limitation of previous content-based retrieval approaches by presenting a user-centric preference rating algorithm. Individual users possess their own cognitive construct of similarity; therefore, retrieval algorithms must demonstrate this flexibility. The proposed rating algorithm is based on the principle of minimum classification error (MCE) training, which has been demonstrated to be robust against outliers and also minimizes the Parzen estimate of the theoretical classification risk. The outlier immunity property limits the effect of labels that arise from non-content-based sources. The MCE-based algorithm performs better than a similar ratings prediction algorithm. Further, this dissertation discusses extensions and future work.