Perceived Naturalness of Speech Sounds Presented Using Personalized Versus Non-Personalized HRTFs
Abstract
Speech sound sources were spatially processed using measured HRTF data that were obtained from nine individuals. The speech signals were auditioned via headphones by two groups of listeners via a paired comparison task in which listeners were asked to judge which of two stimuli sounded more natural. One group of listeners was composed of those whose HRTFs had been used to create subsets of the stimuli that were presented, while a second group of listeners were never presented with stimuli that were processed using their own HRTFs. Results from the first group showed that stimuli generated using an individual's own HRTFs will not necessarily be judged as more natural than those generated using HRTF data from other individuals. However, this was not because one set of HRTF data gave the most natural listening experience for all listeners, since the stimulus ranked highest differed between individuals. An analysis of the Interaural Level Difference (ILD) showed that the frequency dependence of ILD for an individual's HRTFs was quite similar to that of the HRTFs that produced for them an auditory image that was ranked as the most natural sounding. The results suggest that the interaural spectral difference presented via HRTF-based processing can affect perceived naturalness as strongly as the overall spectral shape that is related to source tone coloration.