Resource-dependent acoustic and language modeling for spoken keyword search
MetadataShow full item record
In this dissertation, three research directions were explored to alleviate two major issues, i.e., the use of incorrect models and training/test condition mismatches, in the modeling frameworks of modern spoken keyword search (KWS) systems. Each of the three research directions, which include (i) data-efficient training processes, (ii) system optimization objectives, and (iii) data augmentation, utilizes different types and amounts of training resources in different ways to ameliorate the two issues of acoustic and language modeling in modern KWS systems. To be more specific, resource-dependent keyword modeling, keyword-boosted sMBR (state-level minimum Bayes risk) training, and multilingual acoustic modeling are proposed and investigated for acoustic modeling in this research. For language modeling, keyword-aware language modeling, discriminative keyword-aware language modeling, and web text augmented language modeling are presented and discussed. The dissertation provides a comprehensive collection of solutions and strategies to the acoustic and language modeling problems in KWS. It also offers insights into the realization of good-performance KWS systems. Experimental results show that the data-efficient training process and data augmentation are the two directions providing the most prominent performance improvement for KWS systems. While modifying system optimization objectives provides smaller yet consistent performance enhancement in KWS systems with different configurations. The effects of the proposed acoustic and language modeling approaches in the three directions are also shown to be additive and can be combined to further improve the overall KWS system performance.