Show simple item record

dc.contributor.authorMeng, Zhong
dc.date.accessioned2018-08-20T15:36:35Z
dc.date.available2018-08-20T15:36:35Z
dc.date.created2018-08
dc.date.issued2018-07-25
dc.date.submittedAugust 2018
dc.identifier.urihttp://hdl.handle.net/1853/60262
dc.description.abstractRobust automatic speech recognition (ASR) and understanding (ASU) under various conditions remains to be a challenging problem even with the advances of deep learning. To achieve robust ASU, two discriminative training objectives are proposed for keyword spotting and topic classification: (1) To accurately recognize the semantically important keywords, the non-uniform error cost minimum classification error training of deep neural network (DNN) and bi-directional long short-term memory (BLSTM) acoustic models is proposed to minimize the recognition errors of only the keywords. (2) To compensate for the mismatched objectives of speech recognition and understanding, minimum semantic error cost training of the BLSTM acoustic model is proposed to generate semantically accurate lattices for topic classification. Further, to expand the application of the ASU system to various conditions, four adaptive training approaches are proposed to improve the robustness of the ASR under different conditions: (1) To suppress the effect of inter-speaker variability on speaker-independent DNN acoustic model, speaker-invariant training is proposed to learn a deep representation in the DNN that is both senone-discriminative and speaker-invariant through adversarial multi-task training (2) To achieve condition-robust unsupervised adaptation with parallel data, adversarial teacher-student learning is proposed to suppress multiple factors of condition variability in the procedure of knowledge transfer from a well-trained source domain LSTM acoustic model to the target domain. (3) To further improve the adversarial learning for unsupervised adaptation with unparallel data, domain separation networks are used to enhance the domain-invariance of the senone-discriminative deep representation by explicitly modeling the private component that is unique to each domain. (4) To achieve robust far-field ASR, an LSTM adaptive beamforming network is proposed to estimate the real-time beamforming filter coefficients to cope with non-stationary environmental noise and dynamic nature of source and microphones positions.
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.publisherGeorgia Institute of Technology
dc.subjectDiscriminative training
dc.subjectAdaptation
dc.subjectDeep neural network
dc.subjectAcoustic model
dc.titleDiscriminative and adaptive training for robust speech recognition and understanding
dc.typeDissertation
dc.description.degreePh.D.
dc.contributor.departmentElectrical and Computer Engineering
thesis.degree.levelDoctoral
dc.contributor.committeeMemberLee, Chin-Hui
dc.contributor.committeeMemberMoore, Elliot
dc.contributor.committeeMemberMcClellan, James H.
dc.contributor.committeeMemberXie, Yao
dc.date.updated2018-08-20T15:36:35Z


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record