Show simple item record

dc.contributor.authorShukla, Sunil Ravindraen_US
dc.date.accessioned2007-05-25T17:16:35Z
dc.date.available2007-05-25T17:16:35Z
dc.date.issued2007-01-10en_US
dc.identifier.urihttp://hdl.handle.net/1853/14481
dc.description.abstractCurrent high quality text-to-speech (TTS) systems are based on unit selection from a large database that is both contextually and prosodically rich. These systems, albeit capable of natural voice quality, are computationally expensive and require a very large footprint. Their success is attributed to the dramatic reduction of storage costs in recent times. However, for many TTS applications a smaller footprint is becoming a standard requirement. This thesis presents a new method for representing speech segments that can improve the quality and/or reduce the footprint current concatenative TTS systems. The circular linear prediction (CLP) model is revisited and combined with the constant pitch transform (CPT) to provide a robust representation of speech signals that allows for limited prosodic movements without a perceivable loss in quality. The CLP model assumes that each frame of voiced speech is an infinitely periodic signal. This assumption allows for LPC modeling using the covariance method, with the efficiency of the autocorrelation method. The CPT is combined with this model to provide a database that is uniform in pitch for matching the target prosody during synthesis. With this representation, limited prosody modifications and unit concatenation can be performed without causing audible artifacts. For resolving artifacts caused by pitch modifications in voicing transitions, a method has been introduced for reducing peakiness in the LP spectra by constraining the line spectral frequencies. Two experiments have been conducted to demonstrate the potential for the capabilities of CLP/CPT method. The first is a listening test to determine the ability of this model to realize prosody modifications without perceivable degradation. Utterances are resynthesized using the CLP/CPT method with emphasized prosodics to increase intelligibility in harsh environments. The second experiment compares the quality of utterances synthesized by unit-selection based limited-domain TTS against the CLP/CPT method. The results demonstrate that the CLP/CPT representation, applied to current concatenative TTS systems, can reduce the size of the database and increase the prosodic richness without noticeable degradation in voice quality.en_US
dc.publisherGeorgia Institute of Technologyen_US
dc.subjectTTSen_US
dc.subjectText-to-speechen_US
dc.subjectSpeech synthesisen_US
dc.subjectLinear predictionen_US
dc.subject.lcshProsodic analysis (Linguistics)en_US
dc.subject.lcshSpeech synthesisen_US
dc.subject.lcshSignal processing Digital techniquesen_US
dc.titleImproving High Quality Concatenative Text-to-Speech Using the Circular Linear Prediction Modelen_US
dc.typeDissertationen_US
dc.description.degreePh.D.en_US
dc.contributor.departmentElectrical and Computer Engineeringen_US
dc.description.advisorCommittee Chair: Dr. Thomas P. Barnwell III; Committee Member: Dr. Aaron Lanterman; Committee Member: Dr. Bruce Walker; Committee Member: Dr. David V. Anderson; Committee Member: Dr. Mark A. Clementsen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record