Show simple item record

dc.contributor.advisorClements, Mark A.
dc.contributor.authorRao, Hrishikesh
dc.date.accessioned2016-01-07T17:22:59Z
dc.date.available2016-01-07T17:22:59Z
dc.date.created2015-12
dc.date.issued2015-11-10
dc.date.submittedDecember 2015
dc.identifier.urihttp://hdl.handle.net/1853/54332
dc.description.abstractParalinguistic events are useful indicators of the affective state of a speaker. These cues, in children's speech, are used to form social bonds with their caregivers. They have also been found to be useful in the very early detection of developmental disorders such as autism spectrum disorder (ASD) in children's speech. Prior work on children's speech has focused on the use of a limited number of subjects which don't have sufficient diversity in the type of vocalizations that are produced. Also, the features that are necessary to understand the production of paralinguistic events is not fully understood. To account for the lack of an off-the-shelf solution to detect instances of laughter and crying in children's speech, the focus of the thesis is to investigate and develop signal processing algorithms to extract acoustic features and use machine learning algorithms on various corpora. Results obtained using baseline spectral and prosodic features indicate the ability of the combination of spectral, prosodic, and dysphonation-related features that are needed to detect laughter and whining in toddlers' speech with different age groups and recording environments. The use of long-term features were found to be useful to capture the periodic properties of laughter in adults' and children's speech and detected instances of laughter to a high degree of accuracy. Finally, the thesis focuses on the use of multi-modal information using acoustic features and computer vision-based smile-related features to detect instances of laughter and to reduce the instances of false positives in adults' and children's speech. The fusion of the features resulted in an improvement of the accuracy and recall rates than when using either of the two modalities on their own.
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.publisherGeorgia Institute of Technology
dc.subjectParalinguistic
dc.subjectSpeech signal processing
dc.subjectPattern recognition
dc.titleParalinguistic event detection in children's speech
dc.typeDissertation
dc.description.degreePh.D.
dc.contributor.departmentElectrical and Computer Engineering
thesis.degree.levelDoctoral
dc.contributor.committeeMemberMoore, Elliot
dc.contributor.committeeMemberEssa, Irfan
dc.contributor.committeeMemberAnderson, David
dc.contributor.committeeMemberRozga, Agata
dc.date.updated2016-01-07T17:22:59Z


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record