Show simple item record

dc.contributor.authorBadenhorst, Jacob Andreas Cornelius
dc.date.accessioned2011-02-24T13:27:08Z
dc.date.available2011-02-24T13:27:08Z
dc.date.issued2009
dc.identifier.urihttp://hdl.handle.net/10394/3994
dc.descriptionThesis (M. Ing. (Computer and Electronical Engineering))--North-West University, Potchefstroom Campus, 2009.
dc.description.abstractThe languages spoken in developing countries are diverse and most are currently under-resourced from an automatic speech recognition (ASR) perspective. In South Africa alone, 10 of the 11 official languages belong to this category. Given the potential for future applications of speech-based information systems such as spoken dialog system (SDSs) in these countries, the design of minimal ASR audio corpora is an important research area. Specifically, current ASR systems utilise acoustic models to represent acoustic variability, and effective ASR corpus design aims to optimise the amount of relevant variation within training data while minimising the size of the corpus. Therefore an investigation of the effect that different amounts and types of training data have on these models is needed. With this dissertation specific consideration is given to the data sufficiency principals that apply to the training of acoustic models. The investigation of this task lead to the following main achievements: 1) We define a new stability measurement protocol that provides the capability to view the variability of ASR training data. 2) This protocol allows for the investigation of the effect that various acoustic model complexities and ASR normalisation techniques have on ASR training data requirements. Specific trends with regard to the data requirements for different phone categories and how these are affected by various modelling strategies are observed. 3) Based on this analysis acoustic distances between phones are estimated across language borders, paving the way for further research in cross-language data sharing. Finally the knowledge obtained from these experiments is applied to perform a data sufficiency analysis of a new speech recognition corpus of South African languages: The Lwazi ASR corpus. The findings correlate well with initial phone recognition results and yield insight into the sufficient number of speakers required for the development of minimal telephone ASR corpora.
dc.publisherNorth-West University
dc.subjectSpeech recognitionen
dc.subjectAcoustic variabilityen
dc.subjectCorpus designen
dc.subjectResource-scarce languagesen
dc.subjectAcoustic modelsen
dc.subjectModel distancesen
dc.subjectTelephone ASR corporaen
dc.titleData sufficiency analysis for automatic speech recognitionen
dc.typeThesisen
dc.description.thesistypeMasters


Files in this item

Thumbnail

This item appears in the following Collection(s)

  • ETD@PUK [6442]
    This collection contains the original digitized versions of research conducted at the North-West University (Potchefstroom Campus)

Show simple item record