Data sufficiency analysis for automatic speech recognition / by J.A.C. Badenhorst

Boloka/Manakin Repository

Show simple item record

dc.contributor.author Badenhorst, Jacob Andreas Cornelius
dc.date.accessioned 2011-02-24T13:27:08Z
dc.date.available 2011-02-24T13:27:08Z
dc.date.issued 2009
dc.identifier.uri http://hdl.handle.net/10394/3994
dc.description Thesis (M. Ing. (Computer and Electronical Engineering))--North-West University, Potchefstroom Campus, 2009.
dc.description.abstract The languages spoken in developing countries are diverse and most are currently under-resourced from an automatic speech recognition (ASR) perspective. In South Africa alone, 10 of the 11 official languages belong to this category. Given the potential for future applications of speech-based information systems such as spoken dialog system (SDSs) in these countries, the design of minimal ASR audio corpora is an important research area. Specifically, current ASR systems utilise acoustic models to represent acoustic variability, and effective ASR corpus design aims to optimise the amount of relevant variation within training data while minimising the size of the corpus. Therefore an investigation of the effect that different amounts and types of training data have on these models is needed. With this dissertation specific consideration is given to the data sufficiency principals that apply to the training of acoustic models. The investigation of this task lead to the following main achievements: 1) We define a new stability measurement protocol that provides the capability to view the variability of ASR training data. 2) This protocol allows for the investigation of the effect that various acoustic model complexities and ASR normalisation techniques have on ASR training data requirements. Specific trends with regard to the data requirements for different phone categories and how these are affected by various modelling strategies are observed. 3) Based on this analysis acoustic distances between phones are estimated across language borders, paving the way for further research in cross-language data sharing. Finally the knowledge obtained from these experiments is applied to perform a data sufficiency analysis of a new speech recognition corpus of South African languages: The Lwazi ASR corpus. The findings correlate well with initial phone recognition results and yield insight into the sufficient number of speakers required for the development of minimal telephone ASR corpora.
dc.publisher North-West University
dc.subject Speech recognition en
dc.subject Acoustic variability en
dc.subject Corpus design en
dc.subject Resource-scarce languages en
dc.subject Acoustic models en
dc.subject Model distances en
dc.subject Telephone ASR corpora en
dc.title Data sufficiency analysis for automatic speech recognition / by J.A.C. Badenhorst en
dc.type Thesis en
dc.description.thesistype Masters

Files in this item

This item appears in the following Collection(s)

  • ETD@PUK [6252]
    This collection contains the original digitized versions of research conducted at the North-West University (Potchefstroom Campus)

Show simple item record

Search the NWU Repository

Advanced Search


My Account