Show simple item record

dc.contributor.advisorDavel, M.H.
dc.contributor.authorBadenhorst, Jacob Andreas Cornelius
dc.date.accessioned2016-06-24T07:08:10Z
dc.date.available2016-06-24T07:08:10Z
dc.date.issued2016
dc.identifier.urihttp://hdl.handle.net/10394/17839
dc.descriptionPhD (Computer Engineering), North-West University, Potchefstroom Campus, 2016en_US
dc.description.abstractState-of-the-art automatic speech recognition (ASR) systems are built using hundreds or even thousands of hours of speech data. Even then, high recognition accuracy is achievable only by carefully constraining the recognition domain. This reliance on large speech corpora remains a major challenge when building ASR systems for resource constrained languages. The need for large corpora is partially due to the substantial variation observed in different spoken realisations of the same text but to significantly to co-articulation plays an important role. When building an ASR system, it is not sufficient to observe a large number of samples of each acoustic unit during training; it is necessary to observe sufficient samples appearing in similar contexts to those found in the test data. To obtain a better understanding of co-articulation effects, we analysed the behaviour of phones in context, using trajectory models. We developed a new model that captures the feature trajectories of acoustic unit transitions directly, and developed a way of representing the characteristic changes between different units. We found it beneficial to model these characteristic changes at the spectral rather than cepstral level, by extracting features directly from the filter bank. Applying auto-regressive moving-average (ARMA) filtering to smooth spectral energies before constructing cepstral features also improved the accuracy of trajectories. We experimented with different approaches to identify transition model alignments and selected techniques that allowed us to locate the characteristic changes between units with the required accuracy. We developed a new compact representation of speech units in context, estimating model parameters using the trajectory models. These models function at a sub-transitional level, enabling the construction of units that occur in unseen and rare contexts. Applying this technique, it was possible to create synthetic samples of triphone contexts, by first constructing diphone transitions and concatenating these to form synthetic trajectories. We found that better acoustic models (producing higher likelihoods on unseen test data) could be developed by augmenting existing data with synthetic samples. When the samples were used to augment the training data in an end-to-end ASR system, promising results were obtained. A useful side effect is that the synthetic samples provide a new mechanism to improve cluster selection for unseen or rare phones during state-tying.en_US
dc.language.isoenen_US
dc.subjectSynthetic triphonesen_US
dc.subjectTrajectory modellingen_US
dc.subjectTrajectory-based featuresen_US
dc.subjectFeature distributionsen_US
dc.subjectFeature constructionen_US
dc.subjectData augmentationen_US
dc.subjectResource-scarce acoustic modellingen_US
dc.subjectCorpus designen_US
dc.subjectSintetiese trifoneen_US
dc.subjectTrajekmodelleringen_US
dc.subjectTrajekgebaseerde kenmerkeen_US
dc.subjectKenmerkverspreidingsen_US
dc.subjectKenmerkskeppingen_US
dc.subjectData vermeerderingen_US
dc.subjectHulpbron-beperkte akoestiese modelleringen_US
dc.subjectKorpusontwerpen_US
dc.titleTrajectory modelling with limited speech dataen_US
dc.typeThesisen_US
dc.description.thesistypeDoctoralen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record