Trajectory modelling with limited speech data

Badenhorst, Jacob Andreas Cornelius

dc.contributor.advisor	Davel, M.H.
dc.contributor.author	Badenhorst, Jacob Andreas Cornelius
dc.date.accessioned	2016-06-24T07:08:10Z
dc.date.available	2016-06-24T07:08:10Z
dc.date.issued	2016
dc.identifier.uri	http://hdl.handle.net/10394/17839
dc.description	PhD (Computer Engineering), North-West University, Potchefstroom Campus, 2016	en_US
dc.description.abstract	State-of-the-art automatic speech recognition (ASR) systems are built using hundreds or even thousands of hours of speech data. Even then, high recognition accuracy is achievable only by carefully constraining the recognition domain. This reliance on large speech corpora remains a major challenge when building ASR systems for resource constrained languages. The need for large corpora is partially due to the substantial variation observed in different spoken realisations of the same text but to significantly to co-articulation plays an important role. When building an ASR system, it is not sufficient to observe a large number of samples of each acoustic unit during training; it is necessary to observe sufficient samples appearing in similar contexts to those found in the test data. To obtain a better understanding of co-articulation effects, we analysed the behaviour of phones in context, using trajectory models. We developed a new model that captures the feature trajectories of acoustic unit transitions directly, and developed a way of representing the characteristic changes between different units. We found it beneficial to model these characteristic changes at the spectral rather than cepstral level, by extracting features directly from the filter bank. Applying auto-regressive moving-average (ARMA) filtering to smooth spectral energies before constructing cepstral features also improved the accuracy of trajectories. We experimented with different approaches to identify transition model alignments and selected techniques that allowed us to locate the characteristic changes between units with the required accuracy. We developed a new compact representation of speech units in context, estimating model parameters using the trajectory models. These models function at a sub-transitional level, enabling the construction of units that occur in unseen and rare contexts. Applying this technique, it was possible to create synthetic samples of triphone contexts, by first constructing diphone transitions and concatenating these to form synthetic trajectories. We found that better acoustic models (producing higher likelihoods on unseen test data) could be developed by augmenting existing data with synthetic samples. When the samples were used to augment the training data in an end-to-end ASR system, promising results were obtained. A useful side effect is that the synthetic samples provide a new mechanism to improve cluster selection for unseen or rare phones during state-tying.	en_US
dc.language.iso	en	en_US
dc.subject	Synthetic triphones	en_US
dc.subject	Trajectory modelling	en_US
dc.subject	Trajectory-based features	en_US
dc.subject	Feature distributions	en_US
dc.subject	Feature construction	en_US
dc.subject	Data augmentation	en_US
dc.subject	Resource-scarce acoustic modelling	en_US
dc.subject	Corpus design	en_US
dc.subject	Sintetiese trifone	en_US
dc.subject	Trajekmodellering	en_US
dc.subject	Trajekgebaseerde kenmerke	en_US
dc.subject	Kenmerkverspreidings	en_US
dc.subject	Kenmerkskepping	en_US
dc.subject	Data vermeerdering	en_US
dc.subject	Hulpbron-beperkte akoestiese modellering	en_US
dc.subject	Korpusontwerp	en_US
dc.title	Trajectory modelling with limited speech data	en_US
dc.type	Thesis	en_US
dc.description.thesistype	Doctoral	en_US

Files in this item

Name:: Badenhorst_JAC_2016.pdf
Size:: 3.559Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Engineering [1403]

Show simple item record