Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems
View/ Open
Date
2017Author
De Wet, Febe
Kleynhans, Neil Taylor
Van Compernolle, Dirk
Reza, Sahraeian
Metadata
Show full item recordAbstract
For purposes of automated speech recognition in under-resourced environments, techniques used to
share acoustic data between closely related or similar languages become important. Donor languages
with abundant resources can potentially be used to increase the recognition accuracy of speech
systems developed in the resource poor target language. The assumption is that adding more data will
increase the robustness of the statistical estimations captured by the acoustic models. In this study
we investigated data sharing between Afrikaans and Flemish – an under-resourced and well-resourced
language, respectively. Our approach was focused on the exploration of model adaptation and refinement
techniques associated with hidden Markov model based speech recognition systems to improve the
benefit of sharing data. Specifically, we focused on the use of currently available techniques, some
possible combinations and the exact utilisation of the techniques during the acoustic model development
process. Our findings show that simply using normal approaches to adaptation and refinement does
not result in any benefits when adding Flemish data to the Afrikaans training pool. The only observed
improvement was achieved when developing acoustic models on all available data but estimating model
refinements and adaptations on the target data only.
Significance:
• Acoustic modelling for under-resourced languages
• Automatic speech recognition for Afrikaans
• Data sharing between Flemish and Afrikaans to improve acoustic modelling for Afrikaans
URI
http://hdl.handle.net/10394/26438http://ieeexplore.ieee.org/document/7707303/
http://www.scielo.org.za/scielo.php?script=sci_arttext&pid=S0038-23532017000100009
Collections
- Faculty of Engineering [1129]