Unsupervised acoustic model training: comparing South African English and isiZulu
View/ Open
Date
2015Author
Kleynhans, Neil
De Wet, Febe
Barnard, Etienne
Metadata
Show full item recordAbstract
Large amounts of untranscribed audio data are generated
every day. These audio resources can be used to develop robust
acoustic models that can be used in a variety of speech-based
systems. Manually transcribing this data is resource intensive
and requires funding, time and expertise. Lightly-supervised
training techniques, however, provide a means to rapidly transcribe
audio, thus reducing the initial resource investment to
begin the modelling process.
Our findings suggest that the lightly-supervised training
technique works well for English but when moving to an agglutinative
language, such as isiZulu, the process fails to achieve
the performance seen for English. Additionally, phone-based
performances are significantly worse when compared to an approach
using word-based language models. These results indicate
a strong dependence on large or well-matched text resources
for lightly-supervised training techniques.
URI
http://ieeexplore.ieee.org/document/7359512/https://researchspace.csir.co.za/dspace/handle/10204/8629
http://hdl.handle.net/10394/26490
Collections
- Faculty of Engineering [1129]