Unsupervised acoustic model training: comparing South African English and isiZulu

Kleynhans, Neil; De Wet, Febe; Barnard, Etienne

View/Open

kleynhans-2015-model-training (111.7Kb)

Date

2015

Author

Kleynhans, Neil

De Wet, Febe

Barnard, Etienne

Metadata

Show full item record

Abstract

Large amounts of untranscribed audio data are generated every day. These audio resources can be used to develop robust acoustic models that can be used in a variety of speech-based systems. Manually transcribing this data is resource intensive and requires funding, time and expertise. Lightly-supervised training techniques, however, provide a means to rapidly transcribe audio, thus reducing the initial resource investment to begin the modelling process. Our findings suggest that the lightly-supervised training technique works well for English but when moving to an agglutinative language, such as isiZulu, the process fails to achieve the performance seen for English. Additionally, phone-based performances are significantly worse when compared to an approach using word-based language models. These results indicate a strong dependence on large or well-matched text resources for lightly-supervised training techniques.

URI

http://ieeexplore.ieee.org/document/7359512/
https://researchspace.csir.co.za/dspace/handle/10204/8629
http://hdl.handle.net/10394/26490

Collections

Faculty of Engineering [1123]