Search
Now showing items 1-3 of 3
The NCHLT Speech Corpus of the South African languages
(Workshop Spoken Language Technologies for Under-resourced Languages (SLTU), 2014)
The NCHLT speech corpus contains wide-band speech from approximately
200 speakers per language, in each of the eleven
official languages of South Africa. We describe the design and
development processes that were ...
A smartphone-based ASR data collection tool for under-resourced languages
(Elsevier, 2014)
Acoustic data collection for automatic speech recognition (ASR) purposes is a particularly challenging task when working with under-resourced languages, many of which are found in the developing world. We provide a brief ...
Unsupervised acoustic model training: comparing South African English and isiZulu
(IEEE, 2015)
Large amounts of untranscribed audio data are generated
every day. These audio resources can be used to develop robust
acoustic models that can be used in a variety of speech-based
systems. Manually transcribing this ...