Search

Now showing items 1-3 of 3

The NCHLT Speech Corpus of the South African languages

Barnard, Etienne; Davel, Marelie H.; van Heerden, Charl; De Wet, Febe; Badenhorst, Jaco (Workshop Spoken Language Technologies for Under-resourced Languages (SLTU), 2014)

The NCHLT speech corpus contains wide-band speech from approximately 200 speakers per language, in each of the eleven official languages of South Africa. We describe the design and development processes that were ...

A smartphone-based ASR data collection tool for under-resourced languages

De Vries, Nic J.; Badenhorst, Jaco; Basson, Willem D.; De Wet, Febe; Barnard, Etienne; De Waal, Alta; Davel, Marelie H. (Elsevier, 2014)

Acoustic data collection for automatic speech recognition (ASR) purposes is a particularly challenging task when working with under-resourced languages, many of which are found in the developing world. We provide a brief ...

Unsupervised acoustic model training: comparing South African English and isiZulu

Kleynhans, Neil; De Wet, Febe; Barnard, Etienne (IEEE, 2015)

Large amounts of untranscribed audio data are generated every day. These audio resources can be used to develop robust acoustic models that can be used in a variety of speech-based systems. Manually transcribing this ...

Search

Filters

The NCHLT Speech Corpus of the South African languages

A smartphone-based ASR data collection tool for under-resourced languages

Unsupervised acoustic model training: comparing South African English and isiZulu