Search

Now showing items 1-2 of 2

The NCHLT Speech Corpus of the South African languages

Barnard, Etienne; Davel, Marelie H.; van Heerden, Charl; De Wet, Febe; Badenhorst, Jaco (Workshop Spoken Language Technologies for Under-resourced Languages (SLTU), 2014)

The NCHLT speech corpus contains wide-band speech from approximately 200 speakers per language, in each of the eleven official languages of South Africa. We describe the design and development processes that were ...

Collecting and evaluating speech recognition corpora for 11 South African languages

Badenhorst, Jaco; Van Heerden, Charl; Barnard, Etienne; Davel, Marelie H. (Springer, 2011)

We describe the Lwazi corpus for automatic speech recognition (ASR), a new telephone speech corpus which contains data from the eleven official languages of South Africa. Because of practical constraints, the amount of ...