dc.contributor.author | Barnard, Etienne | |
dc.contributor.author | Davel, Marelie H. | |
dc.contributor.author | van Heerden, Charl | |
dc.contributor.author | De Wet, Febe | |
dc.contributor.author | Badenhorst, Jaco | |
dc.date.accessioned | 2018-03-02T13:44:09Z | |
dc.date.available | 2018-03-02T13:44:09Z | |
dc.date.issued | 2014 | |
dc.identifier.citation | E. Barnard, M. H. Davel, C. van Heerden, F. de Wet and J. Badenhorst, “The NCHLT Speech Corpus of the South African languages”, in Proc. Int. Workshop Spoken Language Technologies for Under-resourced Languages (SLTU), pp 194-200, St Petersburg, Russia, 2014. [http://engineering.nwu.ac.za/multilingual-speech-technologies-must/publications] | en_US |
dc.identifier.uri | https://researchspace.csir.co.za/dspace/handle/10204/7549 | |
dc.identifier.uri | http://mica.edu.vn/sltu2014/proceedings/28.pdf | |
dc.identifier.uri | http://hdl.handle.net/10394/26493 | |
dc.description | This work was supported by the Department of Arts and Culture. | en_US |
dc.description.abstract | The NCHLT speech corpus contains wide-band speech from approximately
200 speakers per language, in each of the eleven
official languages of South Africa. We describe the design and
development processes that were undertaken in order to develop
the corpus, and report on associated materials such as orthographic
transcriptions and pronunciation dictionaries that were
released as part of the corpus. In order to benchmark speech recognition
performance on the corpus, we have also developed
both phone-recognition and word-recognition systems for all
eleven languages; we find that high accuracies can be achieved
for these speaker-independent but vocabulary-dependent recognition
tasks in all languages. | en_US |
dc.description.sponsorship | Multilingual Speech Technologies, North-West University, Vanderbijlpark, South Africa
Human Language Technologies Research Group, Meraka Institute, CSIR, Pretoria, South Africa | en_US |
dc.language.iso | en | en_US |
dc.publisher | Workshop Spoken Language Technologies for Under-resourced Languages (SLTU) | en_US |
dc.subject | Speech Corpus | en_US |
dc.subject | South African languages | en_US |
dc.subject | Speech recognition | en_US |
dc.subject | wword-recognition | en_US |
dc.subject | phone-recognition | en_US |
dc.title | The NCHLT Speech Corpus of the South African languages | en_US |
dc.type | Presentation | en_US |