A Southern African corpus for multilingual name pronunciation
Davel, Marelie H.
MetadataShow full item record
We describe the challenges that arise in predicting the pronunciations of proper names in a multilingual society. In order to improve our understanding of this issue – which is of significant practical importance for applications of speech technology – we have designed and collected a multilingual corpus of proper names. Both the names and the speakers are drawn from four South African languages, namely isiZulu, Sesotho, English and Afrikaans. We describe how the corpus was designed in order to probe the interaction between the speaker’s language and the origin of the name, and discuss the practical steps that were taken in collecting the spoken utterances. A statistical investigation of the prompt material reveals some of the systematic differences between the languages.
- Faculty of Engineering