• Login
    View Item 
    •   NWU-IR Home
    • Research Output
    • Faculty of Engineering
    • View Item
    •   NWU-IR Home
    • Research Output
    • Faculty of Engineering
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    The South African directory enquiries (SADE) name corpus

    Thumbnail
    View/Open
    Thirion-sade-preprint.pdf (770.8Kb)
    Date
    2020
    Author
    Thirion, Jan Willem Frederick
    Van Heerden, Charl Johannes
    Giwa, Oluwapelumi
    Davel, Marelie Hattingh
    Metadata
    Show full item record
    Abstract
    We present the design and development of a South African directory enquiries (DE) corpus. It contains audio and orthographic transcriptions of a wide range of South African names produced by first language speakers of four languages, namely Afrikaans, English, isiZulu and Sesotho. Useful as a resource to understand the effect of name language and speaker language on pronunciation, this is the first corpus to also aim to identify the “intended language”: an implicit assumption with regard to word origin made by the speaker of the name. We describe the design, collection, annotation, and verification of the corpus. This includes an analysis of the algorithms used to tag the corpus with meta information that may be beneficial to pronunciation modelling tasks.
    URI
    http://hdl.handle.net/10394/36913
    Collections
    • Faculty of Engineering [1115]

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of NWU-IR Communities & CollectionsBy Issue DateAuthorsTitlesSubjectsAdvisorThis CollectionBy Issue DateAuthorsTitlesSubjectsAdvisor

    My Account

    LoginRegister

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV