Exploring minimal pronunciation modeling for low resource languages
Date
2015Author
Barnard, Etienne
Van Heerden, Charl
Hartmann, William
Karakos, Damianos
Schwartz, Richard
Tsakalidis, Stavros
Davel, Marelie H.
Metadata
Show full item recordAbstract
Pronunciation lexicons can range from fully graphemic (modeling
each word using the orthography directly) to fully phonemic
(first mapping each word to a phoneme string). Between these
two options lies a continuum of modeling options. We analyze
techniques that can improve the accuracy of a graphemic system
without requiring significant effort to design or implement.
The analysis is performed in the context of the IARPA Babel
project, which aims to develop spoken term detection systems
for previously unseen languages rapidly, and with minimal human
effort. We consider techniques related to letter-to-sound
mapping and language-independent syllabification of primarily
graphemic systems, and discuss results obtained for six languages:
Cebuano, Kazakh, Kurmanji Kurdish, Lithuanian, Telugu
and Tok Pisin.
URI
https://books.google.co.za/books?id=-RGhDQAAQBAJ&pg=PA44&lpg=PA44&dq=Exploring+minimal+pronunciation+modeling+for+low+resource+languages&source=bl&ots=wAYDYAm_Ju&sig=ha5BMCtwoEBjHQTAkyauz2wSSEc&hl=en&sa=X&ved=0ahUKEwjFwPDv1M3ZAhUlKsAKHXrICPkQ6AEIODAC#v=onepage&q=Exploring%20minimal%20pronunciation%20modeling%20for%20low%20resource%20languages&f=falsehttps://www.lti.cs.cmu.edu/sites/default/files/sitaram%2C%20sunayana.pdf
http://hdl.handle.net/10394/26488
Collections
- Faculty of Engineering [1129]