Part-of-speech effects on text-to-speech synthesis
Schlunz, Georg I.
van Huyssteen, Gerhard B.
MetadataShow full item record
One of the goals of text-to-speech (TTS) systems is to produce natural-sounding synthesized speech. Towards this end various natural language processing (NLP) tasks are performed to model the prosodic aspects of the TTS voice. One of the fundamental NLP tasks being used is the part-of-speech (POS) tagging of the words in the text. This paper investigates the effects of POS information on the naturalness of a hidden Markov model (HMM) based TTS voice when additional resources are not available to aid in the modeling of prosody. It is found that, when a minimal feature set is used for the HMM context labels, the addition of POS tags does improve the naturalness of the voice. However, the same effect can be accomplished by including segmental counting and positional information instead of the POS tags.
- Faculty of Engineering 
Showing items related by title, author, creator and subject.
Van Niekerk, Daniel Rudolph (North-West University, 2009)The rapid development of corpus-based speech systems such as concatenative synthesis systems for under-resourced languages requires an efﬁcient, consistent and accurate solution with regard to phonetic speech segmentation. ...
De Vries, Nic J.; Badenhorst, Jaco; Basson, Willem D; De Wet, Febe; Barnard, Etienne; De Waal, Alta; Davel, Marelie H. (Elsevier, 2014)Acoustic data collection for automatic speech recognition (ASR) purposes is a particularly challenging task when working with under-resourced languages, many of which are found in the developing world. We provide a brief ...
De Vries, Nicolaas Johannes (North-West University, 2011)As building transcribed speech corpora for under–resourced languages plays a pivotal role in developing automatic speech recognition (ASR) technologies for such languages, a key step in developing these technologies is the ...