Automatic speech segmentation with limited data
Van Niekerk, Daniel Rudolph
MetadataShow full item record
The rapid development of corpus-based speech systems such as concatenative synthesis systems for under-resourced languages requires an efﬁcient, consistent and accurate solution with regard to phonetic speech segmentation. Manual development of phonetically annotated corpora is a time consuming and expensive process which suffers from challenges regarding consistency and reproducibility, while automation of this process has only been satisfactorily demonstrated on large corpora of a select few languages by employing techniques requiring extensive and specialised resources. In this work we considered the problem of phonetic segmentation in the context of developing small prototypical speech synthesis corpora for new under-resourced languages. This was done through an empirical evaluation of existing segmentation techniques on typical speech corpora in three South African languages. In this process, the performance of these techniques were characterised under different data conditions and the efﬁcient application of these techniques were investigated in order to improve the accuracy of resulting phonetic alignments. We found that the application of baseline speaker-speciﬁc Hidden Markov Models results in relatively robust and accurate alignments even under extremely limited data conditions and demonstrated how such models can be developed and applied efﬁciently in this context. The result is segmentation of sufﬁcient quality for synthesis applications, with the quality of alignments comparable to manual segmentation efforts in this context. Finally, possibilities for further automated reﬁnement of phonetic alignments were investigated and an efﬁcient corpus development strategy was proposed with suggestions for further work in this direction.
- Engineering 
Showing items related by title, author, creator and subject.
De Vries, Nic J; Davel, Davel, Marelie Hattingh; Badenhorst, Jaco; Basson, Willem D; De Wet, Febe; Barnard, Etienne; De Waal, Alta (Elsevier, 2014)Acoustic data collection for automatic speech recognition (ASR) purposes is a particularly challenging task when working with under-resourced languages, many of which are found in the developing world. We provide a brief ...
Schlunz, Georg I.; Barnard, Etienne; van Huyssteen, Gerhard B. (Pattern Recognition Association of South Africa and Mechatronics International Conference, 2010)One of the goals of text-to-speech (TTS) systems is to produce natural-sounding synthesized speech. Towards this end various natural language processing (NLP) tasks are performed to model the prosodic aspects of the TTS ...
De Vries, Nicolaas Johannes (North-West University, 2011)As building transcribed speech corpora for under–resourced languages plays a pivotal role in developing automatic speech recognition (ASR) technologies for such languages, a key step in developing these technologies is the ...