Show simple item record

dc.contributor.authorGouvea, Evandro
dc.contributor.authorDavel, Marelie H.
dc.date.accessioned2018-03-07T08:16:47Z
dc.date.available2018-03-07T08:16:47Z
dc.date.issued2011
dc.identifier.citationEvandro Gouvêa and Marelie H Davel, “Kullback-Leibler divergence-based ASR training data selection”, in Proc. Interspeech, pp 2297-2300, Florence, Italy, 2011. [http://engineering.nwu.ac.za/multilingual-speech-technologies-must/publications]en_US
dc.identifier.urihttps://pdfs.semanticscholar.org/0671/8ffa83aa3bb5df5834873a8511417b311555.pdf?_ga=2.81368151.1751977590.1520410351-507080916.1509951372
dc.identifier.urihttps://www.researchgate.net/publication/221480550_Kullback-Leibler_Divergence-Based_ASR_Training_Data_Selection
dc.identifier.urihttp://hdl.handle.net/10394/26546
dc.description.abstractData preparation and selection affects systems in a wide range of complexities. A system built for a resource-rich language may be so large as to include borrowed languages. A system built for a resource-scarce language may be affected by how carefully the training data is selected and produced. Accuracy is affected by the presence of enough samples of qualitatively relevant information. We propose a method using the Kullback-Leibler divergence to solve two problems related to data preparation: the ordering of alternate pronunciations in a lexicon, and the selection of transcription data. In both cases, we want to guarantee that a particular distribution of n-grams is achieved. In the case of lexicon design, we want to ascertain that phones will be present often enough. In the case of training data selection for scarcely resourced languages, we want to make sure that some n-grams are better represented than others. Our proposed technique yields encouraging results.en_US
dc.description.sponsorshipEuropean Media Laboratory GmbH, Heidelberg, Germany Multilingual Speech Technologies, North-West University, Vanderbijlpark, South Africaen_US
dc.language.isoenen_US
dc.publisherInterspeech 2011en_US
dc.subjectAcoustic model training,en_US
dc.subjectLexical modelen_US
dc.subjectMaximum entropyen_US
dc.subjectKullback-Leibler divergenceen_US
dc.subjectTraining data selectionen_US
dc.titleKullback-Leibler divergence-based ASR training data selectionen_US
dc.typePresentationen_US


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record