Show simple item record

dc.contributor.authorKleynhans, Neil
dc.contributor.authorBarnard, Etienne
dc.date.accessioned2018-03-02T13:19:56Z
dc.date.available2018-03-02T13:19:56Z
dc.date.issued2015
dc.identifier.citationKleynhans, Neil Taylor, and Etienne Barnard, “Efficient data selection for ASR”, Language Resources and Evaluation, Vol 49, issue 2, pp 327-353, 2015. [http://engineering.nwu.ac.za/multilingual-speech-technologies-must/publications]en_US
dc.identifier.urihttps://link.springer.com/article/10.1007/s10579-014-9285-0
dc.identifier.urihttps://researchspace.csir.co.za/dspace/handle/10204/8181
dc.identifier.urihttp://hdl.handle.net/10394/26491
dc.description.abstractAutomatic speech recognition (ASR) technology has matured over the past few decades and has made significant impacts in a variety of fields, from assistive technologies to commercial products. However, ASR system development is a resource intensive activity and requires language resources in the form of text annotated audio recordings and pronunciation dictionaries. Unfortunately, many languages found in the developing world fall into the resource-scarce category and due to this resource scarcity the deployment of ASR systems in the developing world is severely inhibited. One approach to assist with resource-scarce ASR system development, is to select ‘‘useful’’ training samples which could reduce the resources needed to collect new corpora. In this work, we propose a new data selection framework which can be used to design a speech recognition corpus. We show for limited data sets, independent of language and bandwidth, the most effective strategy for data selection is frequency-matched selection and that the widely-used maximum entropy methods generally produced the least promising results. In our model, the frequency-matched selection method corresponds to a logarithmic relationship between accuracy and corpus size; we also investigated other model relationships, and found that a hyperbolic relationship (as suggested from simple asymptotic arguments in learning theory) may lead to somewhat better performance under certain conditions.en_US
dc.description.sponsorshipCSIR, Meraka Institute, HLT group & North-West University MuST group & North-West Universityen_US
dc.language.isoenen_US
dc.publisherLanguage Resources and Evaluationen_US
dc.subjectAutomatic speech recognitionen_US
dc.subjectindependent of language and bandwidthen_US
dc.subjectEfficient data selectionen_US
dc.subjectResource-scarce categoryen_US
dc.subjectCorpus designen_US
dc.titleEfficient data selection for ASRen_US
dc.typePresentationen_US
dc.contributor.researchID21021287 - Barnard, Etienne


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record