Search
Now showing items 1-10 of 48
The Spoken Web Search task at Mediaeval 2012
(Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, 2013)
In this paper, we describe the “Spoken Web Search” Task, which
was held as part of the 2012 MediaEval benchmark evaluation campaign.
The purpose of this task was to perform audio search with audio
input in four languages, ...
The NCHLT Speech Corpus of the South African languages
(Workshop Spoken Language Technologies for Under-resourced Languages (SLTU), 2014)
The NCHLT speech corpus contains wide-band speech from approximately
200 speakers per language, in each of the eleven
official languages of South Africa. We describe the design and
development processes that were ...
Stride and translation invariance in CNNs
(Southern African Conference for Artificial Intelligence Research, 2020)
Convolutional Neural Networks have become the standard for image classification tasks, however, these architectures are not invariant to translations of the input image. This lack of invariance is attributed to the use of ...
Tracking translation invariance in CNNs
(Southern African Conference for Artificial Intelligence Research, 2020)
Although Convolutional Neural Networks (CNNs) are widely used, their translation invariance (ability to deal with translated inputs) is still subject to some controversy. We explore this question using translation-sensitivity ...
Language Independent Search in MediaEval's Spoken Web Search Task
(Elsevier Ltd., 2014)
In this paper, we describe several approaches to language-independent spoken term detection and compare their performanceon a common task, namely “Spoken Web Search”. The goal of this part of the MediaEval initiative is ...
Processing spoken lectures in resource-scarce environments
(Pattern Recognition Association of South Africa and Mechatronics International Conference, 2011)
Initial work towards processing Afrikaans spoken
lectures in a resource-scarce environment is presented. Two
approaches to acoustic modeling for eventual alignment are
compared: (a) using a well-trained target-language ...
Analysing co-articulation using frame-based feature trajectories
(Pattern Recognition Association of South Africa and Mechatronics International Conference, 2010)
We investigate several approaches aimed at a more
detailed understanding of co-articulation in spoken utterances.
We find that the Euclidean difference between instantaneous
frame-based feature values and the mean values ...
G2P variant prediction techniques for ASR and STD
(Interspeech 2013, 2013)
Introducing pronunciation variants into a lexicon is a balancing
act: incorporating necessary variants can improve automatic
speech recognition (ASR) and spoken term detection (STD)
performance by capturing some of the ...
Kullback-Leibler divergence-based ASR training data selection
(Interspeech 2011, 2011)
Data preparation and selection affects systems in a wide range
of complexities. A system built for a resource-rich language
may be so large as to include borrowed languages. A system
built for a resource-scarce language ...
Trajectory behaviour at different phonemic context sizes
(Pattern Recognition Association of South Africa and Mechatronics International Conference, 2011)
We propose a piecewise-linear model for the temporal trajectories
of Mel Frequency Cepstral Coefficients during phone transitions.
As with conventional Hidden Markov Models, the parameters of the
model can be estimated ...