Cross-bandwidth adaptation for ASR systems
Abstract
Mismatches between application and training data greatly reduce the performance of automatic speech recognition (ASR) systems. However, collecting suitable amounts of in-domain and application-specific data for training is resource intensive and may not be feasible for resource-scarce environments. Utilising limited amounts of in-domain data and a combination of feature normalisation and acoustic model adaptation techniques has therefore found wide use in ASR systems. Various approaches have been proposed, and it is not clear when to make use of a particular approach given a specific amount for adaptation data. In this work we investigate the use of standard feature normalisation and model adaptation techniques, for the scenario where adaptation between narrow and wide-band environments must be performed. Our investigation focuses on the dependence of the adaptation data amount and various adaptation techniques by systematically varying the adaptation data amount and comparing the performance of various adaptation techniques. From this we establish a guideline which can be used by a ASR developer to choose the best adaptation technique given a size constraint on the adaptation data. In addition, we investigate the effectiveness of a novel channel normalisation techniques and compare the performance with standard normalisation and adaptation techniques.