A combination part of speech tagger using selected voting methods
Abstract
The development of resources in any language is an expensive process, many languages, including the indigenous languages of South Africa, can be classified as being resource scarce, or lacking in tagging resources. This study investigates and applies techniques and methodologies for optimising the use of available resources and improving the accuracy of a tagger using Afrikaans as resource-scarce language and aims to determine whether combination techniques can be effectively applied to improve the accuracy of a tagger for Afrikaans. In order to do this, existing methodologies for combining classification algorithms are investigated. Four taggers, trained using MBT, SVM 1ight , MXPOST and TnT respectively, are then combined into a combination tagger using weighted voting. Weights are calculated by means of total precision, tag precision and a combination of precision and recall. Although the combination of taggers does not consistently lead to an error rate reduction with regard to the baseline, it manages to achieve an error rate reduction of up to 14.54% in some cases
URI
http://hdl.handle.net/10394/34503https://ieeexplore.ieee.org/abstract/document/9015872
https://doi.org/10.1109/IMITEC45504.2019.9015872