Show simple item record

dc.contributor.authorVan Huyssteen, Gerhardus Beukes
dc.contributor.authorPilon, Suléne
dc.contributor.authorPuttkammer, Martin Johannes
dc.date.accessioned2010-05-11T09:52:34Z
dc.date.available2010-05-11T09:52:34Z
dc.date.issued2008
dc.identifier.citationVan Huyssteen, G.B. et al. 2008. Die ontwikkeling van 'n woordafbreker en kompositumanaliseerder vir Afrikaans. Literator : Journal of Literary Criticism, Comparative Linguistics and Literary Studies, 29 (Special Issue 1):21-41. [http://www.literator.org.za/index.php/literator/article/view/99]en
dc.identifier.issn0258-2279
dc.identifier.urihttp://hdl.handle.net/10394/2981
dc.identifier.urihttp://www.literator.org.za/index.php/literator/article/view/99
dc.description.abstractThe development of two core-technologies for Afrikaans, viz. a hyphenator and a compound analyser is described in this article. As no annotated Afrikaans data existed prior to this project to serve as training data for a machine learning classifier, the core-technologies in question are first developed using a rule-based approach. The rule-based hyphenator and compound analyser are evaluated and the hyphenator obtains an fscore of 90,84%, while the compound analyser only reaches an f-score of 78,20%. Since these results are somewhat disappointing and/or insufficient for practical implementation, it was decided that a machine learning technique (memory-based learning) will be used instead. Training data for each of the two core-technologies is then developed using “TurboAnnotate”, an interface designed to improve the accuracy and speed of manual annotation. The hyphenator developed using machine learning has been trained with 39 943 words and reaches an fscore of 98,11% while the f-score of the compound analyser is 90,57% after being trained with 77 589 annotated words. It is concluded that machine learning (specifically memory-based learning) seems an appropriate approach for developing coretechnologies for Afrikaans.
dc.description.urihttp://search.sabinet.co.za/WebZ/Authorize?sessionid=0&next=ej/ej_content_literat.html&bad=error/authofail.html
dc.languageAfrikaans
dc.publisherBuro vir Wetenskaplike Tydskrifte = Bureau of Scholarly Journalsen
dc.titleDie ontwikkeling van 'n woordafbreker en kompositumanaliseerder vir Afrikaans
dc.typeArticleen
dc.contributor.researchID10215484 - Van Huyssteen, Gerhardus Beukes
dc.contributor.researchID11313099 - Puttkammer, Martin Johannes


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record