Automatic genre classification of English students' argumentative essays using support vector machines / by Sabrina Raaff

Boloka/Manakin Repository

Show simple item record

dc.contributor.author Raaff, Sabrina
dc.date.accessioned 2009-05-22T08:28:21Z
dc.date.available 2009-05-22T08:28:21Z
dc.date.issued 2007
dc.identifier.uri http://hdl.handle.net/10394/1810
dc.description Thesis (M.A. (Afrikaans and Dutch))--North-West University, Potchefstroom Campus, 2008.
dc.description.abstract Automatic text classification refers to the classification of texts according to topic. Similar to text classification is the automatic classification of texts based on stylistic aspect of texts, such as automatic genre classification, where texts are classified according to their genre. This is the classification task that concerns this research project.* The project seeks to examine the genre of the argumentative essay, in order to develop a genre classifier, using an automatic genre classification approach, which will categorise prototypical and non-prototypical argumentative essays of student writers, into 'good' or 'bad' examples of the genre (binary classification). It is intended that this classifier will allow a senior marker (for example, a lecturer) to give student essays classified 'good' (those that require less feedback and volume of expert correction) to junior markers (for example, teaching assistants). This would afford the senior marker time to pay more attention to essays of a 'poorer' quality. The corpus used for the research project is comprised of 346 argumentative essays drawn from a section of the British Academic Written English corpus and written by LI English students. The data are composed of counts of linguistic features extracted from the texts. Once these features were extracted from the texts they were used to create four data sets: a raw data set, composed of raw feature frequencies, a data set composed of the feature set normalised for text length, a data set composed of inverse document frequency counts, and a data set composed of a logarithmic transformation of the feature frequencies. Various classifiers were built making use of these four data sets, using a machine learning approach. In this way, a classifier is trained on previous examples, in order to predict the class of future examples. The project uses support vector machines in STATISTICAL implementation of support vector machines, the STATISTIC A Support Vector Machine module (Statsoft, 2006). Support vector machine learning is used because this technique has been shown to perform well in automatic genre classification studies and other classification tasks.
dc.publisher North-West University
dc.title Automatic genre classification of English students' argumentative essays using support vector machines / by Sabrina Raaff en
dc.type Thesis en
dc.description.thesistype Masters

Files in this item

This item appears in the following Collection(s)

  • ETD@PUK [6252]
    This collection contains the original digitized versions of research conducted at the North-West University (Potchefstroom Campus)

Show simple item record

Search the NWU Repository

Advanced Search


My Account