• Login
    View Item 
    •   NWU-IR Home
    • Electronic Theses and Dissertations (ETDs)
    • Economic and Management Sciences
    • View Item
    •   NWU-IR Home
    • Electronic Theses and Dissertations (ETDs)
    • Economic and Management Sciences
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Semi-supervised segmentation within a predictive modelling context

    Thumbnail
    View/Open
    Breed_DG_2017.pdf (4.661Mb)
    Date
    2017
    Author
    Breed, Douw Gerbrand
    Metadata
    Show full item record
    Abstract
    Industry standards and best practices on robust model development have been refined over many years. Even though many software tools are available to simplify the process today, developing a practically implementable model for long-term use still involves substantial human input. Subsequently, any methodologies that aid in the improvement of model accuracy or increase the efficiency with which models can be developed is welcomed by all involved. Segmentation of the data that are used for predictive modelling is a well-established practice in the industry. Segmentation of subjects (i.e. observations or customers) is defined in this study as partitioning of the subjects into distinct groups, or subsets, with the aim of developing predictive models on each of the groups separately. The focus of our study will be on broadening the available techniques that can be used for statistical segmentation. Currently two main streams of statistical segmentation exist in the industry, namely unsupervised and supervised segmentation. Both these streams make intuitive sense, depending on the application and the requirements of the models developed, and many examples exist where the use of either technique improved model performance. However, both these streams focus on a single aspect (i.e. either target separation or explanatory variable distribution) and combining both aspects might deliver better results in some instances. The primary objective of this research is to develop and define a semi-supervised segmentation algorithm as an alternative to the segmentation algorithms currently in use. This algorithm should allow the user, when segmenting for predictive modelling, to not only consider the explanatory variables (as is the case with unsupervised techniques such as clustering) or the target variable (as is the case with supervised techniques such as decision trees), but to be able to optimise both simultaneously during the segmentation exercise. Once we have defined the semi-supervised segmentation algorithm that is based on standard k-means clustering, we comprehensively analyse it by applying it in several different ways. We illustrate visually how the algorithm differs from standard k-means clustering and how it is able to overcome some of the known weaknesses of k-means clustering. We apply the algorithm to actual data sets from various industries and compare the results to results of other known segmentation algorithms on the same data sets. A number of popular non-linear modelling techniques are also applied to the data sets to compare the accuracy of those techniques to the accuracy obtained with the various segmentation techniques. Simulated data serve to identify a few key data set characteristics that may cause one segmentation technique to outperform another. In addition, we define data set characteristics that suit the semi-supervised segmentation technique best. Finally, we propose two alternative semi-supervised segmentation techniques and measure how these techniques perform on the industry data sets already analysed. We, furthermore, augment a supervised clustering technique found in literature and compare its results to all other results obtained.
    URI
    http://hdl.handle.net/10394/25262
    Collections
    • Economic and Management Sciences [3228]

    Related items

    Showing items related by title, author, creator and subject.

    • Thumbnail

      The benefits of segmentation: evidence from a South African bank and other studies 

      Breed, Douw G.; Verster, Tanja (ASSAf, 2017)
      We applied different modelling techniques to six data sets from different disciplines in the industry, on which predictive models can be developed, to demonstrate the benefit of segmentation in linear predictive modelling. ...
    • Thumbnail

      Die ervaring van ondersteuning as funksie van supervisie aan maatskaplike werkers in diens van kinderbeskermingsorganisasies 

      Van Huyssteen, Cecile (2014)
      In the service of child protection organisations, where services are focused on the protection of children within the preservation of families, social workers are exposed to the adverse conditions of children and families ...
    • Thumbnail

      Segmentation by age of triathletes participating in Ironman South Africa 

      Myburgh, Esmarie; Kruger, Martinette; Saayman, Melville (African Journal of Hospitality, Tourism and Leisure, 2014)
      Triathlon organisers should focus on segmenting participants according to different age groups. Age was found to be a significant factor in commitment, motivational and loyalty levels in reviewed research material. The ...

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of NWU-IR Communities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV