A centralised data management system for the South African mining and industrial sectors
The digital universe is expanding at an exponential rate, and there is a greater variety of data. This phenomenon, known as big data, is making an impact in all industries - specifically in organisations. Organisations can become more efficient, profitable, and deliver better products and services by adopting big data. The traditional relational data systems of organisations are, however, inadequate for big data. Furthermore, improper data management may have caused inconsistent and duplicate data, known as data silos. This coupled with the fact that big data adoption is still in its infancy, created the need for this study. Research in big data is still ongoing, and this study, therefore, adds to the body of knowledge. This study's objective is to illustrate the value of big data to increase its adoption, and specifically in the South African mining and industrial sectors. Therefore, research was performed regarding the design, development, and implementation of a centralised data management system. This system is at its core a big data system aimed at improving the efficiency of organisations. Security was also identified as a research area, which necessitated its inclusion as a system design objective. Through the design and development of the system, a practical framework is provided to assist organisations in employing big data. The literature study investigated NoSQL data stores for use in big data systems. Big data system architectures were discovered as used in system design. Next, industry experience was sought to make a (big data) system's functionalities available to users and systems. From this industry knowledge, microservices and containers were identified and studied. The final part of the literature study evaluated NoSQL software to be used in the proposed centralised data management system. This evaluation led to the decision to use MongoDB as the data store in the proposed system. The architecture of the system consisted of three layers, namely, the resource, service, and interface layers. In the resource layer, Mesosphere DC/OS was used to create a cluster, thereby providing computing resources to the other layers. The service layer used MongoDB, Apache Spark, and the Python programming language to provide the various (micro) services of the system. Interaction with the system was done through the interface layer. Thus, the technologies of the interface layer were web service software, namely, Apache Zeppelin and a Windows Communication Foundation web service. The system was successfully implemented at an engineering services company with multiple clients in the South African mining and industrial sectors. The system either supported more users, or a quicker performance for the same number of users than the company's previous system. For the same number of users, the system achieved at least a 24.72% performance increase. Most importantly, the system used transport layer security (TLS) 1.2 with user authentication and message integrity. Further validation of the system was provided in a journal article that forms part of this dissertation and was written by the author. This journal article is given in Appendix 4. The case study proved that implementing big data improves organisational efficiency. The privacy and security of the organisation's data were ensured. Two other benefits of the system were its support for structured, unstructured, and partially structured data, as well as the volume of big data. The developed system can be extended to other industries to increase efficiency and productivity. Future organisational big data projects can be initiated by using the created system as a starting point.
- Engineering