New statistical approaches for the assessment of metabolomics data
The aim of this PhD study was primarily to develop statistical methods that can accommodate the characteristics of metabolomics data, as well as assist in answering the underlying biological questions. However, to identify where a contribution could be made required an understanding of metabolomics data and the statistical methods applied in practice. This, in turn, required interaction with a metabolomics investigation and so the novel application and/or combination of existing statistical methods became a secondary aim. A longitudinal, intervention-based metabolomics study, with a crossover design, was selected for this purpose. To make the primary aim of this thesis achievable, it was necessary to understand the different approaches to research in statistics. New statistical theory can be developed without reflexion on the application of such developments, that is, for whom a new or expanded method may be of use. However, if research in statistics commences separately from an area of application, developments may not cater for the specific requirements of that area. New statistical theory can also be developed to accommodate specific characteristics of data or to answer questions specific to a given area of application or discipline, that is, context centred statistical research. This thesis then firstly, explores the implications of these two diverse approaches from a theoretical perspective. Context centred statistical research is explored in greater depth as a transdisciplinary approach in the context of metabolomics. Metabolomics is the study of the interactions between endo- and exogenous stimuli (such as lifestyle or disease) and metabolic pathways of a living organism through the metabolites formed. The interactions between statistics and metabolomics are explored next, for the various steps in the knowledge production process, to understand how such transdisciplinary endeavours may be executed. Metabolomics data are known to: (i) have many times more variables than cases; (ii) exhibit severe multicollinearity; (iii) have unequal sample sizes for experimental groups; (iv) have large proportions of missing values; (v) present with skewed distributions; and (vi) exhibit high levels of natural variation. These characteristics make the statistical analysis of metabolomics data challenging. To illustrate this and to achieve the secondary aim of this thesis, three publications, describing the design and analysis of data sets relating to a longitudinal crossover alcohol intervention study, are included. The challenging nature of metabolomics data and the limited number of statistical methods to accommodate such data presents many opportunities to develop or expand upon statistical methods. To achieve the primary aim of this thesis, two publications are included to demonstrate how interaction with contextualized data can generate new ideas, culminating in new methods. The thesis culminates in a reflection on the dynamics of transdisciplinary research to conclude that a context centred approach to research in statistics does not only favour the context or the end-users of statistical methods, but can also act as a muse, inspiring new innovations in statistics. Finally, the thesis concludes with an outlook on future avenues that may be explored given the work presented here.