Statistik: mehr als Erbsen zählen

You are here:

Project P1: Determination of the minimum effective dose from high-dimensional expression data with statistical learning methods

In project P1 we will investigate dose-response curves in which the target variable is a molecular variable, for example the expression of a specific gene.

An important goal in the analysis of dose-response curves in toxicology is the determination of a minimum concentration at which the target variable exceeds a given threshold value. Often statistical testing approaches are used, without fitting dose-response curves. Then, for a number of pre-determined and tested concentrations, the smallest concentration is determined, at which the threshold is exceeded or statistically significantly exceeded. Dose-response curves have the advantage that values between tested concentrations can be interpolated. For this purpose, parametric models and mixtures of such are available in the literature (Ritz et al., 2015; cf. also project P2). Here, the variance of the estimated minimum effective concentration can be determined from the variance of the parameter estimates.

An extension of this procedure consists of collecting genome-wide data and then determining the minimum effective dose from these high-dimensional measurements. In a first approach, for example, genes in a functional context can be considered as a group and the minimum effective dose can be determined from the entire set, either as the lowest concentration at which at least one gene exceeds the threshold, or by combining all single thresholds to a central value. The result with high biological interpretation value is the lowest concentration at which a certain molecular process is influenced by the treatment. Alternatively, clustering approaches can be used to identify genes with similar curves, and then the corresponding thresholds are combined.

The calculated critical values can be used for another important application, namely for the medically relevant prediction of the therapeutic concentration of a compound. In preliminary work, as part of a master thesis, we carried out analyses in which several measurements of cell activity as well as gene expression measurements for seven genes were available. We then investigated how these values could be combined for the best possible prediction of the therapeutic concentration, essentially using simple approaches in which the minimum threshold is calculated from the experiments with the seven genes. In order to evaluate the success of this strategy, it is possible to use measures of quality for regression models if the predicted therapeutic concentration is compared with the observed concentration over many compounds. Further, we have developed measures to quantify whether the therapeutic concentration of toxic compounds is not underestimated and whether a distinction can be made between toxic and non-toxic compounds (Albrecht et al., 2019).

In this project, we will use statistical learning methods (e.g., support vector machines, regression trees, Random Forests) to optimize such predictions based on a large number of genes. The individual gene thresholds must be combined in the best possible way. At the same time, as usual for statistical learning methods applied to high-dimensional data, overfitting due to the large number of possible combinations must be avoided. In total, this approach will result in methods for identifying those genes from the large number of candidates with which the critical concentration of a compound can be predicted in the best possible way.


  • Albrecht W, Kappenberg F, Brecklinghaus T, Stöber R, …, Rahnenführer J, Hengstler JG (2019). Prediction of human drug-induced liver injury (DILI) in relation to oral doses and blood concentrations. Arch Toxicol 93(6), 1609-1637, doi:10.1007/s00204-019-02492-9.
  • Gu X, Albrecht W, Edlund K, Kappenberg F, Rahnenführer J, Leist M, et al. (2018). Relevance of the incubation period in cytotoxicity testing with primary human hepatocytes. Archives of Toxicology 92(12), 3505–3515, doi:10.1007/s00204-018-2302-0.
  • Ritz C, Baty F, Streibig JC, Gerhard D (2015). Dose-response analysis using R. PLOS ONE 10(12), e0146021, doi: 10.1371/journal.pone.0146021.