1 Introduction
A lot of computational models applied in the business and industry, using multidimensional data, obtained by measuring, sometimes require to fulfil expensive simulations. In these cases, the kriging method is proposed to increase the efficiency of solving modelling, prediction and optimization problems (Jones, 2001; Forrester and Keane, 2009; Xiao et al., 2018). Since the kriging method was developed as a consequence of its accurate predictions (Krige, 1951), it became one of the most promising numerical approaches in the various fields of engineering design, spatial statistics, experimental design, and so on (Kwon and Choi, 2015; Bhosekar and Ierapetritou, 2018; Carpio et al., 2017). In general, kriging is a method of interpolation for which the interpolated values are modelled by a multivariate Gaussian process, governed by prior covariances. The basic idea of kriging is to predict the value of a response function at a given point by computing a weighted average of the known values of this function in the neighbourhood of the point. Thus, kriging gives a way of anticipating, with some probability, a result associated with values of the parameters that have never been met before using the existing information (e.g. the experimental measurements).
The first step in kriging is to create a stochastic model that likely best describes the set of observed data. The assumption is usually made that the sample values are sufficiently homogeneous, as well as the correlation between two random points solely depends on the distance between them, but it is independent of their location. In this way, the stochastic homogeneous and isotropic model for kriging is constructed using the correlation functions that have to obey the conditions of positive definiteness according to Bochner’s theorem (Abrahamsen, 1997). Following this approach, many models in spatial statistical research and applications are developed using the Whittle-Matern correlation family (Abrahamsen, 1997; Guttorp and Gneiting, 2005), Gneiting correlation class (Guttorp and Gneiting, 2005), etc. In this paper, the application of Euclidean distance matrices (EDM) to multivariate data modelling and kriging is considered, doing it without the standard way based on positive definite correlation functions.
EDM’s have received increased attention due to their applications in recently active fields of research, such as molecular conformation in bioinformatics, dimensionality reduction in machine learning and statistics, semidefinite programming, wireless sensor network localization (Pham et al., 2008; Ghiasi et al., 2018; Koo et al., 2018; Qian et al., 2008). However, in recent years square EDM’s are mostly studied in literature (Schoenberg, 1935; Gower, 1984; Weinberger et al., 2004), although the Euclidean geometry in the real world is dealing with a square root, i.e. the fractional degree of distance squares. Moreover, fractional degrees of distance squares appear in some applications as well, however many fundamental issues about their properties remain unstudied. Therefore, this work focuses on the geometry of Fractional Euclidean Distance Matrices (FEDM) and their application in the multidimensional data analysis using kriging in the context of computer modelling and solving of practical examples. Sakalauskas (2013) considered a random field with a positive definite correlation function tending its scale parameter to zero as well as infinitely increasing the variance of field, and in this way has derived that the limiting kriging method relates with inverse FEDM’s (see, also Remark 1 below). Usually EDM’s are studied using the Gram matrix of the observed data set, called a kernel matrix (Gower, 1984, etc.). Using the kernel matrix method (Pozniak and Sakalauskas, 2017) has shown FEDM with a strongly fractional degree of distance squares to be nonsingular, and, consequently, the derived kriging method in can be used correctly. In this paper, the multinormal data model and the kriging method are developed taking covariances proportional to the elements of kernel matrix, and, thus, avoiding the standard way, based on positive definite correlation functions. Once constructed, the kriging model can be used for data modelling, prediction, and optimization. The objective is to discuss and demonstrate the applicability of the kriging technique, based on FEDM, and to verify the computational efficiency of the developed approach.
2 Geometrical Properties of FEDM
Let us assume that a data set $X=({x_{1}},{x_{2}},\dots ,{x_{K}})$ of K vectors ${x_{i}}\in {\mathtt{R}^{d}}$, $1\leqslant i\leqslant K$, $d\geqslant 1$ be given, at which the values of some response function $Y={({y_{1}},{y_{2}},\dots ,{y_{K}})^{\mathsf{T}}}$ are used, obtained by physical measurement, computer simulation, etc. Denote the $K\times K$ matrix of fractional degrees δ of Euclidean distance squares among pairs of vectors by
where $\| {x_{i}}-{x_{j}}\| ={({x_{i}}-{x_{j}})^{\mathsf{T}}}\cdot ({x_{i}}-{x_{j}})$, ${x_{i}}\in {\mathtt{R}^{d}}$, $1\leqslant i,j\leqslant K$, $0\leqslant \delta \leqslant 1$. Note that $\delta =1/2$ in a special case of usual Euclidean distances. Recently the properties of matrices of Euclidean distance squares are studied most, i.e. as $\delta =1$ (Schoenberg, 1935; Gower, 1984; Weinberger et al., 2004). Usually the main tool for EDM study is the Gram matrix of data set X, called a kernel matrix. Thus, following this approach, the matrix
is introduced, called hereinafter a kernel matrix, where I denotes $K\times K$ unit matrix, E denotes K-dimensional vector-column of units, $s\in {\mathtt{R}^{K}}$ is a certain vector-column.
Proposition 1.
Let ${x_{i}},{x_{j}}\in X$, ${x_{i}}\ne {x_{j}}$, if $i\ne j$, $1\leqslant i,j\leqslant K$, $0\leqslant \delta <1$, ${s^{\mathsf{T}}}\cdot E=1$. Then, kernel matrix (1) is positively semi-definite of rank $K-1$.
The proof can be seen in Pozniak and Sakalauskas (2017), Theorem 1. Note that, if $\delta =1$, then the rank of kernel matrix can be less $K-1$.
Proposition 2.
Assume A is a FED matrix, ${s_{1}},{s_{2}}\in {\mathtt{R}^{K}}$, ${{s_{1}}^{\mathsf{T}}}\cdot E=1$, ${{s_{2}}^{\mathsf{T}}}\cdot E=1$. Then,
where
The proposition implied by an easily verified equality of idempotence as
Let us consider FEDM geometrical properties more.
Proposition 3.
Under the conditions of Proposition 1, the embedded set $Z={({z_{1}},{z_{2}},\dots ,{z_{K}})^{\mathsf{T}}}$ exists such that ${z_{K}}\in {\mathtt{R}^{K-1}}$, Z is a full rank matrix, and
Proof.
Meanwhile, according to Proposition 1, one can get $F=\tilde{Z}\cdot {\tilde{Z}^{\mathsf{T}}}$ by factorization of the kernel matrix. Then, the embedded set is obtained taking first $K-1$ columns of set $\tilde{Z}$, the rank of which strictly $K-1$.
Note that (1) implies
Next, by means of the latter formula after simple manipulations one can make sure that the proposition is true, because:
\[\begin{array}{l}\displaystyle \frac{1}{2}\cdot {\big[{({z_{i}}-{z_{j}})^{\mathsf{T}}}\cdot ({z_{i}}-{z_{j}})\big]_{i,j=1}^{K}},\\ {} \displaystyle \hspace{1em}=\frac{{[{{z_{i}}^{\mathsf{T}}}\cdot {z_{i}}]_{i,j=1}^{K}}+{[{{z_{j}}^{\mathsf{T}}}\cdot {z_{j}}]_{i,j=1}^{K}}}{2}-{\big[{{z_{i}}^{\mathsf{T}}}\cdot {z_{j}}\big]_{i,j=1}^{K}}\\ {} \displaystyle \hspace{1em}=\frac{\mathit{diag}(F)\cdot {E^{\mathsf{T}}}+E\cdot \mathit{diag}{(F)^{\mathsf{T}}}}{2}-F\\ {} \displaystyle \hspace{1em}=E\cdot {s^{\mathsf{T}}}\cdot A+A\cdot s\cdot {E^{\mathsf{T}}}-{s^{\mathsf{T}}}\cdot A\cdot s\cdot E\cdot {E^{\mathsf{T}}}+\big(I-E\cdot {s^{\mathsf{T}}}\big)\cdot A\cdot (I-s\cdot {E^{\mathsf{T}}})\\ {} \displaystyle \hspace{1em}=A.\end{array}\]
Hence, the embedded set presents itself as a nonsingular $K-1$-dimensional simplex. □
Corollary 1.
Let us consider the factorization $F=Z\cdot {Z^{\mathsf{T}}}$ of the kernel matrix F, constructed according to assumptions of Proposition 1, where Z is the embedded set. Assume ${s_{c}^{\mathsf{T}}}\cdot E=1$, ${s_{c}}\in {\mathtt{R}^{K}}$, and denote ${z_{c}}={Z^{\mathsf{T}}}\cdot {s_{c}}$. Then, according to Proposition 2:
\[\begin{aligned}{}F=& -\big(I-E\cdot {{s_{c}}^{\mathsf{T}}}\big)\cdot A\cdot \big(I-{s_{c}}\cdot {E^{\mathsf{T}}}\big)=\big(I-E\cdot {{s_{c}}^{\mathsf{T}}}\big)\cdot F\cdot \big(I-{s_{c}}\cdot {E^{\mathsf{T}}}\big)\\ {} =& \big(I-E\cdot {{s_{c}}^{\mathsf{T}}}\big)\cdot Z\cdot {Z^{\mathsf{T}}}\cdot \big(I-{s_{c}}\cdot {E^{\mathsf{T}}}\big)=\big(Z-E\cdot {{z_{c}}^{\mathsf{T}}}\big)\cdot \big({Z^{\mathsf{T}}}-{z_{c}}\cdot {E^{\mathsf{T}}}\big).\end{aligned}\]
Hence, Corollary 1 shows us the geometrical sense of vector s, that displays the origin of coordinates to the point ${z_{c}}$ in the embedded space. Therefore, if ${s^{\mathsf{T}}}\cdot E=1$, it is reasonable to call s a centering vector.
The next corollary follows from (1).
Corollary 2.
If under the conditions of Proposition 1, the origin in the embedded space is displaced at the point ${z_{1}}$, namely, $s=(1,0,0,\dots ,0)$, then the elements of the kernel matrix are as follows:
Corollary 3.
Under the conditions of Proposition 1, the centering vector $s=E/K$ minimizes the trace of kernel matrix.
Proof.
Indeed, after simple manipulations it follows:
\[\begin{aligned}{}tr(F)=& tr\big(\big(Z-E\cdot {{z_{c}}^{\mathsf{T}}}\big)\cdot \big({Z^{\mathsf{T}}}-{z_{c}}\cdot {E^{\mathsf{T}}}\big)\big)\\ {} =& tr\big(\big(I-E\cdot {s^{\mathsf{T}}}\big)\cdot Z\cdot {Z^{\mathsf{T}}}\cdot \big(I-s\cdot {E^{\mathsf{T}}}\big)\big)\\ {} =& -tr\big(\big(I-E\cdot {s^{\mathsf{T}}}\big)\cdot A\cdot \big(I-s\cdot {E^{\mathsf{T}}}\big)\big)=2\cdot {s^{\mathsf{T}}}\cdot A\cdot E-K\cdot {s^{\mathsf{T}}}\cdot A\cdot s.\end{aligned}\]
By differentiating the latter expression with respect to s and equating the obtained derivative to zero, one can be sure of the truth of corollary. □Corollary 4.
If the conditions of Proposition 1 are valid, then ${E^{\mathsf{T}}}\cdot {A^{-1}}\cdot E\ne 0$, besides, the centering vector
displays the origin of coordinates in the centre of the circumscribed sphere of the embedded simplex.
Proof.
Note that the diagonal of F consists of squares of distances from vertices of the embedded simplex to the origin in the embedded space. By means of (1), one can easily verify that $\mathit{diag}(F)=2\cdot A\cdot s-({s^{\mathsf{T}}}\cdot A\cdot s)\cdot E$. Since the distances from vertices to the centre of the circumscribed sphere should be the same, find the centering vector s, i.e. which obeys the condition $2\cdot A\cdot s-({s^{\mathsf{T}}}\cdot A\cdot s)\cdot E=r\cdot E$, where r is the radius of the circumscribed sphere. Thus, the latter condition implies that
because the inverse of matrix A exists, as proved in Pozniak and Sakalauskas (2017), Theorem 2, and, therefore, it can be used correctly. Then, using the normalization condition one can derive
The latter equality means 1) that ${E^{\mathsf{T}}}\cdot {A^{-1}}\cdot E\ne 0$, 2) it helps us to establish:
and finally,
□
Let us consider the inverse and determinant of FEDM more in detail. Denote by $({x_{0}},X)$ the set obtained adding some point ${x_{0}}\in {\mathtt{R}^{d}}$ to the observation set X, as well as the respective vector of fractional degrees of Euclidean distance squares between ${x_{0}}$ and X by
where, due to (1):
\[ a={\big({\| {x_{1}}-{x_{0}}\| ^{\delta }},{\| {x_{2}}-{x_{0}}\| ^{\delta }},\dots ,{\| {x_{K}}-{x_{0}}\| ^{\delta }}\big)^{\mathsf{T}}}.\]
Now, define the extended kernel matrix, respective to FEDM $\tilde{A}=\left[\begin{array}{c@{\hskip4.0pt}c}0& {a^{\mathsf{T}}}\\ {} a& A\end{array}\right]$ of the set $({x_{0}},X)$, calculated at the centering vector ${\tilde{s}^{\mathsf{T}}}=({s^{\prime }},{s^{\mathsf{T}}})$ and presented in the block matrix:
(2)
\[ \tilde{F}=\left[\begin{array}{c@{\hskip4.0pt}c}\nu \hspace{1em}& {f^{\mathsf{T}}}\\ {} f\hspace{1em}& F\end{array}\right]\equiv -\left[\begin{array}{c@{\hskip4.0pt}c}1-{s^{\prime }}\hspace{1em}& -{s^{\mathsf{T}}}\\ {} -{s^{\prime }}\cdot E\hspace{1em}& S\end{array}\right]\cdot \left[\begin{array}{c@{\hskip4.0pt}c}0\hspace{1em}& {a^{\mathsf{T}}}\\ {} a\hspace{1em}& A\end{array}\right]\cdot \left[\begin{array}{c@{\hskip4.0pt}c}1-{s^{\prime }}\hspace{1em}& -{s^{\prime }}\cdot {E^{\mathsf{T}}}\\ {} -s\hspace{1em}& {S^{\mathsf{T}}}\end{array}\right],\]Then next theorem relates the inverse and determinant of FEDM with that of the kernel matrix.
Theorem 1.
Assume FEDM
where
\[ A={\big[{\big({({x_{i}}-{x_{j}})^{\mathsf{T}}}\cdot ({x_{i}}-{x_{j}})\big)^{\delta }}\big]_{i,j=1}^{K}},\]
the vector of fractional degrees of distances
as well as the centering vector $({s^{\prime }},s)$ to be given, where
\[\begin{array}{l}\displaystyle {x_{i}},{x_{j}}\in {\mathtt{R}^{d}},\hspace{1em}{x_{i}}\ne {x_{j}},\hspace{1em}i\ne j,\hspace{1em}0\leqslant i,j\leqslant K,\hspace{1em}K>1,\\ {} \displaystyle 0\leqslant \delta <1,\hspace{1em}d\geqslant 1,\hspace{1em}s\in {\mathtt{R}^{K}},\hspace{1em}{s^{\prime }}+{E^{\mathsf{T}}}\cdot s=1.\end{array}\]
Then the inverse of $K\times K$ lower-right submatrix of kernel matrix (2) is as follows:
(7)
\[ {F^{-1}}=M=\frac{{A^{-1}}\cdot E\cdot {E^{\mathsf{T}}}\cdot {A^{-1}}}{{E^{\mathsf{T}}}\cdot {A^{-1}}\cdot E}-{A^{-1}}+\frac{q\cdot {q^{\mathsf{T}}}}{D},\](8)
\[ q=s+{s^{\prime }}\cdot {A^{-1}}\cdot \bigg(a+E\cdot \frac{1-{E^{\mathsf{T}}}\cdot {A^{-1}}\cdot a}{{E^{\mathsf{T}}}\cdot {A^{-1}}\cdot E}\bigg),\]3 Creation of the Gaussian Random Field Using FEDM
Statistical approach for modelling the response function by a random field has the origin in the works of Kushner (1964), Žilinskas (1985), and Mockus (1989). In this section, a probabilistic model of response surface (scalar function) is developed, the values of which are collected by observation, physical measurements or computer simulation, etc. Due to the fact that there are no further data except the measurement performance at the experimental points, the surface which represents the objective response function can be designed as a homogeneous Gaussian random field (GRF) $Z(x,\omega )$, which for each point in the variable space $x\in {\mathtt{R}^{d}}$, is a measurable function of random event $\omega \in (\Omega ,\Sigma ,P)$ in some probability space (Mockus, 1989; Jones, 2001; Sakalauskas, 2013). To model the observed data in a probabilistic way, one has to define the probability distribution of response surface values $Z(x,\omega )=(Z({x_{1}},\omega ),Z({x_{2}},\omega ),\dots ,Z({x_{K}},\omega ))$ in the given set of points $X=({x_{1}},{x_{2}},\dots ,{x_{K}})$. Consider this probability distribution with the constant mean vector:
and covariance matrix
where μ and β are parameters, $\beta >0$, and F is a positively defined matrix.
(13)
\[ E\big(Z(x,\omega )-\mu \cdot E\big)\cdot {\big(Z(x,\omega )-\mu \cdot E\big)^{\mathsf{T}}}={\beta ^{2}}\cdot F,\]Taking into account geometrical properties of FEDM, explored in the previous section, choose matrix F as a lower-right submatrix of kernel matrix (2), calculated using some point ${x_{0}}$ and centering vector $({s^{\prime }},s)$ such that ${s^{\prime }}+{E^{\mathsf{T}}}\cdot s=1$, ${s^{\prime }}\ne 0$. Note that matrix F is positively defined according to Proposition 1.
Thus, the multivariate Gaussian density is written down as well:
However, as one can easily see that this density, using denotations of the previous section and (7)–(11), can be rewritten in an equivalent manner:
(15)
\[ {p_{{x_{0}},X}}(Y)=\frac{\exp \big[-\frac{1}{2\cdot {\beta ^{2}}}\cdot {Y^{\mathsf{T}}}({A^{-1}}-\frac{{A^{-1}}\cdot E\cdot {E^{\mathsf{T}}}\cdot {A^{-1}}}{{E^{\mathsf{T}}}\cdot {A^{-1}}\cdot E})\cdot Y-\frac{{({Y^{\mathsf{T}}}\cdot q-\mu )^{2}}}{2\cdot {\beta ^{2}}\cdot D}\big]}{{(2\pi )^{K/2}}\cdot {\beta ^{K}}\cdot {({(-1)^{K+1}}\cdot |A|\cdot D\cdot {E^{\mathsf{T}}}\cdot {A^{-1}}\cdot E)^{1/2}}}.\]In order to describe the random field by its finite-dimensional (cumulative) distributions, latter ones have to obey the consistency conditions of symmetry and compatibility (Abrahamsen, 1997). This is done by the next theorem under the appropriate choice of centering vector.
Theorem 2.
The random Gaussian field $Z(x,\omega )$ exists in some probability space $(\Omega ,\Sigma ,P)$, having the functions
as its finite-dimensional densities of distribution of ${Y_{K}}={({y_{1}},{y_{2}},\dots ,{y_{K}})^{\mathsf{T}}}$, ${y_{i}}=Z({x_{i}},\omega )$, where ${B_{K}}={E_{K}^{\mathsf{T}}}\cdot {A_{K}^{-1}}\cdot {E_{K}}$, ${C_{K}}=1-{E_{K}^{\mathsf{T}}}\cdot {A_{K}^{-1}}\cdot {a_{K}}$, ${D_{K}}={a_{K}^{\mathsf{T}}}\cdot {A_{K}^{-1}}\cdot {a_{K}}$, ${X_{K}}=({x_{1}},{x_{2}},\dots ,{x_{K}})$ is a sequence of mutually disjoint points ${x_{i}}\in {\mathtt{R}^{d}}$, and disjoint with ${x_{0}}\in {\mathtt{R}^{d}}$ as well, $1\leqslant i\leqslant K$,
(16)
\[\begin{array}{l}\displaystyle {p_{{x_{0}},{X_{K}}}}({Y_{K}})\\ {} \displaystyle \hspace{1em}=\frac{\exp \big[-\frac{1}{2\cdot {\beta ^{2}}}\cdot {Y_{K}^{\mathsf{T}}}\big({A_{K}^{-1}}-\frac{{A_{K}^{-1}}\cdot {E_{K}}\cdot {E_{K}^{\mathsf{T}}}\cdot {A_{K}^{-1}}}{{B_{K}}}\big)\cdot {Y_{K}}-\frac{{({Y_{K}^{\mathsf{T}}}\cdot {A_{K}^{-1}}\cdot ({a_{K}}+{E_{K}}\cdot \frac{{C_{K}}}{{B_{K}}})-\mu )^{2}}}{2\cdot {\beta ^{2}}\cdot ({D_{K}}-\frac{{C_{K}^{2}}}{{B_{K}}})}\big]}{{(2\pi )^{K/2}}\cdot {\beta ^{K}}\cdot {({(-1)^{K+1}}\cdot |{A_{K}}|\cdot ({D_{K}}\cdot {a_{K}}\cdot {B_{K}}-{C_{K}^{2}}))^{1/2}}}\end{array}\]
\[\begin{array}{l}\displaystyle {A_{K}}={\big[{\big(\| {x_{i}}-{x_{j}}\| \big)^{\delta }}\big]_{i,j=1}^{K}},\\ {} \displaystyle {a_{K}}={\big({\| {x_{1}}-{x_{0}}\| ^{\delta }},{\| {x_{2}}-{x_{0}}\| ^{\delta }},\dots ,{\| {x_{K}}-{x_{0}}\| ^{\delta }}\big)^{\mathsf{T}}},\end{array}\]
${E_{K}}={(1,1,\dots ,1)^{\mathsf{T}}}$, ${E_{K}}\in {\mathtt{R}^{K}}$, $K=2,3,\dots $ , μ and β are parameters, $\beta >0$.
Proof.
One can notice the density (16) follows from (14), and, consequently, (15), taking as the lower-right submatrix of kernel matrix (2), calculated with the centering vector $\tilde{s}=(1,0,0,\dots ,0)$. As noted above, density (14), and (15), (16) respectively, are defined correctly due to Proposition 1. Indeed, ${\textstyle\int _{Y\in {\mathtt{R}^{K}}}}{p_{{x_{0}},X}}(Y)dY=1$.
Next, the symmetry of distributions (14) and, consequently, that of (15) and (16), is shown in the standard way for multinormal distributions using the symmetry of kernel matrix (see Section 1.4.1, Abrahamsen, 1997).
Proving the compatibility, denote the covariance matrices respective to ${X_{K}}$ and ${X_{K+1}}$ by ${F_{K}}$ and ${F_{K+1}}$, that are lower-right submatrices in the respective kernel matrices. Now, one can easily notice that, due to Corollary 2, ${F_{K}}$ is the upper-left submatrix in the decomposition
\[ {F_{K+1}}=\left(\begin{array}{c@{\hskip4.0pt}c}{F_{K}}\hspace{1em}& {f_{K}}\\ {} {f_{K}^{\mathsf{T}}}\hspace{1em}& {\nu _{K}}\end{array}\right).\]
Thus, using the standard block matrix operations, one can make sure that:
\[\begin{aligned}{}{p_{{x_{0}},{X_{K+1}}}}({Y_{K+1}})=& {p_{{x_{0}},{X_{K}}}}({Y_{K}})\exp \bigg[\frac{-{({y_{K+1}}-{E_{{x_{0}},{X_{K}}}}(Z({x_{K+1}},\omega )|{Y_{K}}.))^{2}}}{2\cdot {D_{{x_{0}},{X_{K}}}^{2}}(Z({x_{k+1}},\omega )|{Y_{K}}.)}\bigg]\\ {} & \times {\big(2\pi \cdot {D_{{x_{0}},{X_{K}}}^{2}}\big(Z({x_{k+1}},\omega )\big|{Y_{K}}\big)\big)^{-1/2}},\end{aligned}\]
where
\[ {E_{{x_{0}},{X_{K}}}}\big(Z({x_{K+1}},\omega )\big|{Y_{K}}.\big)=\mu +{({Y_{K}}-\mu \cdot {E_{K}})^{\mathsf{T}}}\cdot {F_{K}^{-1}}\cdot {f_{K}}\]
and
\[ {D_{{x_{0}},{X_{K}}}^{2}}\big(Z({x_{k+1}},\omega )\big|{Y_{K}}.\big)={\beta ^{2}}\cdot \big({\nu _{K}}-{f_{K}^{\mathsf{T}}}\cdot {F_{K}^{-1}}\cdot {f_{K}}\big)\]
are, respectively, a posteriori mean and variance of GRF at the point ${x_{k+1}}$ under known ${Y_{K}}$ (Casella and Berger, 2002). Then the compatibility follows as well:
\[ {p_{{x_{0}},{X_{K}}}}({Y_{K}})={\int _{-\infty }^{\infty }}{p_{{x_{0}},{X_{K+1}}}}({Y_{K}},y)dy.\]
However, according to Kolmogorov’s consistency theorem, GRF exists having (16) as its finite-dimensional distributions (Khoshnevisan, 2002). □Example 1.
Assume that the observation set is displayed on a line: ${x_{k}}=a+{t_{k}}\cdot (b-a)$, $a,b\in {\mathtt{R}^{d}}$, ${t_{0}}<{t_{1}}<\cdots <{t_{K}}$. If $\delta =1/2$, then GRF is distributed on this line as a Wiener process.
Actually, FEDM of the considered set on the line is as follows: $A=|b-a|\cdot {[|{t_{i}}-{t_{j}}|]_{i,j=1}^{K}}$. One can easily see that, according to Corollary 2, the lower-right submatrix of the kernel matrix, being the covariance matrix of the considered set, is the same as that of the well-known Wiener process:
4 Kriging by FEDM
The stochastic model of kriging should incorporate uncertainty about quantities in unobserved points and to quantify the uncertainty associated with the kriging estimator. Kriging gives us a way of anticipating, with some probability, a result associated with values of the parameters that have never been met before, or have been lost, to “store” the existing information (the experimental measurements), and propagate it to any situation where no measurement has been made. As it is unknown which of all function variables will be preponderant, consider them as equivalent, and thus calculate a distance between the measurement points, which now are symmetric with respect to the miscellaneous variables. Suppose that we have to predict the value of response surface y at some point $x\in {\mathtt{R}^{d}}$, if the set $X=({x_{1}},{x_{2}},\dots ,{x_{K}})$ of observed mutually disjoint vectors ${x_{i}}\in {\mathtt{R}^{d}}$, $1\leqslant i\leqslant K$, $K>1$, $d\geqslant 1$, is fixed and the data of measurement $Y={({y_{1}},{y_{2}},\dots ,{y_{K}})^{\mathsf{T}}}$ of the response surface at points of X be known. Let the data be modelled by the probabilistic Gaussian model with constant mean (12) and the covariance matrix of type (13). Then, decompose the covariance matrix ${\hat{\beta }^{2}}\cdot \overline{F}$ of vector $\overline{Y}=(Y,y)$ in to the covariance matrix ${\hat{\beta }^{2}}\cdot F$ of the vector Y, to vector ${\hat{\beta }^{2}}\cdot f$ of covariances between vector Y and y, and to variance of y, denoted by ν:
According to the introduction of kriging (see Jones, 2001) it is defined by the kriging predictor
a posteriori variance of the predictor
and the MLE parameter of variance
(18)
\[ y(x)={Y^{\mathsf{T}}}\cdot {F^{-1}}\cdot \bigg(f+E\cdot \frac{(1-{E^{\mathsf{T}}}\cdot {F^{-1}}\cdot f)}{{E^{\mathsf{T}}}\cdot {F^{-1}}\cdot E}\bigg),\]Let us consider the kriging method based on the probabilistic data model, derived in the previous section.
Theorem 3.
Assume the set $X=({x_{1}},{x_{2}},\dots ,{x_{K}})$ of mutually disjoint points ${x_{i}},{x_{j}}\in {\mathtt{R}^{d}}$, ${x_{i}}\ne {x_{j}}$, $i\ne j$, $1\leqslant i,j\leqslant K$, be given, at which the values $Y={({y_{1}},{y_{2}},\dots ,{y_{K}})^{\mathsf{T}}}$ of some GRF realization are known, namely, ${y_{i}}=Z({x_{i}},\omega )$. Let
\[ A={\big[{\big({({x_{i}}-{x_{j}})^{\mathsf{T}}}\cdot ({x_{i}}-{x_{j}})\big)^{\delta }}\big]^{{K_{1}}}}\]
be FEDM of the vectors ${x_{i}}\in X$, $0\leqslant \delta <1$.
Then the kriging predictor at $x\in {\mathtt{R}^{d}}$ is as follows:
its variance
where MLE of the variance parameter
(21)
\[ y(x)={Y^{\mathsf{T}}}\cdot {A^{-1}}\cdot \bigg(a+E\cdot \frac{(1-{E^{\mathsf{T}}}\cdot {A^{-1}}\cdot a)}{{E^{\mathsf{T}}}\cdot {A^{-1}}\cdot E}\bigg),\]Proof is given in Appendix.
Thus, kriging predictor (21) turns out to a linear extrapolator $y(x)={Y^{\mathsf{T}}}\cdot u(x)$, with the extrapolation weights:
and variance (22) helps us the measure of accuracy of extrapolation.
(24)
\[ u(x)={A^{-1}}\cdot \bigg(a+E\cdot \frac{(1-{E^{\mathsf{T}}}\cdot {A^{-1}}\cdot a)}{{E^{\mathsf{T}}}\cdot {A^{-1}}\cdot E}\bigg),\]The next property of the kriging extrapolator easily follows from (24).
Besides, it is easy to make sure that kriging predictor (13) coincides with the values of response at measured points.
Remark 1.
The kriging model has also been derived by Sakalauskas (2013) studying the GRF with the positive defined covariance function of the shape
and tending the parameter γ to infinity.
Remark 2.
Note that the degree δ is a perfect parameter of GRF as well, which can be estimated using the observation data. The least square estimate $\hat{\delta }$ is estimated by the univariate minimization of variance parameter MLE (23):
The properties of this estimate are explored by computer simulation in Section 5.
5 Comparison of the Developed Kriging Extrapolator with the Shepard Extrapolator
The well-known approach for data interpolation is presented by the Shepard method (Shepard, 1968):
where
the weights are chosen in the following way:
It is easy to see that the Shepard extrapolator also satisfies the condition ${E^{\mathsf{T}}}\cdot u(x)=1$.
A set of analytic test functions was chosen aiming to compare the developed kriging predictor with the Shepard extrapolator (Table 1).
At the beginning of the experiments, the behaviour of response of the system is generally unknown, so different analytic functions should be designed. In total six types of functions are considered (Kwon and Choi, 2015). The first test function is a Branin function (fourth-order polynomials) and it shows a dominant second-order trend. This function has an extremely complex and highly nonlinear behaviour. The second test function is a Linear function composed of polynomials and trigonometric functions, and it shows a strong first-order trend. The Rosenbrook function shows both the first and second-order trends.
Each test function is shown in Table 1. The nonlinear Haupt function is composed of trigonometric functions. The test function results indicate a second-order trend of both independent variables x and y.
Table 1
Types of test functions and test function domains.
Model parameters ${\beta ^{2}}$ and δ have been estimated by the maximum likelihood method (23) and (25). The computer simulation experiment has been performed generating $N=200$ samples of $K=20$ randomly simulated points for each test function in its domain.
In the prediction problem, the value of the Euclidean fraction degree δ could be chosen so as to get the best model results. An elegant alternative is to calculate the optimal δ value according to (25).
Mean and standard deviation of variance parameter ${\beta ^{2}}$ and Euclidean fraction degree δ are given in Table 2, histograms are presented in Figs. 7–18 ($K=20$, $N=200$).
Table 2
Mean and standard deviation of variance parameter ${\beta ^{2}}$ and Euclidean fraction degree δ.
Test function | Mean of ${\beta ^{2}}$ | Std. dev. of ${\beta ^{2}}$ | Mean of δ | Std. dev. of δ |
Branin | 518.398 | 165.630 | 0.800 | 0.012 |
Linear | 0.406 | 0.049 | 0.769 | 0.025 |
Rosenbrook | 5.249 $\cdot {10^{7}}$ | 1.859 $\cdot {10^{7}}$ | 0.697 | 0.026 |
Haupt | 4.254 | 1.113 | 0.240 | 0.074 |
Rastrigin | 87.856 | 26.691 | 0.427 | 0.098 |
Himmenblau | 4.003 $\cdot {10^{5}}$ | 1.357 $\cdot {10^{5}}$ | 0.785 | 0.015 |
Table 3
True error results.
Test function | $TE(y(x))$ | $TE({y_{Shepard}}(x))$ |
Branin | 21.919 | 59.479 |
Linear | 0.123 | 0.877 |
Rosenbrook | 8850 | 13310 |
Haupt | 2.026 | 2.47 |
Rastrigin | 12.432 | 13.425 |
Himmelblau | 1061 | 2149 |
To compare the accuracy of the model the True Error Criterion is introduced (Kwon and Choi, 2015). It is defined as follows:
The results are summarized in Table 3, and the best estimation of the true error are marked in bold for each test function.
True error comparisons results are as follows $K=20$, $N=200$.
Therefore, the kriging predictor, being a posterior expected value of Gaussian random field, presents itself as an efficient extrapolator of scattered data, and, in turn, the variance of kriging predictor is an efficient measure of prediction or extrapolation error.
6 Optimization of the Filler Effectiveness of Surface Wastewater Treatment Using Mathematical Modelling
Inadequate treatment of surface wastewater may considerably impair the quality of water. With increasing urbanization, intensification of car traffic and increasing area of impervious surfaces, the pollution of surface water and a negative impact on the aquatic environment are rising as well. One of surface wastewater treatment technologies, capable of reducing suspended solids, heavy metals and other pollutants, is surface wastewater treatment filters (Meškauskaitė and Marčiulaitienė, 2016; Meškauskaitė, 2017). The effectiveness of filters, filled with construction waste and biocarbon, was analysed using the kriging method with distance matrices. The developed method allows modelling filter characteristics with different filler ratios, based on the previous experimental studies of filters.
Let $x={({x_{1}},{x_{2}},{x_{3}},{x_{4}})^{\mathsf{T}}}$ be a vector of filler ratios. Denote different filter characteristics: ${Y_{i}^{j}}$, $1\leqslant i\leqslant m$, $1\leqslant j\leqslant K$, where $K=4$ is the number of experiments, m – is the number of filter characteristics, that describe capability to treat different wastes, depending on filler ratios of the filter.
Filter fillers:
1 – Quartz sand;
2 – Shredded autoclaved aerated concrete (66.7%) and stone wool (33.3%);
3 – Shredded autoclaved aerated concrete (33.3%) and biocarbon (66.7%);
4 – Shredded autoclaved aerated concrete (33.3%), biocarbon (33.3%) and stone wool (33.3%).
These filters with different fillers are designed for treatment of the main pollutants of surface wastewater: zinc (Zn), copper (Cu). Then the experiment matrix of filler proportions was chosen as follows:
\[ X=\left|\begin{array}{c@{\hskip4.0pt}c@{\hskip4.0pt}c@{\hskip4.0pt}c}1\hspace{1em}& 0\hspace{1em}& 0\hspace{1em}& 0\\ {} 0\hspace{1em}& 0.667\hspace{1em}& 0\hspace{1em}& 0.333\\ {} 0\hspace{1em}& 0\hspace{1em}& 0.333\hspace{1em}& 0.667\\ {} 0\hspace{1em}& 0.333\hspace{1em}& 0.333\hspace{1em}& 0.333\end{array}\right|.\]
After the experiment, the filtration characteristics have been presented by the measurement matrix $Y={({Y^{1}},{Y^{2}})^{\mathsf{T}}}$ (see Table 4).
Table 4
Filtration characteristics measurements.
Filters | Zn | Cu |
${Y^{1}}$ | ${Y^{2}}$ | |
1 | 94.7 | 58.5 |
2 | 57.2 | 15.2 |
3 | 77.1 | 20.5 |
4 | 81.1 | 28.8 |
Filters data extrapolation, based on experiments by the kriging method with distance matrices (20), (21), has been perfomed, where FEDM
\[ A={\big[{\big({({x_{i,1}}-{x_{j,1}})^{2}}+{({x_{i,2}}-{x_{j,2}})^{2}}+{({x_{i,3}}-{x_{j,3}})^{2}}+{({x_{i,4}}-{x_{j,4}})^{2}}\big)^{1/2}}\big]_{i,j=1}^{K}}\]
and the vector of distances from the selected extrapolation point x to the experimental points for $i=1,2,3,4$ is
\[ {\tau _{i}}(x)={\big({({X_{i,1}}-{x_{1}})^{2}}+{({X_{i,2}}-{x_{2}})^{2}}+{({X_{i,3}}-{x_{3}})^{2}}+{({X_{i,4}}-{x_{4}})^{2}}\big)^{1/2}}.\]
Visualization of the cleaning efficiency (in %) for the fillers proportion (BC = 1–QS–SHAAC–SW) using the kriging approach proposed, is given in Figs. 19–22.7 Conclusions
Thus, the approach to multidimensional data modelling has been created using geometrical properties of FEDM, studied in the paper. It has been shown that the factorization of kernel matrix of FEDM enables us to create the embedded set being a nonsingular simplex. Using the properties of FEDM the Gaussian random field (GRF) is constructed doing it without positive definite correlation functions usually applied for such a purpose. Created GRF can be considered as a multidimensional analogue of the Wiener process, for instance, line realizations of this GRF are namely Wiener processes. Next, the multidimensional data kriging method is constructed that distinguishes by properties of homogeneity and isotropy. The developed model allows us to represent the information, obtained from any number of measurements of the objective function by a computational code or physical experiment. The resulting model is rather simple and depends on a small set of parameters (mean, variance, and parameter δ of FEDM), that are efficiently estimated by the ML method. The results of scattered data processing by the approach, considered for analytically computed surfaces and mathematical modelling of a wastewater filter design process, illustrate the applicability of GRF with FEDM as models for scattered multidimensional data kriging.