1. Introduction
Number theory plays a significant role in information theory, providing fundamentals for many applications. Number-theoretic methods aim at deriving a point set uniformly scattered to replace the Monte Carlo (MC) method in the numerical evaluation of multiple integrals. Many methods have been proposed, including Korobov (1959) [
1], Hua and Wang (1981) [
2], Niederreiter (1992) [
3], and Hlawka (1962) [
4], to obtain a point set scattered as uniformly as possible over the unit hypercube
, called number-theoretic net (NT-net). This net has been widely utilized in experimental designs, high-dimensional integration, statistical inference, and computer experiments. Computer experiments involving a large number of runs and/or factors have become increasingly prevalent in various fields, driven by rapid developments in computational techniques. A key goal of these experiments is to approximate the true underlying system with a simpler surrogate metamodel. An adequate design of experiments can significantly reduce the bias between the metamodel and the true model. Space-filling designs are the most popular type of designs for computer experiments. A space-filling design method aims to uniformly scatter the design points across the experimental domain, similar to the generation of NT-net. To measure the space-filling properties of a point set or design, many criteria are used, including the maximin distance (Jonhson et al., 1990 [
5]) and mixture discrepancy (Zhou et al., 2013 [
6]). The Kullback–Leibler (KL) divergence (Kullback and Leibler 1951 [
7]) is suitable for measuring the divergence between the distribution of a point set and the uniform distribution. KL divergence is widely used in many applications of information theory, and it can be used to measure space-filling properties.
There are many estimation methods used for computing the KL divergence or relative entropy between two continuous multivariate distributions based on samples. The KL divergence between a distribution and a uniform distribution can be simplified to the problem of estimation of entropy. Let
denote the KL divergence between two multivariate densities:
and
. For an
s-variate uniform distribution over a unit hypercube, denoted as
, the
s-variate density
for
. Therefore,
where
represents the entropy of the continuous random variable
. Joe (1989) [
8] provided the estimator of functionals of a multivariate density, including the entropy function based on a kernel density estimate, such that an empirical distribution and numerical integration are not necessary. Hence, in our case, Joe’s estimator of the KL divergence between the distribution of a point set over a unit hypercube
and the uniform distribution is given by the following:
where
is the kernel density estimate of
with bandwidth
, and
in which
k is an
s-variate density. Joe (1989) [
8] provided some suggested choices of bandwidths and kernels for
. In this paper, we chose the normal kernel and Scott’s rule for bandwidth (Scott 2015 [
9]), in which
, and sometimes
h is fixed with specifications in the tables of this paper.
The
-distance of an
design
D is defined by
,
, where
as the
-distance of any two rows
and
in
D. Remark that the definition of the
-distance between two rows of a design does not necessarily involve taking the
power as in the Euclidean distance between two points and the design can be defined on
or another finite field. The
-distance of a design, also known as the separation distance, is the minimal distance between pairs of design points. The goal of a maximin distance design is to maximize this separation distance. On the other hand, discrepancy, which measures uniformity, should be minimized to achieve a uniform design. Discrepancy has been widely used in number-theoretic methods to evaluate the uniformity of a set of points. Within the context of reproducing kernel Hilbert spaces, Hickernell (1998) [
10] proposed generalized
-discrepancies including centered
-discrepancy (CD) and wrap-around
-discrepancy (WD). Zhou et al. (2013) [
6] proposed mixture discrepancy (MD), and indicated that MD can overcome the shortcomings of CD and WD. For a point set in the domain of a unit hypercube
that can be expressed as a matrix
, the square value of MD is defined by the following:
To evaluate the discrepancy of a design
in which the levels are labeled by integers, the elements can be linearly transformed to
by
. However, the computational complexities of MD and
-distance result in significant time consumption when
n or
s is too large. Instead, to evaluate the uniformity of the point set
, this paper adopts the Frobenius distance (FD); see Horn and Johnson (2012) [
11]. A natural uniform condition on
is that the mean vector is
with a covariance matrix close to
, where
is the identity matrix in
. The uniformity of
can be measured by the Frobenius distance between the two matrices, the covariance matrix of
denoted by
and
, as follows:
Since the calculation of FD in this paper involves point sets with large
n, it is not necessary to transform the design
D into
for computational convenience.
The good lattice point (GLP) set proposed by Korobov (1959) [
1] is the most popularly used method for generating an NT-net. Given a generating vector
satisfying
for
, and the greatest common divisor
, so that each column is a permutation of
, the design
with
is called the lattice point set of the generating vector
, where the multiplication operation modulo
n is modified such that the result falls within the range
. Without loss of generality,
is always set to be 1. If
D has the minimal discrepancy among all possible generating vectors, it is called a good lattice point set. Hua and Wang (1981) [
2] indicated that GLP sets tend to have a lower discrepancy. Searching for a GLP in large
n and
s requires long calculations. Saltykov (1963) [
12] provided tables of suggested generating vectors for GLP sets for
and
155,093 according to Korobov’s method (Korobov 1959 [
1]). Saltykov’s tables have been widely utilized in numerical analyses. Hua and Wang (1981) [
2] further extended these tables to include more values of
n and
s. Fang and Wang (1994) [
13] attached the extended tables as appendices, and further developed the applications of GLP sets in statistics, including experimental designs and representative points. However, the space-filling properties of GLP sets can be further improved so that the tables of suggested generating vectors of GLP sets can be updated.
To study the space-filling properties of GLP sets, Zhou and Xu (2015) [
14] treated the GLP set as a special class of regular designs. Therefore, the theorem by Zhou and Xu (2015) [
14] stating that a linear-level permutation of a regular design does not decrease the
-distance also applies to GLP sets. The linear-level permuting design is defined by the following:
with a given permutation vector
, and
. Qi et al. (2018) [
15] defined the best design under a certain space-filling criterion among the linear level permuting designs of a GLP set as the generalized good lattice point (GGLP) set. They noted that a specific type of permuting vector
often results in better
-distances. With a given acceptable number of searches, Qi et al. (2018) [
15] considered
vectors of this special type and then conducted a random search for all possible
vectors, fixing the first element
as 0. Searching for the best permuting vector is a combinatorial optimization problem with
possibilities. When
n and
s are large, a simple random search is not efficient; instead, a stochastic optimization algorithm is adequate. The threshold accepting (TA) algorithm proposed by Dueck and Scheuer (1990) [
16], as a variation of the simulated annealing algorithm, has solved many NP-hard optimization problems. It has been successfully applied in constructing uniform designs, as noted by Fang et al. (2000) [
17]. There are many adjustments for TA settings, among which, we consider the version by Fang et al. (2017) [
18].
To update the tables of generating vectors for GLP sets, this paper considers the best special permuting vector from
searches as a benchmark and then searches for the optimal
in the form of
using the TA algorithm. In
Section 2, we provide a brief review of GGLP, including related basic theoretical evidence according to Zhou and Xu (2015) [
14] and Qi et al. (2018) [
15]. The details of conducting the TA algorithm to search for GGLP sets are elaborated in
Section 3. With the given initial GLP sets provided in work by Fang and Wang (1994) [
13], the resulting GGLP sets, searched under the Frobenius distance, are evaluated by mixture discrepancy, maximin distance, and KL divergence in
Section 4. According to the KL divergence, the distributions of GGLP sets and the initial GLP sets are very close, as the KL divergence is nearly the same when
n is large. For small
n, most GGLP sets exhibit lower KL divergence. The mixture discrepancy of GGLP sets can be improved for all
n. Moreover, we also compare the performances in several case studies, including representative points and computer experiments. For a computer experiment, we generate training data according to pairs of GLP and GGLP sets from the Wood function, and predict a randomly generated testing set with the Kriging modeling technique. As for the application of representative points, we utilize the Box–Muller method to transform the points generated with Monte Carlo, GLP, and GGLP methods, respectively, into quasi-Monte Carlo points of bivariate mixture normal distribution, and use them in bootstrap resampling to approximate parameters of the distribution. The performances are evaluated by the mean squared error. In many cases, GGLP sets outperform GLP sets.
2. Review on Generalized Good Lattice Point Sets
Zhou and Xu (2015) [
14] specified that a regular
design
D with
q levels for each factor can be constructed by
, where
M is the
full factorial design over the finite ring
and
G is a generating matrix. A regular
design defined in such a way has
independent columns spanning an
-dimensional linear space over the ring
and
r-dependent columns specified in
G. For any row,
, the
-Lee weight is defined as
, implying that
, where
. Similar to the
-distance of
D, the
-Lee weight of a design
D is defined as
. Zhou and Xu (2015) [
14] have proved the following Lemmas 1 and 2.
Lemma 1. The -distance of a regular design D over is .
Lemma 2. For a regular design D over , a linear level permutation does not decrease the -distance, i.e., where is as defined in (
4).
A GLP set can be considered a type of regular design featuring an independent column
and the generating matrix
. The only difference between GLP sets and regular designs is the replacement of 0 with
n in GLP sets. Zhou and Xu (2015) [
14] indicated that the special regular design, i.e.,
as well as the corresponding GLP set
D derived by replacing the first row
of
with row
, have the same
-distance.
Proposition 1. The regular design defined in (
5)
and the resulting good lattice point set D have the same -distance, and according to Lemma 1, Considering a GLP set as a special case of regular designs, Qi et al. (2018) [
15] provided the following corollary:
Corollary 1. Given an good lattice point set D, the linear level permutation of D does not decrease the -distance, i.e., , where is defined as (
4).
Zhou and Xu (2015) [
14] also provided the upper bound of
-distance for any
p. For a given initial GLP set, the linear level permuting design with the maximal
-distance or with the minimal MD is called the generalized good lattice point set (Qi et al., 2018 [
15]). When exploring the best permuting vector
, it is sufficient to permute the
dependent column while keeping the first independent column unchanged, allowing the first element of
to be fixed as 0. However, Qi et al., 2018 [
15] found that the special permuting vector
increases the
-distance for
in most cases. Hence, when limiting the number of searches to
K times with a given initial GLP set, they first considered the
special
vector and then conducted a random search for
of the form
. We incorporate the threshold-accepting algorithm for searching the best permuting vector by taking the best special
as a benchmark. Since the tables of GLPs in Fang and Wang (1994) [
13] refer to large
n and
s, the time consumption of the MD or
-distance is not admissible. Instead, when conducting the TA algorithm, this paper considers the Frobenius distance defined in (
3). When
n is large, the KL divergence of GGLP sets is almost the same as the initial GLP sets. This indicates that the distributions of GLP and GGLP sets are very close. Hence, the KL divergence defined in (
1) is used only for comparing the results and is not considered a criterion of the TA algorithm.