Next Article in Journal
Multi-Party Quantum Secret Sharing Based on GHZ State
Next Article in Special Issue
Forecasting Tourist Arrivals for Hainan Island in China with Decomposed Broad Learning before the COVID-19 Pandemic
Previous Article in Journal
Optimized Convolutional Neural Network Recognition for Athletes’ Pneumonia Image Based on Attention Mechanism
Previous Article in Special Issue
Bearing Fault Diagnosis Using Refined Composite Generalized Multiscale Dispersion Entropy-Based Skewness and Variance and Multiclass FCM-ANFIS
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Quantile General Index Derived from the Maximum Entropy Principle

Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
Submission received: 30 August 2022 / Revised: 3 October 2022 / Accepted: 6 October 2022 / Published: 8 October 2022
(This article belongs to the Special Issue Entropy in Data Analysis II)

Abstract

:
We propose a linear separation method of multivariate quantitative data in such a way that the average of each variable in the positive group is larger than that of the negative group. Here, the coefficients of the separating hyperplane are restricted to be positive. Our method is derived from the maximum entropy principle. The composite score obtained as a result is called the quantile general index. The method is applied to the problem of determining the top 10 countries in the world based on the 17 scores of the Sustainable Development Goals (SDGs).

1. Introduction

Consider a data matrix, each row of which corresponds to a case, and each column represents a variable. Suppose that every variable has the meaning that a larger value indicates better. For example, ref. [1] investigated the efforts of countries to attain the SDGs (Sustainable Development Goals) and reported the 17 SDG scores for each country. The scores ranged from 0 to 100. In the report, a ranking of 163 countries on the basis of the average of the 17 scores was provided. We call such a procedure of ranking the simple sum method.
However, we sometimes find a paradoxical phenomenon in the simple sum method, in that a particular variable of a higher-score group is less than that of the remaining group. See Table 1 for illustration, where we separate the SDGs data into two groups: the 10 top countries on the basis of the simple sum method and the remaining 153 countries. The average values of each variable for the two groups are compared. On almost all the variables, the 10 top countries have larger averages than the remaining countries, as expected. However, there are reverse relations in the SDGs 12 and 13. The 10 top countries have an average value lower than the remaining countries on the two goals.
In this paper, we propose a linear weighting method that can avoid the reversal relation (in a random-decision sense). The higher-score group separated by the linear weight has average values greater than the remaining group with respect to all the variables. The idea behind the method is the objective general index (OGI; [2]), which is constructed to have a positive correlation with all the variables. The purpose of the OGI is the ranking and not the separation. The OGI is interpreted as a minimization problem of a free energy functional [3,4], which is the sum of the negative entropy and an internal energy functional. This interpretation also works in the current setting; see Section 2.
The problem of determining weights is unsupervised in the sense that no one knows the correct weights and classifications, which has been consistently discussed (e.g., [5,6]). There are many weighting methods for such purposes. Among them, the principal component analysis (PCA) is widely used. The PCA, however, does not always give positive weights; so, some modifications are necessary. It is known that a nonnegative version of the principal component analysis is a nonconvex and NP-hard optimization problem [7]. Another approach is the factor analysis, where a factor model refers to a set of multivariate distributions that have common latent factors (e.g., [8]). Although the factor analysis is quite flexible, it needs additional assumptions such as variance–covariance structures and often does not have a unique solution. In contrast, the quantile general index we propose is reduced to a convex optimization problem and is essentially unique as we will demonstrate. The Hirsch index (or h-index) is widely used for the evaluation of scientific research reports [9], and its further application has been recently investigated by [10]. We numerically compare our method with the h-index in Section 5.
The name of the quantile general index comes from the quantile regression developed by [11]. Indeed, the objective function we use is similar to those of the quantile regression; see the explicit form in Section 3. The essential difference here is that our problem is unsupervised, whereas the regression problems are supervised.
The general indices determine an ordering of the data. The problem of well ordering multivariate data was discussed by [12], where methods of ordering were classified into four categories: marginal ordering, reduced ordering, partial ordering, and conditional ordering. Our method is considered as marginal ordering on the weighted sum.
The paper is organized as follows. In Section 2, we define the quantile general index for continuous distributions and show that it is characterized by the maximum entropy principle. In Section 3, a finite-sample counterpart of the quantile general index is derived. In Section 4, a practical method that avoids the ambiguity of data lying on the separating hyperplane is proposed. We apply the method to the SDG data in Section 5, and we conclude in Section 6.

2. Quantile General Index for Continuous Distributions

The quantile general index for continuous probability distributions is defined first. The assumption of continuity avoids the difficulty caused by the non-smoothness of the objective function. The sample counterpart of the index is constructed in the subsequent section.
Suppose that we have a random vector x = ( x 1 , , x d ) following a probability distribution P on R d , where ⊤ denotes the vector transpose. We assume that P has the probability density function p ( x ) so that P ( A ) = A p ( x ) d x for an event A R d . For given h : R d R , we denote the expectation of a random variable h ( x ) by E [ h ( x ) ] = p ( x ) h ( x ) d x and the conditional expectation of h ( x ) given an event A by
E [ h ( x ) | A ] = A p ( x ) h ( x ) d x A p ( x ) d x .
We deal with a class of general indices
g ( x ) = g ( x ; w , c ) = i = 1 d w i x i c
of x , where w = ( w 1 , , w d ) R + d and c R are called the weight vector and the threshold, respectively. Here R + denotes the set of positive numbers. The quantities w and c may depend on the underlying distribution P but do not depend on x itself.
For a given g of the form (1), the half spaces separated by the hyperplane g ( x ) = 0 are denoted by
H g + = { x g ( x ) > 0 } a n d H g = { x g ( x ) < 0 } .
The quantile general index is defined as follows.
Definition 1.
A general index g ( x ) = i w i x i c is called the quantile general index of x if it satisfies the following two equations:
P ( H g + ) = α
and
E [ w i x i | H g + ] E [ w i x i | H g ] = 1 , i = 1 , , d .
The weight w is calledthe optimal weight.
Let us call H g + and H g the positive and negative group, respectively. Equation (2) means that the fraction of the positive group is α . The threshold c is the upper α -quantile of the weighted sum w x because P ( w x > c ) = α by (2). We call α the acceptance ratio. Equation (3) implies that the average of each variable x i on the positive group is greater than that on the negative group. Therefore, the reversal relation observed in Table 1 does not occur if we adopt the quantile general index.
We now state the existence and uniqueness theorem of the quantile general index. For 0 < α < 1 , we define the “check” loss function α : R R by
α ( u ) = u 1 α + u + α ,
where u + = max ( u , 0 ) and u = max ( u , 0 ) are the positive and negative parts of u, respectively. See Figure 1 for the graph of α . The function α is used in quantile regression [13]. The derivative of α ( u ) for u 0 is
α ( u ) = 1 1 α I { u < 0 } + 1 α I { u > 0 } ,
where I { u > 0 } is 1 if u > 0 and 0 otherwise. The subgradient (e.g., [14]) at u = 0 can be also defined but is not used here.
We define a convex function F : R + d × R R by
F ( w , c ) = i = 1 d log w i + E α j = 1 d w j x j c .
The main theorem is stated as follows.
Theorem 1.
Let x = ( x 1 , , x d ) be a random vector with a probability density function on R d and assume that E [ x i ] exists for each i. Let 0 < α < 1 . Then, the function F in (5) admits a minimizer ( w , c ) R + d × R . The optimal w is unique, whereas c may not be unique. Furthermore, the general index g ( x ) = i w i x i c based on the minimizer ( w , c ) of F satisfies the conditions (2) and (3) of the quantile general index.
Proof. 
The proof of existence and uniqueness is given in Appendix A. We prove that the stationary condition of F is given by (2) and (3). The partial derivatives of F with respect to c and w i are
F c = E [ α ( g ( x ) ) ] = 1 1 α P ( H g ) 1 α P ( H g + )
and
F w i = 1 w i + E [ x i l α ( g ( x ) ) ] = 1 w i 1 1 α E [ x i I { g ( x ) < 0 } ] + 1 α E [ x i I { g ( x ) > 0 } ] .
Note that P ( H g ) = 1 P ( H g + ) , since P ( g ( x ) = 0 ) = 0 from the assumption that x has a continuous distribution. Then, the equations F / c = 0 and F / w i = 0 ( i = 1 , , d ) are equivalent to (2) and (3). □
Example 1.
Let x 1 and x 2 be independent and identically distributed according to a continuous distribution. By the uniqueness of the optimal weight and symmetry, we have w 1 = w 2 ( = w ) . We denote the upper α-quantile of x 1 + x 2 by y α . Then, we have c / w = y α from (2) and
w = 1 E [ x 1 | x 1 + x 2 > y α ] E [ x 1 | x 1 + x 2 < y α ]
from (3). For example, if x i has the standard normal distribution and α = 1 / 2 , then c = 0 , and w = π / 2 .
The quantile general index is derived from the maximum entropy principle in line with [4]. The entropy of a density function p is defined by
S ( p ) = p ( x ) log p ( x ) d x .
Consider a class of transformations T : R d R d of the form
T ( x ) = ( w 1 x 1 c 1 , , w d x d c d ) , ( w i , c i ) R + × R .
The push-forward density of p by T is defined by
( T p ) ( x ) = p ( T 1 ( x ) ) | ( T 1 ) ( x ) | = p x 1 + c 1 w 1 , , x d + c d w d 1 w 1 w d .
This is the distribution of T ( x ) when the random variable x follows the distribution P . It is shown that the entropy of the push-forward density is
S ( T p ) = S ( p ) + i = 1 d log w i .
We also define an internal energy by
U ( p ) = p ( x ) α ( i x i ) d x ,
where α is the check loss function in (4). The following theorem characterizes the quantile general index in terms of entropy. The proof is straightforward.
Theorem 2.
The minimization problem of (5) is equivalent to
Minimize U ( T p ) S ( T p ) subject to T ( x ) = ( w 1 x 1 c 1 , , w d x d c d ) , ( w i , c i ) , , ( w d , c d ) R + × R .
The threshold c in (5) is given by c = i = 1 d c i .

3. Quantile General Index for Finite Samples

The quantile general index defined in the preceding section is valid only for continuous distributions. It is useful to define the index also for finite samples. Let x ( 1 ) , , x ( n ) R d be a sample of size n. We denote the i-th coordinate of x ( t ) by x t i . We deal with a class of general indices g t = i = 1 d w i x t i c , where ( w , c ) R + d × R may depend on the whole sample { x ( t ) } t = 1 n but does not depend on t.
The empirical counterpart of the objective function (5) is
F ( w , c ) = i = 1 d log w i + 1 n t = 1 n α j = 1 d w j x t j c
for ( w , c ) R + d × R .
Definition 2.
A general index g t = i w i x t i c of x ( t ) for t = 1 , , n is called the quantile general index if ( w , c ) minimizes the function (6).
Remark 1.
As described in Section 1, the objective function (6) is similar to that of the quantile regression defined by
1 n t = 1 n α y t j = 1 d w j x t j ,
where y t is a response variable and w 1 , , w d are regression coefficients. See [13] for a comprehensive study of the quantile regression.
The following theorem is proved in a similar way to Theorem 1. See Appendix A.
Theorem 3.
Suppose that there is no hyperplane of R d that contains all x ( t ) . Then, the objective function F in (6) admits a minimizer ( w , c ) . The weight vector w is unique. The threshold c is unique if n α is not an integer.
Each case x ( t ) is classified into positive and negative groups according to g t > 0 and g t < 0 , respectively. If the case g t = 0 does not exist, then the fraction of the positive (resp. negative) group is α (resp. 1 α ), and the conditional expectation of x t i on the positive group is greater than that on the negative group. This is the desired dominance relation.
However, it is not always possible to classify the data into positive and negative groups, because g t may become 0 in some cases. Furthermore, the minimization of F ( w , c ) is not straightforward, since the function is not differentiable. In order to avoid these issues, we modify the method in Section 4.
For illustration, we calculate the quantile general index for the following examples.
Example 2.
Consider the bivariate data
x ( 1 ) = 2 2 , x ( 2 ) = 2 1 , x ( 3 ) = 0 2 , x ( 4 ) = 0 0
of sample size 4. Let the acceptance ratio be α = 1 / 2 . In this data, any set of three points is not on a straight line. Therefore, there exists the quantile general index by Theorem 3. We show that the solution is w 1 = 2 / 3 , w 2 = 4 / 3 , and c = 8 / 3 . We consider three disjoint subsets of R + 2 :
A = { w w 2 < 2 w 1 } , B = { w w 2 > 2 w 1 } , C = { w w 2 = 2 w 1 } .
Let w A . Then, we have
w x ( 1 ) > w x ( 2 ) > w x ( 3 ) > w x ( 4 )
Hence, the optimal c is between w x ( 2 ) and w x ( 3 ) , since c is the upper 1 / 2 -quantile of { w x ( t ) } . For such c, the objective function (6) becomes
F ( w 1 , w 2 , c ) = log w 1 log w 2 + 1 4 w x ( 1 ) 1 / 2 + w x ( 2 ) 1 / 2 w x ( 3 ) 1 / 2 w x ( 4 ) 1 / 2 = ( log w 1 + 2 w 1 ) + ( log w 2 + w 2 / 2 ) .
If F is minimized at some w A , then it must be w 1 = 1 / 2 and w 2 = 2 by the stationary condition, but this point does not belong to A. Hence, the optimal point does not exist in A.
If w B , then we have
w x ( 1 ) > w x ( 3 ) > w x ( 2 ) > w x ( 4 )
and the objective function is
F ( w 1 , w 2 , c ) = ( log w 1 ) + ( log w 2 + 3 w 2 / 2 ) ,
where w x ( 2 ) c w x ( 3 ) . It is shown again that the optimal point does not exist in B.
Therefore, the optimal point should be located in C, the boundary of A and B. The objective function is
F ( w 1 , 2 w 1 , c ) = log 2 2 log w 1 + 3 w 1 ,
where c = w x ( 2 ) = w x ( 3 ) = 4 w 1 . The optimal solution is w 1 = 2 / 3 , w 2 = 4 / 3 , and c = 8 / 3 . The quantile general index is given by
g 1 g 2 g 3 g 4 = 2 2 1 2 1 1 0 2 1 0 0 1 2 / 3 4 / 3 8 / 3 = 4 / 3 0 0 8 / 3 .
The index does not provide a separation of the data because g 2 = g 3 = 0 . In this case, however, a group { x ( 1 ) , x ( 2 ) } dominates { x ( 3 ) , x ( 4 ) } in the sense that the difference of averages
1 2 ( x ( 1 ) + x ( 2 ) ) 1 2 ( x ( 3 ) + x ( 4 ) ) = 2 1 / 2
is a positive vector.
If we set the acceptance ratio to α = 1 / 4 , then it is proved in a similar way that the optimal w is w 1 = 3 / 4 and w 2 = 1 . In this case, c is not unique: 5 / 2 c 7 / 2 . The quantile general index is
g 1 g 2 g 3 g 4 = 2 2 1 2 1 1 0 2 1 0 0 1 3 / 4 1 c = 7 / 2 c 5 / 2 c 2 c c .
Therefore, g 1 > 0 and g 2 , g 3 , g 4 < 0 as long as 5 / 2 < c < 7 / 2 . The separation provides a dominance relation:
x ( 1 ) 1 3 ( x ( 2 ) + x ( 3 ) + x ( 4 ) ) = 4 / 3 1 .
Example 3.
Consider the bivariate data
x ( 1 ) = 4 0 , x ( 2 ) = 2 4 , x ( 3 ) = 1 3 , x ( 4 ) = 0 2
of sample size 4. Let α = 1 / 2 . In a similar manner to the preceding example, the optimal parameters are shown to be w = ( 1 , 1 ) and c = 4 . The quantile general index is
g 1 g 2 g 3 g 4 = 4 0 1 2 4 1 1 3 1 0 2 1 1 1 4 = 0 2 0 2 .
In this case, no separation of the sample into two groups provides a dominance relation. Indeed, all the possible combinations are
1 2 ( x ( 1 ) + x ( 2 ) ) 1 2 ( x ( 3 ) + x ( 4 ) ) = 5 / 2 1 / 2 , 1 2 ( x ( 1 ) + x ( 3 ) ) 1 2 ( x ( 2 ) + x ( 4 ) ) = 3 / 2 3 / 2 , 1 2 ( x ( 1 ) + x ( 4 ) ) 1 2 ( x ( 2 ) + x ( 3 ) ) = 1 / 2 5 / 2 ,
which are not positive.

4. Practical Implementation

The quantile general index defined in the preceding section has the following two drawbacks.
  • The minimization is not straightforward since F is not differentiable.
  • The cases with g t = 0 are not assigned to positive or negative groups.
To overcome these issues, we approximate F as
F ε ( w , c ) = i = 1 d log w i + 1 n t = 1 n α , ε i x t i w i c
where ε is a positive constant, and the function α , ε : R R is defined by
α , ε ( u ) = min z R α ( z ) + 1 2 ε | z u | 2 = u / α ε / ( 2 α 2 ) if u ε / α , u 2 / ( 2 ε ) if ε / ( 1 α ) < u < ε / α , u / ( 1 α ) ε / ( 2 ( 1 α ) 2 ) if u ε / ( 1 α ) .
The function is called the Moreau envelope of α . See Figure 2 for the graph of α , ε . It is shown that l α , ε uniformly converges to α , as ε 0 .
The derivative of α , ε is piecewise linear:
α , ε ( u ) = 1 / α if u ε / α , u / ε if ε / ( 1 α ) < u < ε / α , 1 / ( 1 α ) if u ε / ( 1 α ) .
In particular, α , ε is continuously differentiable unlike α .
Definition 3.
A general index g t = i w i x t i c is called the quantile general indexwithin tolerance ε if ( w , c ) minimizes F ε ( w , c ) .
The gradient of F ε is
F ε c = 1 n t = 1 n J t α 1 J t 1 α , F ε w i = 1 w i + 1 n t = 1 n J t α 1 J t 1 α x t i ,
where
J t = 1 if g t ε / α , α ( 1 α ) ( g t / ε + 1 / ( 1 α ) ) if ε / ( 1 α ) < g t < ε / α , 0 if g t ε / ( 1 α ) .
These formulas prove the second part of the following theorem. See Appendix A for the proof of the first part.
Theorem 4.
Suppose that there is no hyperplane of R d that contains all x ( t ) . Then, the objective function F ε in (8) admits a minimizer ( w , c ) , and the optimal weight vector w is unique. Furthermore, the optimal ( w , c ) and J t [ 0 , 1 ] defined in (9) satisfy
1 n t = 1 n J t = α
and
1 n α t = 1 n w i x t i J t 1 n ( 1 α ) t = 1 n w i x t i ( 1 J t ) = 1 .
The Equations (10) and (11) correspond to (2) and (3) for continuous distributions. The quantity J t is interpreted as the probability of assigning the case x ( t ) to the positive group. We call J t the optimal random decision. If the general index g t is greater than the threshold ε / α , the case t is definitely assigned to the positive group because J t = 1 . Similarly, if the general index is less than ε / ( 1 α ) , it is definitely assigned to the negative group.
For numerical computation, we used a general-purpose optimization solver optim in R [15] with the L-BFGS method.
Example 4
(Continuation of Example 2). Consider four cases
x ( 1 ) = 2 2 , x ( 2 ) = 2 1 , x ( 3 ) = 0 2 , x ( 4 ) = 0 0 .
Let α = 1 / 2 and ε = 0.001 . The optimal w and c are numerically obtained as w = ( 0.667 , 1.332 ) and c = 2.666 . The quantile general index is ( g 1 , g 2 , g 3 , g 4 ) = ( 1.333 , 0.001 , 0.001 , 2.666 ) , and the optimal random decision is ( J 1 , J 2 , J 3 , J 4 ) = ( 1 , 0.749 , 0.250 , 0 ) , so that the optimal separation will be { x ( 1 ) , x ( 2 ) } and { x ( 3 ) , x ( 4 ) } . This separation happens to satisfy the dominance relation as we have seen in Example 2.
Example 5
(Continuation of Example 3). Consider four cases
x ( 1 ) = 4 0 , x ( 2 ) = 2 4 , x ( 3 ) = 1 3 , x ( 4 ) = 0 2 .
Let α = 1 / 2 and ε = 0.001 . The optimal w and c are numerically obtained as w = ( 1 , 1 ) and c = 4 . The quantile general index is ( g 1 , g 2 , g 3 , g 4 ) = ( 0 , 2 , 0 , 2 ) , and the optimal random decision is ( J 1 , J 2 , J 3 , J 4 ) = ( 0.5 , 1 , 0.5 , 0 ) . In this case, we cannot decide which of x ( 1 ) and x ( 3 ) has to be assigned to the positive group. This result is consistent with the discussion in Example 3.

5. Application to the SDGs Index

We finally compute the quantile general indices of the SDGs data provided by [1], as introduced in Section 1. According to [1], countries with a fraction of missing values greater than 20% were removed from the data and then the missing values were imputed by regional averages. We applied the quantile general index with the acceptance ratio α = 10 / 163 and tolerance ε = 0.001 . The result is summarized in Table 2. The optimal weight w is shown in the second column of the table. The threshold was c = 178.2 . The other columns of Table 2 show the average of each variable in the 10 top countries and the remaining countries, respectively. In contrast to Table 1, we do not observe the reversal relation. Table 3 shows the general index g t and the optimal random decision J t of the 10 top countries.
We must be careful with interpretating the result. In particular, the optimal weights had high variation: the ratio of the largest weight (SDG 12) to the smallest weight (SDG 1) was about 0.49 / 0.049 = 10.0 , which means that the SDG 1 had only 10% of the impact of the SDG 12 under the quantile general index. This may discourage people or governments contributing to the SDG 1. Our main message in this paper is that there were reversal relations in the SDGs 12 and 13 under the simple sum method, as observed in Table 1, and such a phenomenon can be avoided by the proposed method. Further discussion should be needed for the use of the quantile general index.
As a reviewer suggested, we also computed the Hirsch index [9] (or h-index) of the countries based on the original SDG scores. In the current setting, the h-index is defined as the fixed point of the graph { ( i , s i ) } i = 1 17 , where s i ’s are the 17 SDG scores in descending order (normalized into the range [ 0 , 17 ] ). The 10 top countries based on the h-index are shown in Table 4. The top three were not changed from the original SDG ranking. We also observed the reversal relations in the SDGs 12 and 13 when we adopted the h-index for separation. See [10] for a study of the scaling behavior of the h-index.

6. Discussion

We proposed a quantile general index that avoids reversal relations in the separated groups. The weight was defined by the solution of the convex optimization problem (6) or (7) for given data. In Section 5, we applied the proposed method to the SDG data and obtained the 10 top countries based on it. The result actually satisfies the desired properties (10) and (11). A side effect is that the obtained weights sometimes had large variation, which may be controversial.
Various applications of our method are expected. For example, one could construct a regional competitive index (e.g., [16]) based on the quantile general index if it is necessary to select a given number of top regions. The method is also applicable to admission decisions based on entrance examinations in schools or companies, where a fixed fraction of candidates are supposed to pass. Further case studies are needed to support the validity of our approach.
The quantile general index (without approximation) introduced in Section 3 was reduced to a minimization problem of a nondifferentiable objective function. It is theoretically of interest to develop an exact algorithm and also to estimate the accuracy of the practical method developed in Section 4. Another problem is to find an algorithm that decides the separability of the data into two groups without the reversal relations. In Example 3, we enumerated all possible combinations to prove that the data was not separable. However, this algorithm requires a large amount computational time when the sample size is large. Faster algorithms would be welcomed. Finally, the relation between the quantile general index and the h-index is also completely unknown.

Funding

This research was funded by JSPS KAKENHI Grant Numbers JP26108003, JP17K00044, JP19K11865 and JP21K11781, and JST CREST Grant Number JPMJCR1763.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The SDGs data used in Section 1 and Section 5 is provided in [1].

Acknowledgments

The author thanks Kentaro Minami for the insightful comments on numerical optimization, such as the concept of Moreau’s envelope. He also thanks the associate editor and the two reviewers for their constructive comments.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Proofs

We give a key lemma for the proof of Theorems 1, 3, and 4. In general, we define
F ( w , c ) = i = 1 d ( log w i ) + E i = 1 d w i x i c
for ( w , c ) R + d × R . Here, : R R 0 is a convex function with properties ( 0 ) = 0 and ( u ) > 0 for u 0 . The check loss function α in Section 2 satisfies these conditions. Under the conditions, we have coercivity
lim u ± ( u ) =
and subadditivity
( u + v ) ( u ) + ( v )
for any real u and v.
We consider the following condition on the nondegeneracy of the distribution P of x . We denote the set of nonnegative numbers by R 0 .
(C1)
P ( i w i x i = c ) < 1 for any ( w , c ) R 0 d × R with w 0 ,
This condition holds if P is absolutely continuous with respect to the Lebesgue measure on R d , as assumed in Section 2.
Theorem 1 is immediate from the following lemma.
Lemma A1.
Suppose that E [ ( w i x i ) ] < for all i = 1 , , d and w i > 0 . If the condition (C1) is satisfied, then the function F in (A1) admits a minimizer, and the optimal w is unique. Conversely, if (C1) does not hold, then F is not bounded from below.
Proof. 
We first show that F is finite everywhere. Indeed, by the subadditivity of , we have
E i w i x i c i E [ ( w i x i ) ] + ( c ) < .
To prove the uniqueness, let ( w 1 , c 1 ) and ( w 2 , c 2 ) be two minimizers of F . From the strict convexity of z ( log z ) and the convexity of , we have
F ( ( 1 λ ) ( w 1 , c 1 ) + λ ( w 2 , c 2 ) ) < ( 1 λ ) F ( w 1 , c 1 ) + λ F ( w 2 , c 2 )
if w 1 w 2 . Thus, we have w 1 = w 2 , and the uniqueness follows.
To prove the existence, we show that the sublevel set
{ ( w , c ) R + d × R F ( w , c ) a }
is compact for each a R . We define a function R : R 0 d × R R 0 by
R ( w , c ) = E [ ( w x c ) ] .
Then, R is continuous and strictly positive unless ( w , c ) = ( 0 , 0 ) . Indeed, the continuity of R is a consequence of Lebesgue’s dominated convergence theorem, and the strict positivity follows from the condition (C1). Let
γ : = inf i w i + | c | = 1 R ( w , c ) > 0 .
Since R is convex, and R ( 0 , 0 ) = 0 , we have
R ( w , c ) γ i w i + | c |
whenever i w i + | c | 1 . For any ( w , c ) R + d × R , we have
F ( w , c ) = i ( log w i ) + R ( w , c ) i ( log w i ) + γ i w i + | c | 1 = i ( log w i ) + γ w i + γ | c | γ .
Since the functions w i ( log w i ) + γ w i and c | c | have compact sublevel sets, the sublevel set of F is also compact. □
In order to prove Theorems 3 and 4, it is enough to replace the distribution P by the empirical distribution P n = n 1 t = 1 n δ x ( t ) , where δ a is the Dirac measure at a point a R d . In Theorem 3, the uniqueness of c when n α is not an integer follows from the observation that the optimal c for a fixed w must be w x ( t ) for some t.

References

  1. Sachs, J.; Kroll, C.; Lafortune, G.; Fuller, G.; Woelm, F. Sustainable Development Report 2022; Cambridge University Press: Cambridge, UK, 2022; Available online: https://dashboards.sdgindex.org (accessed on 4 June 2022).
  2. Sei, T. An objective general index for multivariate ordered data. J. Multivar. Anal. 2016, 147, 247–264. [Google Scholar] [CrossRef]
  3. Sei, T. Coordinate-wise transformation and Stein-type densities. In Geometric Science of Information. GSI; Nielsen, F., Barbaresco, F., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2017; Volume 10589. [Google Scholar]
  4. Sei, T. Coordinate-wise transformation of probability distributions to achieve a Stein-type identity. Inf. Geom. 2022, 5, 325–354. [Google Scholar] [CrossRef]
  5. Baker, R.J. Selection indexes without economic weights for animal breeding. Can. J. Anim. Sci. 1974, 54, 1–8. [Google Scholar] [CrossRef]
  6. Elston, R.C. A weight-free index for the purpose of ranking or selection with respect to several traits at a time. Biometrics 1963, 19, 85–97. [Google Scholar] [CrossRef]
  7. Montanari, A.; Richard, E. Non-negative principal component analysis: Message passing algorithms and sharp asymptotics. IEEE Trans. Inf. Theory 2015, 62, 1458–1484. [Google Scholar] [CrossRef] [Green Version]
  8. Bartholomew, D.; Knott, M.; Moustaki, I. Latent Variable Models and Factor Analysis, A Unified Approach, 3rd ed.; Wiley: Hoboken, NJ, USA, 2011. [Google Scholar]
  9. Hirsch, J.E. An index to quantify an individual’s scientific research output. Proc. Natl. Acad. Sci. USA 2005, 102, 16569–16572. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Ghosh, A.; Chakrabarti, B.K.; Ram, D.R.S.; Mitra, M.; Maiti, R.; Biswas, S.; Banerjee, S. Scaling behavior of the Hirsch index for failure avalanches, percolation clusters and paper citations. arXiv 2021, arXiv:2109.14500. [Google Scholar]
  11. Koenker, R.; Bassett, G. Regression quantiles. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
  12. Barnett, V. The ordering of multivariate data. J. R. Stat. Soc. Ser. A 1976, 139, 318–355. [Google Scholar] [CrossRef]
  13. Koenker, R. Quantile Regression; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
  14. Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1972. [Google Scholar]
  15. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022; Available online: http://www.R-project.org (accessed on 5 October 2022).
  16. Charles, V.; Sei, T. A two-stage OGI approach to compute the regional competitiveness index. Compet. Rev. 2019, 29, 78–95. [Google Scholar] [CrossRef]
Figure 1. The check-loss function for α = 0.3 .
Figure 1. The check-loss function for α = 0.3 .
Entropy 24 01431 g001
Figure 2. The Moreau envelope of the check-loss function for α = 0.3 and ε = 0.2 . The two vertical lines are u = ε / α and u = ε / ( 1 α ) , respectively.
Figure 2. The Moreau envelope of the check-loss function for α = 0.3 and ε = 0.2 . The two vertical lines are u = ε / α and u = ε / ( 1 α ) , respectively.
Entropy 24 01431 g002
Table 1. The average values of the SDG scores for the 10 top countries (Finland, Denmark, Sweden, Norway, Austria, Germany, France, Switzerland, Ireland and Estonia) and the remaining 153 countries. The values with the reversal relations are marked by asterisks.
Table 1. The average values of the SDG scores for the 10 top countries (Finland, Denmark, Sweden, Norway, Austria, Germany, France, Switzerland, Ireland and Estonia) and the remaining 153 countries. The values with the reversal relations are marked by asterisks.
SDGsAverage of the 10 Top CountriesAverage of the Remaining Countries
1 (no poverty)99.673.8
2 (zero hunger)68.558.6
3 (good health and well-being)94.168.0
4 (quality education)98.274.9
5 (gender equality)84.860.1
6 (clean water and sanitation)89.466.2
7 (affordable and clean energy)83.864.8
8 (descent work and economic growth)85.066.3
9 (industry, innovation and infrastructure)91.843.2
10 (reduced inequalities)92.359.7
11 (sustainable cities and communities)92.868.8
12 (responsible consumption and production)60.3 *85.6 *
13 (climate action)54.7 *81.9 *
14 (life below water)71.464.3
15 (life on land)80.464.8
16 (peace, justice and strong institutions)87.865.2
17 (partnerships for the goals)71.858.5
Source: The Sustainable Development Report 2022 [1].
Table 2. For the SDGs data, the optimal weight w i , the average x i + of each score in the 10 top countries determined from the quantile general index (Cuba, Romania, Finland, Kyrgyz Republic, Ukraine, Chile, Poland, Georgia, Vietnam, Hungary), the average x i on the remaining 153 countries, and the scaled differences w i ( x i + x i ) are shown.
Table 2. For the SDGs data, the optimal weight w i , the average x i + of each score in the 10 top countries determined from the quantile general index (Cuba, Romania, Finland, Kyrgyz Republic, Ukraine, Chile, Poland, Georgia, Vietnam, Hungary), the average x i on the remaining 153 countries, and the scaled differences w i ( x i + x i ) are shown.
SDGsWeightsAverage of the 10 Top CountriesAverage of the Remaining CountriesScaled Difference
i w i x i + x i w i ( x i + x i )
10.04994.974.11.02
20.13665.858.80.95
30.07981.368.80.99
40.06191.475.30.98
50.12269.261.10.99
60.07181.366.81.03
70.09176.065.30.97
80.09877.466.81.04
90.08258.445.41.07
100.07075.560.81.03
110.08882.069.51.10
120.49085.983.90.98
130.41282.480.10.95
140.12671.564.30.91
150.13872.665.31.01
160.10675.466.01.00
170.07971.258.51.00
Table 3. The 10 top countries based on the quantile general index. The last column shows the original rank based on the SDG scores.
Table 3. The 10 top countries based on the quantile general index. The last column shows the original rank based on the SDG scores.
RankCountry g t J t Original Rank
1Cuba5.091.0040
2Romania3.821.0030
3Finland3.331.001
4Kyrgyz Republic3.061.0048
5Ukraine1.921.0037
6Chile1.111.0028
7Poland1.031.0012
8Georgia0.241.0051
9Vietnam0.010.6855
10Hungary0.010.6421
Table 4. The 10 top countries based on the h-index. The last column shows the original rank based on the SDG scores.
Table 4. The 10 top countries based on the h-index. The last column shows the original rank based on the SDG scores.
RankCountryh-IndexOriginal Rank
1Finland13.471
2Denmark13.352
3Sweden13.193
4Germany13.016
5Romania12.9130
6Norway12.844
7Estonia12.7710
8Croatia12.7723
9Ireland12.769
10Portugal12.6420
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Sei, T. A Quantile General Index Derived from the Maximum Entropy Principle. Entropy 2022, 24, 1431. https://doi.org/10.3390/e24101431

AMA Style

Sei T. A Quantile General Index Derived from the Maximum Entropy Principle. Entropy. 2022; 24(10):1431. https://doi.org/10.3390/e24101431

Chicago/Turabian Style

Sei, Tomonari. 2022. "A Quantile General Index Derived from the Maximum Entropy Principle" Entropy 24, no. 10: 1431. https://doi.org/10.3390/e24101431

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop