Modelling Nonlinear Relation by Using Running Interval Smoother, Constrained B-Spline Smoothing and Different Quantile Estimators

Firat Ozdemir

Modelling Nonlinear Relation by Using Running Interval Smoother, Constrained B-Spline Smoothing and Different Quantile Estimators

Mugla Journal of Science and Technology

This paper compares the small-sample properties of two non-parametric regression methods, running interval smoother and constrained b-spline smoothing. The running interval smoother method deals with estimation of a conditional quantile (or a measure of location) using different estimators and here our focus is on Harrell-Davis and newly proposed NO quantile estimators. The constrained b-spline smoothing method uses the quantile regression estimator while obtaining conditional quantile estimates. Constrained b-spline smoothing and running interval smoother methods are compared with a simulation study by using theoretical distributions. Furthermore, the methods are examined graphically to understand how they can model the relationship between variables. Constrained b-spline smoothing and running interval smoother with NO estimator outperformed running interval smoother with Harrell-Davis estimator in terms of mean squared error.

Mugla Journal of Science and Technology MODELLING NONLINEAR RELATION BY USING RUNNING INTERVAL SMOOTHER, CONSTRAINED B-SPLINE SMOOTHING AND DIFFERENT QUANTILE ESTIMATORS *Burak DİLBER, Department of Statistics, Dokuz Eylül University, Turkey, [email protected] ( https://orcid.org/0000-0001-5055-8879) A. Fırat ÖZDEMİR, Department of Statistics, Dokuz Eylül University, Turkey, [email protected] ( https://orcid.org/0000-0003-4976-7168) Received: 21.07.2020, Accepted: 18.12.2020 *Corresponding author Research Article DOI: 10.22531/muglajsci.772523 Abstract This paper compares the small-sample properties of two non-parametric regression methods, running interval smoother and constrained b-spline smoothing. The running interval smoother method deals with estimation of a conditional quantile (or a measure of location) using different estimators and here our focus is on Harrell-Davis and newly proposed NO quantile estimators. The constrained b-spline smoothing method uses the quantile regression estimator while obtaining conditional quantile estimates. Constrained b-spline smoothing and running interval smoother methods are compared with a simulation study by using theoretical distributions. Furthermore, the methods are examined graphically to understand how they can model the relationship between variables. Constrained b-spline smoothing and running interval smoother with NO estimator outperformed running interval smoother with Harrell-Davis estimator in terms of mean squared error. Keywords: Non-parametric regression, Quantile estimators, RIS, COBS HAREKETLİ ARALIK DÜZLEŞTİRİCİSİ, KISITLI B-SPLINE DÜZLEŞTİRME VE FARKLI KANTİL KESTİRİCİLERİ KULLANILARAK DOĞRUSAL OLMAYAN İLİŞKİNİN MODELLENMESİ Özet Bu makalede iki parametrik olmayan regresyon yöntemi, hareketli aralık düzleştiricisi ve kısıtlı b-spline düzleştirme yöntemlerinin küçük örneklem özelliklerinin karşılaştırması yapılmaktadır. Hareketli aralık düzleştiricisi yöntemi farklı kestiriciler kullanarak koşullu kantil (veya konum ölçüsü) değerinin tahmini ile uğraşır ve burada Harrell-Davis ile yeni önerilen NO kantil kestiricisine odaklanılmıştır. Kısıtlı b-spline düzleştirme yöntemi, koşullu kantil tahminleri elde ederken kantil regresyon tahmincisini kullanır. Kısıtlı b-spline düzleştirme ve hareketli aralık düzleştiricisi yöntemleri teorik dağılımlar kullanılarak elde edilen bir simülasyon çalışması ile karşılaştırılmıştır. Ayrıca, bu yöntemler, değişkenler arasındaki ilişkinin nasıl modellendiğini anlamak için grafiksel olarak incelenmiştir. Kısıtlı b-spline düzleştirme ve NO kestiricisi ile kullanılan hareketli aralık düzleştiricisi yöntemleri hata kareler ortalaması açısından Harrell-Davis kestiricisi ile kullanılan hareketli aralık düzleştiricisi yönteminden daha iyi performans göstermektedir. Anahtar Kelimeler: Parametrik olmayan regresyon, Kantil kestiricileri, RIS, COBS Cite Dilber, B. and Özdemir, A. F., (2020). “Modelling Nonlinear Relation by Using Running Interval Smoother, Constrained BSpline Smoothing and Different Quantile Estimators”, Mugla Journal of Science and Technology, 6(2), 121-127. nonparametric regression analysis, the shape of the function is not pre-defined and there are no significant assumptions as in parametric regression analysis. In this way, flexibility is provided to reveal the relationship between variables. Despite to all these advantages, there are some difficulties in the application of nonparametric regression. First, a certain number is not given, but a large data set is needed and it requires intensive computer use. The estimators used in the nonparametric regression are called smoothers. The basic idea of smoothing is to use 1. Introduction Whereas the assumption of linearity in regression analysis seems providing a good approximation of the population regression model in numerous situations, this is not always true in many cases. One way of arranging any possible curvature is to use some parametric model with quadratic or cubic terms but this might be unsatisfactory in terms of model adequacy criterions. Non-parametric regression is used to explain the nonlinear relationship among variables. In the 121 Burak Dilber and A. Fırat Özdemir Modelling Nonlinear Relation by Using Running Interval Smoother, Constrained B-Spline Smoothing and Different Quantile Estimators based on the 𝑌𝑖 values such that 𝑖 ∈ 𝑁(𝑥). That is, use all of the 𝑌𝑖 values for which 𝑋𝑖 is close to x. To get a graph of the regression line, calculate 𝜃̂𝑖 , the estimate of 𝑌 given that 𝑥 = 𝑋𝑖 , i= 1, . . . , n, and then plot the points (𝑋1 , 𝜃̂1 ), . . . , (𝑋𝑛 , 𝜃̂𝑛 ) [5]. locally weighted mean. The estimated value of the dependent variable at a certain point of interest of x is determined by taking a weighted mean of points in the neighborhood of x. Two of these methods are running interval smoother (RIS) and constrained b-spline smoothing (COBS). The purpose of using these methods is to monitor the latent associations that might appear when analyzed in a way that is based on different quantile values. COBS and RIS methods enable this in a flexible and efficient manner. The RIS method deals with some robust measure of location connected with the random variable y, given x. Many nonparametric regression estimators of this measure of location have been proposed. This method can also obtain the predicted value of the dependent variable using quantile estimators. This conditional quantile is estimated using different estimators and here the focus is on Harrell-Davis (HD) and NO quantile estimators [1,2]. The other non-parametric regression method is constrained b-spline smoothing (COBS). COBS is a very attractive method with some unique advantages. It facilitates robust function estimation via conditional median estimation of the dependent variable. It also provides computation of other conditional quantile functions which have gradually become an integral part of data analysis. The COBS method uses the quantile regression estimator proposed by Koenker and Bassett (1978) while obtaining conditional quantile estimates [3,4]. The span 𝑓 controls the roughness of the line. As the 𝑓 value increases a smooth will be a straight, horizontal line. However, if 𝑓 is too close to zero, the result is a very ragged line. Often the choice 𝑓 = 1 and 𝑓 = 0.8 and give good results, but both larger and smaller values might be of interest particularly when 𝑛 is small. A good method for an optimum f is to try out some values in an interactive graphics environment, the general strategy is to find the smallest f so that the plot of the points is reasonably smooth. 2.1.1. HD Quantile Estimator A concern when estimating the 𝑞 𝑡ℎ quantile with a single order statistics 𝑥̂𝑞 = 𝑋(𝑚), 𝑚 = [𝑞𝑛 + 0.5], is that its standard error can be relatively high. The problem is of particular concern when sampling from a light-tailed or normal distribution. A natural strategy for addressing this problem is to use all of the order statistics to estimate 𝑥𝑞 , as opposed to a single order statistic, and several methods have been proposed. One such estimator was derived by Harrell and Davis (1982) [1]. To compute it, let Y be a random variable having a beta distribution with parameters 𝑎 = (𝑛 + 1)𝑞 and 𝑏 = (𝑛 + 1)(1 − 𝑞). That is, the probability density function of Y is 2. Description of the Methods Γ(𝑎 + 𝑏) 𝑎−1 𝑦 (1 − 𝑦)𝑏−1 Γ( 𝑎 ) Γ ( 𝑏 ) 2.1. The Running Interval Smoother (RIS) The values of independent variable (𝑋) that are close to the point of interest of 𝑥 is determined and denoted by 𝑋𝑖 . Then, a conditional quantile of the 𝑌𝑖 values corresponding to these 𝑋𝑖 is computed. For the running interval smoother, this conditional quantile might be estimated using different estimators. MAD, median absolute deviation, is a robust measure of scale. It is the median of |𝑋1 − 𝑀|, … , |𝑋𝑛 − 𝑀|, where M is the usual sample median based on random sample 𝑋1 , … , 𝑋𝑛 . When the parent distribution is normal, MAD estimates Z0.75σ where Z0.75=0.6745 is the 0.75 quantile of the standard normal distribution. So it is rescaled and normalized version of MAD (MADN) estimates σ when sampling from a normal distribution. 𝑀𝐴𝐷𝑁 = 𝑀𝐴𝐷𝑋 / 0.6745. The points 𝑋𝑖 are close to 𝑥 if |𝑥 − 𝑋𝑖 | ≤ 𝑓 ∗ 𝑀𝐴𝐷𝑁 (Γ is the gamma function). Let 𝑊𝑖 = 𝑃 ( 𝑖−1 𝑖 ≤ 𝑌 ≤ ). 𝑛 𝑛 (4) Then the Harrell–Davis estimate of the 𝑞 𝑡ℎ quantile is 𝑛 𝜃̂𝑞 = ∑ 𝑊𝑖 𝑋(𝑖) . (5) 𝑖=1 2.1.2. NO Quantile Estimator With the aim of improving the performance in the lower and upper quantiles especially with small sample sizes, a new quantile estimator which is again a weighted average of all order statistics (𝑋(1) ≤ 𝑋(2) ≤ ⋯ ≤ 𝑋(𝑛)) is introduced [2]. (1) where 𝑀𝐴𝐷𝑁 is computed using 𝑋1 , … , 𝑋𝑛 and 𝑓 is a number between 0 and 1 and plays the role of a span. Let 𝑁(𝑥 ) = {𝑖 = |𝑥 − 𝑋𝑖 | ≤ 𝑓 ∗ 𝑀𝐴𝐷𝑁 }. (3) (2) That is, 𝑁(𝑥) indexes the set of all 𝑋𝑖 values that are close to 𝑥. Let 𝜃̂𝑖 be an estimate of some parameter of interest, 122 Burak Dilber and A. Fırat Özdemir Modelling Nonlinear Relation by Using Running Interval Smoother, Constrained B-Spline Smoothing and Different Quantile Estimators distributions especially in robust statistical researches. The reason for using the g-and-h distribution is that it provides a simple method for generating observations from a wide variety of distributions, which include extreme departures from normality as measured by skewness and kurtosis. Let Z be a random variable which is generated from standard normal distribution. When 𝑔 ≠ 0, the transformation 𝑁𝑂𝑞 = [𝐵(0; 𝑛, 𝑞 )2𝑞 + 𝐵(1; 𝑛, 𝑞)𝑞 ]𝑋(1) + 𝐵 (0; 𝑛, 𝑞 )(2 − 3𝑞 )𝑋(2) − 𝐵 (0; 𝑛, 𝑞 )(1 − 𝑞 )𝑋(3) 𝑛−2 + ∑ [𝐵(𝑖; 𝑛, 𝑞 )(1 − 𝑞 ) (6) 𝑖=1 + 𝐵 (𝑖 + 1; 𝑛, 𝑞 )𝑞]𝑋(𝑖+1) − 𝐵 (𝑛; 𝑛, 𝑞 )𝑞𝑋(𝑛−2) + 𝐵 (𝑛; 𝑛, 𝑞 )(3𝑞 − 1)𝑋(𝑛−1) + [𝐵(𝑛 − 1; 𝑛, 𝑞 )(1 − 𝑞) + 𝐵(𝑛; 𝑛, 𝑞)(2 − 2𝑞)]𝑋𝑛 where 𝐵(𝑖; 𝑛, 𝑞 ), 𝑖 = 0,1,2, … , 𝑛 are the binomial probabilities with probability of success q and n is the sample size. This is a new quantile estimator and the performances of RIS with NO is totaly unknown. exp(𝑔𝑍) − 1 exp(ℎ𝑍 2 /2) 𝑔 and when 𝑔 = 0, the transformation 𝑋= 𝑋 = 𝑍𝑒𝑥𝑝(ℎ𝑍 2 /2) (10) is used to generate data from g-and-h distribution. The four error term distributions used here were the standard normal (g = h = 0.0), a symmetric heavy-tailed distribution (h = 0.2, g = 0.0), an asymmetric distribution with relatively light tails (h = 0.0, g = 0.2), and an asymmetric distribution with heavy tails (g = h = 0.2). Table 1 shows the estimated skewness (𝜅1 ), kurtosis (𝜅2 ) values and p-values of each distribution. D'Agostino test (P-DT) [9] for skewness and Bonett-Seier test of Geary's (P-BST) [10] for kurtosis are used to obtain p values of each distribution. 2.2. Constrained B-Spline Smoothing (COBS) Regression splines are special functions defined by piecewise polynomials. The region defining the pieces is separated by a sequence of knots or breakpoints [6]. The aim is to force the piecewise polynomials to merge smoothly with the knots. Splines can be named according to degrees. The simplest spline has degree 0 and it is also called a step function. The next simplest spline has degree 1 and it is also called a linear spline. The next spline is the quadratic spline of degree 2. Constrained B-spline smoothing (COBS) method provides a way of dealing with quantiles [4,7]. This method is based on the estimated function of the constraints. In particular, COBS can include restrictions such as monotonicity, convexity, concavity and periodicity constraints. The COBS method uses the quantile regression estimator proposed by Koenker and Bassett (1978) while obtaining conditional quantile estimates [4]. Let 𝜌𝜏 (𝑢) = 𝑢(𝜏 − 𝐼(𝑢 < 0)), where the indicator function 𝐼(𝑢 < 0) = 1 if 𝑢 < 0 otherwise 𝐼 (𝑢 < 0) = 0. The goal is to estimate the 𝜏 𝑡ℎ quantile of y given x by finding a function g(x) that minimizes 𝚤𝚤 ∑ 𝜌𝜏 (𝑦𝑖 − 𝑔(𝑥𝑖 )) + 𝜆 ∫ |𝑔 (𝑥)| 𝑑𝑥 Table 1. Estimated skewness, kurtosis and p-values of tests of the g-and-h distributions g h 𝜿𝟏 𝜿𝟐 P-DT P-BST 0 0 0 3 0.1907 0.6338 0 0.2 0 21.46 0.0000 0.0000 0.2 0 0.61 3.68 0.0000 0.0000 0.2 0.2 2.81 155.98 0.0000 0.0000 Three different variance patterns (VP) were used. For VP1, 𝜆 (𝑋) = 1 for VP2, 𝜆 (𝑋) = |𝑋| + 1 and for VP3, 𝜆 (𝑋) = 1/(|𝑋| + 1). The criterion used to compare RIS and COBS was mean squared error that was estimated with 𝐾 𝑛 1 𝑀𝑆𝐸 = ∑ ∑(𝜃𝑞𝑖𝑘 − 𝜃̂𝑞𝑖𝑘 )2 𝑛𝐾 𝑘=1 𝑖=1 (7) (11) where for the kth replication, 𝜃𝑞𝑖𝑘 is the true conditional qth quantile of 𝑌 given 𝑋 = 𝑋𝑖 and again 𝜃̂𝑞𝑖𝑘 is the estimate of 𝜃𝑞𝑖𝑘 based on either RIS or COBS. The methods were compared with three quantile values: q=0.1, q=0.5 and q=0.9, with sample sizes of n=25 and n=50. R programming language of version 3.5.2 was used for this simulation study. Here, COBS is applied via the qsmcobs function and RIS is applied via the rungen function. These functions are available in the WRS2 package. For the COBS, 𝑙𝑎𝑚𝑏𝑑𝑎 = 0 and for the RIS, 𝑠𝑝𝑎𝑛 = 0.8. Typically, taking the span to be 0.8 suffices in terms of providing a relatively accurate estimate of conditional quantile, based on mean squared error. Here, span = 0.8 is assumed unless stated otherwise. On the other hand, RIS method is used with HD and NO estimators. based on the random sample (𝑥1 , 𝑦1 ), … , (𝑥𝑛 , 𝑦𝑛 ) where 𝜆 is a scalar that controls smoothness and ∫ |𝑔𝚤𝚤 (𝑥)| 𝑑𝑥 denotes the roughness penalty as 𝐿1 norm of the second derivative of function. 𝜆 can be taken from zero to infinity. 3. Design of Simulation Study RIS and COBS were compared via simulations based on K = 10000 replications. Sample sizes were n=25 and n=50. The model used for generating data was 𝑌 = 𝑋 + 𝜆 (𝑋 )𝜖 (9) (8) where X is distributed N(0,1). The distribution for the error term was taken to be one of four g-and-h distributions [8]. g-and-h distributions are well-known 123 Burak Dilber and A. Fırat Özdemir Modelling Nonlinear Relation by Using Running Interval Smoother, Constrained B-Spline Smoothing and Different Quantile Estimators 3.1. Simulation Results Table 4. MSE values when 𝜖 was distributed as g=0 and h=0.2 RIS COBS HD NO VP1 4.891 6.050 4.515 q=0.1 VP2 15.091 17.833 13.309 VP3 2.130 2.958 2.218 VP1 1.948 2.036 2.029 n=25 q=0.5 VP2 6.373 6.805 6.755 VP3 0.828 0.879 0.878 VP1 4.849 6.023 4.509 q=0.9 VP2 14.978 17.716 13.233 VP3 2.128 2.980 2.232 VP1 4.973 5.850 5.010 q=0.1 VP2 17.614 17.894 14.914 VP3 1.974 2.801 2.440 VP1 2.070 2.141 2.139 n=50 q=0.5 VP2 6.995 7.235 7.212 VP3 0.868 0.918 0.917 VP1 4.924 5.849 5.017 q=0.9 VP2 17.523 17.730 14.806 VP3 1.965 2.784 2.429 Table 2. MSE values when 𝜖 was distributed as standard normal (Z) RIS COBS HD NO VP1 2.249 3.113 2.601 q=0.1 VP2 6.948 8.671 7.170 VP3 0.984 1.684 1.400 VP1 0.895 0.973 0.970 n=25 q=0.5 VP2 2.862 3.100 3.083 VP3 0.384 0.438 0.438 VP1 2.253 3.105 2.598 q=0.9 VP2 6.923 8.697 7.193 VP3 0.979 1.669 1.390 VP1 2.528 3.248 2.923 q=0.1 VP2 8.570 9.232 8.165 VP3 1.052 1.723 1.557 VP1 0.952 1.019 1.018 n=50 q=0.5 VP2 3.238 3.384 3.375 VP3 0.401 0.452 0.452 VP1 2.536 3.240 2.919 q=0.9 VP2 8.557 9.237 8.179 VP3 1.052 1.726 1.561 Table 4 reports simulation results in which X has standard normal distribution and 𝜖 has symmetric heavy tailed distribution. When q=0.5, COBS, RIS with NO and RIS with HD give close results. For extreme quantiles where VP2, RIS with NO estimator competes well with COBS as the sample size increases. Table 3. MSE values when 𝜖 was distributed as g=0.2 and h=0 RIS COBS HD NO VP1 2.170 2.993 2.603 q=0.1 VP2 6.667 8.269 7.199 VP3 0.941 1.623 1.398 VP1 0.961 1.034 1.031 n=25 q=0.5 VP2 3.081 3.310 3.290 VP3 0.413 0.465 0.464 VP1 2.651 3.579 2.844 q=0.9 VP2 7.995 9.981 7.761 VP3 1.157 1.873 1.490 VP1 2.433 3.137 2.884 q=0.1 VP2 8.120 8.868 8.094 VP3 1.008 1.682 1.546 VP1 1.020 1.084 1.083 n=50 q=0.5 VP2 3.474 3.620 3.610 VP3 0.428 0.477 0.477 VP1 2.904 3.628 3.183 q=0.9 VP2 10.000 10.516 8.981 VP3 1.198 1.883 1.667 Table 5. MSE values when 𝜖 was distributed as g=0.2 and h=0.2 RIS COBS HD NO VP1 4.512 5.632 4.620 q=0.1 VP2 14.274 16.867 14.037 VP3 1.980 2.811 2.288 VP1 2.176 2.267 2.258 n=25 q=0.5 VP2 6.983 7.426 7.362 VP3 0.920 0.971 0.970 VP1 6.134 7.355 5.001 q=0.9 VP2 19.738 22.292 14.768 VP3 2.651 3.528 2.420 VP1 4.765 5.693 5.104 q=0.1 VP2 16.667 17.443 15.478 VP3 1.932 2.744 2.477 VP1 2.340 2.407 2.404 n=50 q=0.5 VP2 7.870 8.108 8.079 VP3 0.988 1.035 1.035 VP1 5.909 6.823 5.558 q=0.9 VP2 22.187 21.263 16.620 VP3 2.305 3.189 2.678 Table 2 reports simulation results in which X and 𝜖 has standard normal distribution and Table 3 reports simulation results in which X has standard normal distribution and 𝜖 has asymmetric light tailed distribution. When q=0.5, COBS, RIS with NO and RIS with HD give close results. The RIS with NO estimator gives better results than COBS for VP2 and extreme quantiles as the sample size increases. Table 5 reports simulation results in which X has standard normal distribution and 𝜖 has asymmetric heavy tailed distribution. When n=25 and extreme 124 Burak Dilber and A. Fırat Özdemir Modelling Nonlinear Relation by Using Running Interval Smoother, Constrained B-Spline Smoothing and Different Quantile Estimators quantiles, the performance of RIS with NO is improved in heavy-tailed distributions. In terms of the distribution of the data generated in the simulation study, as the g and h values increase, the MSE values increase, especially in the extreme quantiles. On the other hand, one more simulation was performed when n=500 with 1000 replications and 36 experimental settings. In this simulation, RIS with NO gave the smallest MSE values 28 times out of 36 settings. But it can not be said that the MSE values were less than the ones computed when n=25 or 50. Each distributional setting lasted 10 hours on the average here. The data in Figure 2 is used to illustrate method RIS with NO, RIS with HD and COBS. In the COBS method, smoothing parameter (𝜆) equals 0. In the RIS with NO and RIS with HD, span (𝑓) equals 0.8. For quantile regression line when q=0.1, notice that for age under 10 the plot suggests a positive association but over 10 methods suggest a negative association with C-peptide. For quantile regression line when q=0.5, notice that for age under 10 the plot suggests a positive association but over 10 methods suggest a little or no association with Cpeptide. For quantile regression line when q = 0.9, notice that for age under 10 the plot suggests that there is little or no association but over 10 methods suggest a positive association with C-peptide. For COBS at q = 0.9, COBS indicated some unusually shaped regression lines. Here, there is concern about the shape of the regression line. 4. Real Data Example 4.1. Example 1 Sochett et al. (1987) report data related to patterns of residual insulin secretion in children at the time they were diagnosed with diabetes [11]. A portion of the study was concerned with whether age can be used to predict the logarithm of C-peptide concentrations at diagnosis. This data set has 43 observations. Table 6. MSE values of methods for example 1 q=0.1 q=0.5 q=0.9 RIS with 1.063089 0.4055781 1.080137 NO RIS with 1.229652 0.4053521 1.229841 HD COBS 0.9410627 0.4118963 1.311917 MSE values of this data set for different quantiles are shown in Table 6. For q = 0.9, RIS with NO gives better results than the other methods. It is seen that the median gives close results. For q = 0.1, COBS gives more reliable results. 4.2. Example 2 Between March 12, 2020 and April 23, 2020 estimate of the number of daily cases occurred in Turkey COVID-19 can be modeled using these methods. The independent variable is days and the dependent variable is daily deaths. The data were obtained from the website of the World Health Organization [12]. Figure 1. Association between age and c-peptide Figure 1 shows the relationship between age and logarithm of C-peptide. There is an interesting association among these two variables. A weak but positive slope up to a certain age point 10 then the rest might be interpreted in different ways depending on the interest of the researcher. Using different quantile points simplifies this matter significantly. Figure 3. Association between days and daily deaths Figure 3 shows the relationship between days and daily deaths. The relationship between variables is modeled using COBS, RIS with NO, RIS with HD and third order polynomial regression model (PR). Figure 2. Comparisons of RIS with NO, RIS with HD and COBS (quantile values are 0.1, 0.5 and 0.9) 125 Burak Dilber and A. Fırat Özdemir Modelling Nonlinear Relation by Using Running Interval Smoother, Constrained B-Spline Smoothing and Different Quantile Estimators seen that COBS method are more successful especially when compared with the other methods. 5. Conclusion In this paper, two nonparametric regression methods that can be used to model nonlinear relationship were compared. After introducing the fundamental structure of the conventional nonparametric regression methods, a simulation study was carried out by using g-h distributions with different skewness and kurtosis levels. The performance of the methods were also monitored with graphs using real data sets. In the simulation study, the methods were compared in terms of mean squared error (MSE). In general, the COBS and the running interval smoother with NO quantile estimator gave better results than the running interval smoother with HD quantile estimator. That means they are more efficient than the other method. COBS and RIS with NO methods gave similar results for all g-h distributions but increasing the kurtosis and skewness levels (non-normality) resulted in better efficiency values for RIS with NO quantile estimator. Under the normality, COBS and RIS with NO methods gave similar results so both can be preferred. When the distribution is asymmetric light tailed, in 15 of the 18 different cases examined, the COBS method had a lower MSE value, while in 3 of them the RIS with NO method had a lower value. However, both methods can be preferred because the MSE values are quite similar to each other. RIS with HD is not preferred. When the distribution is symmetric heavy tailed, in 12 of the 18 different situation examined, COBS method can be preferred, in other situations RIS with NO quantile estimator can be preferred. When the distribution is symmetric heavy tailed, in 12 of the 18 different situation examined, COBS method can be preferred, in other situations RIS with NO quantile estimator can be preferred. When the distribution is asymmetric heavy tailed, since COBS and RIS give similar results, both methods can be preferred. However, the RIS with NO method is recommended for VP2. Performances of methods were also compared by sketching graphs. The type and degree of relationship among variables might change depending on the quantile points of interest. Using more than one quantile points let the researchers observe different patterns and make correct predictions accordingly. Preferring RIS or COBS enable this when analyzing nonlinear trends. Moreover, predicted values of COVID-19 daily deaths for Turkey are obtained and it has been seen that COBS method makes very successful predictions. Constrained B-Spline Smoothing (COBS) and Running Interval Smoother (RIS) with NO quantile estimator are suggested as efficient, flexible and robust tools of modelling nonlinear relationship among variables. Figure 4. Comparisons of RIS with NO, RIS with HD, COBS and PR (quantile value is 0.5) The data in Figure 4 is used to illustrate method RIS with NO, RIS with HD, COBS and PR. In the COBS method, smoothing parameter (𝜆) equals 0. In the RIS with NO and RIS with HD, span (𝑓) equals 0.4. For quantile regression line when q=0.5, notice that the plot suggests a positive association with daily deaths. Table 7. MSE values of methods for example 2 RIS with RIS with COBS PR NO HD q=0.5 25.3764 23.8560 44.2608 48.5388 MSE values of this data set for median are shown in Table 7. It is seen that the RIS with HD and RIS with NO give more reliable results. Table 8. Predicted value of daily deaths DAYS REAL PREDICTED VALUE VALUE COBS RIS RIS PR with with NO HD 15/04/2020 107 113 108 108 112 16/04/2020 115 116 112 112 115 17/04/2020 125 118 115 115 117 18/04/2020 126 120 117 117 119 19/04/2020 121 121 118 119 121 20/04/2020 127 121 120 120 121 21/04/2020 123 121 121 121 122 22/04/2020 119 119 122 122 122 23/04/2020 117 117 123 123 121 24/04/2020 115 114 122 122 119 25/04/2020 109 110 121 121 117 26/04/2020 106 106 121 121 114 27/04/2020 99 101 119 120 111 28/04/2020 95 95 118 118 106 29/04/2020 92 88 117 117 101 Table 8 shows the estimated and real values of daily deaths between 15 April 2020 and 29 April 2020. It is 126 Burak Dilber and A. Fırat Özdemir Modelling Nonlinear Relation by Using Running Interval Smoother, Constrained B-Spline Smoothing and Different Quantile Estimators 6. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] Harrell, F.E., Davis C.E., “A new distribution-free quantile estimator”, Biometrika, 69, 635-640, 1982. Navruz G., Özdemir A.F., “A new quantile estimator with weights based on a subsampling approach”, British J. of Mathematical and Statistical Psychology (Early view) 2020. Koenker R., Bassett G., “Regression quantiles”, Econometrica, 46, 33-50, 1978. He X., Ng P., “COBS: Qualitatively constrained smoothing via linear programming”, Computational Statistics, 14, 315-337, 1999. Wilcox R., Introduction to Robust Estimation and Hypothesis Testing, 4th ed. Academic Press Amsterdam, the Netherlands, 2017. Hastie T., Tibshirani R., Generalized Additive Models, 1st ed. Chapman and Hall/CRC Press, London, 1990. Koenker R., Ng P., “Inequality constrained quantile regression”, The Indian Journal of Statistics, 67, 418-440, 2005. Hoaglin D.C., “Summarizing shape numerically: The g-and-h distribution. In D. C. Hoaglin, F. Mosteller, & J. W. Tukey (Eds.)”, Exploring data tables, trends, and shapes. New York, NY: WileyInterscience, 1985. D'Agostino R.B., “Transformation to normality of the null distribution of G1”, Biometrika, 57, 3, 679-681, 1970. Bonett D.G., Seier E., “ A test of normality with high uniform power”, Computational Statistics and Data Analysis, 40, 435-445, 2002. Sochett E.B., Daneman D., Clarson C., Ehrich R.M., “Factors affecting and patterns of residual insulin secretion during the first year of type I (insulin dependent) diabetes mellitus in children”, Diabetes, 30, 453–459, 1987. World Health Organization, 2020 [Online]. Available: https://www.who.int/emergencies/diseases/n ovel-coronavirus-2019/situation-reports. 127

Log In

Modelling Nonlinear Relation by Using Running Interval Smoother, Constrained B-Spline Smoothing and Different Quantile Estimators

Related papers

Related papers

Related topics