Next Article in Journal
An Efficient Secure Scheme Based on Hierarchical Topology in the Smart Home Environment
Next Article in Special Issue
Evaluating Investment Risks of Metallic Mines Using an Extended TOPSIS Method with Linguistic Neutrosophic Numbers
Previous Article in Journal
Learning Performance Enhancement Using Computer-Assisted Language Learning by Collaborative Learning Groups
Previous Article in Special Issue
Another Note on Paraconsistent Neutrosophic Sets
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Neutrosophic Weighted Extreme Learning Machine for Imbalanced Data Set

1
Department of Electrical and Electronics Engineering, Technology Faculty, Firat University, 23119 Elazig, Turkey
2
Department of Computer Science, University of Illinois at Springfield, Springfield, IL 62703, USA
3
Department of Mathematics and Sciences, University of New Mexico, Gallup, NM 87301, USA
*
Author to whom correspondence should be addressed.
Submission received: 23 June 2017 / Revised: 31 July 2017 / Accepted: 1 August 2017 / Published: 3 August 2017
(This article belongs to the Special Issue Neutrosophic Theories Applied in Engineering)

Abstract

:
Extreme learning machine (ELM) is known as a kind of single-hidden layer feedforward network (SLFN), and has obtained considerable attention within the machine learning community and achieved various real-world applications. It has advantages such as good generalization performance, fast learning speed, and low computational cost. However, the ELM might have problems in the classification of imbalanced data sets. In this paper, we present a novel weighted ELM scheme based on neutrosophic set theory, denoted as neutrosophic weighted extreme learning machine (NWELM), in which neutrosophic c-means (NCM) clustering algorithm is used for the approximation of the output weights of the ELM. We also investigate and compare NWELM with several weighted algorithms. The proposed method demonstrates advantages to compare with the previous studies on benchmarks.

1. Introduction

Extreme learning machine (ELM) was put forward in 2006 by Huang et al. [1] as a single-hidden layer feedforward network (SLFN). The hidden layer parameters of ELM are arbitrarily initialized and output weights are determined by utilizing the least squares algorithm. Due to this characteristic, ELM has fast learning speed, better performance and efficient computation cost [1,2,3,4], and has, as a result, been applied in different areas.
However, ELM suffers from the presence of irrelevant variables in the large and high dimensional real data set [2,5]. The unbalanced data set problem occurs in real applications such as text categorization, fault detection, fraud detection, oil-spills detection in satellite images, toxicology, cultural modeling, and medical diagnosis [6]. Many challenging real problems are characterized by imbalanced training data in which at least one class is under-represented relative to others.
The problem of imbalanced data is often associated with asymmetric costs of misclassifying elements of different classes. In addition, the distribution of the test data set might differ from that of the training samples. Class imbalance happens when the number of samples in one class is much more than that of the other [7]. The methods aiming to tackle the problem of imbalance can be classified into four groups such as algorithmic based methods, data based methods, cost-sensitive methods and ensembles of classifiers based methods [8]. In algorithmic based approaches, the minority class classification accuracy is improved by adjusting the weights for each class [9]. Re-sampling methods can be viewed in the data based approaches where these methods did not improve the classifiers [10]. The cost-sensitive approaches assign various cost values to training samples of the majority class and the minority class, respectively [11]. Recently, ensembles based methods have been widely used in classification of imbalanced data sets [12]. Bagging and boosting methods are the two popular ensemble methods.
The problem of class imbalance has received much attention in the literature [13]. Synthetic minority over–sampling technique (SMOTE) [9] is known as the most popular re-sampling method that uses pre-processing for obtaining minority class instances artificially. For each minority class sample, SMOTE creates a new sample on the line joining it to the nearest minority class neighbor. Borderline SMOTE [14], SMOTE-Boost [15], and modified SMOTE [14] are some of the improved variants of the SMOTE algorithm. In addition, an oversampling method was proposed that identifies some minority class samples that are hard to classify [16]. Another oversampling method was presented that uses bagging with oversampling [17]. In [18], authors opted to use double ensemble classifier by combining bagging and boosting. In [19], authors combined sampling and ensemble techniques to improve the classification performance for skewed data. Another method, namely random under sampling (RUS), was proposed that removes the majority class samples randomly until the training set becomes balanced [19]. In [20], authors proposed an ensemble of an support vector machine (SVM) structure with boosting (Boosting-SVM), where the minority class classification accuracy was increased compared to pure SVM. In [21], a cost sensitive approach was proposed where k-nearest neighbors (k-NN) classifier was adopted. In addition, in [22], an SVM based cost sensitive approach was proposed for class imbalanced data classification. Decision trees [23] and logistic regression [24] based methods were also proposed in order to handle with the imbalanced data classification.
An ELM classifier trained with an imbalanced data set can be biased towards the majority class and obtain a high accuracy on the majority class by compromising minority class accuracy. Weighted ELM (WELM) was employed to alleviate the ELM’s classification deficiency on imbalanced data sets, and which can be seen as one of the cost-proportionate weighted sampling methods [25]. ELM assigns the same misclassification cost value to all data points such as positive and negative samples in a two-class problem. When the number of negative samples is much larger than that of the number of positive samples or vice versa, assigning the same misclassification cost value to all samples can be seen one of the drawbacks of traditional ELM. A straightforward solution is to obtain misclassification cost values adaptively according to the class distribution, in the form of a weight scheme inversely proportional to the number of samples in the class.
In [7], the authors proposed a weighted online sequential extreme learning machine (WOS-ELM) algorithm for alleviating the imbalance problem in chunk-by-chunk and one-by-one learning. A weight setting was selected in a computationally efficient way. Weighted Tanimoto extreme learning machine (T-WELM) was used to predict chemical compound biological activity and other data with discrete, binary representation [26]. In [27], the authors presented a weight learning machine for a SLFN to recognize handwritten digits. Input and output weights were globally optimized with the batch learning type of least squares. Features were assigned into the prescribed positions. Another weighted ELM algorithm, namely ESOS-ELM, was proposed by Mirza et al. [28], which was inspired from WOS-ELM. ESOS-ELM aims to handle class imbalance learning (CIL) from a concept-drifting data stream. Another ensemble-based weighted ELM method was proposed by Zhang et al. [29], where the weight of each base learner in the ensemble is optimized by differential evolution algorithm. In [30], the authors further improved the re-sampling strategy inside Over-sampling based online bagging (OOB) and Under-sampling based online bagging (UOB) in order to learn class imbalance.
Although much awareness of the imbalance has been raised, many of the key issues remain unresolved and encountered more frequently in massive data sets. How to determine the weight values is key to designing WELM. Different situations such as noises and outlier data should be considered.
The noises and outlier data in a data set can be treated as a kind of indeterminacy. Neutrosophic set (NS) has been successfully applied for indeterminate information processing, and demonstrates advantages to deal with the indeterminacy information of data and is still a technique promoted for data analysis and classification application. NS provides an efficient and accurate way to define imbalance information according to the attributes of the data.
In this study, we present a new weighted ELM scheme using neutrosophic c-means (NCM) clustering to overcome the ELM’s drawbacks in highly imbalanced data sets. A novel clustering algorithm NCM was proposed for data clustering [31,32]. NCM is employed to determine a sample’s belonging, noise, and indeterminacy memberships, and is then used to compute a weight value for that sample [31,32,33]. A weighted ELM is designed using the weights from NCM and utilized for imbalanced data set classification.
The rest of the paper is structured as follows. In Section 2, a brief history of the theory of ELM and weighted ELM is introduced. In addition, Section 2 introduces the proposed method. Section 3 discusses the experiments and comparisons, and conclusions are drawn in Section 4.

2. Proposed Method

2.1. Extreme Learning Machine

Backpropagation, which is known as gradient-based learning method, suffers from slow convergence speed. In addition, stuck in the local minimum can be seen as another disadvantage of a gradient-based learning algorithm. ELM was proposed by Huang et al. [1] as an alternative method that overcomes the shortcomings of gradient-based learning methods. The ELM was designed as an SLFN, where the input weights and hidden biases are selected randomly. These weights do not need to be adjusted during the training process. The output weights are determined analytically with Moore–Penrose generalized inverse of the hidden-layer output matrix.
Mathematically speaking, the output of the ELM with L hidden nodes and activation function g(·) can be written as:
o i = j = 1 L β j g ( a j , b j , x j ) , i = 1 , 2 , , N ,
where x j is the jth input data, a j = [ a j 1 , a j 2 , , a j n ] T is the weight vector, β j = [ β j 1 , β j 2 , , β j n ] T is the output weight vector, b j is the bias of the jth hidden node and o i is the ith output node and N shows the number of samples. If ELM learns these N samples with 0 error, then Equation (1) can be updated as follows:
t i = j = 1 L β j g ( a j , b j , x j ) , i = 1 , 2 , , N ,
where t i shows the actual output vector. Equation (2) can be written compactly as shown in Equation (3):
H β = T ,
where H = { h i j } = g ( a j , b j , x j ) is the hidden-layer output matrix. Thus, the output weight vector can be calculated analytically with Moore–Penrose generalized inverse of the hidden-layer output matrix as shown in Equation (4):
β ^ = H + T ,
where H + is the Moore–Penrose generalized inverse of matrix H.

2.2. Weighted Extreme Learning Machine

Let us consider a training data set [ x i , t i ] , i = 1 , , N belonging to two classes, where x i R n and t i are the class labels. In binary classification, t i is either 1 or + 1 . Then, a N × N diagonal matrix W i i is considered, where each of them is associated with a training sample x i . The weighting procedure generally assigns larger W i i to x i , which comes from the minority class.
An optimization problem is employed to maximize the marginal distance and to minimize the weighted cumulative error as:
M i n i m i z e : H β T 2 a n d β .
Furthermore:
M i n i m i z e : L E L M = 1 2 β 2 + C W 1 2 i = 1 N ξ i 2 ,
S u b j e c t e d t o : h ( x i ) β = t i T ξ i T , i = 1 , 2 , , N ,
where T = [ t 1 , , t N ] , ξ i is the error vector and h ( x i ) is the feature mapping vector in the hidden layer with respect to x i , and β . By using the Lagrage multiplier and Karush–Kuhn–Tucker theorem, the dual optimization problem can be solved. Thus, hidden layer’s output weight vector β becomes can be derived from Equation (7) regarding left pseudo-inverse or right pseudo-inverse. When presented data with small size, right pseudo-inverse is recommended because it involves the inverse of an N × N matrix. Otherwise, left pseudo-inverse is more suitable since it is much easier to compute matrix inversion of size L × L when L is much smaller than N:
W h e n N i s s m a l l : β = H T ( I C + W H H T ) 1 W T ,
W h e n N i s l a r g e : β = H T ( I C + H T W T ) 1 H T W T .
In the weighted ELM, the authors adopted two different weighting schemes. In the first one, the weights for the minority and majority classes are calculated as:
W m i n o r i t y = 1 # ( t i + ) a n d W m a j o r i t y = 1 # ( t i ) ,
and, for the second one, the related weights are calculated as:
W m i n o r i t y = 0.618 # ( t i + ) a n d W m a j o r i t y = 1 # ( t i ) .
The readers may refer to [25] for detail information about determination of the weights.

2.3. Neutrosophic Weighted Extreme Learning Machine

Weighted ELM assigns the same weight value to all samples in the minority class and another same weight value to all samples in the majority class. Although this procedure works quite well in some imbalanced data sets, assigning the same weight value to all samples in a class may not be a good choice for data sets that have noise and outlier samples. In other words, to deal with noise and outlier data samples in an imbalanced data set, different weight values are needed for each sample in each class that reflects the data point’s significance in its class. Therefore, we present a novel method to determine the significance of each sample in its class. NCM clustering can determine a sample’s belonging, noise and indeterminacy memberships, which can then be used in order to compute a weight value for that sample.
Guo and Sengur [31] proposed the NCM clustering algorithms based on the neutrosophic set theorem [34,35,36,37]. In NCM, a new cost function was developed to overcome the weakness of the Fuzzy c-Means (FCM) method on noise and outlier data points. In the NCM algorithm, two new types of rejection were developed for both noise and outlier rejections. The objective function in NCM is given as follows:
J N C M ( T , I , F , C ) = i = 1 N j = 1 C ( w ¯ 1 T i j ) m x i c j 2 + i = 1 N ( w ¯ 2 I i ) m x i c ¯ i m a x 2 + δ 2 i = 1 N ( w ¯ 3 F i ) m ,
where m is a constant. For each point i, the c ¯ i m a x is the mean of two centers. T i j , I i and F i are the membership values belonging to the determinate clusters, boundary regions and noisy data set. θ < T i j , I i , F i < 1 :
j = 1 c T i j + I i + F i = 1 .
Thus, the related membership functions are calculated as follows:
T i j = w ¯ 2 w ¯ 3 ( x i c j ) ( 2 m 1 ) j = 1 C ( x i c j ) ( 2 m 1 ) + ( x i c ¯ i m a x ) ( 2 m 1 ) + δ ( 2 m 1 ) ,
I i = w ¯ 1 w ¯ 3 ( x i c i m a x ) ( 2 m 1 ) j = 1 C ( x i c j ) ( 2 m 1 ) + ( x i c ¯ i m a x ) ( 2 m 1 ) + δ ( 2 m 1 ) ,
F i = w ¯ 1 w ¯ 2 ( δ ) ( 2 m 1 ) j = 1 C ( x i c j ) ( 2 m 1 ) + ( x i c ¯ i m a x ) ( 2 m 1 ) + δ ( 2 m 1 ) ,
C j = i = 1 N ( w ¯ 1 T i j ) m x i i = 1 N ( w ¯ 1 T i j ) m ,
where c j shows the center of cluster j, w ¯ 1 , w ¯ 2 , and w ¯ 3 are the weight factors and δ is a regularization factor which is data dependent [31]. Under the above definitions, every input sample in each minority and majority class is associated with a triple T i j , I i , F i . While the larger T i j means that the sample belongs to the labeled class with a higher probability, the larger I i means that the sample is indeterminate with a higher probability. Finally, the larger F i means that the sample is highly probable to be a noise or outlier data.
After clustering procedure is applied in NCM, the weights for each sample of minority and majority classes are obtained as follows:
W i i m i n o r i t y = C r T i j + I i F i a n d W i i m a j o r i t y = 1 T i j + I i F i ,
C r = # ( t i ) # ( t i + ) ,
where C r is the ratio of the number of samples in the majority class to the number of the samples in the minority class.
The algorithm of the neutrosophic weighted extreme learning machine (NWELM) is composed of four steps. The first step necessitates applying the NCM algorithm based on the pre-calculated cluster centers, according to the class labels of the input samples. Thus, the T, I and F membership values are determined for the next step. The related weights are calculated from the determined T, I and F membership values in the second step of the algorithm.
In Step 3, the ELM parameters are tuned and samples and weights are fed into the ELM in order to calculate the H matrix. The hidden layer weight vector β is calculated according to the H, W and class labels. Finally, the determination of the labels of the test data set is accomplished in the final step of the algorithm (Step 4).
The neutrosophic weighted extreme learning machine (NWELM) algorithm is given as following:
Input: 
Labelled training data set.
Output: 
Predicted class labels.
Step 1: 
Initialize the cluster centers according to the labelled data set and run NCM algorithm in order to obtain the T, I and F value for each data point.
Step 2: 
Compute W i i m i n o r i t y and W i i m a j o r i t y according to Equations (18) and (19).
Step 3: 
Adapt the ELM parameters and run NWELM. Compute H matrix and obtain β according to Equation (8) or Equation (9).
Step 4: 
Calculate the labels of test data set based on β .

3. Experimental Results

The geometric mean ( G m e a n ) is used to evaluate the performance of the proposed NWELM method. The G m e a n is computed as follows:
G m e a n = R T N T N + F P ,
R = T P T P + F N ,
where R denotes the recall rate and T N , F P denotes true-negative and false-positive detections, respectively. G m e a n values are in the range of [0–1] and it represents the square root of positive class accuracy and negative class accuracy. The performance evaluation of NWELM classifier is tested on both toy data sets and real data sets, respectively. The five-fold cross-validation method is adopted in the experiments. In the hidden node of the NWELM, the radial basis function (RBF) kernel is considered. A grid search of the trade-off constant C on { 2 18 , 2 16 , …, 2 48 , 2 50 } and the number of hidden nodes L on {10, 20, …, 990, 2000} was conducted in seeking the optimal result using five-fold cross-validation. For real data sets, a normalization of the input attributes into [ 1 , 1] is considered. In addition, for NCM, the following parameters are chosen such as ε = 10 5 , w ¯ 1 = 0.75 , w ¯ 2 = 0.125 , w ¯ 3 = 0.125 respectively, which were obtained by means of trial and error. The δ parameter of NCM method is also searched on { 2 10 , 2 8 , …, 2 8 , 2 10 }.

3.1. Experiments on Artificial Data Sets

Four two-class artificial imbalance data sets were used to evaluate the classification performance of the proposed NWELM scheme. The illustration of the data sets is shown in Figure 1 [38]. The decision boundary between classes is complicated. In Figure 1a, we illustrate the first artificial data set that follows a uniform distribution. As can be seen, the red circles of Figure 1a belong to the minority class, with the rest of the data samples shown by blue crosses as the majority class. The second imbalance data set, namely Gaussian-1, is obtained using two Gaussian distributions with a 1:9 ratio of samples as shown in Figure 1b. While the red circles illustrate the minority class, the blue cross samples show the majority class.
Another Gaussian distribution-based imbalance data set, namely Gaussian-2, is given in Figure 1c. This data set consists of nine Gaussian distributions with the same number of samples arranged in a 3 × 3 grid. The red circle samples located in the middle belong to the minority class while the blue cross samples belong to the majority class. Finally, Figure 1d shows the last artificial imbalance data set. It is known as a complex data set because it has a 1:9 ratio of samples for the minority and majority classes.
Table 1 shows the G m e a n achieved by the two methods on these four data sets in ten independent runs. For Gaussian-1, Gaussian-2 and the Uniform artificial data sets, the proposed NWELM method yields better results when compared to the weighted ELM scheme; however, for the Complex artificial data sets, the weighted ELM method achieves better results. The better resulting cases are shown in bold text. It is worth mentioning that, for the Gaussian-2 data set, NWELM achieves a higher G m e a n across all trials.

3.2. Experiments on Real Data Set

In this section, we test the achievement of the proposed NWELM method on real data sets [39]. A total of 21 data sets with different numbers of features, training and test samples, and imbalance ratios are shown in Table 2. The selected data sets can be categorized into two classes according to their imbalance ratios. The first class has the imbalance ratio range of 0 to 0.2 and contains yeast-1-2-8-9_vs_7, abalone9_18, glass-0-1-6_vs_2, vowel0, yeast-0-5-6-7-9_vs_4, page-blocks0, yeast3, ecoli2, new-thyroid1 and the new-thyroid2 data sets.
On the other hand, second class contains the data sets, such as ecoli1, glass-0-1-2-3_vs_4-5-6, vehicle0, vehicle1, haberman, yeast, glass0, iris0, pima, wisconsin and glass1, that have imbalance ratio rates between 0.2 and 1.
The comparison results of the proposed NWELM with the weighted ELM, unweighted ELM and SVM are given in Table 3. As the weighted ELM method used a different weighting scheme ( W 1 , W 2 ), in our comparisons, we used the higher G m e a n value. As can be seen in Table 3, the NWELM method yields higher G m e a n values for 17 of the imbalanced data sets. For three of the data sets, both methods yield the same G m e a n . Just for the page-blocks0 data set, the weighted ELM method yielded better results. It is worth mentioning that the NWELM method achieves 100% G m e a n values for four data sets (vowel0, new-thyroid1, new-thyroid2, iris0). In addition, NWELM produced higher G m e a n values than SVM for all data sets.
The obtained results were further evaluated by area under curve (AUC) values [40]. In addition, we compared the proposed method with unweighted ELM, weighted ELM and SVM based on the achieved AUC values as tabulated in Table 4. As seen in Table 4, for all examined data sets, our proposal’s AUC values were higher than the compared other methods. For further comparisons of the proposed method with unweighted ELM, weighted ELM and SVM methods appropriately, statistical tests on AUC results were considered. The paired t-test was chosen [41]. The paired t-test results between each compared method and the proposed method for AUC was tabulated in Table 5 in terms of p-value. In Table 5, the results showing a significant advantage to the proposed method were shown in bold–face where p-values are equal or smaller than 0.05. Therefore, the proposed method performed better than the other methods in 39 tests out of 63 tests when each data set and pairs of methods are considered separately.
Another statistical test, namely the Friedman aligned ranks test, has been applied to compare the obtained results based on AUC values [42]. This test is a non-parametric test and the Holm method was chosen as the post hoc control method. The significance level was assigned 0.05. The statistics were obtained with the STAC tool [43] and recorded in Table 6. According to these results, the highest rank value was obtained by the proposed NWELM method and SVM and WELM rank values were greater than the ELM. In addition, the comparison’s statistics, adjusted p-values and hypothesis results were given in Table 6.
We further compared the proposed NWELM method with two ensemble-based weighted ELM methods on 12 data sets [29]. The obtained results and the average classification G m e a n values are recorded in Table 7. The best classification result for each data set is shown in bold text. A global view on the average classification performance shows that the NWELM yielded the highest average G m e a n value against both the ensemble-based weighted ELM methods. In addition, the proposed NWELM method evidently outperforms the other two compared algorithms in terms of G m e a n in 10 out of 12 data sets, with the only exceptions being the yeast3 and glass2 data sets.
As can be seen through careful observation, the NWELM method has not significantly improved the performance in terms of the glass1, haberman, yeast1_7 and abalone9_18 data sets, but slightly outperforms both ensemble-based weighted ELM methods.
A box plots illustration of the compared methods is shown in Figure 2. The box generated by the NWELM is shorter than the boxes generated by the compared vote-based ensemble and differential evolution (DE)- based ensemble methods. The dispersion degree of NWELM method is relatively low. It is worth noting that the box plots of all methods consider the G m e a n of the haberman data set as an exception. Finally, the box plot determines the proposed NWELM method to be more robust when compared to the ensemble-based weighted ELM methods.

4. Conclusions

In this paper, we propose a new weighted ELM model based on neutrosophic clustering. This new weighting scheme introduces true, indeterminacy and falsity memberships of each data point into ELM. Thus, we can remove the effect of noises and outliers in the classification stage and yield better classification results. Moreover, the proposed NWELM scheme can handle the problem of class imbalance more effectively. In the evaluation experiments, we compare the performance of the NWELM method with weighted ELM, unweighted ELM, and two ensemble-based weighted ELM methods. The experimental results demonstrate the NEWLM to be more effective than the compared methods for both artificial and real binary imbalance data sets. In the future, we are planning to extend our study to multiclass imbalance learning.

Author Contributions

Abdulkadir Sengur provided the idea of the proposed method. Yanhui Guo and Florentin Smarandache proved the theorems. Yaman Akbulut analyzed the model’s application. Abdulkadir Sengur, Yanhui Guo, Yaman Akbulut, and Florentin Smarandache wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  2. Miche, Y.; Van Heeswijk, M.; Bas, P.; Simula, O.; Lendasse, A. TROP-ELM: A double-regularized ELM using LARS and Tikhonov regularization. Neurocomputing 2011, 74, 2413–2421. [Google Scholar] [CrossRef]
  3. Deng, W.; Zheng, Q.; Chen, L. Regularized Extreme Learning Machine. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Data Mining, Nashville, TN, USA, 30 March–2 April 2009; pp. 389–395. [Google Scholar]
  4. MartíNez-MartíNez, J.M.; Escandell-Montero, P.; Soria-Olivas, E.; MartíN-Guerrero, J.D.; Magdalena-Benedito, R.; GóMez-Sanchis, J. Regularized extreme learning machine for regression problems. Neurocomputing 2011, 74, 3716–3721. [Google Scholar] [CrossRef]
  5. Miche, Y.; Sorjamaa, A.; Bas, P.; Simula, O.; Jutten, C.; Lendasse, A. OP-ELM: Optimally pruned extreme learning machine. IEEE Trans. Neural Netw. 2010, 21, 158–162. [Google Scholar] [CrossRef] [PubMed]
  6. Vajda, S.; Fink, G.A. Strategies for Training Robust Neural Network Based Digit Recognizers on Unbalanced Data Sets. In Proceedings of the 2010 International Conference on Frontiers in Handwriting Recognition (ICFHR), Kolkata, India, 16–18 November 2010; pp. 148–153. [Google Scholar]
  7. Mirza, B.; Lin, Z.; Toh, K.A. Weighted online sequential extreme learning machine for class imbalance learning. Neural Process. Lett. 2013, 38, 465–486. [Google Scholar] [CrossRef]
  8. Beyan, C.; Fisher, R. Classifying imbalanced data sets using similarity based hierarchical decomposition. Pattern Recognit. 2015, 48, 1653–1672. [Google Scholar] [CrossRef]
  9. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar]
  10. Pazzani, M.; Merz, C.; Murphy, P.; Ali, K.; Hume, T.; Brunk, C. Reducing Misclassification Costs. In Proceedings of the Eleventh International Conference on Machine Learning, New Brunswick, NJ, USA, 10–13 July 1994; pp. 217–225. [Google Scholar]
  11. Japkowicz, N. The Class Imbalance Problem: Significance and Strategies. In Proceedings of the 2000 International Conference on Artificial Intelligence (IC-AI’2000), Las Vegas, NV, USA, 26–29 June 2000. [Google Scholar]
  12. Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 2012, 42, 463–484. [Google Scholar] [CrossRef]
  13. He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar]
  14. Hu, S.; Liang, Y.; Ma, L.; He, Y. MSMOTE: Improving Classification Performance when Training Data is Imbalanced. In Proceedings of the Second International Workshop on Computer Science and Engineering (WCSE’09), Qingdao, China, 28–30 October 2009; IEEE: Washington, DC, USA, 2009; Volume 2, pp. 13–17. [Google Scholar]
  15. Chawla, N.; Lazarevic, A.; Hall, L.; Bowyer, K. SMOTEBoost: Improving Prediction of the Minority Class in Boosting. In Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat-Dubrovnik, Croatia, 22–26 September 2003; pp. 107–119. [Google Scholar]
  16. Barua, S.; Islam, M.M.; Yao, X.; Murase, K. MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 2014, 26, 405–425. [Google Scholar] [CrossRef]
  17. Radivojac, P.; Chawla, N.V.; Dunker, A.K.; Obradovic, Z. Classification and knowledge discovery in protein databases. J. Biomed. Inf. 2004, 37, 224–239. [Google Scholar] [CrossRef] [PubMed]
  18. Liu, X.Y.; Wu, J.; Zhou, Z.H. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2009, 39, 539–550. [Google Scholar]
  19. Seiffert, C.; Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2010, 40, 185–197. [Google Scholar] [CrossRef]
  20. Wang, B.X.; Japkowicz, N. Boosting support vector machines for imbalanced data sets. Knowl. Inf. Syst. 2010, 25, 1–20. [Google Scholar] [CrossRef]
  21. Tan, S. Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Syst. Appl. 2005, 28, 667–671. [Google Scholar] [CrossRef]
  22. Fumera, G.; Roli, F. Cost-sensitive learning in support vector machines. In Proceedings of the VIII Convegno Associazione Italiana per l’Intelligenza Artificiale, Siena, Italy, 10–13 September 2002. [Google Scholar]
  23. Drummond, C.; Holte, R.C. Exploiting the cost (in) sensitivity of decision tree splitting criteria. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML-2000), Stanford, CA, USA, 29 June–2 July 2000; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2000; Volume 1, pp. 239–246. [Google Scholar]
  24. Williams, D.P.; Myers, V.; Silvious, M.S. Mine classification with imbalanced data. IEEE Geosci. Remote Sens. Lett. 2009, 6, 528–532. [Google Scholar] [CrossRef]
  25. Zong, W.; Huang, G.B.; Chen, Y. Weighted extreme learning machine for imbalance learning. Neurocomputing 2013, 101, 229–242. [Google Scholar] [CrossRef]
  26. Czarnecki, W.M. Weighted tanimoto extreme learning machine with case study in drug discovery. IEEE Comput. Intell. Mag. 2015, 10, 19–29. [Google Scholar] [CrossRef]
  27. Man, Z.; Lee, K.; Wang, D.; Cao, Z.; Khoo, S. An optimal weight learning machine for handwritten digit image recognition. Signal Process. 2013, 93, 1624–1638. [Google Scholar] [CrossRef]
  28. Mirza, B.; Lin, Z.; Liu, N. Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift. Neurocomputing 2015, 149, 316–329. [Google Scholar] [CrossRef]
  29. Zhang, Y.; Liu, B.; Cai, J.; Zhang, S. Ensemble weighted extreme learning machine for imbalanced data classification based on differential evolution. Neural Comput. Appl. 2016, 1–9. [Google Scholar] [CrossRef]
  30. Wang, S.; Minku, L.L.; Yao, X. Resampling-based ensemble methods for online class imbalance learning. IEEE Trans. Knowl. Data Eng. 2015, 27, 1356–1368. [Google Scholar] [CrossRef]
  31. Guo, Y.; Sengur, A. NCM: Neutrosophic c-means clustering algorithm. Pattern Recognit. 2015, 48, 2710–2724. [Google Scholar] [CrossRef]
  32. Guo, Y.; Sengur, A. NECM: Neutrosophic evidential c-means clustering algorithm. Neural Comput. Appl. 2015, 26, 561–571. [Google Scholar] [CrossRef]
  33. Guo, Y.; Sengur, A. A novel 3D skeleton algorithm based on neutrosophic cost function. Appl. Soft Comput. 2015, 36, 210–217. [Google Scholar] [CrossRef]
  34. Smarandache, F. A Unifying Field in Logics: Neutrosophic Logic. Neutrosophic Probability, Neutrosophic Set. In Proceedings of the 2000 Western Section Meeting (Meeting #951), Preliminary Report, Santa Barbara, CA, USA, 11–12 March 2000; Volume 951, pp. 11–12. [Google Scholar]
  35. Smarandache, F. Introduction to Neutrosophic Measure, Neutrosophic Integral, and Neutrosophic Probability; Sitech: Craiova, Romania, 2013. [Google Scholar]
  36. Smarandache, F. Neutrosophy, Neutrosophic Probability, Set, and Logic; American Research Press: Rehoboth, DE, USA, 1998; p. 105. [Google Scholar]
  37. Smarandache, F. A Unifying Field in Logics: Neutrosophic Logic. Neutrosophy, Neutrosophic Set, Neutrosophic Probability: Neutrsophic Logic. Neutrosophy, Neutrosophic Set, Neutrosophic Probability; Infinite Study: Ann Arbor, MI, USA, 2005; ISBN 9781599730806. [Google Scholar]
  38. Ng, W.W.; Hu, J.; Yeung, D.S.; Yin, S.; Roli, F. Diversified sensitivity-based undersampling for imbalance classification problems. IEEE Trans. Cybern. 2015, 45, 2402–2412. [Google Scholar] [CrossRef] [PubMed]
  39. Alcalá-Fdez, J.; Fernández, A.; Luengo, J.; Derrac, J.; García, S.; Sánchez, L.; Herrera, F. Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Log. Soft Comput. 2011, 17, 255–287. [Google Scholar]
  40. Huang, J.; Ling, C.X. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 2005, 17, 299–310. [Google Scholar] [CrossRef]
  41. Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
  42. Hodges, J., Jr.; Lehmann, E. Rank methods for combination of independent experiments in analysis of variance. In Selected Works of E.L. Lehmann; Selected Works in Probability and Statistics; Springer: Boston, MA, USA, 2012; pp. 403–418. [Google Scholar]
  43. Rodríguez-Fdez, I.; Canosa, A.; Mucientes, M.; Bugarín, A. STAC: A Web Platform for the Comparison of Algorithms Using Statistical Tests. In Proceedings of the 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Istanbul, Turkey, 2–5 August 2015; pp. 1–8. [Google Scholar]
Figure 1. Four 2-dimensional artificial imbalance data sets ( X 1 , X 2 ): (a) uniform; (b) gaussian-1; (c) gaussian-2; and (d) complex.
Figure 1. Four 2-dimensional artificial imbalance data sets ( X 1 , X 2 ): (a) uniform; (b) gaussian-1; (c) gaussian-2; and (d) complex.
Symmetry 09 00142 g001
Figure 2. Box plots illustration of the compared methods.
Figure 2. Box plots illustration of the compared methods.
Symmetry 09 00142 g002
Table 1. Comparison of weighted extreme learning machine (ELM) vs. NWELM on artificial data sets.
Table 1. Comparison of weighted extreme learning machine (ELM) vs. NWELM on artificial data sets.
Data SetsWeighted ELMNWELMData SetsWeighted ELMNWELM
G mean G mean G mean G mean
Gaussian-1-10.98110.9822Gaussian-2-10.96290.9734
Gaussian-1-20.98430.9855Gaussian-2-20.95510.9734
Gaussian-1-30.99440.9955Gaussian-2-30.96700.9747
Gaussian-1-40.98660.9967Gaussian-2-40.94940.9649
Gaussian-1-50.98660.9833Gaussian-2-50.94670.9724
Gaussian-1-60.98990.9685Gaussian-2-60.95630.9720
Gaussian-1-70.98330.9685Gaussian-2-70.95120.9629
Gaussian-1-80.99670.9978Gaussian-2-80.96440.9785
Gaussian-1-90.99440.9798Gaussian-2-90.94410.9559
Gaussian-1-100.98460.9898Gaussian-2-100.94020.9623
Uniform-10.98360.9874Complex-10.95870.9481
Uniform-20.97980.9750Complex-20.95290.9466
Uniform-30.97600.9823Complex-30.95870.9608
Uniform-40.98110.9836Complex-40.94820.9061
Uniform-50.98110.9823Complex-50.95870.9297
Uniform-60.97720.9772Complex-60.94090.9599
Uniform-70.97340.9403Complex-70.96440.9563
Uniform-80.97850.9812Complex-80.95750.9553
Uniform-90.98360.9762Complex-90.95510.9446
Uniform-100.96950.9734Complex-100.93510.9470
Table 2. Real data sets and their attributes.
Table 2. Real data sets and their attributes.
Data SetsFeatures (#)Training Data (#)Test Data (#)Imbalance Ratio
yeast-1-2-8-9_vs_787571880.0327
abalone9_1885841470.0600
glass-0-1-6_vs_29153390.0929
vowel0137901980.1002
yeast-0-5-6-7-9_vs_484221060.1047
page-blocks010437710950.1137
yeast3811872970.1230
ecoli27268680.1806
new-thyroid15172430.1944
new-thyroid25172430.1944
ecoli17268680.2947
glass-0-1-2-3_vs_4-5-69171430.3053
vehicle0186761700.3075
vehicle1186761700.3439
haberman3244620.3556
yeast1811872970.4064
glass09173430.4786
iris04120300.5000
pima86141540.5350
wisconsin95461370.5380
glass19173430.5405
Table 3. Experimental results of binary data sets in terms of the G m e a n . The best results on each data set are emphasized in bold-face.
Table 3. Experimental results of binary data sets in terms of the G m e a n . The best results on each data set are emphasized in bold-face.
G mean Data (Imbalance Ratio)
Gaussian KernelRadial Base Kernel
Unweighted ELMWeighted ELM max ( W 1 , W 2 )SVMNeutrosophic Weighted ELM
C G mean ( % ) C G mean ( % ) G mean ( % ) C G mean ( % )
imbalance ratio: 0, 0.2yeast-1-2-8-9_vs_7 (0.0327)2 48 60.97 2 4 71.4147.88 2 7 77.57
abalone9_18 (0.0600) 2 18 72.71 2 28 89.7651.50 2 23 94.53
glass-0-1-6_vs_2 (0.0929) 2 50 63.20 2 32 83.5951.26 2 7 91.86
vowel0 (0.1002) 2 18 100.00 2 18 100.0099.44 2 7 100.00
yeast-0-5-6-7-9_vs_4 (0.1047) 2 6 68.68 2 4 82.2162.32 2 10 85.29
page-blocks0 (0.1137) 2 4 89.62 2 16 93.6187.72 2 20 93.25
yeast3 (0.1230) 2 44 84.13 2 48 93.1184.71 2 3 93.20
ecoli2 (0.1806) 2 18 94.31 2 8 94.4392.27 2 10 95.16
new-thyroid1 (0.1944) 2 0 99.16 2 14 99.7296.75 2 7 100.00
new-thyroid2 (0.1944) 2 2 99.44 2 12 99.7298.24 2 7 100.00
imbalance ratio: 0.2, 1ecoli1 (0.2947) 2 0 88.75 2 10 91.0487.73 2 20 92.10
glass-0-1-2-3_vs_4-5-6 (0.3053) 2 10 93.26 2 18 95.4191.84 2 7 95.68
vehicle0 (0.3075) 2 8 99.36 2 20 99.3696.03 2 10 99.36
vehicle1 (0.3439) 2 18 80.60 2 24 86.7466.04 2 10 88.06
haberman (0.3556) 2 42 57.23 2 14 66.2637.35 2 7 67.34
yeast1 (0.4064) 2 0 65.45 2 10 73.1761.05 2 10 73.19
glass0 (0.4786) 2 0 85.35 2 0 85.6579.10 2 13 85.92
iris0 (0.5000) 2 18 100.00 2 18 100.0098.97 2 10 100.00
pima (0.5350) 2 0 71.16 2 8 75.5870.17 2 10 76.35
wisconsin (0.5380) 2 2 97.18 2 8 97.7095.67 2 7 98.22
glass1 (0.5405) 2 18 77.48 2 2 80.3569.64 2 17 81.77
Table 4. Experimental result of binary data sets in terms of the average area under curve (AUC). The best results on each data set are emphasized in bold-face.
Table 4. Experimental result of binary data sets in terms of the average area under curve (AUC). The best results on each data set are emphasized in bold-face.
AUCData (Imbalance Ratio)
Gaussian KernelRadial Base Kernel
Unweighted ELMWeighted ELM max ( W 1 , W 2 )SVMNeutrosophic Weighted ELM
CAUC (%)CAUC (%)AUC (%)CAUC (%)
imbalance ratio: 0, 0.2yeast-1-2-8-9_vs_7 (0.0327)2 48 61.48 2 4 65.5356.67 2 7 74.48
abalone9_18 (0.0600) 2 18 73.05 2 28 89.2856.60 2 23 95.25
glass-0-1-6_vs_2 (0.0929) 2 50 67.50 2 32 61.1453.05 2 7 93.43
vowel0 (0.1002) 2 18 93.43 2 18 99.2299.44 2 7 99.94
yeast-0-5-6-7-9_vs_4 (0.1047) 2 6 66.35 2 4 80.0969.88 2 10 82.11
page-blocks0 (0.1137) 2 4 67.42 2 16 71.5588.38 2 20 91.49
yeast3 (0.1230) 2 44 69.28 2 48 90.9283.92 2 3 93.15
ecoli2 (0.1806) 2 18 71.15 2 8 94.3492.49 2 10 94.98
new-thyroid1 (0.1944) 2 0 90.87 2 14 98.0296.87 2 7 100.00
new-thyroid2 (0.1944) 2 2 84.29 2 12 96.6398.29 2 7 100.00
imbalance ratio: 0.2, 1ecoli1 (0.2947) 2 0 66.65 2 10 90.2888.16 2 20 92.18
glass-0-1-2-3_vs_4-5-6 (0.3053) 2 10 88.36 2 18 93.9492.02 2 7 95.86
vehicle0 (0.3075) 2 8 71.44 2 20 62.4196.11 2 10 98.69
vehicle1 (0.3439) 2 18 58.43 2 24 51.8069.10 2 10 88.63
haberman (0.3556) 2 42 68.11 2 14 55.4454.05 2 7 72.19
yeast1 (0.4064) 2 0 56.06 2 10 70.0366.01 2 10 73.66
glass0 (0.4786) 2 0 74.22 2 0 75.9979.81 2 13 81.41
iris0 (0.5000) 2 18 100.00 2 18 100.0099.00 2 10 100.00
pima (0.5350) 2 0 59.65 2 8 50.0171.81 2 10 75.21
wisconsin (0.5380) 2 2 83.87 2 8 80.9495.68 2 7 98.01
glass1 (0.5405) 2 18 75.25 2 2 80.4672.32 2 17 81.09
Table 5. Paired t-test results between each method and the proposed method for AUC results.
Table 5. Paired t-test results between each method and the proposed method for AUC results.
Data SetsUnweighted ELMWeighted ELMSVM
imbalance ratio: 0, 0.2yeast-1-2-8-9_vs_7 (0.0327)0.02540.05610.0018
abalone9_18 (0.0600)0.02250.08320.0014
glass-0-1-6_vs_2 (0.0929)0.01190.01030.0006
vowel0 (0.1002)0.00100.24500.4318
yeast-0-5-6-7-9_vs_4 (0.1047)0.02180.58340.0568
page-blocks0 (0.1137)0.00000.00000.0195
yeast3 (0.1230)0.00080.03330.0001
ecoli2 (0.1806)0.00060.08390.0806
new-thyroid1 (0.1944)0.03260.20890.1312
new-thyroid2 (0.1944)0.00290.09620.2855
imbalance ratio: 0.2, 1ecoli1 (0.2947)0.00210.19620.0744
glass-0-1-2-3_vs_4-5-6 (0.3053)0.07020.43190.0424
vehicle0 (0.3075)0.00000.00010.0875
vehicle1 (0.3439)0.00000.00000.0001
haberman (0.3556)0.15670.01650.0007
yeast1 (0.4064)0.00010.06210.0003
glass0 (0.4786)0.01270.16880.7072
iris0 (0.5000)NaNNaN0.3739
pima (0.5350)0.00580.00000.0320
wisconsin (0.5380)0.00000.00020.0071
glass1 (0.5405)0.04850.86080.0293
Table 6. Friedman Aligned Ranks test (significance level of 0.05).
Table 6. Friedman Aligned Ranks test (significance level of 0.05).
Statisticp-ValueResult
29.60520.0000H0 is rejected
Ranking
AlgorithmRank
ELM21.7619
WELM38.9047
SVM41.5238
NWELM67.8095
ComparisonStatisticAdjusted p-ValueResult
NWELM vs. ELM6.11710.0000H0 is rejected
NWELM vs. WELM3.83980.0003H0 is rejected
NWELM vs. SVM3.49190.0005H0 is rejected
Table 7. Comparison of the proposed method with two ensemble-based weighted ELM methods.
Table 7. Comparison of the proposed method with two ensemble-based weighted ELM methods.
Vote-Based EnsembleDE-Based EnsembleNWELM
C G mean ( % ) C G mean ( % ) C G mean ( % )
glass1 2 30 74.32 2 18 77.72 2 17 81.77
haberman 2 12 63.10 2 28 62.68 2 7 67.34
ecoli1 2 40 89.72 2 0 91.39 2 20 92.10
new-thyroid2 2 10 99.47 2 32 99.24 2 7 100.00
yeast3 2 4 94.25 2 2 94.57 2 3 93.20
ecoli3 2 10 88.68 2 18 89.50 2 17 92.16
glass2 2 8 86.45 2 16 87.51 2 7 85.58
yeast1_7 2 20 78.95 2 38 78.94 2 6 84.66
ecoli4 2 8 96.33 2 14 96.77 2 10 98.85
abalone9_18 2 4 89.24 2 16 90.13 2 23 94.53
glass5 2 18 94.55 2 12 94.55 2 7 95.02
yeast5 2 12 94.51 2 28 94.59 2 17 98.13
Average 87.46 88.13 90.53

Share and Cite

MDPI and ACS Style

Akbulut, Y.; Şengür, A.; Guo, Y.; Smarandache, F. A Novel Neutrosophic Weighted Extreme Learning Machine for Imbalanced Data Set. Symmetry 2017, 9, 142. https://doi.org/10.3390/sym9080142

AMA Style

Akbulut Y, Şengür A, Guo Y, Smarandache F. A Novel Neutrosophic Weighted Extreme Learning Machine for Imbalanced Data Set. Symmetry. 2017; 9(8):142. https://doi.org/10.3390/sym9080142

Chicago/Turabian Style

Akbulut, Yaman, Abdulkadir Şengür, Yanhui Guo, and Florentin Smarandache. 2017. "A Novel Neutrosophic Weighted Extreme Learning Machine for Imbalanced Data Set" Symmetry 9, no. 8: 142. https://doi.org/10.3390/sym9080142

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop