计算机科学 ›› 2015, Vol. 42 ›› Issue (9): 249-252.doi: 10.11896/j.issn.1002-137X.2015.09.048
李克文,杨磊,刘文英,刘璐,刘洪太
LI Ke-wen, YANG Lei, LIU Wen-ying, LIU Lu and LIU Hong-tai
摘要: 不平衡数据的分类问题在多个应用领域中普遍存在,已成为数据挖掘和机器学习领域的研究热点。提出了一种新的不平衡数据分类方法RSBoost,以解决传统分类方法对于少数类识别率不高和分类效率低的问题。该方法采用SMOTE方法对少数类进行过采样处理,然后对整个数据集进行随机欠采样处理,以改善整个数据集的不平衡性,再将其与Boosting算法相结合来对数据进行分类。通过实验对比了5种方法在多个公共数据集上的分类效果和分类效率,结果表明该方法具有较高的分类识别率和分类效率。
[1] Batista G E A P A,Prati R C,Monard M C.A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data[J].ACM SIGKDD Explorations Newsletter,2004,6(1):20-29 [2] Gao Jia-wei,Liang Ji-ye.Research and Advancement of Classification Method of Imbalanced Data Sets [J].Computer Science,2008,5(4):10-13 [3] Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:Synthetic Minority Over-SamplingTechnique[J].Journal of Artificial Intelligence Research,2002,6(1):321-357 [4] Laurikkala J.Improving Identification of Difficult Small Classes by Balancing Class Distribution[C]∥Proceedings of the 8th Conference on AI in Medicine Europe:Artificial.2001:63-66 [5] Drummond C,Holte R C.C4.5,Class Imbalance and Cost Sensitivity:Why Under-Sampling beats Over-Sampling[C]∥Proceedings of the ICML’03 Workshop on Learning from.2003 [6] Seiffert C,Khoshgoftaar T M,Van Hulse J,et al.RUSBoost:A Hybrid Approach to Alleviating Class Imbalance[J].IEEE T ransactions on System,MAN,and Cybernetics-PART A:Systems and Humans,2010,0(1):185-197 [7] Batista G E,Prati R C,Monard M C.A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data[J].ACM SIGKDD Explorations Newsletter,2004,6(1):20-29 [8] Chawla N V,Cieslak D A,Hall L O,et al.Automatically Coun-tering Imbalance and Its Empirical Relationship to Cost[J].Data Mining and Knowledge Discovery,2008,17(2):225-252 [9] Wang C X,Pan Z M,Ma C S,et al.Classification for Imbalanced Dataset of Improved Weighted KNN Algorithm[J].Computer Engineering,2012,38(20):160-163 [10] Joshi M V,Kumar V,Agarwal R.Evaluating Boosting Algo-rithms to Classify Rare Classes:Comparison and Improvements[C]∥Proc of the 1st IEEE International Conference on Data Mining.San Jose,USA,2001:257-264 [11] Chawla N V,Lazarevic A,Hall L O,et al.Smoteboost:Improving Prediction of the Minority Class in Boosting[C]∥Proc.of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases.Dubrovnik,Croatia,2003:107-119 [12] Li X F,Li J,Dong Y F,et al.A new learning algorithm for imbalanced data-PCBoost[J].Chinese Journal of Computers,2012,35(2):202-209 [13] Hothorn T,Buehlmann P,Kneib T,et al.mboost:Model-based boosting 2.0[J].Journal of Machine Learning Research,2010(11):2109-2113 [14] Ganganwar V.An overview of classification algorithms for imbalanced datasets[J].International Journal of Emerging Technology and Advanced Engineering,2012,2(4):42-47 [15] Gao S.An ensemble classifier learning approach to ROC optimization;Pattern Recognition[C]∥18th International Conference on ICPR.2006:679-682 [16] Hand D J,TillR J.A simple generalization of the area under the ROC curve for multiple[J].Machine Learning,2001,45(2):172-186 |
No related articles found! |
|