计算机科学 ›› 2016, Vol. 43 ›› Issue (8): 190-193.doi: 10.11896/j.issn.1002-137X.2016.08.038
吴德,刘三阳,梁锦锦
WU De, LIU San-yang and LIANG Jin-jin
摘要: 传统多类文本多分类算法存在计算量大和训练时间长的问题,为此利用黄金分割(Golden Selection,GS)和支持向量域描述(Support Vector Domain Description,SVDD)对多类文本构造一种分类算法。GS-SVDD首先利用词频逆向文件频率(Term Frequency-Inverse Document Frequency,TF-IDF)公式计算词条的相对词频,根据该值将词条降序排列,并对得到的文本向量进行归一化;其次采用黄金分割法对文本向量进行维数约简,使得冗余的样本特征数不超过一个;最后根据支持向量域描述进行多类分类,判断待测文本归属相对类距离之值较小的类。不同数据集的数值实验表明,GS-SVDD比“一对一”和“一对多”支持向量机具有更好的稳定性、更高的分类精度和更短的训练时间,从而更适 用于海量文本的多分类。
[1] Sebastiani F.Machine learning in automated text categorization [J].ACM Computing Surveys,2002,34(1):1-47 [2] Su Jin-shu,Zhang Bo-feng,Xu Xin.Advances in machine lear-ning based text categorization[J].Journal of Software,2006,17(9):1848-1859(in Chinese) 苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859 [3] Dong Yue-hua,Guo Shi-chuan.Text clustering algorithm with improved weighting factor and feature vector[J].Computer Engineering and Design,2015,35(4):1051-1057(in Chinese) 董跃华,郭士串.结合权重因子与特征向量改进的文本聚类算法[J].计算机工程与设计,2015,35(4):1051-1057 [4] Zhang Pei-yun,Chen Chuan-ming,Huang Bo.Texts similarity algorithm based on subtrees matching[J].Pattern Recognition and Artificial Intelligence,2014,7(3):226-234(in Chinese) 张佩云,陈传明,黄波.基于子树匹配的文本相似度算法[J].模式识别与人工智能,2014,7(3):226-234 [5] Wan C H,Lee L H,Rajkumar R,et al.A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine[J].Expert System with Application,2012,39(15):11880-11888 [6] Arun K M,Gopal M.A comparison study on multiple binary-class SVM methods for unilabel text categorization[J].Pattern Recognition Letters,2010,31(11):1437-1444 [7] Kumar M A,Gopal M.One-against-one fuzzy support vectormachine classifier:An approach to text categorization[J].Expert System with Application,2009,36(6):10030-10034 [8] Lin Xu-dong,Liu Han-xing,Lin Pi-yuan,et al.Chinese question classification using alternating and iterative one-against-one algorithm[J].Journal of Convergence Information Technology,2010,5(3):61-67 [9] Kumar M A,Gopal M.Reduced one-against-all method for mul-ticlass SVM classification[J].Expert System with Application,2011,38(11):14238-14248 [10] Wu De,Liu San-yang.Multiple support vector domain classifier[J].Journal of Xi’an Jiaotong University,2012,46(6):87-91(in Chinese) 吴德,刘三阳.支持向量域多分类器[J].西安交通大学学报,2012,46(6):87-91 [11] Zhang Yu-fang,Wan Bin-hou,Xiong Zhong-yang.Research onfeature dimension reduction in text classification[J].Application Research of Computer,2012,29(7):2541-2543(in Chinese) 张玉芳,万斌候,熊忠阳.文本分类中的特征降维方法研究[J].计算机应用研究,2012,29(7):2541-2543 [12] Xia Shi-xiong,Li You-wen,Zhou Yong.Method based on semi-supervised local linear algorithm for text classification[J].Application Research of Computer,2010,7(1):64-67(in Chinese) 夏士雄,李佑文,周勇.一种半监督局部线性嵌入算法的文本分类方法[J].计算机应用研究,2010,7(1):64-67 [13] Li Jian-lin.A combination of feature extraction in text classification based on PCA[J].Application Research of Computer,2013,0(8):2398-2401(in Chinese) 李建林.一种基于PCA的组合特征提取文本分类方法[J].计算机应用研究,2013,0(8):2398-2401 [14] Duan Jie,Hu Qing-hua,Zhang Ling-jun,et al.Feature selection for multi-label classification based on neighborhood rough sets[J].Journal of Coumputer Research and Development,2015,2(1):56-65(in Chinese) 段洁,胡清华,张灵均,等.基于邻域粗糙集的多标记分类特征选择算法[J].计算机研究与发展,2015,2(1):56-65 [15] Song Ju-long,Qian Fu-cai.The global optimization methodbased on golden-section[J].Computer Engineering and Applications,2005,8(4):95-96(in Chinese) 宋巨龙,钱富才.基于黄金分割的全局最优化方法[J].计算机工程与应用,2005,8(4):95-96 [16] Yang Wen-chen,Zhang Lun,Rao Qian,et al.Multi-objective optimization for traffic signals with golden Ration based genetic algorithm[J].Journal of Transportation Systems Engineering and Information Technology,2013,3(5):48-55(in Chinese) 杨文臣,张轮,饶倩,等.基于黄金分割点遗传算法的交通信号多目标优化[J].交通运输系统工程与信息,2013,3(5):48-55 [17] Zhong Hua,Wang Yong,Shao Chang-xing.Golden-section adaptive control based on disturbances and model error compensations[J].Application Research of Computer,2015,2(8):2343-2346(in Chinese) 钟华,王永,邵长星.基于扰动和模型误差补偿的黄金分割自适应控制[J].计算机应用研究,2015,2(8):2343-2346 [18] Zhang Li-na,Zhou Run-jing,Na Ri-su.A method for characte-ristic extraction from large sample databased on the golden section method’s ISODATA Algorithm[J].Journal of Inner Mongolia University(Natural Science Edition),2013,4(1):93-96(in Chinese) 张丽娜,周润景,那日苏.基于黄金分割法的ISODATA算法的大样本特征数据提取方法[J].内蒙古大学学报(自然科学),2013,4(1):93-96 |
No related articles found! |
|