skip to main content
research-article

A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans

Published: 20 March 2015 Publication History

Abstract

This paper proposes a novel algorithm for subgroup discovery task based on genetic programming and fuzzy logic called Fuzzy Genetic Programming-based for Subgroup Discovery (FuGePSD). The genetic programming allows to learn compact expressions with the main objective to obtain rules for describing simple, interesting and interpretable subgroups. This algorithm incorporates specific operators in the search process to promote the diversity between the individuals. The evolutionary scheme of FuGePSD is codified through the genetic cooperative-competitive approach promoting the competition and cooperation between the individuals of the population in order to find out the optimal solutions for the SD task.FuGePSD displays its potential with high-quality results in a wide experimental study performed with respect to others evolutionary algorithms for subgroup discovery. Moreover, the quality of this proposal is applied to a case study related to acute sore throat problems.

References

[1]
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, A. Verkamo, Fast discovery of association rules, in: Advances in Knowledge Discovery and Data Mining, AAAI Press, 1996, pp. 307-328.
[2]
A. Asuncion, D.J. Newman, UCI Machine Learning Repository, 2007. <http://www.ics.uci.edu/mlearn/MLRepository.html>.
[3]
M. Atzmueller, F. Puppe, SD-Map - a fast algorithm for exhaustive subgroup discovery, in: LNCS, vol. 4213, Springer, 2006, pp. 6-17.
[4]
M. Atzmueller, F. Puppe, H.P. Buscher, Towards knowledge-intensive subgroup discovery, in: Proceedings of the Lernen, Wissensentdeckung, Adaptivität, Fachgruppe Maschinelles Lernen, 2004, pp. 111-117.
[5]
S. Bay, M. Pazzani, Detecting group differences: mining contrast sets, Data Min. Knowl. Disc., 5 (2001) 213-246.
[6]
P.S. Callery, L.A. Geelhaar, Biosynthesis of 5-aminopentanoic acid and 2-piperidone from cadaverine and 1-piperideine in the mouse, J. Neurochem., 43 (1984) 1631-1634.
[7]
C.J. Carmona, C. Chrysostomou, H. Seker, M.J. del Jesus, Fuzzy rules for describing subgroups from Influenza A virus using a multi-objective evolutionary algorithm, Appl. Soft Comput., 13 (2013) 3439-3448.
[8]
C.J. Carmona, P. González, M.J. del Jesus, F. Herrera, NMEEF-SD: non-dominated multi-objective evolutionary algorithm for extracting fuzzy rules in subgroup discovery, IEEE Trans. Fuzzy Syst., 18 (2010) 958-970.
[9]
C.J. Carmona, P. González, M.J. del Jesus, F. Herrera, Overview on evolutionary subgroup discovery: analysis of the suitability and potential of the search performed by evolutionary algorithms, WIREs Data Min. Knowl. Disc., 4 (2014) 87-103.
[10]
C.J. Carmona, P. González, M.J. del Jesus, C. Romero, S. Ventura, Evolutionary algorithms for subgroup discovery applied to e-learning data, in: Proceedings of the IEEE International Education Engineering, 2010, pp. 983-990.
[11]
C.J. Carmona, P. González, B. García-Domingo, M.J. del Jesus, J. Aguilera, MEFES: an evolutionary proposal for the detection of exceptions in subgroup discovery. An application to concentrating photovoltaic technology, Knowl.-Based Syst., 54 (2013) 73-85.
[12]
C.J. Carmona, S. Ramírez-Gallego, F. Torres, E. Bernal, M.J. del Jesus, S. García, Web usage mining to improve the design of an e-commerce website: OrOliveSur.com, Expert Syst. Appl., 39 (2012) 11243-11249.
[13]
A. Claxson, M. Grootveld, C. Chander, J. Earl, P. Haycock, M. Mantle, S.R. Williams, C.J.L. Silwood, D.R. Blake, Examination of the metabolic status of rat air pouch inflammatory exudate by high field proton NMR spectroscopy, Biochim. Biophys. Acta-Molec. Basis Dis., 1454 (1999) 57-70.
[14]
K. Deb, A. Pratap, S. Agrawal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput., 6 (2002) 182-197.
[15]
M.J. del Jesus, P. González, F. Herrera, M. Mesonero, Evolutionary fuzzy rule induction process for subgroup discovery: a case study in marketing, IEEE Trans. Fuzzy Syst., 15 (2007) 578-592.
[16]
J. Demsar, Statistical comparisons of classifiers over multiple data sets, J. Learning Res., 7 (2006) 1-30.
[17]
J. Derrac, C. Cornelis, S. García, F. Herrera, Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection, Inform. Sci., 186 (2012) 73-92.
[18]
G.Z. Dong, J.Y. Li, Mining border descriptions of emerging patterns from dataset pairs, Knowl. Inform. Syst., 8 (2005) 178-202.
[19]
A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computation, Springer, 2003.
[20]
P. Espejo, S. Ventura, F. Herrera, A survey on the application of genetic programming to classification, IEEE Trans. Syst. Man Cybernet. - Part C: Appl. Rev., 40 (2010) 121-144.
[21]
A. Fernández, S. García, J. Luengo, E. Bernadó-Mansilla, F. Herrera, Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study, IEEE Trans. Evol. Comput., 14 (2010) 913-941.
[22]
C. Ferri, J. Hernández-Orallo, R. Modroiu, An experimental comparison of performance measures for classification, Pattern Recogn. Lett., 30 (2009) 27-38.
[23]
D.B. Fogel, Evolutionary Computation - Toward a New Philosophy of Machine Intelligence, IEEE Press, 1995.
[24]
J.C. Fothergill, J.R. Guest, Catabolism of l-lysine by Pseudomonas aeruginosa, J. Gen. Microbiol., 99 (1977) 139-145.
[25]
M. Friedman, The use of ranks to avoid the assumption of normality implicit in the analysis of variance, J. Am. Stat. Assoc., 32 (1937) 675-701.
[26]
M.J. Gacto, M. Galende, R. Alcalá, F. Herrera, METSK-HDe: a multiobjective evolutionary algorithm to learn accurate TSK-fuzzy systems in high-dimensional and large-scale regression problems, Inform. Sci., 276 (2014) 63-79.
[27]
D. Gamberger, N. Lavrac, Expert-guided subgroup discovery: methodology and application, J. Artif. Intell. Res., 17 (2002) 501-527.
[28]
S. García, A. Fernández, J. Luengo, F. Herrera, Study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability, Soft Comput., 13 (2009) 959-977.
[29]
S. García, F. Herrera, An extension on "Statistical comparisons of classifiers over multiple data sets" for all pairwise comparisons, J. Machine Learn. Res., 9 (2008) 2677-2694.
[30]
D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley Longman Publishing Co., Inc., 1989.
[31]
D.P. Greene, S.F. Smith, Competition-based induction of decision models from examples, Machine Learn., 13 (1993) 229-257.
[32]
M. Grootveld, M.D. Atherton, A.N. Sheerin, J. Hawkes, D.R. Blake, T.E. Richens, C.J.L. Silwook, E. Lynch, A.W.D. Claxson, In vivo absorption, metabolism, and urinary excretion of alpha,beta-unsaturated aldehydes in experimental animals. Relevance to the development of cardiovascular diseases by the dietary ingestion of thermally stressed polyunsaturate-rich culinary oils, J. Clin. Invest., 101 (1998) 1210-1218.
[33]
M. Grootveld, A. Sheerin, M. Atherton, A.D. Millar, E.J. Lynch, D.R. Blake, D.P. Naughton, John Wiley and Sons, 1996.
[34]
F. Herrera, Genetic fuzzy systems: taxomony, current research trends and prospects, Evol. Intell., 1 (2008) 27-46.
[35]
F. Herrera, C.J. Carmona, P. González, M.J. del Jesus, An overview on subgroup discovery: foundations and applications, Knowl. Inform. Syst., 29 (2011) 495-525.
[36]
J.H. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, 1975.
[37]
J.H. Holland, J.S. Reitman, Cognitive systems based on adaptive algorithms, in: Pattern Directed Inference Systems, Academic Press, 1978, pp. 313-329.
[38]
S. Holm, A simple sequentially rejective multiple test procedure, Scand. J. Stat., 6 (1979) 65-70.
[39]
H. Ishibuchi, T. Nakashima, M. Nii, Classification and Modeling with Linguistic Information Granules: Advanced Approaches to Linguistic Data Mining, Springer, 2004.
[40]
W. Kloesgen, Explora: a multipattern and multistrategy discovery assistant, in: Advances in Knowledge Discovery and Data Mining, American Association for Artificial Intelligence, 1996, pp. 249-271.
[41]
J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection, MIT Press, 1992.
[42]
P. Kralj-Novak, N. Lavrac, G.I. Webb, Supervised descriptive rule discovery: a unifying survey of constrast set, emerging pateern and subgroup mining, J. Machine Learn. Res., 10 (2009) 377-403.
[43]
C.-S. Kuo, T.-P. Hong, C.-L. Chen, Applying genetic programming technique in classification trees, Soft Comput., 11 (2007) 1165-1172.
[44]
C.-S. Kuo, T.-P. Hong, C.-L. Chen, An improved knowledged-acquisition strategy based on genetic programming, Cybernet. Syst., 39 (2008) 672-685.
[45]
N. Lavrac, B. Cestnik, D. Gamberger, P.A. Flach, Decision support through subgroup discovery: three case studies and the lessons learned, Machine Learn., 57 (2004) 115-143.
[46]
N. Lavrac, P.A. Flach, B. Zupan, Rule evaluation measures: a unifying view, in: LNCS, vol. 1634, Springer, 1999, pp. 174-185.
[47]
A. Lemanska, M. Grootveld, C.J.L. Silwood, R.G. Brereton, Chemometric variance analysis of 1H NMR metabolomics data on the effects of oral rinse on saliva, Metabolomics, 8 (2011) 64-80.
[48]
K.S. Leung, Y. Leung, L. So, K.F. Yam, Rule learning in expert systems using genetic algorithm: 1, concepts, in: K. Jizuka (Ed.), Proc. of the 2nd International Conference on Fuzzy Logic and Neural Networks, 1992, pp. 201-204.
[49]
J.M. Luna, J.R. Romero, C. Romero, S. Ventura, On the use of genetic programming for mining comprehensible rules in subgroup discovery, IEEE Trans. Cybernet., 44 (2014) 2329-2341.
[50]
D. Martín, A. Rosete, J. Alcalá-Fdez, F. Herrera, QAR-CIP-NSGA-II: a new multi-objective evolutionary algorithm to mine quantitative association rules, Inform. Sci., 258 (2014) 1-28.
[51]
B.L. Miller, D.E. Goldberg, Genetic algorithms, tournament selection, and the effects of noise, Complex Syst., 9 (1995) 193-212.
[52]
R. Palm, H. Hellendoorn, D. Driankov, Model Based Fuzzy Control, Springer, 1997.
[53]
W. Pedrycz, Fuzzy Modelling: Paradigms and Practices, Kluwer Academic Publishers, 1996.
[54]
A.L. Bisno, M.A. Gerber, J.M. Gwaltney, E.L. Kaplan, R.H. Schwartz, Infectious Diseases Society of America, Practice guidelines for the diagnosis and management of group A streptococcal pharyngitis, Clin. Infect. Dis., 35 (2002).
[55]
H.P. Schwefel, Wiley, 1995.
[56]
D. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, Chapman and Hall/CRC, 2006.
[57]
A. Siebes, Data surveying: foundations of an inductive query language, in: Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, AAAI Press, 1995, pp. 269-274.
[58]
C.J.L. Silwood, E. Lynch, A.W.D. Claxson, M. Grootveld, 1H and 13C NMR spectroscopic analysis of human saliva, J. Dental Res., 81 (2002) 422-427.
[59]
S.F. Smith, A learning system based on genetic adaptive algorithms, Ph.D. thesis, Pittsburgh, PA, USA, 1980.
[60]
G. Venturini, SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts, in: LNAI, vol. 667, Springer, 1993, pp. 280-296.
[61]
C.H. Wang, T.P. Hong, S.S. Tseng, Integrating fuzzy knowledge by genetic algorithms, IEEE Trans. Evol. Comput., 2 (1998) 138-149.
[62]
C.H. Wang, T.P. Hong, S.S. Tseng, C.M. Liao, Automatically integrating multiple rule sets in a distributed-knowledge environment, IEEE Trans. Syst. Man Cybernet. Part C, 28 (1998) 471-476.
[63]
M.L. Wong, K.S. Leung, Data Mining using Grammar Based Genetic Programming and Applications, Kluwer Academics Publishers, 2000.
[64]
K. Wongravee, G.R. Lloyd, C.J.L. Silwood, M. Grootveld, R.G. Brereton, Supervised self organizing maps (SOMs) for classification and variable selection: illustrated by application to NMR metabolomic profiling, Anal. Chem., 82 (2010) 628-638.
[65]
G. Worrall, There is a Lot of it About: Acute Respiratory Infection in Primary Care, Abingdon Engl: Radcliffe Publishing Ltd, 2006.
[66]
S. Wrobel, An algorithm for multi-relational discovery of subgroups, in: LNAI, vol. 1263, Springer, 1997, pp. 78-87.
[67]
S. Wrobel, Inductive Logic Programming for Knowledge Discovery in Databases, Springer, 2001.
[68]
L.A. Zadeh, The concept of a linguistic variable and its applications to approximate reasoning. Parts I, II, III, Inform. Sci., 8-9 (1975) 199-249.

Cited By

View all
  1. A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Information Sciences: an International Journal
    Information Sciences: an International Journal  Volume 298, Issue C
    March 2015
    567 pages

    Publisher

    Elsevier Science Inc.

    United States

    Publication History

    Published: 20 March 2015

    Author Tags

    1. Bioinformatics
    2. Evolutionary fuzzy system
    3. Genetic programming
    4. Subgroup discovery

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 29 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media