fuzzy-rough-learn 0.1: A Python Library
for Machine Learning with Fuzzy Rough
Sets
Oliver Urs Lenz1(B) , Daniel Peralta1,2 , and Chris Cornelis1
1
2
Department of Applied Mathematics, Computer Science and Statistics,
Ghent University, Ghent, Belgium
{oliver.lenz,chris.cornelis}@ugent.be
Data Mining and Modelling for Biomedicine Group, VIB Center for Inflammation
Research, Ghent University, Ghent, Belgium
[email protected]
http://www.cwi.ugent.be, https://www.irc.ugent.be
Abstract. We present fuzzy-rough-learn, the first Python library of
fuzzy rough set machine learning algorithms. It contains three algorithms
previously implemented in R and Java, as well as two new algorithms
from the recent literature. We briefly discuss the use cases of fuzzy-roughlearn and the design philosophy guiding its development, before providing an overview of the included algorithms and their parameters.
Keywords: Fuzzy rough sets · OWA operators
Python package · Open-source software
1
· Machine learning ·
Background
Since its conception in 1990, fuzzy rough set theory [2] has been applied as
part of a growing number of machine learning algorithms [17]. Simultaneously,
the distribution and communication of machine learning algorithms has spread
beyond academic literature to a multitude of publicly available software implementations [7,10,19]. And also during the same period, Python has grown from
its first release in 1991 [13] to become one of the world’s most popular high-level
programming languages.
Python has become especially popular in the field of data science, in part
due to the self-reinforcing growth of its package ecosystem. This includes scikitlearn [11], which is currently the go-to general purpose Python machine learning
library, and which contains a large collection of algorithms.
Only a limited number of fuzzy rough set machine learning algorithms have
received publicly available software implementations. Variants of Fuzzy Rough
Nearest Neighbours (FRNN) [5], Fuzzy Rough Rule Induction [6], Fuzzy Rough
Feature Selection (FRFS) [1] and Fuzzy Rough Prototype Selection (FRPS)
[14,15] are included in the R package RoughSets [12], and have also been released
for use with the Java machine learning software suite WEKA [3,4].
c Springer Nature Switzerland AG 2020
R. Bello et al. (Eds.): IJCRS 2020, LNAI 12179, pp. 491–499, 2020.
https://doi.org/10.1007/978-3-030-52705-1_36
492
O. U. Lenz et al.
So far, none of these algorithms seem to have been made available for Python
in a systematic way. In this paper, we present an initial version of fuzzy-roughlearn, a Python library that fills this gap. At present, it includes FRNN, FRFS,
FRPS, as well as FROVOCO [18] and FRONEC [16], two more recent algorithms
designed for imbalanced and multilabel classification. These implementations all
make use of a significant modification of classical fuzzy rough set theory: the
incorporation of Ordered Weighted Averaging (OWA) operators in the calculation of upper and lower approximations for increased robustness [1].
We discuss the use cases and design philosophy of fuzzy-rough-learn in Sect. 2,
and provide an overview of the included algorithms in Sect. 3.
2
Use Cases and Design Philosophy
The primary goal of fuzzy-rough-learn is to provide implementations of fuzzy
rough set algorithms. The target audience is researchers with some programming
skills, in particular those who are familiar with scikit-learn. We envision two
principal use cases:
– The application of fuzzy rough set algorithms to solve concrete machine learning problems.
– The creation of new or modified fuzzy rough set algorithms to handle new
types of data or to achieve better performance.
A third use case falls somewhat in between these two: reproducing or benchmarking against results from existing fuzzy rough set algorithms.
To facilitate the first use case, fuzzy-rough-learn is available from the two
main Python package repositories, pipy and conda-forge, making it easy to install
with both pip and conda. fuzzy-rough-learn has an integrated test suite to limit
the opportunities for bugs to be introduced. API documentation is integrated in
the code and automatically updated online1 whenever a new version is released,
and includes references to the literature.
We believe that it is important to make fuzzy rough set algorithms available
not just for use, but also for adaptation, since it is impossible to predict or
accommodate all requirements of future researchers. Therefore, the source code
for fuzzy-rough-learn is hosted on GitHub2 and freely available under the MIT
license. We have attempted to write accessible code, by striving for consistency
and modularity. The coding style of fuzzy-rough-learn is a compromise between
object-oriented and functional programming. It makes use of classes to model the
different components of the classification algorithms, but as a rule, functions and
methods have no side-effects. Finally, subject to these design principles, fuzzyrough-learn generally follows the conventions of scikit-learn and the terminology
of the cited literature.
1
2
https://fuzzy-rough-learn.readthedocs.io.
https://github.com/oulenz/fuzzy-rough-learn.
fuzzy-rough-learn 0.1
3
493
Contents
fuzzy-rough-learn implements three of the fuzzy rough set algorithms mentioned
in Sect. 1: FRFS, FRPS and FRNN, making them available in Python for the
first time. In addition, we have included two recent, more specialised classifiers:
the ensemble classifier FROVOCO, designed to handle imbalanced data, and the
multi-label classifier FRONEC.
Together, these five algorithms form a representative cross-section of fuzzy
rough set algorithms in the literature. In the future, we intend to build upon
this basis by adding more algorithms (Table 1).
3.1
Fuzzy Rough Feature Selection (FRFS)
Fuzzy Rough Feature Selection (FRFS) [1] greedily selects features that induce
the greatest increase in the size of the positive region, until it matches the size
of the positive region with all features, or until the required number of features
is selected.
The positive region is defined as the union of the lower approximations of
the decision classes in X. Its size is the sum of its membership values.
The similarity relation RB for a given subset of attributes B is obtained by
aggregating with a t-norm the per-attribute similarities Ra associated with the
attributes a in B. These are in turn defined, for any x, y ∈ X, as the complement
of the difference between the attribute values xa and ya after rescaling by the
sample standard deviation σa (1).
Ra (x, y) = max(1 −
|xa − ya |
, 0)
σa
Table 1. Parameters of FRFS in fuzzy-rough-learn
Name
Default value
Description
n features
None
Number of features to
select. If None, will
continue to add features
until positive region size
becomes maximal
owa weights deltaquadsigmoid (0.2, 1) OWA weights to use for
calculation of soft
minimum in lower
approximations
t norm
‘lukasiewicz’
T-norm used to aggregate
the similarity relation R
from per-attribute
similarities
(1)
494
3.2
O. U. Lenz et al.
Fuzzy Rough Prototype Selection (FRPS)
Fuzzy Rough Prototype Selection (FRPS) [14,15] uses upper and/or lower
approximation membership as a quality measure to select instances. It follows
the following steps:
1. Calculate the quality of each training instance. The resulting values are the
potential thresholds for selecting instances (Table 2).
2. For each potential threshold and corresponding candidate instance set, count
the number of instances in the overall dataset that have the same decision
class as their nearest neighbour within the candidate instance set (excluding
itself).
3. Return the candidate instance set with the highest number of matches. In
case of a tie, return the largest such set.
There are a number of differences between the implementations in [15] and
[14]. In each case, the present implementation follows [14]:
– While [15] uses instances of all decision classes to calculate upper and lower
approximations, [14] calculates the upper approximation membership of an
instance using only instances of the same decision class, and its lower approximation membership using only instances of the other decision classes. This
choice affects over what length the weight vector is ‘stretched’.
Table 2. Parameters of FRPS in fuzzy-rough-learn
Name
Default value Description
quality measure ‘lower’
Quality measure to use for
calculating thresholds.
Either the upper
approximation of the
decision class of each
attribute, the lower
approximation, or the
mean value of both
aggr R
np.mean
Function used to
aggregate the similarity
relation R from
per-attribute similarities
owa weights
invadd()
OWA weights to use for
calculation of soft
maximum and/or
minimum in quality
measure
nn search
KDTree()
Nearest neighbour search
algorithm to use
fuzzy-rough-learn 0.1
495
– In addition, [14] excludes each instance from the calculation of its own upper
approximation membership, while [15] does not.
– [15] uses additive weights, while [14] uses inverse additive weights.
– [15] defines the similarity relation R by aggregating the per-attribute simi ukasiewicz t-norm, whereas [14] recommends using the
larities Ra using the L
mean.
– In case of a tie between several best-scoring candidate prototype sets, [15]
returns the set corresponding to the median of the corresponding thresholds,
while [14] returns the largest set (corresponding to the smallest threshold).
In addition, there are two implementation issues not addressed in [15] or [14]:
– It is unclear what metric the nearest neighbour search should use. It seems
reasonable that it should either correspond to the similarity relation R (and
therefore incorporate the same aggregation strategy from per-attribute similarities), or that it should match whatever metric is used by nearest neighbour
classification subsequent to FRPS. By default, the present implementation
uses Manhattan distance on the scaled attribute values.
– When the largest quality measure value corresponds to a singleton candidate
instance set, it cannot be evaluated (because the single instance in that set
has no nearest neighbour). Since this is an edge case that would not score
highly anyway, it is simply excluded from consideration.
3.3
Fuzzy Rough Nearest Neighbour (FRNN) Multiclass
Classification
Fuzzy Rough Nearest Neighbours (FRNN) [5] provides a straightforward way
to apply fuzzy rough sets for classification. Given a new instance y, we obtain
class scores by calculating the membership degree of y in the upper and lower
approximations of each decision class and taking the mean. This implementation
uses OWA weights, but limits their application to the k nearest neighbours of
each class, as suggested by [8] (Table 3).
3.4
Fuzzy Rough OVO Combination (FROVOCO) Multiclass
Classification
Fuzzy Rough OVO COmbination (FROVOCO) [18] is an ensemble classifier
specifically designed for, but not restricted to, imbalanced data, which adapts
itself to the Imbalance Ratio (IR) between classes. It balances one-versus-one
decomposition with two global class afinity measures (Table 4).
In a binary classification setting, the lower approximation of one class corresponds to the upper approximation of the other class, so when using OWA
weights, the effective number of weight vectors to be chosen is 2. FROVOCO
uses the IR-weighting scheme, which depends on the IR between the classes. If
the IR is less than 9, both classes are approximated with exponential weights. If
the IR is 9 or more, the smaller class is approximated with exponential weights,
496
O. U. Lenz et al.
Table 3. Parameters of FRNN in fuzzy-rough-learn
Name
Default value Description
upper weights additive()
OWA weights to use in
calculation of upper
approximation of decision
classes
upper k
Effective length of upper
weights vector (number of
nearest neighbours to
consider)
20
lower weights additive()
OWA weights to use in
calculation of lower
approximation of decision
classes
lower k
20
Effective length of lower
weights vector (number of
nearest neighbours to
consider)
nn search
KDTree()
Nearest neighbour search
algorithm to use
while the larger class is approximated with a reduced additive weight vector of
effective length k equal to 10% of the number of instances.
Provided with a training set X, and a new instance y, FROVOCO calculates
the class score of y for a class C from the following components:
V (C, y) weighted vote For each other class C ′ = C, calculate the upper approximation memberships of y in C and C ′ , using the IR-weighting scheme. Rescale
each pair of values so they sum to 1, then sum the resulting scores.
mem(C, y) positive affinity Calculate the average of the membership degrees
of y in the upper and lower approximations of C, using the IR-weighting scheme.
msen (C, y) negative affinity For each class C ′ , calculate the average positive
affinity of the members of C in C ′ . Combine these average values to obtain the
signature vector SC . Calculate the mean squared error of the positive affinities
of y for each class and SC , and divide it by the sum of the mean squared errors
for all classes.
Table 4. Parameters of FROVOCO in fuzzy-rough-learn
Name
Default value Description
nn search KDTree()
Nearest neighbour search algorithm to use
fuzzy-rough-learn 0.1
497
The final class score is calculated from these components in (2).
AV (C, y) =
3.5
1
V (C, y) + mem(C, y)
− msen (C, y).
2
m
(2)
Fuzzy Rough Neighbourhood Consensus (FRONEC) Multilabel
Classification
Fuzzy Rough Neighbourhood Consensus (FRONEC) [16] is a multilabel classifier. It combines the instance similarity R, based on the instance attributes, with
label similarity Rd , based on the label sets of instances. It offers two possible
(1)
definitions for Rd . The first, Rd , is simply Hamming similarity scaled to [0, 1].
(2)
The second label similarity, Rd , takes into account the prior probability pl of
a label l in the training set. Let L the set of possible labels, and L1 , L2 two
(2)
particular label sets. Then Rd is defined as follows (Table 5):
a=
(1 − pl )
l∈L1 ∩L2
b=
pl
l∈L\(L1 ∪L2 )
(2)
Rd =
a+b
a + b + 12 |L1 ∆L2 |
Table 5. Parameters of FRONEC in fuzzy-rough-learn
Name
Default value Description
Q type
2
Quality measure to use for
identifying most relevant
instances: based on lower
(1), upper (2) or both
approximations (3)
R d type
1
Label similarity relation
to use: Hamming
similarity (1) or based on
prior probabilities (2)
k
20
Number of neighbours to
consider for
neighbourhood consensus
weights
additive()
OWA weights to use for
calculation of soft
maximum and/or
minimum
nn search KDTree()
Nearest neighbour search
algorithm to use
(3)
498
O. U. Lenz et al.
Provided with a training set X, and a new instance y, FRONEC predicts the
label set of y by identifying the training instance with the highest ‘quality’ in
relation to y. There are three possible quality measures, based on the upper and
lower approximations.
Q1 (y, x) = OW Awl ({I(R(z, y), Rd (x, z))|z ∈ N (y)})
Q2 (y, x) = OW Awu ({T (R(z, y), Rd (x, z))|z ∈ N (y)})
Q1 (y, x) + Q2 (y, x)
Q3 (y, x) =
2
(4)
Where Rd is a choice of label similarity, T the L
ukasiewicz t-norm, I the
L
ukasiewicz implication, and N (y) the k nearest neighbours of y in X, for a
choice of k.
For a choice of quality measure Q, FRONEC predicts the labels of the training
instance with the highest quality. If there are several such training instances, it
predicts all labels that appear with at least half.
3.6
OWA Operators and Nearest Neighbour Searches
Each of the algorithms in fuzzy-rough-learn uses OWA operators [20] to calculate
upper and lower approximations. OWA operators take the weighted average of
an ordered collection of real values. By choosing suitably skewed weight vectors,
OWA operators can thus act as soft maxima and minima. The advantage of
defining upper and lower approximations with soft rather than strict maxima and
minima is that the result is more robust, since it no longer depends completely
on a single value.
To allow experimentation with other weights, we have included a range of predefined weight types, as well as a general OWAOperator class that can be extended
and instantiated by users and passed as a parameter to the various classes.
Similarly, users may customise the nearest neighbour search algorithm that
is used in all classes except FRFS by defining their own subclass of NNSearch.
For example, by choosing an approximative nearest neighbour search like Hierarchical Navigable Small World [9], we obtain Approximate FRNN [8].
Acknowledgement. The research reported in this paper was conducted with the
financial support of the Odysseus programme of the Research Foundation – Flanders
(FWO). D. Peralta is a Postdoctoral Fellow of the Research Foundation – Flanders
(FWO, 170303/12X1619N).
References
1. Cornelis, C., Verbiest, N., Jensen, R.: Ordered weighted average based fuzzy rough
sets. In: Yu, J., Greco, S., Lingras, P., Wang, G., Skowron, A. (eds.) RSKT 2010.
LNCS (LNAI), vol. 6401, pp. 78–85. Springer, Heidelberg (2010). https://doi.org/
10.1007/978-3-642-16248-0 16
2. Dubois, D., Prade, H.: Rough fuzzy sets and fuzzy rough sets. Int. J. General Syst.
17(2–3), 191–209 (1990)
fuzzy-rough-learn 0.1
499
3. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The
WEKA data mining software: an update. ACM SIGKDD Explor. Newslett. 11(1),
10–18 (2009)
4. Jensen, R.: Fuzzy-rough data mining with Weka (2010). http://users.aber.ac.uk/
rkj/Weka.pdf
5. Jensen, R., Cornelis, C.: A new approach to fuzzy-rough nearest neighbour classification. In: Chan, C.-C., Grzymala-Busse, J.W., Ziarko, W.P. (eds.) RSCTC 2008.
LNCS (LNAI), vol. 5306, pp. 310–319. Springer, Heidelberg (2008). https://doi.
org/10.1007/978-3-540-88425-5 32
6. Jensen, R., Cornelis, C., Shen, Q.: Hybrid fuzzy-rough rule induction and feature
selection. In: Proceedings of the 2009 IEEE International Conference on Fuzzy
Systems, pp. 1151–1156. IEEE (2009)
7. Jović, A., Brkić, K., Bogunović, N.: An overview of free software tools for general
data mining. In: Proceedings of the 37th International Convention on Information
and Communication Technology, Electronics and Microelectronics (MIPRO 2014),
pp. 1112–1117. IEEE (2014)
8. Lenz, O.U., Peralta, D., Cornelis, C.: Scalable approximate FRNN-OWA classification. IEEE Transactions on Fuzzy Systems (to be published). https://doi.org/
10.1109/TFUZZ.2019.2949769
9. Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor
search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal.
Mach. Intell. 42(4), 824–836 (2020)
10. Nguyen, G., et al.: Machine learning and deep learning frameworks and libraries
for large-scale data mining: a survey. Artif. Intell. Rev. 52(1), 77–124 (2019)
11. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn.
Res. 12(85), 2825–2830 (2011)
12. Riza, L.S., et al.: Implementing algorithms of rough set theory and fuzzy rough set
theory in the R package “RoughSets”. Inf. Sci. 287, 68–89 (2014)
13. van Rossum, G., de Boer, J.: Interactively testing remote servers using the Python
programming language. CWI Q. 4(4), 283–303 (1991)
14. Verbiest, N.: Fuzzy rough and evolutionary approaches to instance selection. Ph.D.
thesis, Ghent University (2014)
15. Verbiest, N., Cornelis, C., Herrera, F.: OWA-FRPS: a prototype selection method
based on ordered weighted average fuzzy rough set theory. In: Ciucci, D., Inuiguchi,
D., Wang, G. (eds.) RSFDGrC 2013. LNCS (LNAI), vol.
M., Yao, Y., Ślezak,
֒
8170, pp. 180–190. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3642-41218-9 19
16. Vluymans, S., Cornelis, C., Herrera, F., Saeys, Y.: Multi-label classification using
a fuzzy rough neighborhood consensus. Inf. Sci. 433, 96–114 (2018)
17. Vluymans, S., D’eer, L., Saeys, Y., Cornelis, C.: Applications of fuzzy rough set
theory in machine learning: a survey. Fundamenta Informaticae 142(1–4), 53–86
(2015)
18. Vluymans, S., Fernández, A., Saeys, Y., Cornelis, C., Herrera, F.: Dynamic affinitybased classification of multi-class imbalanced data with one-versus-one decomposition: a fuzzy rough set approach. Knowl. Inf. Syst. 56(1), 55–84 (2017). https://
doi.org/10.1007/s10115-017-1126-1
19. Wang, Z., Liu, K., Li, J., Zhu, Y., Zhang, Y.: Various frameworks and libraries
of machine learning and deep learning: a survey. Archives Comput. Methods Eng.
1–24 (2019). https://doi.org/10.1007/s11831-018-09312-w
20. Yager, R.R.: On ordered weighted averaging aggregation operators in multicriteria
decisionmaking. IEEE Trans. Syst. Man Cybern. 18(1), 183–190 (1988)