Next Article in Journal
Grasp Pattern Recognition Using Surface Electromyography Signals and Bayesian-Optimized Support Vector Machines for Low-Cost Hand Prostheses
Previous Article in Journal
Streamlining Visual UI Design: Mining UI Design Patterns for Top App Bars
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Differentiation of Extra Virgin Olive Oil from Other Olive Oil Categories Based on FTIR Spectroscopy and Random Forest

by
Chrysavgi Gardeli
1,*,
Stavroula Sykioti
2,
George Exarchos
3,
Maria Koliatsou
3,
Periklis Andritsos
4 and
Efstathios Z. Panagou
2,*
1
Laboratory of Food Chemistry and Analysis, Department of Food Science and Human Nutrition, School of Food and Nutritional Sciences, Agricultural University of Athens, Iera Odos 75, 11855 Athens, Greece
2
Laboratory of Microbiology and Biotechnology of Foods, Department of Food Science and Human Nutrition, School of Food and Nutritional Sciences, Agricultural University of Athens, Iera Odos 75, 11855 Athens, Greece
3
Department of Chemical Analyses, Interagency Unit for Market Control, Ministry of Development, Kaningos Square, 10181 Athens, Greece
4
Faculty of Information, University of Toronto, Toronto, ON M5S 3G6, Canada
*
Authors to whom correspondence should be addressed.
Submission received: 18 December 2024 / Revised: 15 January 2025 / Accepted: 20 January 2025 / Published: 22 January 2025
(This article belongs to the Section Food Science and Technology)

Abstract

:
The great interest in the rapid and reliable differentiation of extra virgin olive oil from other olive oil categories is directly related to its unique sensory characteristics and high market prices. The aim of the present study was to investigate the potential of FTIR as a rapid and non-invasive technique to discriminate extra virgin olive oil (EVOO) from other olive oil categories (virgin olive oil, ordinary, and lampante) based on the acquired spectral profile of olive oil. Spectral data were collected, pre-processed, and correlated by Random Forest (RF) analysis with the sensory category (EVOO vs. other) of olive oil samples, as defined by sensory analysis undertaken previously by trained panelists. The results showed that the application of Savitzky–Golay (S-G) smoothing with a second derivative (d = 2), second- and third-order polynomial (p = 2, p = 3), and window size (w) of 12 and 13 points achieved the highest accuracy (0.91) between the two classes of samples. Characteristic spectral bands of triacylglycerols related to the carbonyl groups present in triacylglycerols (C=O) located near 1744 cm−1 (specific features: 1739, 1748, and 1751 cm−1), the fingerprinting area 1250–1000 cm−1 (specific features: 1088, 1094, 1116, 1123, 1124, 1158, 1162, 1236, 1240, and 1247 cm−1), which correspond to CH bending, and 1680 cm−1, which is associated with unsaturated aldehydes were observed to constitute the main basis of the discrimination of EVOO from the “other” class. The ability of the model to achieve high classification accuracy demonstrates the robustness of the FTIR spectral data combined with advanced machine learning techniques. Due to the lower cost and more rapid analysis time afforded by FTIR, this method provides promising perspectives for industrial olive oil classification.

1. Introduction

Olive oil is one of the most significant nutritional trends worldwide, with significant market volumes. Approximately 70% of the world’s olive crop is produced in the European Union (EU), making it a leader in the field [1]. The Mediterranean region produces the majority of olive oil, which is important to the agricultural economy in the area. Approximately 3 million tons of olive oil is produced annually worldwide [2]. The EU produces over 2 million tons of this, mostly in Spain, Italy, Greece, Portugal, and Croatia. Greece, a leading country in olive oil production, has a long tradition in the cultivation of olive trees and the production of high-quality olive oil, which is appreciated in the international market. The olive oil sector is of paramount importance to the agro-food industry because of its job generation and production volume, commercialization, and export potential [3].
Extra virgin olive oil (EVOO) is the premium olive oil category among those intended for human consumption. It is extracted exclusively from fresh and healthy olive drupes (Olea europea L.) using mechanical and physical processes without additional refining [4]. It has no organoleptic defects, it is fruity, and its acidity level, expressed as oleic acid, must not exceed 0.8% (w/w). Virgin olive oil may have some sensory defects, that is, the median of the defects is above 0.0 but not more than 3.5, the median of the fruity attribute is above 0.0, and the free acidity does not exceed 2% (w/w). Ordinary virgin olive oil has a free acidity of not more than 3.3% (w/w), and the median of the defects is above 3.5 but not more than 6.0, or the median of the defects is not more than 3.5, and the median of the fruity attribute is 0.0. This designation may only be sold directly to consumers if permitted in a retail sale country. Lampante olive oil is the lowest-quality virgin olive oil, with an acidity of more than 3.3% (w/w), with no fruity characteristics, and the median of the defects is above 6.0. Lampante olive oil is not intended for market distribution unless it is refined or used for other industrial purposes [5].
Although consumer preference for EVOO is clearly explained by its high nutritional value and unique sensory characteristics, low olive oil productivity, particularly in recent years, has led to high market prices. Indicative of the situation are the data of olive oil prices found on the International Olive Council (IOC) website [6], where it can be noted that there was an average increase of 300% in EVOO prices from January 2020 to May 2024. Economic data were based on three regions, namely, Bari in Italy, Chania in Greece, and Jaén in Spain, as these regions are considered the most representative olive oil markets of the European Union that also impact other producing countries and exports of the product.
Comprehensive official quality control [4] of olive oil requires both diverse analytical determinations and sensory evaluation as follows: free acidity, peroxide value, fatty acid ethyl esters (only for EVOO), absorbency in ultraviolet, and organoleptic evaluation. Most of the official analytical methods for olive oil quality are time-consuming, require the use of organic solvents, generate wastes, and their accuracy is strongly dependent on reproducing the operating instructions of the standardized procedure very accurately [7]. In addition, organoleptic assessment requires a recognized panel of well-trained tasters, which presupposes the establishment of procedures for the selection, training, and monitoring of individuals by a panel leader, as stipulated by the respective standard of the IOC for the selection, training, and quality control of virgin olive oil tasters [8,9]. However, it should be noted that organoleptic assessment is based on subjective perceptions of and variability in responses over time, even for well-trained and experienced panel members, resulting in potential discrepancies in the classification of olive oil and disagreement between different sensory groups. In addition, the members of a taste panel can only evaluate a limited number of olive oil samples every day due to sensory fatigue, rendering the method unsuitable for on-line evaluation during the extraction process and the distribution and marketing of olive oil. For these reasons, recent analytical advances are needed to overcome the drawbacks and limitations of the official methods to evaluate olive oil quality and to rapidly differentiate EVOO from other cheaper olive oil categories. Thus, there is a need for analytical methodologies that are more robust, efficient, sensitive, user-friendly, easy to implement by non-specialized personnel, rapid, and cost-effective to guarantee the quality, authenticity, and geographic and varietal origin traceability based on recent technological progress in the analytical field.
Among the most recent and effective tools to ensure the authenticity and traceability of olive oils and their declared blends with other vegetable oils is the use of nuclear magnetic resonance (1H NMR) fingerprinting in combination with chemometrics [10]. The discrimination between EVOOs and virgin olive oils (VOOs) according to their geographical origin was also achieved using flash gas chromatography (FGC) for volatile compound analysis combined with untargeted chemometric data analysis, Partial Least Squares–Discriminant Analysis (PLS-DA), and Artificial Neural Networks (ANNs) [11]. The same technique has proven to be a reliable approach for the rapid screening of quality grades [EVOO, VOO, and lampante olive oil (LOO)] and a valid solution for supporting sensory panels, reducing the number of samples that must be assessed by panels or at least prioritizing their assessment [12]. In the quest for rapid analytical techniques that mimic the human olfactory system and provide an overall assessment of the volatile fingerprint of foods, the electronic nose (E-nose) has recently been explored as a complementary tool for organoleptic analysis for olive oil classification [13]. In this study, the authors investigated the potential of an electronic olfactory system (EOS) to classify virgin olive oil into quality classes, as previously defined by three sensory panels recognized by the IOC, and concluded that EOS could effectively face inconsistent classifications of olive oil samples derived from inherent human factor variability.
In the food industry, near-infrared (NIR) and mid-infrared (MIR) spectroscopy have become quick and effective tools for tasks such as process monitoring, quality control, and shelf-life estimation [14]. They can be used directly and simply in olive oils, and their application could be a useful tool for the industry and producers to guarantee the quality and authenticity of olive oil [15,16]. Fourier transform infrared spectroscopy (FTIR) has been used to detect adulteration of olive oil with other edible oils, such as hazelnut oil, canola oil, and safflower oil, by focusing on the entire range of the FTIR spectrum (3.100–650 cm−1) [17]. Examination of specific regions of the mid-infrared spectrum (3.600–2.500 cm−1, 1.900–1.600 cm−1, and 1.500–900 cm−1) can differentiate edible and non-edible oils [18]. Moreover, the quality defects of olive oil (fusty, winey, musty, and rancid), previously assessed by a trained sensory panel, were investigated using mid-infrared spectra (4000–600 cm−1) in tandem with data analytics (PLS-DA) [19]. The authors reported that the collected spectra could effectively identify the musty defect in the samples with a predictive ability of 87% and the winey, fusty, and rancid defects with correct classification rate of approximately 77%, providing promising perspectives for the advancement of the instrumental determination of sensory defects in olive oil that are perceived today only by a human taste panel. The same authors in another work explored the combined use of three instrumental techniques, namely, headspace–mass spectrometry (HS-MS), mid-infrared spectroscopy (MIR), and UV–visible spectroscopy (UV–vis), to detect the presence of the same defects in virgin olive oil that was previously organoleptically characterized by an accredited taste panel [20]. The authors employed different data fusion approaches (low- and mid-level fusion) combined with a PLS-DA model development and reported that low-level data fusion from the three instruments was most suitable for detecting musty, winey, and fusty defects, whereas the rancid defect was identified with a combination of headspace–mass spectrometry (HS-MS) and MIR spectroscopy techniques. However, FTIR spectroscopy has also been used for the quality assessment of other foodstuffs, in addition to olive oil. One of these is wine, where FTIR spectra have been used to determine phenols and quantify ellagitannin, which affects the organoleptic characteristics of wine during aging [21]. In addition, FTIR spectroscopy was also applied to the classification of milk and specifically to the differentiation of goat and sheep milk, where the fingerprint region 1.840–950 cm−1 was examined, as this is the region where absorption by carbonyl groups is located [22].
The aim of the present study was to investigate the potential of FTIR as a rapid and non-invasive technique to discriminate extra virgin olive oil (EVOO) from other olive oil categories (virgin olive oil, ordinary, and lampante) based on the acquired spectral profile of olive oil. For this purpose, spectral data were collected, pre-processed, and correlated by Random Forest (RF) analysis with the sensory category (EVOO vs. other) of olive oil samples, as defined by sensory analysis undertaken previously by trained panelists.

2. Materials and Methods

2.1. Olive Oil Samples

Sixty-four samples were selected during the harvesting periods 2021–2022 and 2022–2023. The olive oil samples were provided by the IOC, the Department of Chemical Analyses, Interagency Unit for Market Control, Ministry of Development (MD), and two olive oil industries located in central and northern Greece, respectively. Using codes O1 and O2, commercial brand samples were anonymized to prevent company names from being disclosed. All samples were received in sealed amber glass containers, with labels indicating the quality category and defects assigned by experienced panelists. The samples were stored at −20 °C under a nitrogen atmosphere until analysis. Among the samples, 38 (58.5%) belonged to the EVOO category, whereas the remaining 27 (41.5%) belonged to the virgin, ordinary, and lampante categories, and for the purpose of this work, they were attributed to a single class denoted as “other”. Table 1 summarizes the origins of the olive oil samples used to build the model.
An additional set of 16 samples, obtained from the IOC and the industry (O2), was used for the external validation of the developed RF model (Table 2). This external validation dataset, composed of diverse samples not utilized during model training, was specifically included to assess the generalization capabilities and robustness of the model in classifying olive oil categories. Evaluating a model’s performance on previously unseen data ensures that the predictions are not only reflective of the training set but also applicable to broader, real-world scenarios.

2.2. Spectra Acquisition and Pre-Processing

The spectra of the olive oil samples were recorded in triplicate (three different subsamples) using a 6200 JASCO spectrometer (Jasco Corp., Tokyo, Japan) equipped with a ZnSe 45° Horizontal Attenuated Total Reflectance (HATR) crystal plate (PIKE Technologies, Madison, WI, USA) with a refractive index of 2.4 and a penetration depth of 2.0 μm at 1000 cm−1. The Spectra Manager™ Code of Federal Regulations (CFR) software ver. 2 (Jasco Corp., Tokyo, Japan) was used for spectral measurements in the wavenumber range of 4000–400 cm−1 by accumulating 100 scans with a resolution of 4 cm−1 and a total integration time of 2 min. Between each sampling, the crystal was cleaned with detergent, distilled water, and acetone and dried with lint-free tissue. For every three measurements, a background spectrum was recorded using only the crystal without the sample.
For the measurement, 0.8 mL of olive oil was deposited on the surface of the crystal. Overall, 192 spectra (64 olive oil samples × 3 replications) were acquired and used in model development, following a 70:30 dataset partition, representing the calibration (134 spectra) and testing (58 spectra) datasets. For external validation, 48 spectra (16 olive oil samples × 3 replications) were acquired from additional olive oil samples to test the accuracy of the classifier to categorize unknown samples. The acquired spectra were subjected to baseline correction and smoothing using the Savitzky–Golay (S-G) second derivative (d = 2) filtering algorithm to reduce noise. To explore the impact of smoothing on classification performance, second- and third-order polynomials (p = 2, p = 3) and different window sizes (w = 11, w = 12, and w = 13) were employed to correct baseline shifts and enhance spectral resolution.

2.3. Feature Selection and Classification Using Random Forest

RF is an ensemble machine learning algorithm that is particularly effective for classification and regression tasks [23]. It consists of multiple decision trees constructed during the training phase, and the final output is determined by aggregating the predictions of the individual trees, often using majority voting in classification scenarios. The primary strength of RF lies in its ability to handle high-dimensional data, mitigate overfitting, and identify the most relevant features by calculating feature importance scores based on their contribution to reducing impurity in the decision nodes. In this study, RF was employed both as a feature selection tool and classifier. First, the RF algorithm was used to identify the most critical features from the FTIR spectra that contributed to differentiating between EVOO and other olive oil categories. This feature selection step helped reduce the dimensionality of the dataset and enhanced the efficiency and accuracy of the subsequent classification task. The reduced set of important features was then utilized by the RF model to classify the olive oil samples based on their spectral data, with a focus on understanding the effects of pre-processing techniques, such as S-G smoothing, on model performance. The performance of the models was assessed by the calculation of recall, precision, accuracy, and F1 score indices. Recall indicates the capacity of the model to accurately identify actual observations which are predicted correctly, i.e., how many observations of positive class (EVOO in our case) are actually predicted as positive. Precision is defined as the ratio of the total number of correctly classified positive classes divided by the total number of predicted positive classes. It is a useful metric in cases where false positives (i.e., olive samples from the “other” class are classified as “EVOO”) are of higher concern than false negatives. Accuracy indicates how often the classifier makes the correct prediction, and it is the ratio between the number of correct predictions over the total number of predictions. F1 score is a machine learning evaluation metric that takes values between 0 and 1, and it is the harmonic mean of precision and recall. The higher the F1 score, the better the predictive performance of the model. Finally, the accuracy index evaluates the overall effectiveness of the classifier [24,25].
In this study, the RF technique, implemented in Python version 3.12, was used as a supervised machine learning algorithm for both feature selection and classification. RF operates by constructing multiple decision trees during training and aggregating their outputs to enhance predictive accuracy and reduce overfitting. The algorithm excels in handling high-dimensional data and assessing feature importance based on their contribution to reducing impurity at decision nodes. In our implementation, 100 estimators were used, that is, trees and the Gini impurity criterion, to measure the quality of a split when constructing the trees. For feature extraction, we tested several ‘threshold’ variables to retain the features. Our reported results are for threshold values of 0.001 and 0.0001 for both training our model and for validating it using the additional data set of samples, as described in Table 2.

3. Results and Discussion

3.1. Spectral Analysis

Figure 1 presents the spectra of an EVOO and lampante olive oil sample belonging to the “other” class in the region 4000–900 cm−1. They are comparable, as may be observed, and their spectral characteristics can be connected to those of triacylglycerols. Three different spectral regions were observed at 3100–2800 cm−1, 1850–1650 cm−1, and 1500–900 cm−1, with the latter being the “fingerprint region” because the band pattern is particularly characteristic of molecular composition and can be used to identify minor substances [26]. In the high-wavenumber region, the band at 3005 cm−1 is due to the CH stretching of cis double bonds, whereas the bands centered at 2922 and 2853 cm−1 are associated with the hydrogen bonds and C-H bonds of the aliphatic chains of fatty acids and, in particular, vibrations of the terminal methyl groups (-CH3) [27,28]. The very strong absorbance at 1744 cm−1 was assigned to the carbonyl group (C=O) of triacylglycerols and specifically to the ketone group [29]. The triacylglycerol ester linkage is one of the most stable bonds in olive oil, and harsh conditions may induce changes in that band [30]. It is referred that during oxidation, the spectral region between 3050 and 2740 cm−1 and the band at 1744 cm−1 undergo several changes due to the production of saturated aldehyde functional groups or other secondary oxidation products [26]. The peak at ~1460 cm−1 is associated with the bending vibrations of CH2 and CH3 in the aliphatic chains of fatty acids. Bending vibrations of the CH2 groups correspond to the peak at 1377 cm−1 [18]. Shoulder peaks at 1096 and 1237 cm−1 are assigned to triacylglycerols, with a strong peak at 1160 cm−1 corresponding to stretching vibrations of C–O ester groups [31]. Finally, bands in the 1125–1095 cm−1 region correspond to the stretching vibration of C-O ester groups and CH2 wag [27].

3.2. Random Forest Models Based on Feature Extraction

The various parameter settings for smoothing effectively reduced noise and provided enhanced spectral signals, thereby facilitating more accurate feature extraction and classification.
Initially, the spectral dataset comprised 3164 features (variables) corresponding to the wavenumbers (cm−1) in the selected region of the FTIR spectrum. The RF model was applied for feature selection, with two different thresholds for feature importance set at 0.001 and 0.0001 to determine the most influential variables contributing to classification accuracy. The feature selection process was crucial in reducing the number of features, simplifying the model, and improving the overall computational efficiency while maintaining high classification accuracy. The number of selected features and the resulting classification accuracies varied significantly depending on the threshold for feature importance and the S-G smoothing parameters (Table 3 and Table 4).
The results obtained indicated that with a feature importance threshold of 0.001, the classification accuracy of the RF model during external validation varied between 0.83 and 0.91, depending on the specific S-G smoothing parameters. It was observed that applying S-G smoothing improved the accuracy compared to using raw spectral data without any smoothing (e.g., 0.87 without S-G versus 0.91 with two S-G configurations). It should be noted that the application of S-G smoothing with the second derivative (d = 2), second- and third-order polynomials (p = 2, p = 3), and window sizes (w) of 12 and 13 points achieved the highest accuracy (0.91) while selecting around 365–398 features. However, a slightly lower accuracy (0.89) was obtained when using a second-order polynomial (p = 2) and a window size (w) of 12 points, selecting 400 features. Without smoothing, the RF model retained 352 features and reached a testing accuracy of 0.87, highlighting the advantage of applying pre-processing to reduce spectral noise. In addition, the combination of different window sizes and polynomial orders for S-G smoothing influenced the number of features retained, demonstrating the importance of careful tuning of pre-processing steps to optimize classification performance.
The results of the analysis revealed that with a lower feature importance threshold (0.0001), more features were selected across the spectrum, resulting in a higher number of important variables being retained. The classification accuracy with this threshold ranged from 0.74 to 0.91, with the highest accuracy being 0.91. Notably, without S-G smoothing, 963 features were retained, achieving an accuracy of 0.87, which is identical to the previous threshold result but with more retained features. The highest classification accuracy (0.91) was obtained with S-G parameters (d = 2, p = 2, w = 13), yielding 980 features, suggesting that the S-G pre-processing step effectively reduced noise and highlighted relevant spectral features. Interestingly, the accuracy was reduced when the window size w = 12 was used, particularly for p = 3, resulting in a much lower accuracy (0.87). Additionally, for the case of a t = 0.001 threshold value, 58 samples (out of a total of 192) were used to test the trained models. The misclassification rate for the generated models during testing ranged from 6/58 to 10/58 samples. For the case of a t = 0.0001 threshold value, the misclassification rate for the generated models during testing ranged from 5/58 to 8/58 samples.
For external validation data with feature importance thresholds set at 0.001 and 0.0001, the approach focused on testing the RF model’s ability to generalize to olive oil spectral data not included in model development (unseen data). The results are shown in the last columns of Table 3 and Table 4, respectively.
For t = 0.001 the highest accuracy of 0.91 was obtained for the S-G smoothing parameters d = 2, p = 3, and w = 12, as well as for d = 2, p = 2, and w = 13, indicating an alignment in the performance of the prediction on unseen data (Table 5). For the two datasets, the confusion matrices are given below.
For the second threshold of t = 0.0001, the highest accuracy of 0.91 was achieved for d = 2, p = 2, and w = 13, and the second-highest accuracy of 0.87 was achieved for d = 2, p = 3, and w = 12, which is the same as in the dataset that was used for model development without S-G pre-processing (Table 6). It can be concluded that both thresholds provide a stable model with high performance on unseen data for the same smoothing parameters applied on the FTIR spectra. Table 6 lists the two confusion matrices for t = 0.0001.
The spectral features responsible for the classification in all cases are the bands located near 1744 cm−1 (i.e., 1739, 1748, and 1751 cm−1), related to the carbonyl groups present in triacylglycerols (C=O), the fingerprinting area 1250–1000 cm−1 (i.e., 1088, 1094, 1116, 1123, 1124, 1158, 1162, 1236, 1240, and 1247 cm−1), which correspond to CH bending, and 1680 cm−1, which is associated with unsaturated aldehydes.
In the literature, FTIR spectroscopy has been used to distinguish vegetable oils from olive oils of different categories, varieties, and origins combined with PLS-DA. The characteristic spectral bands of triacylglycerols containing unsaturated fatty acids have been observed to constitute the main basis of discrimination (3030–2800 cm−1) [26]. In another study, FTIR coupled with PCA was applied to discriminate the geographic origin of 84 monovarietal virgin olive oils from different Italian regions [27]. The calibration models were obtained using two wavenumber ranges, specifically 1770–1690 cm−1 and the fingerprinting area 1480–1030 cm−1, indicating that the carbonyl bonds in acylglycerols and the stretching in the C–O bonds of aliphatic esters, respectively, provide important information. Our results are in line with those of the latter study, as previously stated.
In EVOO samples, the free fatty acid content is very low, whereas they are present in higher amounts in virgin, lampante, and ordinary olive oils. Their presence could be attributed to fermentation and deterioration of the olive oil fruit, which may be overripe, damaged, or otherwise improperly stored before processing [28]. Spectral features associated with free fatty acids are in the wavenumber region around 3300 cm−1 [29], which was not observed in our study.
The peroxide value and spectrophotometric absorption at 232 and 270 nm are among the quality criteria for ranking olive oil categories. Both chemical parameters are used to express the extent of oxidation of olive oils. For EVOO, virgin olive oil, and ordinary olive oil, peroxide values must be equal to or lower than 20 milliequivalents of peroxide oxygen per kg/oil, whereas lampante olive oil has no limit [5,29]. Hydroperoxides decompose into secondary oxidation products, such as aldehydes, ketones, and epoxides, which are responsible for the off-flavor of oxidized oils [30]. Spectral bands around 1685 cm−1 were observed when olive oils were incubated at 65 °C in the dark and were assigned to unsaturated aldehydes [30,31,32]. The latter explains the selection of 1680 cm−1 as a spectral feature for the classification of our samples and the discrimination of EVOO from the “other” class and particularly from the lampante olive oil.
This study effectively demonstrates the potential of FTIR spectroscopy combined with RF analysis to differentiate EVOO from other olive oil categories. However, some limitations should be acknowledged to provide a more comprehensive understanding of the applicability of the method and to inspire further improvements. The sample size and its diversity are very important, as a relatively small sample size may limit the generalization of the classifier. Further studies should be undertaken to include a broader range of geographical origins, olive cultivars, and production methods. From a computational point of view, although the RF model achieved a high classification accuracy (up to 91%), the risk of overfitting cannot be entirely ruled out, especially given the high-dimensional nature of the spectral data. While external validation was performed, further testing on larger independent datasets is necessary to confirm the robustness of the model. Finally, while RF is a robust and widely used algorithm, comparing its performance with other machine learning techniques (e.g., support vector machines, neural networks) could provide a more comprehensive evaluation of the effectiveness of the method.

4. Conclusions

The information obtained in this study provides promising perspectives for the effective use of FTIR in the classification of EVOO against other olive oil categories. The characteristic spectral bands of triacylglycerols related to the carbonyl groups present in triacylglycerols (C=O) located near 1744 cm−1 (specific features: 1739, 1748, and 1751 cm−1), the fingerprinting area 1250–1000 cm−1 (specific features: 1088, 1094, 1116, 1123, 1124, 1158, 1162, 1236, 1240, and 1247 cm−1), which correspond to CH bending, and 1680 cm−1, which is associated with unsaturated aldehydes, were observed to constitute the main basis of the discrimination of EVOO from the “other” class. The reference method employed in the classification of the olive oil samples was based on the sensory analysis of well-established and regulated taste panels. RF analysis was employed as a supervised machine learning approach to correlate spectral data and olive oil sample sensory classes (EVOO vs. other olive oil samples). The combination of RF feature selection with S-G smoothing provides an effective strategy for reducing the dimensionality of spectral data while maintaining high classification accuracy. Careful tuning of the S-G parameters and the feature importance threshold was essential to optimize the model performance, demonstrating the value of well-designed pre-processing and feature selection steps in spectroscopic analysis for food authentication. The analysis indicated that the choice of the feature importance threshold significantly affected the number of features selected and the subsequent classification accuracy. Specifically, the 0.001 threshold generally selected fewer features, simplifying the model while maintaining or even improving classification performance compared to the 0.0001 threshold. The highest accuracy (0.91) was achieved using various combinations of smoothing parameters, suggesting that careful selection of pre-processing parameters is crucial for the optimal classification of olive oil categories. Furthermore, the feature selection ability of the RF model reduced the number of spectral features from 3164 to 352, which facilitated faster model training while improving interpretability by focusing on the most critical spectral regions for quality classification.
Overall, the use of RF in conjunction with S-G smoothing provided an effective methodology for differentiating EVOO from other olive oil categories. The ability of the model to achieve high classification accuracy demonstrates the robustness of FTIR spectral data combined with advanced machine learning techniques or the authentication of olive oil quality. Due to the limited number of olive oil samples that a sensory panel can access in a day, this method provides new perspectives in the high-throughput characterization of the EVOO olive oil against other olive oil categories. Combined with the lower cost and more rapid analysis time afforded by FTIR, it presents a promising option for industrial olive oil classification, increasing the credibility of the olive oil sector in the provision of high-quality products to consumers.
To inspire further studies in the field, it is suggested that future research should focus on collecting a larger and more diverse dataset, including olive oil samples from different regions, cultivars, and production methods. This would enhance the generalizability of the model and allow for the development of more robust classification systems. Furthermore, while RF has proven effective in this study, exploring other machine learning techniques, such as deep learning (e.g., convolutional neural networks) or ensemble methods combining multiple algorithms, could further improve the classification accuracy and feature extraction.

Author Contributions

Conceptualization, E.Z.P.; methodology, E.Z.P.; software, P.A. and E.Z.P.; validation, P.A. and E.Z.P.; formal analysis, P.A.; C.G., and E.Z.P.; investigation, S.S. and C.G.; resources, C.G.; G.E., and M.K.; data curation, S.S.; C.G., and E.Z.P.; writing—original draft preparation, C.G.; P.A., and E.Z.P.; writing—review and editing, C.G.; G.E.; M.K; P.A., and E.Z.P.; visualization, P.A. and C.G.; supervision, C.G. and E.Z.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding authors. The data are not publicly available due to privacy restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
EVOOExtra virgin olive oil
Other Virgin, ordinary, and lampante olive oil categories
RFRandom Forest
FTIRFourier transform infrared spectroscopy

References

  1. Producing 69% of the World’s Production, the EU Is the Largest Producer of Olive Oil—European Commission. Available online: https://agriculture.ec.europa.eu/news/producing-69-worlds-production-eu-largest-producer-olive-oil-2020-02-04_en (accessed on 8 January 2025).
  2. Bungaro, M. The World of Olive Oil. Available online: https://www.internationaloliveoil.org/the-world-of-olive-oil/ (accessed on 8 January 2025).
  3. Olive Oil Market Size, Share Analysis, Trends Report [2033]. Available online: https://www.imarcgroup.com/olive-oil-market (accessed on 12 January 2025).
  4. EU, European Union Regulation No 2022/2104 as Regards Marketing Standards for Olive Oil, and Repealing Commission Regulation (EEC) No 2568/91 and Commission Implementing Regulation (EU) No 29/2012. Available online: https://eur-lex.europa.eu/eli/reg_del/2022/2104/oj/eng (accessed on 18 December 2024).
  5. International Olive Council. Trade Standard Applying to Olive Oils and Olive Pomace Oils. COI/T.15/NC No 3/Rev. 20_. Available online: https://www.internationaloliveoil.org/wp-content/uploads/2024/11/TRADE-STANDARD-REV-20_EN.pdf (accessed on 12 January 2025).
  6. Olive Oil Prices—June 2024 Update. Available online: https://www.internationaloliveoil.org/wp-content/uploads/2024/06/IOC-prices-rev-0-1.html#ja%C3%A9n-spain (accessed on 18 December 2024).
  7. Bajoub, A.; Bendini, A.; Fernández-Gutiérrez, A.; Carrasco-Pancorbo, A. Olive Oil Authentication: A Comparative Analysis of Regulatory Frameworks with Especial Emphasis on Quality and Authenticity Indices, and Recent Analytical Techniques Developed for Their Assessment. A Review. Crit. Rev. Food Sci. Nutr. 2018, 58, 832–857. [Google Scholar] [CrossRef] [PubMed]
  8. International Olive Council. Sensory Analysis of Olive Oil Standard Guide for the Selection, Training and Quality Control of Virgin Olive Oil Tasters—Qualifications of Tasters, Panel Leaders and Trainers. COI/T.20/Doc. No 14/Rev. 9. Available online: https://www.internationaloliveoil.org/wp-content/uploads/2024/12/COI-T.20-Doc-14-REV9-ENG.pdf (accessed on 18 December 2024).
  9. International Olive Council. Sensory Analysis of Olive Oil Method for the Organoleptic Assessment of Virgin Olive Oil. COI/T.20/Doc. No 15/Rev. 11. Available online: https://www.internationaloliveoil.org/wp-content/uploads/2024/07/III-2.2.COI-T20-Doc.-15-REV-11-2024_EN.pdf (accessed on 18 December 2024).
  10. Alonso-Salces, R.M.; Berrueta, L.Á.; Quintanilla-Casas, B.; Vichi, S.; Tres, A.; Collado, M.I.; Asensio-Regalado, C.; Viacava, G.E.; Poliero, A.A.; Valli, E.; et al. Stepwise Strategy Based on 1H-NMR Fingerprinting in Combination with Chemometrics to Determine the Content of Vegetable Oils in Olive Oil Mixtures. Food Chem. 2022, 366, 130588. [Google Scholar] [CrossRef] [PubMed]
  11. Palagano, R.; Valli, E.; Cevoli, C.; Bendini, A.; Toschi, T.G. Compliance with EU vs. Extra-EU Labelled Geographical Provenance in Virgin Olive Oils: A Rapid Untargeted Chromatographic Approach Based on Volatile Compounds. LWT 2020, 130, 109566. [Google Scholar] [CrossRef]
  12. Barbieri, S.; Cevoli, C.; Bendini, A.; Quintanilla-Casas, B.; García-González, D.L.; Gallina Toschi, T. Flash Gas Chromatography in Tandem with Chemometrics: A Rapid Screening Tool for Quality Grades of Virgin Olive Oils. Foods 2020, 9, 862. [Google Scholar] [CrossRef]
  13. Chacón, I.; Roales, J.; Lopes-Costa, T.; Pedrosa, J.M. Analyzing the Organoleptic Quality of Commercial Extra Virgin Olive Oils: IOC Recognized Panel Tests vs. Electronic Nose. Foods 2022, 11, 1477. [Google Scholar] [CrossRef] [PubMed]
  14. Infrared Spectroscopy for Food Quality Analysis and Control—Pages 415-424—ScienceDirect. Available online: https://www.sciencedirect.com/science/article/pii/B9780123741363000250 (accessed on 18 December 2024).
  15. Casale, M.; Simonetti, R. Review: Near Infrared Spectroscopy for Analysing Olive Oils. J. Infrared Spectrosc. 2014, 22, 59–80. [Google Scholar] [CrossRef]
  16. Violino, S.; Taiti, C.; Marone, E.; Pallottino, F.; Costa, C. A Statistical Tool to Determine the Quality of Extra Virgin Olive Oil (EVOO). Eur. Food Res. Technol. 2022, 248, 2825–2832. [Google Scholar] [CrossRef]
  17. Ordoudi, S.A.; Özdikicierler, O.; Tsimidou, M.Z. Detection of Ternary Mixtures of Virgin Olive Oil with Canola, Hazelnut or Safflower Oils via Non-Targeted ATR-FTIR Fingerprinting and Chemometrics. Food Control 2022, 142, 109240. [Google Scholar] [CrossRef]
  18. Motahari, H.; Mousavi, S.S.; Haghighi, P. Raman, FTIR, and UV–Vis Spectroscopic Investigation of Some Oils and Their Hierarchical Agglomerative Clustering (HAC). Food Anal. Methods 2023, 16, 1237–1251. [Google Scholar] [CrossRef]
  19. Borràs, E.; Mestres, M.; Aceña, L.; Busto, O.; Ferré, J.; Boqué, R.; Calvo, A. Identification of Olive Oil Sensory Defects by Multivariate Analysis of Mid Infrared Spectra. Food Chem. 2015, 187, 197–203. [Google Scholar] [CrossRef]
  20. Borràs, E.; Ferré, J.; Boqué, R.; Mestres, M.; Aceña, L.; Calvo, A.; Busto, O. Olive Oil Sensory Defects Classification with Data Fusion of Instrumental Techniques and Multivariate Analysis (PLS-DA). Food Chem. 2016, 203, 314–322. [Google Scholar] [CrossRef]
  21. Basalekou, M.; Kallithraka, S.; Tarantilis, P.A.; Kotseridis, Y.; Pappas, C. Ellagitannins in Wines: Future Prospects in Methods of Analysis Using FT-IR Spectroscopy. LWT 2019, 101, 48–53. [Google Scholar] [CrossRef]
  22. Pappas, C.S.; Tarantilis, P.A.; Moschopoulou, E.; Moatsou, G.; Kandarakis, I.; Polissiou, M.G. Identification and Differentiation of Goat and Sheep Milk Based on Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS) Using Cluster Analysis. Food Chem. 2008, 106, 1271–1277. [Google Scholar] [CrossRef]
  23. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  24. Grassi, S.; Benedetti, S.; Magnani, L.; Pianezzola, A.; Buratti, S. Seafood Freshness: E-Nose Data for Classification Purposes. Food Control 2022, 138, 108994. [Google Scholar] [CrossRef]
  25. Sokolova, M.; Lapalme, G. A Systematic Analysis of Performance Measures for Classification Tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
  26. De La Mata, P.; Dominguez-Vidal, A.; Bosque-Sendra, J.M.; Ruiz-Medina, A.; Cuadros-Rodríguez, L.; Ayora-Cañada, M.J. Olive Oil Assessment in Edible Oil Blends by Means of ATR-FTIR and Chemometrics. Food Control 2012, 23, 449–455. [Google Scholar] [CrossRef]
  27. Bendini, A.; Cerretani, L.; Di Virgilio, F.; Belloni, P.; Bonoli-Carbognin, M.; Lercker, G. Preliminary Evaluation of the Application of the FTIR Spectroscopy to Control the Geographic Origin and Quality of Virgin Olive Oils. J. Food Qual. 2007, 30, 424–437. [Google Scholar] [CrossRef]
  28. Valli, E.; Bendini, A.; Maggio, R.M.; Cerretani, L.; Toschi, T.G.; Casiraghi, E.; Lercker, G. Detection of Low-quality Extra Virgin Olive Oils by Fatty Acid Alkyl Esters Evaluation: A Preliminary and Fast Mid-infrared Spectroscopy Discrimination by a Chemometric Approach. Int. J. Food Sci. Technol. 2013, 48, 548–555. [Google Scholar] [CrossRef]
  29. Muik, B.; Lendl, B.; Molina-Diaz, A.; Valcarcel, M.; Ayora-Cañada, M.J. Two-Dimensional Correlation Spectroscopy and Multivariate Curve Resolution for the Study of Lipid Oxidation in Edible Oils Monitored by FTIR and FT-Raman Spectroscopy. Anal. Chim. Acta 2007, 593, 54–67. [Google Scholar] [CrossRef] [PubMed]
  30. Tena, N.; Aparicio, R.; García-González, D.L. Virgin Olive Oil Stability Study by Mesh Cell-FTIR Spectroscopy. Talanta 2017, 167, 453–461. [Google Scholar] [CrossRef]
  31. Antolovich, M.; Prenzler, P.D.; Patsalides, E.; McDonald, S.; Robards, K. Methods for Testing Antioxidant Activity. Anal. 2002, 127, 183–198. [Google Scholar] [CrossRef] [PubMed]
  32. Dubois, J.; Van De Voort, F.R.; Sedman, J.; Ismail, A.A.; Ramaswamy, H.R. Quantitative Fourier Transform Infrared Analysis for Anisidine Value and Aldehydes in Thermally Stressed Oils. J. Am. Oil Chem. Soc. 1996, 73, 787–794. [Google Scholar] [CrossRef]
Figure 1. FTIR spectra in the region 4000–900 cm−1 of extra virgin olive oil (EVOO class) versus a lampante olive oil (denoted as other class).
Figure 1. FTIR spectra in the region 4000–900 cm−1 of extra virgin olive oil (EVOO class) versus a lampante olive oil (denoted as other class).
Applsci 15 01061 g001
Table 1. Number and coding of olive oil samples according to sensory quality category and origin.
Table 1. Number and coding of olive oil samples according to sensory quality category and origin.
Providers 1
CategoryIOCMDO1
Number of samples 124112
Extra virgin olive oil6257
Other (virgin, ordinary, lampante)6165
1 IOC (International Olive Council), MD (Department of Chemical Analyses, Interagency Unit for Market Control, Ministry of Development), O1 (olive oil industry 1).
Table 2. Number and coding of olive oil samples according to sensory quality category and origin used for external validation purposes.
Table 2. Number and coding of olive oil samples according to sensory quality category and origin used for external validation purposes.
Providers 1
CategoryIOCO2
Number of samples 79
Extra virgin olive oil40
Other (virgin, ordinary, lampante)39
1 IOC (International Olive Council), O2 (olive oil industry 2).
Table 3. Results of the RF models developed on the FTIR spectral data based on different pre-processing techniques using 0.001 as the feature importance threshold, together with classification accuracies obtained in both testing and external validation.
Table 3. Results of the RF models developed on the FTIR spectral data based on different pre-processing techniques using 0.001 as the feature importance threshold, together with classification accuracies obtained in both testing and external validation.
S-G ParametersOriginal FeaturesImportant FeaturesAccuracy (Testing)Accuracy
(External Validation)
Without S-G *31643520.880.87
d = 2, p = 2, w = 1131643750.900.83
d = 2, p = 3, w = 1131649300.840.83
d = 2, p = 2, w = 1231644000.900.89
d = 2, p = 3, w = 1231643980.900.91
d = 2, p = 2, w = 1331643650.860.91
d = 2, p = 3, w = 1331643690.900.85
* Raw data; S-G: Savitzky–Golay; d: derivative order; p: polynomial order; w: window size.
Table 4. Results of the RF models developed on the FTIR spectral data based on different pre-processing techniques using 0.0001 as the feature importance threshold together with classification accuracies obtained in both testing and external validation.
Table 4. Results of the RF models developed on the FTIR spectral data based on different pre-processing techniques using 0.0001 as the feature importance threshold together with classification accuracies obtained in both testing and external validation.
S-G ParametersOriginal FeaturesImportant FeaturesAccuracy
(Testing)
Accuracy
(External Validation)
Without S-G31649630.880.87
d = 2, p = 2, w = 1131649300.880.76
d = 2, p = 3, w = 1131649300.840.83
d = 2, p = 2, w = 1231649600.860.74
d = 2, p = 3, w = 1231649580.900.87
d = 2, p = 2, w = 1331649800.840.91
d = 2, p = 3, w = 1331649820.860.83
S-G: Savitzky–Golay; d: derivative order; p: polynomial order; w: window size.
Table 5. Results (confusion matrix) of external validation of the RF models developed on FTIR spectral data based on different pre-processing techniques using 0.001 as the feature importance threshold.
Table 5. Results (confusion matrix) of external validation of the RF models developed on FTIR spectral data based on different pre-processing techniques using 0.001 as the feature importance threshold.
d = 2, p = 3, w = 12PredictedEVOOOtherRecallAccuracyPrecisionF1 Score
ActualEVOO3220.94 0.910.940.94
Other210
d = 2, p = 2, w = 13
ActualEVOO3040.880.911.00.94
Other012
d: derivative order; p: polynomial order; w: window size.
Table 6. Results (confusion matrix) of external validation of RF models developed on FTIR spectral data based on different pre-processing techniques using 0.0001 feature importance threshold.
Table 6. Results (confusion matrix) of external validation of RF models developed on FTIR spectral data based on different pre-processing techniques using 0.0001 feature importance threshold.
d = 2, p = 2, w = 13PredictedEVOOOtherRecallAccuracyPrecisionF1 Score
ActualEVOO3040.880.911.00.94
Other012
d = 2, p = 3, w = 12
ActualEVOO2860.820.871.00.90
Other012
d: derivative order; p: polynomial order; w: window size.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gardeli, C.; Sykioti, S.; Exarchos, G.; Koliatsou, M.; Andritsos, P.; Panagou, E.Z. The Differentiation of Extra Virgin Olive Oil from Other Olive Oil Categories Based on FTIR Spectroscopy and Random Forest. Appl. Sci. 2025, 15, 1061. https://doi.org/10.3390/app15031061

AMA Style

Gardeli C, Sykioti S, Exarchos G, Koliatsou M, Andritsos P, Panagou EZ. The Differentiation of Extra Virgin Olive Oil from Other Olive Oil Categories Based on FTIR Spectroscopy and Random Forest. Applied Sciences. 2025; 15(3):1061. https://doi.org/10.3390/app15031061

Chicago/Turabian Style

Gardeli, Chrysavgi, Stavroula Sykioti, George Exarchos, Maria Koliatsou, Periklis Andritsos, and Efstathios Z. Panagou. 2025. "The Differentiation of Extra Virgin Olive Oil from Other Olive Oil Categories Based on FTIR Spectroscopy and Random Forest" Applied Sciences 15, no. 3: 1061. https://doi.org/10.3390/app15031061

APA Style

Gardeli, C., Sykioti, S., Exarchos, G., Koliatsou, M., Andritsos, P., & Panagou, E. Z. (2025). The Differentiation of Extra Virgin Olive Oil from Other Olive Oil Categories Based on FTIR Spectroscopy and Random Forest. Applied Sciences, 15(3), 1061. https://doi.org/10.3390/app15031061

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop