1. Introduction
Croplands are changing continuously and intensively at regional to global scales due to climate change and human activities [
1]. Cropland extent is an essential part of crop monitoring because it is fundamental for the analysis of crop inventory and the assessment of crop status [
2,
3,
4,
5].
Approximately 90% of staple food production in sub-Saharan Africa is provided by rainfed farming systems [
6]. In the Zambezi River Basin (ZRB), variations in climate conditions have a direct influence on crop outputs as they may turn into extreme rainfall or extended period of drought [
6,
7]. The effects of climate change [
8] combined with filed sizes and cropping systems in the basin, contribute to the reduced food production [
9]. The reduced food production makes the ZRB one of the most vulnerable regions in terms of food security in the continent. The expansion of cropland is a primary way to increase crop production in sub-Saharan Africa. According to official statistics, the cropland in the ZRB has experienced a considerable increase since the 1980s [
10]. The driving force in the rise was due to a major political and socio-economic transition which changed the agricultural landscape, cropping practices, and productivity, and ultimately influenced the water cycle, water resources, and energy generation. Accurate and timely analysis of cropland extent can provide objective information for decision-making in agricultural management, food security early warning, and food allocation in this region. Nevertheless, mapping cropland extent remains challenging in the ZRB, mainly because of landscape heterogeneity, fragmented agricultural fields, and different cropping patterns [
8,
11,
12], which makes it difficult to discriminate cropland from other vegetation classes such as grassland. To avoid these problems, different aspects, including the quality of input data, methodology to be applied and land features, should be considered [
13,
14,
15].
Various approaches have been developed and tested by researchers to obtain an accurate and timely cropland extent. Compared to conventional methods such as field surveys, remote sensing-based cropland identification is a more rapid and cost-effective method [
13,
14,
15], and therefore provides more frequently updated cropland information [
15,
16,
17]. Numerous studies have utilized supervised classification methodologies for cropland mapping using various satellite datasets [
1,
18,
19,
20,
21]. Among those supervised classifiers, the random forest classifier [
18,
21], the classification and regression tree (CART), the support vector machine (SVM) [
20,
22], maximum likelihood and minimum distance [
23] and the naive Bayes classifier [
24] are commonly used [
18]. Based on these approaches, several fine resolution landcover and cropland products have become available in recent years, including the fine resolution observation and monitoring of global land cover (FROM-GLC) dataset at 30 m resolution for 2010 [
25], the Landsat-derived GLOBELAND30 (GLC30) dataset at 30 m resolution for 2000 and 2010 [
26], the Copernicus global land operations (CGLS) land cover product of Africa (CGLS-LC100 at a 100 m resolution for 2015), the European Space Agency Climate Change Initiative Land Cover, Global Land Cover map at 300 m for 2015 (ESACCL-LC-L4-300), the Sentinel-2 Prototype (ESACCI-LC_S2_Prototype) map for Africa at a 20 m resolution for 2016 [
27], and the Global Food Security support analysis data over the continent of Africa for 2015 at a 30 m resolution (GFSAD30AFCE) [
18]. The accuracies of these landcover products depend on their input data sources.
Recently, researches evaluated the accuracy of these datasets, and it was found that the GLOBELAND30 (GLC30) had an overall accuracy of 78.6% for the year 2000 and 80% for the year 2010 [
26]. The overall accuracy of the ESACCI-LC_S2_Prototype dataset was approximately 65% [
27], while ESACCL-LC-L4-300 had an overall accuracy of 71.5% [
28]. GFSAD30AFCE [
18], FROM-GLC [
25], and CGLS-LC100 [
29] have as overall accuracies of 94.5%, 64.9% and 74.3%, respectively. Independent validation shows that the overall accuracies of all four datasets (CGLS-LC100 [
29], ESACCL-LC-L4-300 [
28], GFSAD30AFCE [
18], and ESACCI-LC_S2_Prototype [
27]) were below 65% [
12]. Furthermore, results revealed an overestimation of crop area in most countries when compared with the statistics from the Food and Agriculture Organization (FAO) [
30]. Several studies have attempted to include more indicative spectral features to improve classification accuracy [
5,
18]. For example, taking advantage of crop growth features derived from satellite data, the GFSAD30AFCE obtained a relatively higher accuracy than that in other studies [
12,
19]. The use of time-series data has been proven better than using a single date image [
31]. Furthermore, the use of remote sensing data with high spatial resolution is one of the primary factors to obtain high-quality cropland maps [
3,
14,
20]. However, studies have also indicated that a single factor, such as high spatial resolution alone, may not be enough to yield the desired improvement in mapping accuracy. For instance, even though there is an improved spatial resolution from using Sentinel-2 images, this does not result in a significant improvement in the accuracy of cropland mapping [
12]. Studies have reported that although accuracy might be improved by spatial stratification [
32], limited tests have been carried out and are not publicly available. Despite the efforts to enhance input data for mapping, the high cost of obtaining in situ samples and the lack of training samples when applied over a large scale are other limiting factors to further improve cropland product accuracy [
33].
According to [
34], the ZRB is composed of four agroecological zones AEZs, namely, tropic cool semiarid, tropic cool sub-humid, tropic warm semiarid, and tropic warm sub-humid zones. Each of these AEZs represents diversified characteristics of cropping practices, field sizes and, heterogeneity of landcover and climate variation [
35]. For example, the tropic cool zones are characterized by large field sizes compared to tropic warm zones. These characteristics influence the discrimination of cropland from non-cropland areas. This study aims to investigate the stability of several parametric and non-parametric classifiers for cropland mapping by addressing the diversity in landcover and cropping systems over four different AEZs in the ZRB, taking advantage of multiple fine resolution datasets (Landsat-8 and Sentinel-2), as well as cloud computing with Google Earth Engine (GEE) (
https://earthengine.google.com/) [
36,
37]. The objectives of the paper are (1) to evaluate the feasibility of training samples derived from existing datasets for large-scale cropland mapping and (2) to investigate the stability of four different classifiers (machine learning (random forest, support vector machine, and classification and regression tree) and non-parametric (minimum distance) classifiers) over different AEZs with diverse landscapes and cropping systems.
4. Discussion
Suggested by previous studies, field size [
8], spatial extent, and landscape patterns [
85] are the major factors impacting the accuracy of cropland classification [
12]. According to [
8], field size can be of great importance in agricultural land monitoring, referring to the fact that, for example, a small field size will require the use of high-resolution images when compared with larger fields. Over the ZRB, the dominant field size is labelled ‘‘very small’’ (0.64–2.56 ha and < 0.64 ha). To handle this issue, different aspects must be taken into consideration, including the methodology to be applied, the characteristics of the land features, and the quality of the input data for classification. The results obtained from this study indicate that with an average OA of 87.4%, the RF classifier outperformed all other classifiers, including MD (79.4%), SVM (78.2%), and CART (76.9%) in all studied AEZs. The good performance of RF classifier over different AEZs indicates it has substantial potential in mapping cropland features under different conditions. Similar findings were reported by [
18,
21], who used RF to map not only the cropland extent but also the different features on the Earth’s surface. Although RF performed best in all AEZs, this classifier still needs to be trained in each region considering the dynamics of agricultural conditions.
In this study, we found that by training in each region, the accuracies of this classifier varied with the AEZS (
Figure 4), with the highest accuracy observed in the tropic cool sub-humid region (93.8%), and the lowest accuracy observed in the tropic warm semiarid region (82.6%). The differences presented here might be attributed to landscape patterns, field sizes, and different cropping systems at different AEZs [
86]. For example, the tropic cool zones (semiarid and sub-humid) have different cropping systems and field sizes compared with those of the tropic warm zones (semiarid and sub-humid). In the tropic cool zones, a high percentage of the area is mostly characterized by commercial farms with relatively large field size [
8], making it easy to identify the croplands with higher accuracy. In contrast, over the tropic warm (semiarid and sub-humid) zones, the high phenological similarity between vegetation classes, particularly grassland, with rainfed cropland areas could be the main source of confusion. The zone-dependent variations in cropland classification accuracies were also reported by [
87] when mapping cropland area over southeast and northeast Asia using multi-year time-series Landsat 30 m data and a random forest classifier. In the process of cropland mapping, one of the biggest challenges is the separation of cropland from grassland. Grassland has, in some growing periods, spectral features similar to those of cropland, which often confuse discrimination from cropland [
88]. Some fields have crops in some years while in other years, they are left idle (bare or as grassland), which also leads to spectral variability and confusion in the multi-temporal analysis. This phenomenon may have led to the higher misclassification among cropland, grassland, and forest in the two tropical warm (semiarid and sub-humid) zones than in the two tropical cool (semiarid and sub-humid) zones, leading to the higher accuracy of all four classifiers in cool AEZs (
Figure 4).
It is noteworthy that our research paid special efforts in collecting and processing the input data to improve the cropland classification thanks to the cloud computing techniques. The cropland extent was finally mapped at a 10 m spatial resolution over the ZRB by considering three years, 2017–2019. We obtained an overall accuracy of 84%, which was 2% higher than the GFSAD30AFCE for the years 2015/2016 over the ZRB [
18]. The differences between these two studies are the input datasets. This study enhanced mapping by combining reflective bands with multiple derived indices, thereby increasing spectral discriminability between the different classes. In addition, our 10 m cropland map was also compared to the FROM-GLC 30 m cropland map developed by [
84] over the ZRB. It was found that not only did we improve the spatial resolution of cropland map (from 30 m to 10 m), but also the accuracy of our cropland map was 15% higher than that of the FROM-GLC product.
This study also proved the importance of the integration of different types of datasets (Landsat-8 and Sentinel-2) for accurate cropland mapping. In this study, these two different datasets were chosen because most of the study region is characterized by rainfed croplands, and the growing period (an essential element in the identification of cropland areas) coincides with the rainy season, thus, obtaining cloud-free time-series images becomes a challenge. Hence, the use of multiple sensors enhanced the acquisition of cloud-free images, which in turn contributed to more accurate results. Apart from the RF classifier and the usage of the different datasets, another essential element that contributed to the improvement in mapping accuracy was the technique used to collect samples for classifier training/calibration. In this study, the samples used were collected from locations where there was agreement between different existing land cover datasets on the class value at a given point. By using different datasets for sampling, we have reduced the uncertainty and therefore provided more reliable calibration sets. Apart from the RF classifier and the usage of the different datasets, another essential element that contributed to the improvement in mapping accuracy was the technique used to collect samples for classifier training/calibration. In this study, the samples used were collected from locations where there was an agreement between different existing land cover datasets on the class value at a given point. Furthermore, a comparison of spatial agreement of the four different land cover datasets (including GFSAD30AFCE, CGLS-LC100, ESACCILC_S2_Prototype, and ESACCL-LC-L4-300) based on standard deviation [
12] revealed high spatial agreement on cropland maps over the ZRB, suggesting that these datasets are reliable over the region. By using different datasets for sampling, we have reduced the uncertainty and therefore provided more reliable calibration sets.
In terms of limitations of this study, the present of cloud is a big issue for RS observations in study area, particularly over the rainy season (October to March). Although a yearly mosaic is used to reduce the impact of clouds, the impact cannot be eliminated. The use of Synthetic Aperture Radar (SAR) data may result in better separation between cropland and grassland, especially in areas close to rives or wetlands which might be another limitation of this research. Fortunately, cloud computing on the GEE platform efficiently processed and composited thousands of imageries for each year. As a consequence, the number of cloudless observations for the rainy season (October 2018 to March 2019) was 64 on average by integrating Landsat-8 and Sentinel-2 imageries (
Figure A2). Moreover, 64.9% of the ZRB had 34 to 64 valid observations during the rainy season. Thus, the uncertainty of cloud impacts is limited to some extent. Furthermore, more validation samples based on field surveys are needed for better quality assessment. Given that only a small number of validation samples are available, we relied on a freely available validation dataset, which could also introduce some uncertainty in the final assessment.