1. Introduction
With several cyclones having hit its coast in the last few decades, Mozambique is considered one of the most flood-prone countries in southern Africa [
1]. In 2000, the tropical cyclone (TC) Eline hit the Mozambican coast near Beira, causing over 800 deaths and leaving hundreds of thousands of people homeless. Although all provinces were affected, Gaza, a province in the southern region of Mozambique, was the most impacted, with the low-lying farmlands in the Chókwe and Xai Xai Districts being put under 4–8 m of water [
2,
3]. For years, these floods were considered the most devastating in Mozambican history [
2]. In February 2017, TC Dineo hit the southern Mozambique region with heavy rains and winds over 100 km/h [
4]. From 7 to 15 January 2018, there were floods in almost all the provinces in Mozambique, and at least 11 people died and more than 750,000 were affected. Up to 15,000 homes were severely damaged in the worst-hit province of Nampula, causing disruption of rail traffic in Maputo [
5]. In 2019, Mozambique became the first country in southern Africa to be hit by two TCs in the same raining season. On 15 March 2019, TC Idai hit the central part of the country, devastating three provinces, Sofala, Manica and Zambezia. Beira, the capital of Sofala, was 90% damaged (infrastructures such as buildings, roads, and bridges were partially or totally destroyed). Once again, on 25 April 2019, the TC Kenneth hit the north part of the country, affecting two provinces, Cabo Delgado and Nampula. On 22 January 2021, Beira was again hit by TC Eloise, which also affected the Zambezia province and the north part of the Inhambane province. The destruction of Eloise was close to that of TC Idai; the number of causalities, however, was much reduced, and that can reflect how aware people were about the event.
Several studies have been conducted in order to mitigate the impact of floods in Mozambique, some of which were conducted in collaboration with the National Institute for disaster management in Mozambique (INGD). However, most of these studies are focused on the impact of climate change on disaster risk in Mozambique [
6], and others were more specific and directed to floods [
1]. Following these studies, many actions to mitigate the impact of these disasters have been taking place. National organizations such as INGD and the Ministry of State Administration (locally called MAE) are also engaged on creating policies to prevent the impact of disasters on society and infrastructures. INGD is responsible for coordinating disaster risk management at the national, provincial and district levels, as well as at the community level. Three regional emergency operation centers (CENOE) handle TCs and droughts (Vilankulos), floods (Caia) and TCs (Nacala). There are also four multiple use centers (called CERUM) at the district level that specialize in reducing the impact of droughts. At the community level, INGD acts through local committees for Disaster Risk Management that are empowered to deal with both disaster prevention and preparedness. In fact, there are some efforts to obtain near real-time monitoring systems in Mozambique; however, the INGD is still lagging behind compared to the Copernicus Emergency Management Service (Copernicus EMS) services. On the other hand, INGD often utilizes drones and has acquired a number of drones capable of covering a radius of almost seven kilometers at a height of 700 m and speed of 72 km/h to aid in real-time decision making during emergencies and after an event anywhere in the country [
7]. Although drones may be useful in some particular cases, they have a very narrow swath, and are an expensive and often time consuming solution for demanding situations such as natural disasters.
Covering large areas at regular revisits, satellite remote sensing has been playing an important role in the disaster management of hazards such as floods, fires, cyclones, and earthquakes, especially for preparedness, warnings and emergency responses. Particularly, the fusion of optical and synthetic aperture radar (SAR) data on change detection has proven to be very promising [
8,
9]. Change detection is the process of identifying differences in the state of an object or phenomenon by observing it at different times, and the basic premise in using remote sensing data is that changes in land cover (LC) must result in changes in radiance values [
10]. As the number of fast growing cities is increasing significantly all over the globe, events such as uncontrolled urbanization, deforestation, droughts, and floods highlight the importance of the continuous development of methods and technologies for Earth observation and monitoring. This is vital to the management of natural resources, the conservation of ecosystems and biodiversity as well as decision support for sustainable development. Hence, change detection using remote sensing plays an important role, as shown in several studies [
11,
12,
13]. Examples related to deforestation can be found in [
14,
15], urbanization in [
12] and floods in [
16,
17,
18,
19,
20].
With the launches of the Sentinel-1 (S1) SAR and Sentinel-2 (S2) multi-spectral instrument (MSI), free and open data with global coverage, a large swath and high temporal resolution became routinely available and easily accessible compared to drones data and some other satellites. In this project, we aim to investigate multi-temporal S1 SAR and S2 MSI for flood mapping (FM) in near real-time and assess the damage in Mozambique. To help the local communities, we map flooded areas and estimate the devastated area for some LCs, mainly in built up areas and agriculture in cases of the TCs Idai, Kenneth and Eloise. For an accuracy assessment of FM, we follow the method applied in [
16].
2. Literature Review
There are several methods and techniques for FM that have been developed over the past decades. These methods primarily follow change detection techniques such as image differencing [
21,
22], regression change detection [
23,
24], principal component analysis (PCA) [
25], unsupervised change detection [
26], post-classification comparison [
27], artificial neural network (ANN) [
28], change detection by combining feature-based and pixel-based techniques [
29], change detection by combining object-based and pixel-based techniques [
30], and object-based change detection [
31], among others. Each of these methods can range from simple to complex when deployed for flood detection depending on elements such as data specification, sensor imaging capabilities, soil moisture and whether it is an urban or rural area. In general, it is a challenging task to find the most effective technique for FM, especially in urban areas where some shadows of buildings can easily be mistaken for floods [
32]. However, identifying a suitable flood detection technique is of great significance in producing FM results with substantial quality. There has been continuous work throughout the last few decades to produce and identify the best approaches for flood detection in different situations and regions. In all those approaches, it can be seen that flood detection, as with any change detection problem, should follow the following basic steps:
Image preprocessing including geometrical rectification and image registration, radiometric and atmospheric correction, and topographic correction if the study area is in mountainous regions;
Selection of suitable techniques to implement change detection analyses;
Accuracy assessment.
Therefore, to perform flood detection based on multi-temporal images, the respective images should be radiometrically and spatially comparable [
33,
34,
35], and it is a common practice to perform these corrections before FM [
36]. Geometric correction is accomplished by image-to-image registration or image orthorectification in mountainous areas, and in urban areas for very high resolution images to ensure that the corresponding pixels in the multi-temporal images refer to the same geographic location [
34].
FM can be performed using SAR or optical data (
Table 1 and
Table 2); however, SAR sensors are widely preferable for FM over optical ones due to their all weather and day–night imaging capabilities. SAR images usually exhibit a salt and pepper appearance called speckles, which can affect the quality of SAR-based FM. The strategy for reducing the effect of speckles is usually filtering individual imagery before the comparison. To do so, an adaptive filtering is iteratively applied until a satisfactory result is obtained. In addition, due to the sensor sensitivity to the incident angle, radiometric correction is also applied to SAR data. Several fully automated FM approaches have been presented in the last few decades, particularly using SAR images [
17,
20,
37]. Although SAR-based FM has been applied in both urban and rural areas, it is most commonly applied in rural areas [
38]. The specular reflection occurring on smooth water surfaces results in a dark tone in SAR data that makes floodwater distinguishable from dry land surfaces. SAR-based flood detection in urban areas is challenging due to the complex backscatter mechanisms associated with varying building types and heights, vegetation areas, and different road types. From this scope, considerable effort is still required to produce more reliable methods for urban FM because these areas are also prone to flooding and have an associated increased risk of loss of human lives and damage to economic infrastructures [
38,
39].
SAR data such as TerraSAR-X were used to develop an automatic near real-time flood detection approach at the River Severn, UK [
37]. They combine histogram thresholding and segmentation-based classification, specifically oriented to the analysis of single-polarized of very high resolution SAR. In [
17], an automated S1-based processing chain designed for flood detection and monitoring in near real-time data is presented. The study is performed in Greece and Turkey, and the deployed methods exhibit an accuracy of 94%. Observe that fully automated approaches have advantages over manual or semi-automatic approaches due to the fact that they can reduce the time for mapping activities and rapidly provide useful information to emergency management authorities and decision-makers.
Table 1 and
Table 2 summarize some papers that show the state of the art of FM and damage assessment using S1 and S2, some of which will be fundamental to reach the objectives of this project.
Table 1.
Examples of papers about FM using SAR data.
Table 1.
Examples of papers about FM using SAR data.
Paper | SAR Data | Methods | Accuracy | Major Findings | Limiations |
---|
[38] | | Bayesian network. | For S1, had an overall accuracy of 93.7–94.5% and kappa of 0.60–0.68. For ALOS-2/PALSAR-2, had an overall accuracy of 86.0–89.6% and kappa of 0.61–0.72.
| | |
[40] | S1. | Difference of pre- and post-flood images. Otsu’s method for thresholding. | Accuracy of | Low computational cost. High accuracy. Easy implementation. Good extensionality.
| Poor validation process. |
[41] | S1. | Knowledge-based classification methods. | Accuracy of . | | |
[42] | S1. | statistics with complex Wishart distribution. | Not presented. | | |
[17] | S1. | | Overall accuracy of about 94.0–96.1% and kappa of 0.879–0.91. | Full automated method. Good extensionality. High accuracy. Good validation.
| Robustness to be proved. |
In threshold-based detection methods, flood detection usually requires an optimal threshold value [
43,
44]. Thresholds are used as a limiting value to evaluate if a pixel belongs to a certain cluster (class), and hence distinguish different classes in an image. Consequently, thresholding is one of most important parts in flood detection and there are several methods to do so. For example, the Kittler–Illingworth algorithm, the Otsu’s method, Normal and LMedS method, Poisson method, and the outlier detection technique are some popular thresholding methods. Some of them, such as Otsu’s thresholding, are based on the histogram of the image by holding the premise that the histogram is bimodal and in between the two peaks lies the optimal threshold value. The Kittler–Illingworth thresholding algorithm proposed in [
45] has also been used extensively in change detection. This algorithm was developed based on Bayesian decision theory and is known to be a fast and effective tool. However, the Kittler–Illingworth thresholding has some drawback when used in a multithreshold version, as it has high computational complexity [
13].
Table 2.
Examples of papers about FM using optical data.
Table 2.
Examples of papers about FM using optical data.
Paper | Optical Data | Methods | Accuracy | Major Findings | Limitations |
---|
[46] | Landsat 8. | SVM classification. | | Is flexible to data availability. Robustness of the method. NDWI fails to identify some water bodies than MNDWI based on SWIR.
| |
[47] | EO-1 ALI flooding. Landsat TM.
| Reflectance differencing technique. | Accuracy of 95–98% and kappa of 0.90–0.96. | | |
[48] | S2. | | | Unsupervised can be used by non-trained user. Unsupervised has good extensionality. Full automatic approach and shows good efficacy. Unsupervised has lower computational cost.
| Supervised shows a trend to overestimate water coverage. The approach shows limitations when water occupies only a very small portion in a S2 scene. Lack of wide coverage in situ ground truth data of a finer scale than the S2 ones.
|
[42] | HJ-1B satellite. | Multiple end member spectral analysis (MESMA) and Random Forest classifier. | Overall accuracy of and kappa of . | Full automated method. Good for FM using medium resolution optical data. Good due to its capability to account for the spectral and spatial variability of complex landscapes. High accuracy compered with ANN.
| The computational performance needs to be accessed. May cause salt and pepper effect because it is pixel-based. Underestimation of flooded areas in the woodland region due to the optical sensor’s inability to penetrate the canopies of forests. Underestimation of flooded areas in the built up regions.
|
[49] | | | Overall accuracy of 95. | | |
3. Study Area and Data
This paper focuses on the Beira municipality (or city of Beira) and Macomia district. Beira is a coastal city, the fourth largest city in Mozambique, located in the central region in Sofala Province (
Figure 1). This city has a total area of 633 km
and a population of about 500,000 [
50]. It is where Pungwe River meets the Indian Ocean, and most areas of this city are under the sea level (
Figure 2), making it very prone to flooding. Beira has a very important port that feeds not only the central provinces in Mozambique, but also the inland countries such as Zimbabwe, Malawi and Zambia.
Figure 3a displays Beira after TC Idai had hit on 15 March 2019. The second study area, Macomia (
Figure 1), is also a coastal district located in the Cabo Delgado province in the northern region of Mozambique. Being bounded by Mocímboa da Praia and Muidumbe districts in the north (also Muidumbe in the northwest), by the Meluco district in the west and southeast, the Quissanga district in the south and by the Indian Ocean in the west, this district has a total area of 4252 km
with a total population of about 114,345. With TC Kenneth hitting the Cabo Delgado province on 24 April 2019, this district was the epicenter of the storm (
Figure 3b) that came along and because of which 107,836 people were affected (displaced, had houses and crops destroyed), and 33 died [
51]. We chose these study areas not only because they were the most affected and important to the country’s economy (mainly for Beira), but also because we have the reference data for accuracy assessment.
Figure 4 displays some GPS (we used a Garmin GPS navigator device) points we collected during the field work in Beira after TC Idai. The points are helpful for LC identification and for visual validation of LC classification results. We collected 15 points for each LC within a neighborhood; therefore, the points are also labeled with numbers from 1 to 15.
We use S1 SAR imagery to map the flooded areas and S2 MSI to assess different LCs that were affected. S1 SAR and S2 MSI are a part of the space component of the Copernicus program of the EU (European Union) and the European Space Agency (ESA) with the aim to manage the environment, study climate change impact, and ensure civil security. During the management of natural disasters, man-made emergency situations and humanitarian crises, the availability of timely and accurate geospatial information over the affected area is of great importance. The specifications of the used data are as follows:
Sentinel-1 SAR: The S1 mission comprises a constellation of two polar-orbiting satellites (Sentinel-1A and Sentinel-1B) that operate day and night, performing C-band synthetic aperture radar (SAR) imaging, which enables them to acquire imagery regardless of the weather and achieve global coverage every 6 days. It operates at an altitude of 700 km. Sentinel-1A was launched on 3 April 2014, and Sentinel-1B on 25 April 2016. S1 SAR operates with single (VV,VH) or dual (VV+VH, HH+HV) polarizations [
52]. S1 SAR has four operating modes, namely: interferometric wide-swath (IW) with 5 m × 20 m spacial resolution and 250 km swath width, stripmap (SM) with 5 m × 5 m spacial resolution and 80 km swath width, extra wide-swath (EW) with 25 m × 40 m spacial resolution and 400 km swath width and wave-mode (WV) with 5 m × 5 m spacial resolution and 20 km by 20 km vignettes every 100 km along the orbit.
Therefore, in this project, we use S1 imagery for pre- and post-floods (
Table 3). The images taken with interferometric wide-swath (IW) acquisition mode and VH polarization.
Sentinel-2 MSI: The S2 mission comprises a constellation of two polar-orbiting optical satellites placed in the same orbit (Sentinel-2A and Sentinel-2B), phased at
to each other. Sentinel-2A was launched on 23 June 2015, and Sentinel-2B on 7 March 2017. The multi-spectral instrument (MSI) measures the Earth’s reflected radiance in 13 spectral bands that range from 43 nm to 2190 nm at a spatial resolution of 10–60 m with a swath width of 290. It aims at monitoring the variability inland surface conditions making use of its wide swath and high revisit time. The revisit frequency of each single S2 satellite is 10 days, and the combined constellation revisit is 5 days [
53]. In this project, we utilize S2 imagery for LC classification and combine it with the S1 results for damage assessment.
Table 4 shows the S2 data that we used. We use descendant orbit pass images because they were the best (cloud free) images found in the region that cover the whole study area.
Figure 5 shows the reference data produced by Copernicus EMS.
4. Methodology
Three main steps to reach the objectives of this work are considered. First, we use S1 data for FM, then we use S2 imagery to classify different LCs (Mangrove, Bare land, Built up Area, Shrubs, Forest, Water, Wetland, Grassland and Agriculture) and finally, we fuse both results to estimate the flooded area for each LC. Given that the data in the Google Earth Engine (GEE) platform are preprocessed, our imagery is geometrically and radiometrically corrected. Additionally, there is no need to download these data given that can they easily be handled within the platform. Thus, the analysis is conducted in GEE and
Figure 6 displays the flowchart that summarizes the whole work in this project. The GEE link for the script used to produce the results in this project is
https://code.earthengine.google.com/2e933dff4365807cd7ac309deec71c21, Accessed: 19 January 2023. We created the script, except for the Otsu’s thresholding algorithm that is provided in GEE.
4.1. FM Using S1 SAR
For FM (flood detection), we use an image differencing method. This is a traditional pixel-based change detection method. It is well known for its simplicity, easy implementation and understanding [
54]. It involves the subtraction of two images (pre- and post-event imagery) pixel-by-pixel and can reach a relatively high accuracy with a low computational cost [
21,
54], hence its choice for implementation in this project. Additionally, GEE is an online, and shared platform with worldwide coverage, thus, it is highly recommended to use low computational cost techniques to avoid errors (timeout errors or errors due to eventual slow response of the platform). Therefore, to map floods with this method, we first mosaic the pre-flooding S1 images and apply a focal mean filter to remove noise such as lines that appear mainly on the image boundaries or as separation lines on the mosaicked images. The appearance of these lines could be attributed to some sensor problems while imaging. We then apply the same filter on post-flooding images. After that, we compute the difference of the pre- and post-flooding mosaicked images. We choose to use VH polarization imagery due to its high sensitivity to water. Even though VV polarization is often used for FM, VH polarization has also proven to be highly sensitive to water, mainly in rural, swampy or river bank areas [
55,
56]. Note that Beira city is mostly covered by swampy areas where agriculture is practiced. After calculating the difference, a Gaussian filter is applied to further reduce some speckles. In order to automatically binarize the difference, the Otsu’s thresholding method is used, and then the flooded area is represented by pixels assigned a value of 1, and those with values of 0 are masked out. Additionally, in order to best extract the flooded area, we mask the permanent water along with isolated pixels (pixel groups smaller than 8 pixels) that normally appear after binarization as a result of some mis-detection. We then carry out the accuracy assessment using Copernicus EMS data as a reference (
Figure 5). Copernicus EMS services use very high resolution imagery, have a worldwide coverage, and are activated whenever there is a natural disaster. We ingest the Copernicus EMS vectors into GEE and re-project them into the local coordinate system (EPSG:3036) to match Sentinel data. Afterwards, we overlay this vector onto our predicted results, and randomly choose 5000 of the flooded pixels and 5000 of the non-flooded ones over our regions of interest (ROI) to compute the agreement of both results (our prediction and Copernicus EMS results). Some of the statistics we compute use the following formulas that are widely recommend as some of the best practices for accuracy assessment in remote sensing:
where,
,
,
, and
stand for true positive, false positive, true negative, and false negative, respectively.
where
is the overall accuracy and
is the expected proportion of cases correctly classified by chance. More details about this formula can be found in [
57].
Note that we compute the flood detection for a larger area of the Sofala province; however, an accuracy assessment is performed where reference data are available. Certainly, there are some regions within Beira that were not totally covered by the EMS such as Nhangau, and as such we do not include these areas in the accuracy assessment. Nonetheless, considering that Beira tends to have a homogeneous terrain, we assume that our FM results in Beira are representative for the entire Beira municipality and, hence, Nhangau.
4.2. LC Classification Using S2 MSI
To map the LC, we perform a supervised and pixel-based classification because it allows for exploitation of the authors’ prior information about the study area and helps to select more appropriate training samples to improve the accuracy. We start by the pan-sharpening step to improve the resolution of all the S2 bands with a spatial resolution of 20 m. By applying the Hue-Saturation-Value (HSV) method, we pan-sharpen all these bands utilizing the S2 Band 8 (NIR band) as reference. The S2 band 8 has a 10 m resolution. When the variable Value is replaced by Intensity, the HSV method is called Intensity-Hue-Saturation (IHS) [
58,
59]. HSV is one of the most used method for pan-sharpening [
54] and works as follows:
We first select bands B5, B6 and B7 and compose an RGB image;
Transform the RGB imagery into HSV;
Substitute the value by the panchromatic imagery (S2 band 8);
Transform back the HSV into the RGB imagery.
Because this method requires three bands at a time, we also apply it to bands 8A, 11, and 12. Thus, the S2 imagery utilized for LC classification consists of 10 bands with a spatial resolution of 10 m. For the second step, we manually collect between 600 and 1500 points (pixels) of data for each class. To collect these points, we draw small polygons comprising around 10 to 30 pixels. Afterwards, we train models with a CART classifier and produce the LC map of the study area. This classifier produces relatively high accuracy results with a low computational cost and complexity [
60], and these particularities were import for the choice of this classifier to fulfill the purpose of this project. Finally, the accuracy assessment is performed by computing the confusion matrix, overall accuracy, and kappa coefficient utilizing an independent set of pixels (also containing between 600 and 1500 points for each LC class) as reference data. As with the training points, this set of validation pixels is obtained manually, and due to its high resolution, we explored the base map in GEE during the collection process rather than solely using S2 imagery. The training and validation points are collected from S2 imagery not from the base map; however, the GEE base map imagery is helpful to distinguish different LCs types and minimize errors when collecting training or validation points. That is, with the help of the base map, we can easily locate, on S2 imagery, LCs such as permanent Water, Forest, and Shrubs, but due to the difference of dates between the S2 imagery and the base map, it is highly recommended to pay more attention to LCs such as Built up or Agriculture, as significant differences between the base map and S2 data can be observed. Additionally, we make use of points that were collected on the ground during an in situ fieldwork visit in Beira (
Figure 4) to visually compare with our classification. The in situ points also helped identify different LCs during the collection of the training and validation points, and were useful for visual validation of the classification results, but not for the computation of the confusion matrix.
4.3. Damage Assessment
The damage assessment is performed by overlapping both results of FM and classification. We compute the common area between each LC and the flooded area, which results in the flooded area per LC. We also calculate the percentages of flooded areas in both study areas for each LC by dividing the area of each flooded LC by its total area.
7. Conclusions and Future Research
In this work, we presented an automatic near real-time algorithm for FM by exploring the potential of S1 data in GEE. We also explored the capabilities of S2 and its synergy with S1 for damage assessment. The algorithm for FM showed an overall accuracy of 87–88% and kappa of 0.73–0.75 when compared directly with Copernicus EMS results, and this shows substantial agreement with the reference data. However, it produced higher recall than precision, indicating that it was a bit overoptimistic. Low ranges in all statistics mainly for the F1 Score (ranges from 0.85 to 0.87) highlighted the reliability and consistency of this method. Moreover, the method showed that it can be securely replicated and still yield results with high accuracy, which is a great advantage for the purpose of this project, which is to provide the local structures (in Mozambique) with reliable tools for better planning. We extracted, among others, damaged built up and agricultural areas, as they carry vital information to enhance local planning efforts. The fusion of S1-based FM results with a S2-based LC classification showed remarkable potential for rapid damage assessment. The classification overall accuracy was 90–95% with a kappa of 0.80–94, which also shows high alignment of the results with the reference data. The user’s accuracy, which reflects the reliability of the classification to the user, is very high (ranges from 0.75 to 1.0) for both study areas, and together with other statistics indicates that the method performed relatively well, and that the classification results are trustworthy. Given that Beira is largely swampy, it was challenging to distinguish Mangrove from Shrubs and Grassland; however, it was possible to create an adequate separation of those classes. Agreeing with our expectations, Wetland was the most affected area in Beira (we found flooded area of 106.24 km ), while agricultural land was considerably impacted by flooding in both ROIs. This is due to the composition of Beira LC, which is under the sea level, while the livelihoods of its inhabitants in both regions make them very dependent on subsistence agriculture.
Furthermore, the FM experiments showed that some of the misclassified pixels belong to some small natural depressions. These depressions on the Earth’s surface occur everywhere and are able to retain water in a flood event. Inside a particular depression, if one pixel at a certain elevation is correctly classified as flooded, then it should be expected that the other pixels at the same elevation are likely flooded. Some of these bottlenecks could be mitigated by exploiting information from DEM, as we can estimate the depth of the water in flooded areas and correct the misclassified pixels. Therefore, for improving the flood detection in the risk zones, further investigation of DEM information should be considered.