1. Introduction
Object detection is a classic research topic in the computer vision communities. The current large volume of standardized object detection datasets [
1,
2,
3] help to explore many key research challenges that are related to object detection and evaluate the performance of different algorithms and technologies. Especially, the recent popularity and development of deep learning techniques has proved a fact that, given sufficient high-quality annotated image datasets, deep learning approaches [
4,
5,
6] can effectively and efficiently achieve the detection and classification tasks. This results in some practical breakthroughs in many classic applications, including face recognition [
7] and vehicle detection [
8]. However, in some domain-specific object detection applications, there is a huge difference of quality between standardized annotated dataset and practical raw data. This leads us to the obvious question: how could we maximize the utilization of deep learning techniques in practical applications?
Taking an example of typical object detection in smart agriculture application, current pest monitoring task requires precise and pest detection and population counting in static image. In this case, computer vision based automatic pest monitoring techniques have been widely used in real practice. These computer vision techniques deal with pest images that were captured from fixed stationary, and then adopt traditional image processing algorithms to analyze the pest associated features for detection [
9]. During this processing, most of the solutions aim to formulate it as a whole image classification task [
10,
11,
12]. However, in practical applications, wild pest detection that requires not only classification, but also localization might be much more important for pest hazard assessment, since a precise detection performance could provide higher semantic information, such as pest occurrence areas and pest population counting information in the field.
Despite recent deep learning approaches have showed great success in image recognition [
13] or generic object detection applications [
14,
15,
16], they are often intractable to be ready-to-use practical methods for showing satisfied performance on pest detection and classification. Towards this problem, the main reasons are: (1) when comparing with generic object detection, pest detection in the wild remains an open problem due to a challenging fact that many discriminative details and features of object are small, blurred, hidden, and short of sufficient details. These pose a fundamental dilemma that it is hard to distinguish small object from the generic clutter in the background. (2) The diversity and complexity of scenes in the wild cause a variety of challenges, including dense distribution, sparse distribution, illumination variations, and background clutter shown in
Figure 1. These types of scenes might increase the difficulty of applying generic object detection techniques into tiny wild pest detection task.
It is well known that the large-scale image dataset plays a key role in driving efficient model and enables powerful feature representation. In the field of agricultural pest controlling, the first challenge is how to select the field crops and pest species in the large-scale dataset to build hierarchical taxonomy. From the practical point view of pest reduction, we consider the field crops that occupy a larger production of food in the world. Under this consideration, the Food and Agriculture Organization of the United Nations (FAO) reports that rice (paddy), maize (corn), and wheat are three major field crops for food production that could provide 700 M, 1000 M, and 800 M tones in 2019 [
17]. Besides, there is also a large planting area in Asia for rape. Among these crops, certain insects and other arthropods are serious agricultural pests, causing significant crops loss if not controlled. Some of them, e.g. moth larvae (Lepidoptera) directly feed on the rhizome and leaves of crops while others mainly feed on nonharvested portions of the plant or suck on plant juices, such as aphids and leafhoppers [
18]. Being damaged by these pests, an estimated 18–20% of the annual crop production worldwide is destroyed, estimated at a value of more than
$470 billion [
19].
When considering the targeted field crops and pest species, using computer vision for pest monitoring is expected to have a domain specific dataset. However, current public datasets for agricultural pest recognition and detection have several limitations: (1) most of them typically cover a small number of samples [
20,
21], which results in poor generalization, so that the model might not work well on recognizing pests with various attitudes. (2) Many datasets that are target at solving the problem of pest recognition, in which pest objects occupy a large ratio in images [
22,
23]. However, pests always show to be with tiny sizes in real-life scenes. Besides, most of the images in these datasets contain only one insect pest category, which might be unusual in practical pest images. (3) Some of the datasets collect images in laboratory or non-field environment while using trap devices or from Internet, where these pest images hold a highly simple background, making it difficult to cope with the complexity of practical fields [
9,
24].
In this paper, we introduce a domain-specific benchmark dataset, called AgriPest, in tiny wild pest detection, providing the researchers and communities with a standard large-scale dataset of practically wild pest images and annotation, as well as standardized evaluation procedures. Different from other public object detection datasets, such as MS COCO [
1] and PASCAL VOC [
2], which are collected by searching on the Internet, a task-specific image acquisition equipment is designed to build our AgriPest. During this process, we spend over seven years collecting the images due to seasonal and regional difficulty. AgriPest captures 49.7K images of four fields’ crops and 14 species of pests by smartphone in the field environment. All of the images are manually annotated by agricultural experts with up to 264.7K bounding boxes of locating pests. This paper also offers a detailed analysis of AgriPest, where the validation set is split into four types of scenes that are common in practical pest monitoring applications. Benefiting to the practical precision agriculture applications, our AgriPest could provide a large amount of valuable information for precise pest monitoring that could help to reduce crop production loss. Specifically, the current agriculture automation system could deploy a deep learning pest detection method for building effective pest management policy, such as choice and concentration of pesticide, as well as natural enemies controlling and production estimation. We believe our efforts could benefit future precision agriculture and agroecosystems.
The major contributions of this paper lie in three folds:
To the best of our knowledge, the largest scale domain-specific dataset AgriPest containing more than 49.7K images and 264.7K annotated pests is published for tiny pest detection research. This benchmark will significantly promote the effectiveness and usefulness of applications of new object detection approaches in intelligent agriculture, e.g., crop production forecast.
AgriPest defines, categories, and establishes a series of detailed and comprehensive domain-specific sub-datasets. Its first category contains two typical challenges: pest detection and pest population counting. Subsequently, it categories four types of the validation subsets of AgriPest dense distribution, sparse distribution, illumination variations, and background clutter, which are common in practical pest monitoring applications.
Accompanying AgriPest, we build the practical pest monitoring systems that are based on deep learning detectors deployed in the task-specific equipment, in which we give comprehensive performance evaluations of the state-of-the-art deep learning techniques in AgriPest. We believe that AgriPest provides a feasible benchmark dataset and facilitate further research on the pest detection task well. Our dataset and code will be made publicly available.
2. Related Work
The emergence of deep learning techniques has led to significantly promising progress in the field of object detection [
25], such as SSD [
4], Faster R-CNN [
5], Feature Pyramid Network (FPN) [
6], and other extended variants of these networks [
26,
27,
28,
29]. CNN has exhibited superior capacities in learning invariance in multiple object categories from large amounts of training data [
23]. It enables suggesting object proposal regions in the detection process and extract more discriminative features than hand-engineered features. The experimental results on the MS COCO [
1] and PASCAL VOC [
2] dataset show that Faster R-CNN [
5] is an effective region-based object detector towards general object detection in the wild with an Average Precision (AP) up to 42.7% with IoU 0.5. In Faster R-CNN, Region-of-Interest (RoI) pooling is used to extract features on a single-scale feature map. However, targeting at small object detection, FPN [
6] is the state-of-the-art technique for small object detection over MS COCO dataset with AP up to 56.9% with IoU 0.5. By building up a multi-scale image pyramid, FPN enables a model to detect all of the objects across a large range of scales over both positions and pyramid levels. This property is particularly useful to tiny object detection.
Benefitting from the success of these object detection methods, many applications have been developed in recent years [
30,
31,
32]. Towards pest detection in the wild, deep learning methods might not achieve satisfactory performance, because an excellent object detection application using deep learning techniques usually need to be trained by large enough training dataset. Although there exist a few datasets for solving agricultural issues [
33,
34], most public large-scale datasets for tiny objects, especially agricultural pest images, cover limited data volume, which causes deep learning methods on pest detection to be restricted [
21,
22,
23]. Besides, a large number of current pest related datasets are collected in the controlled laboratory or non-field environment, which could not satisfy the practical requirements of pest monitoring applications in the field [
24]. Moreover, these datasets mainly focus on the pest recognition task, rather than pest detection, so the pest objects occupy a large ratio in images [
20]. On the contrary, our proposed AgriPest is built to address practical issues in pest monitoring applications, so all of the images are collected in the wild fields and each pest is annotated with bounding box for detection as well as pest population counting.