2.1 Imaging System
The imaging system was composed of a four-band Parrot Sequoia multi-spectral camera. Due to its small size and light weight, this camera can be mounted on several platforms (aircraft, terrestrial vehicles, or unnamed aerial vehicles (UAV)) to carry out radiometric measurements. Collected data can be extracted from the camera in three different ways: via USB, WiFi and SD card. Parrot Sequoia camera has a sensor resolution of 1280 × 960 pixels, 1.2 megapixels, a size of 4.8 × 3.6 mm, and collects data in discrete spectral bands: Green (GRE, 550 nm, 40 nm bandwidth), Red (RED, 660 nm, 40 nm bandwidth), Red Edge (REG, 735 nm, 10 nm bandwidth) and Near Infrared (NIR, 790–40 nm bandwidth), with a 10-bit depth. A sunshine sensor can be mounted together with the camera for accurate radiometric correction. The device is a fully-integrated sunshine sensor that captures and logs the current lighting conditions and automatically calibrates outputs of the camera so measurements are absolute. Figure
2 shows a Sequoia camera with the imaging sensors (NIR, RED, GRE and REG) and the sunshine sensor.
Fig. 2
Parrot Sequoia camera (the imaging and sunshine sensors are shown).
2.2 eCognition Software
Pixel-based classifications have difficulty adequately or conveniently exploiting expert knowledge or contextual information. Object-based image-processing techniques overcome these difficulties by first segmenting the image into meaningful multi-pixel objects of various sizes, based on both spectral and spatial characteristics of groups of pixels (Flanders
et al.,
2003).
eCognition Developer is a powerful development environment for object-based image analysis. It is used in earth sciences to develop rule sets (or applications for eCognition Architect) for the automatic analysis of remote sensing data. Trimble eCognition software is used by Geographic Information System professionals, remote sensing experts and data scientists to automate geospatial data analytics. This software classifies and analyses imagery, vectors and point clouds using all the semantic information required to interpret it correctly. Rather than examining stand-alone pixels or points, it distils meaning from the objects’ connotations and mutual relationships, not only with neighbouring objects but throughout various input data (Geospatial,
2022).
In this work, eCognition developer version 9.5.0 has been used to classify pixels using Object-based image-processing techniques.
2.3 Field Experiment
The study site was located in Córdoba, Spain (349042, 4198307 UTM coordinates, zone 30). It was a sunflower (
Helianthus annuus, L.) crop when the plants had about four to eight leaves and a height of approximately 8 to 20 cm. Moreover, the soil contained some weed species in early development (
Chenopodium album, L.,
Convolvulus arviensis, L.; and
Cyperus rotundus, L.). A rectangular frame of 57 × 47 cm was used as reference to scale and identify equivalent points in all bands. The frame was placed on the ground wrapping some sunflower and weed plants up (Fig.
3). A picture was taken with the multi-spectral camera, which was mounted on a platform, including the power supply, voltage regulators and a monitor. This platform was transported by an operator who framed the scene, while another operator was in charge of shooting the Sequoia camera through the mobile phone’s WiFi. The distance from the camera to the plants was approximately between 2 and 3 m, which is equivalent to distances used when terrestrial vehicles or UAV at low flight altitude are used as platforms (Louargant
et al.,
2018). A total of 26 multi-spectral images of the sunflower crop were considered in the experimental study (each one consisting of 4 bands).
Fig. 3
Raw image where the frame used in the experiment is shown.
2.6 Multi-Spectral Image Composition
After correction of lens and perspective distortions, images in each band may be different in size due to errors inherent in the correction processes: error when selecting the vertices of the frame, error in the correction of lens distortions, and error in the correction of perspective. In addition, the co-registration process also involves the commission of an error that accumulates to the previous ones. Image co-registration has been carried out setting as reference the band of the greatest resolution image. Through an affine transformation, the Toolbox of Matlab makes the co-registration of the rest of the bands by fitting every frame vertex to its equivalent in the reference band. The composed multi-spectral images are the inputs of the classification process.
Figure
10 shows the whole band-to-band co-registration process for the composition of a multi-spectral image from the raw images of the sunflower crop. Concretely, the NIR band is taken as an example. Figure
10(d) is an RGB representation of the three most representative bands (GRE, NIR and RED). These RGB images will be the input for the Deep Learning methods explained in Section
2.8.
Fig. 10
(a) Raw photograph in the NIR band for sample 38; (b) NIR band of sample 38 after lens distortion correction; (c) NIR band after perspective correction taking as reference the four inner corners of the frame; and (d) RGB representation of three overlapping bands (GRE, NIR and RED).
2.7 Classification Process
In this section, a novel algorithm for multi-spectral imaging classification based on e-Cognition is described. Figure
11 shows a scheme of the classification processes carried out in this work. The inputs of the process are the previously generated multi-spectral images and the output is the classification of all the pixels of the image in three classes: sunflower, weed and soil. It is worth noting that in the experiments carried out it has not been possible to distinguish the sunflower from the weed only using the spectral signature. The spectral signature in close-up photography (from 2 to 3 metres in our case study) is highly variable, as each leaf has a different level of illumination, there are shadows within the same plant and the plants contain several parts (branches, leaves in different stages of growth, etc.) with their respective spectral signatures. Therefore, it was also necessary to use geometric factors to differentiate sunflowers from weed. These geometric factors are the assumption that the sunflower plant is larger and more developed than the surrounding weed.
Fig. 11
Scheme of the classification process.
Here we focus on identifying sunflower plants using a seedling growth strategy. These kind of algorithms are based on a set of initial points called seeds that grow annexing adjacent regions that have similar properties (e.g. texture or colour) (Baatz and Schäpe,
2000). These seeds have been determined using the most descriptive leaves of sunflowers, taking into account factors of light, shape and size. In the process three main blocks can be distinguished: vegetal matter-soil classification, sunflower node detection and an iterative region growing algorithm. The steps carried out in each block are described as follows.
In the vegetal matter-soil classification, two steps are computed. Firstly, a clusterization based on multi-resolution segmentation with the spectral signatures is carried out, using the four bands (GRE, NIR, RED and REG) and the Normalized Difference Vegetation Index (NDVI). NDVI is an indicator of the presence of vegetal matter (Weier and Herring,
2000), and it can be formulated as follows:
According to this index, plants have positive values between 0.2 and 1. In the study case, NDVI is considered to classify every cluster as vegetal matter or soil. Figure
12(a) illustrates the output of this block for sample 38. Any cluster with an NDVI value greater than 0.2 is classified as vegetal matter and printed in green colour. Finally, we group all vegetal matter clusters into nodes and discard false positives (nodes equal or less than 0.03% of the total size of the image) (see Fig.
12(b)).
Fig. 12
(a) Output image for sample 38 after executing the block vegetal matter-soil classification. Vegetal matter and soil are printed in green and black colour, respectively; (b) Image after grouping of vegetal matter into adjacent nodes and removing cluster less that 0.03% of the total size of the image; (c) Image after executing the multi-spectral quadtree segmentation; (d) Image with the square-shaped clusters marked (seeds); (e) Image after the first stage of the region growing algorithm; (f) Image of the second stage of the region growing algorithm. Potential sunflowers are marked in yellow and potential weed in red colour; (g) Image of the third stage of region growing algorithm; (h) Output image of the classification process from the eCognition visor. Yellow areas represent vegetal matter classified as sunflower, red areas represent vegetal matter classified as weed and black areas identify soil; and (i) Output image of the classification process.
The next block of the classification process (see Fig.
11) is the sunflower node detection. Vegetal matter is segmented using a multi-spectral quadtree algorithm, to identify large square areas belonging to the same leaf (see Fig.
12(c)). The four bands and the derived NDVI layer are considered as inputs for this quadtree based segmentation. A large scale parameter (in an order of 25000 variations of colour) is required to obtain large squares that point to the most relevant leaves of the plants of interest. The scale parameter is an important element when performing segmentation in remote sensing, and its correct estimation is the subject of study in several articles (Drăguţ
et al.,
2014; Yang
et al.,
2019; El-naggar,
2018). The input images have a colour depth of 16 bits (65536 values). To obtain a multi-spectral quadtree segmentation result adjusted to the size of the sunflower leaves, an average variation of 7.5% (5000 colour variations on average in each of the 5 layers used) of the colour in each band has been estimated. Since there are 5000 possible variations on average in each of the 5 layers, the considered scale parameter has been 25000.
The next step consists of marking as sunflowers the previously obtained clusters because of their size, shape and luminosity level. The best cluster candidates to be sunflower leaves are large, almost perfectly shaped squares with a high level of luminosity in the NIR band. This step focuses on the NIR band because it is the brightest band and this allows to better identify the most developed and descriptive leaves to classify a sunflower plant.
The two factors for classifying these clusters as sunflowers have been luminosity level per size of cluster and compactness. Figure
12(d) shows the nodes marked as sunflowers.
The last block of the classification process (see Fig.
11) is based on a region growing algorithm. The region growing algorithm uses potential square-shaped clusters that can be part of the same leaf. To identify potential candidates to grow, a derived layer called Diff_RED_NIR is computed by adding the mean difference to neighbour clusters in RED and NIR bands. The objective of this new layer is to identify square-shaped clusters with similar values in the RED and NIR layers.
The region growing algorithm takes place in three stages. The first stage aims to mark as sunflower the neighbour vegetal matter clusters to the sunflower nodes. For this study case, the threshold is Diff_RED_NIR > −1000. The threshold has been calculated to have mean value differences between the two bands (RED and NIR) of less than 1% relative. This means that we are growing by adding very similar clusters (with a very small relative level of difference in those two bands).
To avoid growing into twigs that may be part of other plants, a compactness <1.5 is required for the neighbour clusters to be marked as sunflower. This stage is done ten times per iteration. Figure
12(e) shows the results of the first step of the region growing algorithm.
In the second stage of the iterative process, areas of vegetal matter enclosed by objects classified as sunflower are themselves marked as sunflowers. All sunflower regions are merged and grow to any neighbour cluster with a relative border greater than 30%. This means that growth occurs with clusters with which more than one side is shared, which would normally be 25%. The resulting sunflower objects are merged and all the other objects are classified as weed (see Fig.
12(f)). The final stage of this block is to compute a multi-spectral quadtree segmentation in the merged weed area, which considers as input the calculated layers, the bands and a scale parameter of 25000 (as it was explained in the sunflower node detection process). Obtained image of this step is depicted in Fig.
12(g). The iterative process is carried out three times since more iterations result in marginal growth. This iterative process results in clusters with a high potential to be the most representative leaves of sunflowers.
The last step of the region growing algorithm is to consider as sunflower any weed cluster with more than
$5\% $ of the common border. These will be the darker parts and edges of the sunflower plant. It is possible at this point to accept small pieces of weed as false positives. Moreover, any small sunflower clusters are eliminated, since sunflower plants are expected to have a greater size than the weed. It is possible to falsely identify some nodes as sunflowers if the weeds form a big cluster of small leaves that can be misidentified as a large sunflower leaf. These false positives are removed by classifying any cluster of sunflower with an area inferior to 10% of the biggest sunflower cluster (3500 pixels as the threshold in our case as a reference) as weed. The output of this step is shown in Fig.
12(h). Figure
12(i) shows the output of the classification process, where yellow, red and black colours identify sunflower, weed and soil regions, respectively.
A general scheme of the developed classification method can be found in Algorithm
1.
Algorithm 1
Proposed classic computer vision algorithm
2.8 Deep Learning Approaches
The comparison of the proposed classificator (based on classical techniques) with other state-of-the-art classification models is of great interest. Widely used Deep Learning-based segmentation methods are U-Net and Feature Pyramid Network (FPN) and we have implemented both is this paper.
U-Net is an encoder-decoder based model, these kind of models are the most popular segmentation models based on Deep Learning (DL). U-Net was proposed in Ronneberger
et al. (
2015) for image segmentation in the medical and biomedical field. The U-Net strategy is to supplement a usual contracting network by successive layers, where pooling operations are replaced by upsampling operators for achieving higher resolution outputs on the input images. On the other hand, FPN is a feature extractor that takes a single-scale image of an arbitrary size as input, and outputs proportionally sized feature maps at multiple levels, in a fully convolutional fashion (Lin
et al.,
2017). This approach combines low-resolution, semantically strong features with high-resolution, semantically weak features via a top-down pathway and lateral connections.
Deep Learning models need to use pre and post-processing techniques for improving the quality of the obtained results in complex problems. Some of the most important methods are: transfer learning, fine-tuning and data augmentation methods. We have considered these three techniques in both U-Net and FPN implementations.
The transfer learning technique considers the use of pre-trained models on a database of millions of instances (such as ImageNet, Russakovsky
et al.,
2014) as a starting point to initialise the weights of the network, thus taking advantage of previously learned knowledge. With the fine-tuning technique, the new model only has to train the last few layers, thus taking less time to obtain favourable results. Finally, the Data Augmentation (DA) technique is used to increase the number of input images by applying variations to the starting images. In this way, the model will have a larger database for training and thus make the model more generalisable. Taking into account such techniques, we have prepared three training datasets as follows.