Introduction

Sweet potato (Ipomoea batatas (L.) LAM) is among the most consumed vegetables in Brazil because it is a rich source of carbohydrates, fiber, vitamins, minerals, and antioxidants and because of its low glycemic content1. In addition to multiple uses such as bioethanol production2, sweet potato is a vegetable that plays a relevant role in supplying raw materials for human and animal food3,4.

The main challenge for commercializing this vegetable is associated with its market value since it is strongly dependent on the qualitative characteristics of the roots, such as the shape5. Hence, consumers prefer products that have adequate commercial standards and good appearance. Thus, although deformed products have the same nutritional value as commercial roots, they are often discarded by consumers5, becoming a source of food waste.

It is necessary to invest in information and new technologies associated with the genetic improvement of vegetables for commercial purposes to increase products’ productivity and quality and minimize the losses of the rural producer1. However, in selecting the best genotypes, it is necessary to evaluate many quantitative and qualitative characteristics, which is an expensive and subjective process making the analysis difficult for the breeder6. In this sense, the adoption of new technologies associated with the phenotyping process represents an advance, and among the possibilities, we have image analysis connected to computational intelligence.

Using strategies that allow the acquisition and analysis of data from agricultural environments can help optimize current practices, promoting increased productivity, better quality control processes, and flexibility in agricultural management7. Moreover, these new technologies aimed to improve the accuracy and speed of phenotypic measurements have been the subject of intense research in recent years8, such as in the development of phenotyping platforms9, automated high-efficiency phenotyping systems10, and characterization of phenotypes in sweet potato using images5. We may use convolutional neural networks (CNNs) to automate the interpretation of these images.

Convolutional neural networks (CNNs) have become popular for object detection because they can classify objects and extract image descriptors11. Furthermore, they can achieve high performances for different classification and detection problems, achieving faster inference time and higher detection rates than traditional computer vision methods12. Thus, the association of images to convolutional neural networks has already been used for the automatic quantification of ears of wheat in the field13 and for the classification of the milling fraction of lentils and peas14.

In genetic improvement, the convolutional neural network has already been approached to predict phenotypes from genotypes15.Thus, using images associated with deep learning neural networks can be an alternative for efficiently classifying roots into commercial and non-commercial. Hence, the objective was to verify if convolutional neural networks can help phenotyping sweet potatoes for qualitative traits.

Material and methods

Declaration research involving plants

All methods were performed in accordance with relevant guidelines and regulations. The methodology used does not involve the use of human tests. The methodology was tested on images of sweet potato roots obtained by the authors themselves. Sweet potato is an easily propagated species and is not on the endangered species list.

Installation and evaluation of the experiment

We carried out the experiment at UFMG Agrarian Sciences Institute—Montes Claros-MG Campus (ICA/UFMG) (coordinates: 16°40′58.16″ S and 43°50′20.15″ W) where 16 sweet potato families were evaluated in haplic cambisol under irrigation conditions [BELGARD (F2), CAMBRAIA (F4), LICURI (F5), UFVJM40 (F6), UFVJM01 (F7), ARRUBA (F8), UFVJM05 (F10), UFVJM15 (F13), UFVJM56 (F16), UFVJM31 (F20), UFVJM37 (F22), UFVJM54 (F24), UFVJM25 (F26), UFVJM29 (F27), TCARRO02 (F29), UFVJM09 (F25)].

We obtained the sweet potato half-sibs by collecting seeds from the germplasm bank comprised of elite accessions brought from the Federal University of the Jequitinhonha and Mucuri Valleys(UFVJM) and cultivated at ICA/UFMG. The seeds were collected daily between April and October 2018 and stored in a refrigerator at 4 °C. Subsequently, the seeds were subjected to mechanical scarification with sandpaper to break dormancy (tegumentary impermeability) and planted in 72-cell polystyrene trays with a commercial substrate. The trays were kept in a greenhouse and irrigated daily for two months when the seedlings were ready to be planted.

Planting was carried out in rows in a randomized blocks design (RBD) with 16 families (different progenies) and four replications, with rows spaced 1 m apart and spacing between plants of 0.4 m. Once we made the evaluations at the plant level, we used a larger spacing to facilitate identifying each plant and facilitate the harvest. We carried out fertilization and cultural treatments as recommended for the crop in the New Horticulture Manual16. We used 180 kg ha−1 of phosphorus and 30 kg ha−1 of nitrogen. Thirty days after planting the seedlings, we applied top dressing with 30 kg ha−1 of nitrogen. Potassium fertilization was not necessary according to the chemical analysis of the soil.

At first, we applied sprinkler irrigation every day to keep the soil with good moisture content. After the critical period of crop establishment (2 months after transplanting), we used irrigation twice a week.

Manual harvesting was performed 165 days after planting, carrying out analyses at the plant level when we removed excess soil from the roots to obtain images. We analyzed the variables shape, skin color, and damage caused by insects. The evaluations were carried out according to the descriptors and scoring scales recommended by the International Board for Plant Genetic Resources (IBPGR), elaborated by Ref.17 (Table 1).

Table 1 Score scale related to shape, predominant skin color, and damage caused by insects for half-sib progenies of sweet potato (Ipomoea batatas (L.) Lam).

Concerning shape, we classified the roots as commercial and non-commercial, with those with a more fusiform shape being considered commercial. Regarding skin color, we divided the roots into light-colored (white, cream, yellow, and orange) and dark-colored (pink, red, purple-red, and dark purple). As for the damage caused by insects, we classified the roots according to the presence and absence of damage.

Image generation and processing

We generated images in a “studio” made of MDF (Medium Density Fiberboard) with dimensions of 0.50 × 1.00 m at the bottom and 1.0 m high (Fig. 1). We used a Canon PowerShotSX400 IS digital camera under artificial lighting with a fluorescent lamp. The camera was attached to a support to standardize image generation so that we could get all images from the same height (70 cm) and angle (90°). We placed the roots in front of a black background and spaced them apart without overlapping.

Figure 1
figure 1

Source: Authors (2022).

Image generation: (A) “Studio” for image generation; (B) Artificial lighting with a fluorescent lamp; (C) Image generated in the studio.

We used the ExpImage package of the R software to reduce the resolution and individualize one root per image (Fig. 2). We grouped these images according to their classifications regarding format (commercial and non-commercial), skin color (light and dark), and insect attack (damaged and undamaged).

Figure 2
figure 2

Source: Authors (2022).

Steps for individualizing one root per image in the phenotyping of sweet potato roots by computational image analysis.

We visually classified the images of sweet potato roots considering the variables of root skin color (light and dark), shape (commercial and non-commercial), and damage caused by insects (damaged and undamaged) (Table 2). We divided the original images for each variable into the ones destined for adjustment of recurrent correlation neural networks (RCNNs) and to evaluate goodness of fit (Test). Thus, we set 600 roots of each classification to train the networks, while the remainder was used to verify the goodness of the fit (Table 2).

Table 2 Number of images of sweet potato roots used for phenotyping color, shape, and insect damage (originals) and the number of images destined for the adjustment of RCNNs (Trainig) and evaluation of the goodness of fit (Test).

We replicated each of the images with four different rotations (45°, 135°, 225°, and 315°) to expand the training dataset. Thus, for each classification, there were 3,000 images in training (600 + 4 × 600). We used the python language on the Google Colab platform for training the networks and the Keras library, considering the VGG-16, Inception-v3, ResNet-50, InceptionResNetV2, and EfficientNetB3 architectures. We considered the maximum of 100 iterations and early stopping with a tolerance of 5 iterations.

We constructed confusion matrices to evaluate the adjustments of the convolutional networks, with the classifications predicted by the different network architectures as a function of the visual classifications. We used the metrics Recall (Eq. (1)), Accuracy (Eq. (2)), Precision (Eq. (3)), F-Measure (Eq. (4)), and Specificity (Eq. (5)) to assess the network efficiency18. Where: TP refers to true positives, FN to false negatives, FP to false positives, and TN to true negatives.

$$Recall= \frac{TP}{TP+FN}$$
(1)
$$Accuracy= \frac{TP+TN}{TP+TN+FP+ TN}$$
(2)
$$Precision= \frac{TP}{TP+FP}$$
(3)
$$F-measure= \frac{2*Precision*Recall}{Precision+Recall}$$
(4)
$$Specificity= \frac{TN}{FP+TN}$$
(5)

Results

For each architecture, it was possible to observe different epochs required for training (Table 3). For shape, damage caused by insects, and skin color, the architectures Inception-v3 and InceptionResNetV2 presented the lowest classification times and the lowest number of epochs (Table 3). However, for all analyzed variables, there was a higher rate of true positive (TP) ratings for the InceptionResNetV2 architecture. On the other hand, the EfficientNetB3 architecture was the one that presented the lowest efficiency for detecting shape and damage caused by insects in the sweet potato roots, demanding a higher number of epochs and, consequently, a longer time for classification (Table 3). While for skin color, the ResNet-50 architecture showed lower efficiency.

Table 3 Number of epochs, training time, and classifications performed by different architectures of RCNNs for sweet potato roots regarding shape, damage caused by insects, and skin color. UFMG (2022).

The efficacy of the convolutional neural network architectures in classifying the sweet potato roots according to the shape considered ideal, the damage caused by insects, and the root skin color, was evaluated using scores of precision, recall, F-measure (F1), accuracy, and specificity (Table 4). The InceptionResNetV2 architecture was the one that obtained the best precision, accuracy, and specificity for all analyzed variables. This architecture enabled accuracy close to 78.7% to classify damage caused by insects. This metric was superior to the accuracy obtained by other architectures for the same variable. Concerning the other architectures, all evaluation metrics were below 91% for the three analyzed variables.

Table 4 Evaluators of the goodness of fit for RCNNS with different architectures in classifying sweet potato roots according to shape, damage caused by insects, and skin color. UFMG (2022).

In Fig. 3, to visualize the classification of roots randomly selected from the test samples by InceptionResNetV2, it was possible to note the accuracy of the network in classifying commercial individuals in terms of shape, damage caused by insects, and skin color. Inadequate individuals (red boxes) are those with an irregular shape, damaged by insects, and dark in color.

Figure 3
figure 3

Source: Authors (2022).

Classification of randomly selected roots in the samples from test images by RCNN InceptionResNetV2 for classifying sweet potato roots in terms of shape, damage caused by insects, and skin color. UFMG (2022).

Thus, for the studied population, classification and, consequently, phenotyping of sweet potato roots in terms of shape, damage caused by insects, and skin color, can be performed efficiently and in a shorter response time with the architecture InceptionResNetV2. However, it is worth mentioning that only two classes (adequate and inadequate) are relevant for the initial phase of genetic improvement. We must detail these classes in later improvement steps.

Discussion

Convolutional neural networks (CNNs) are a trend in image information processing due to their adaptability and efficiency in object detection7. The learning process of networks occurs through an iterative process of adjustments applied to the synaptic weights (training) when the neural network reaches a generalized solution for a given problem19. However, the greater the number of iterations used in training, the greater the memorization of data by the networks tends to be, resulting in the non-general character of the system (overfitting). Thus, the fitting process, the heavy computational load, the high tendency of overfitting and the empirical nature of model establishment are the main limitations associated with deep CNNs20. Thus, defining an optimal number of iterations for the analyzed datasets is essential. That can be done by using a strategy called early stopping. For each type of architecture and for each variable used, it was possible to observe different epochs, where the Inception-v3 and InceptionResNetV2 architectures presented the lowest number of epochs and, consequently, the lowest classification times (Table 3). That indicates the reliability and efficiency of the method, in addition to saving time in obtaining results21. On the other hand, the EfficientNetB3 and ResNet-50 architectures were the ones that presented the lowest efficiency in the classification of the studied dataset and also had the highest number of epochs and the longest time to obtain the results.

Selecting the architecture that enables analysis more efficiently allows us to qualify features in large datasets with little labor force. That can help breeders evaluate the interaction genotype x environment more effectively, leading to the identification of potential new cultivars in a shorter period of time5. Thus, the higher the true positive rate identified by each architecture for the variables root shape, damage caused by insects, and skin color, the greater the precision and, consequently, the recall, accuracy, and F-Meansure. That is corroborated in the present study by the InceptionResNetV2 architecture since we obtained better precision, accuracy, and specificity for all analyzed variables, with the precision for this network higher than 91% for the variable shape and skin color and greater than 78% for damage caused by insects (Table 4). By using sweet potato images to train a neural network classifier for sweet potato root shape, Ref.5 obtained lower precision than those in the present study. This difference in accuracy may be associated with the classes used by each researcher to consider an ideally-shaped root. This high accuracy of the architectures used in root classification can further improve the decision-making process in agricultural practices7.

The different responses obtained by the tested architectures are directly associated with the construction of CNN's, and may vary according to size, precision, number of parameters, depth and time per inference22. In this case, the InceptionResNetV2 architecture benefits from the other tested architectures due to the integration of two well-known deep convnets, Inception and ResNet, which contribute positively to the more accurate result in this study. This also suggests that features extracted from different convnets are complementary and improve the model classification efficiency23. The results showed that deeper networks (e.g. InceptionResNetV2) are more efficient in separating the input space into more detailed regions, due to its deeper architecture, which contributes to a better detection of the studied classes.

We saw in Fig. 3 the efficacy of classifying sweet potato roots by computational analysis using the InceptionResNetV2 architecture. For this purpose, bounding boxes were applied, which allowed a demonstration of data categorization. Through this approach, the neural network can classify sweet potato roots for each analyzed characteristic. The high efficiency obtained by the developed system can also be justified by the absence of overlapping objects, allowing greater accuracy in identification24. In addition, another factor that may have influenced the metrics is the sharpness of root coloring since defective areas may have a different color pattern from the rest of the root, improving classification accuracy and providing new information about root quality5. Thus, the easier the distinction of the image RGB matrix in terms of its objects and parts, the greater the chance of success in detecting the evaluated classes25.

We can infer from the results that the developed system efficiently classifies sweet potato roots, making the interpretation process faster, more accurate, and less subjective. That is one of the main differentials of the technique developed since most of the research investigating size and shape characteristics of horticultural crops is carried out on a laboratory scale, not adapted to large-scale production26,27. Therefore, the developed system has great potential to be adapted and used to collect and analyze data by small-scale producers and on large commercial scales. That helps not only in the phenotyping process of the crop but also in separating roots considered commercial or not. For genetic improvement, the quantification of roots belonging to each class may be used as a selection criterion, either individually or simultaneously, adding relevant information for better genetic progress of the crop.

Thus, this approach makes it possible to quantify the loss due to shape deformation and damage caused by insects. However, a prime challenge in implementing this method is the computational power required to process hundreds of thousands of roots, which requires expensive computers. One way to mitigate this challenge is to use cloud servers5. In addition, the developed methodology opens the way to investigate other horticultural crops, indicating the possibility of developing treadmill equipment with cameras for high-scale phenotyping, either for commercial purposes or for genetic improvement.

Conclusions

The InceptionResNetV2 architecture performed better in classifying individuals according to shape, damage caused by insects, and skin color, obtaining high estimates for the parameters used to assess the goodness of fit.

Image analysis associated with deep learning may improve the quality of sweet potato roots, reducing analysis subjectivity and the time of phenotyping the culture.

We believe that the efficiency of the methodologies used can provide valuable information and tools to researchers, substantially contributing to the future development of applications and devices for the classification of sweet potato roots, in order to help rural producers, traders and breeders. However, studies must be deepened in order to incorporate libraries or cloud servers for root classification.