Introduction

Universal warming and environmental destruction associated with the release of CO2, the most important greenhouse gas (GHG), have long been known as a significant threat to the future of the environment1,2,3,4,5,6. Several CO2 capture techniques, such as chemical and physical solvent absorption, membrane purification, cryogenic distillation, catalytic conversion, and adsorption processes, have been proposed for the separation and regeneration of CO2 emitted in the atmosphere7,8,9. The most widely applied technique for CO2 removal is the absorption of various solvent types10,11. Choosing an economically solvent system includes two essential criteria: high CO2 selectivity over other gases such as CH4 and N2, and high CO2 solubility. Furthermore, there is minimal solvent loss, low volatility, and a low energy requirement for solvent regeneration12, high thermal stability, high reaction rate, long-term stability, and high thermal stability are significant in the CO2 capture process13. The piperazine (Pz) solvent has many useful properties, including high CO2 capture capacity, fast CO2 absorption rate14,15,16, low regeneration energy and thermal damage, low oxidative damage, and low corrosion compared to MEA, which is widely used, DEA, and another alkyl amine11,17,18. To increase the fundamental ability and improve some of the Pz properties, modified absorbents are provided by adding various amines and metal oxides to the solution19. Modified absorbents have a high capacity for absorption, CO2 selectivity, and affordability in comparison to base fluid, and regeneration energy20. Saturation with potassium hydroxide (KOH) is more impressive in modifying the Pz solution, because KOH is low-cost, less corrosive rate, and is environmentally friendly. Many studies have been performed using Piperazine in an attempt to find effective absorbents for CO2 capture. Moiolo et al. used MDEA and Pz solutions for CO2 absorption and used ASPEN PLUS software for thermodynamic decomposition with the Electrolyte-NRTL technique for systems combining Pz-CO2-MDEA aqueous solutions21. They have analyzed the interaction effect between ion pairs and Pz or MDEA molecular species. Xu et al. proposed a thermodynamic model and studied the Pz concentration effect on CO2 absorption in MDEA solution22. The results demonstrated that Pz is useful for CO2 loading in the Pz-MDEA-H2O system. Liu et al. investigated the CO2 solubility in the Pz-MDEA-H2O system in a wide range of Pz concentrations (0.36–1.36 M) and temperatures (303‒363 K)23. Bishnoi and Rochelle studied the vapor–liquid equilibrium and CO2 solubility in the Pz-MDEA-H2O system using a wetted-wall column24. Bottger et al. published results on the solubility of CO2 in three different solutions of the Pz-MDEA-H2O system25. Furthermore, Speyer et al. investigated the CO2 solubility at low gas loadings of 313–393 K26. Najibi, and Maleki presented new experimental VLE data at different temperatures and pressures ranging from 363–423 K to 26.3–204.3 kPa, respectively27. Halim et al. investigated the performance of CO2 absorption at low pressure in Pz-MEA-H2O and Pz-AMP-H2O systems28. Ume et al. investigated the reaction kinetics of CO2 in the combination Pz with AEPD29. They noted that the absorption rate was significantly higher in the Pz-AEPD-H2O system in comparison to the base fluid.

To optimize the CO2 photo-reduction process, one factor at a time (OFAT) and trial-and-error optimization methods can be used30. The main disadvantage of the OFAT method is that it cannot estimate interactions and cannot explain the full effects of the parameters on the process31,32. Another disadvantage is that it cannot perform optimal agent setting, requires more runs for the same accuracy, which means more time, and incurs additional costs due to reagent costs33,34. In general, the primary goal of optimization is to develop and improve system performance, as well as to increase process efficiency while decreasing costs35. Recently, artificial neural networks (ANNs) and response surface methodology (RSM) techniques have been commonly used to simulate and optimize the CO2 absorption in an amine-based fluid.

The neural networks act like “black boxes” and can be easily applied to pattern recognition36 and predict different parameters as a function of different variables37,38. Neural network-based models are experimental in nature. The most significant benefits of ANNs modeling are: (i) the ease of computational efficiency; (ii) the ability to learn from examples39; (iii) the fact that prior knowledge of the relationships between variables is not required40,41; (iv) the ability to use it in complex non-linear processes where non-linearity can be more consistent with the data42; and (v) the ease of manipulation43. The original purpose of ANNs-based calculation (neuro-calculation) is to extend mathematical algorithms that can learn ANNs by imitating data processing and knowledge in the human brain44.

Response surface methodology (RSM), at first explained by Box and Wilson45, has since been widely applied as a method for designing experiments, for the approximation of the effects of numerous independent factors with their interactions on the observed response, and as an optimizing condition46. Latterly, RSM has been used to reduce the essential experimental data to achieve the best operating conditions for optimal response in multiple chemical processes47,48,49.

RSM is a set of statistical50 and mathematical techniques for optimizing numerous independent variables with fewer experimental tests51,52,53. RSM can help to study the interaction of process variables and create a mathematical model that precisely explains the general process. The most visible and impressive design used in the RSM to adapt a model using the least-squares technique is the central composite design (CCD)54,55. Compared to the Doehelt (DD) and Box–Behnken designs (BBD), the CCD offers advantages such as the need for only a few test points for its high application and performance56. Recently, RSM has been widely applied to appraise the results and performance of operations, even in many industrial processes57.

Several studies have been presented for the use of ANN and RSM for modeling and simulating absorption processes49,58,59,60. Nuchitprasittichai and Cremaschi used RSM to simulate the CO2 removal operation in an amine-based solution in order to identify the design variables and optimal process limitations while keeping the process cost low61. They also compared the RSM result in modality to an ANN prediction and concluded that the RSM technique could estimate the optimal solution information, which is similar to the solution obtained with ANN36. Also, in another study, a prediction algorithm for evaluating the suitable sample size by ANNs is presented as a substitute model62. RSM was used by Morero et al. to evaluate the performance of various solvents in the absorption/stripping process63. The effects of design variables such as pressure, temperature, flow rate, and CO2 concentration and their effects on energy cost and CO2 removal were investigated. Babamohammadi et al. used the CCD method to provide a CO2 solubility model in a mixed solution based on the concentration of MEA and glycerol, gas flow rate, and temperature64.

Sipocz et al. used a multilayer feed-forward form of ANN for CO2 capture65. They validated and developed the model into a conventional heat and mass balance program. In addition, they showed that the ANN model could produce the results of a precision process simulator in a fraction of the time. Basheer and Hajmeer66 applied ANN-based neurocomputing to the ANNs model as a practical guide and toolkit. ANNs were compared with statistical regression, and expert systems and their limitations and advantages were stated. Wu et al. investigated the nature of relationships between the main parameters using ANN approaches and statistical analysis for CO2 absorption from an amine-based solution67. In another study, these researchers analyzed the data for the amine-based CO2 capture domain68. They used the analysis process to clarify the relative importance of the parameters, including the use of ANN to model the relationship between the parameters and sensitivity analysis.

Zhou et al. combined two different approaches to ANN with sensitivity analysis (SA)39. In the ANN and SA combination method, four neural network models were developed, and the sensitivity analysis method has been used to express the relative order of importance of the parameters. The results showed that the concentration of amine is the most influential parameter in achieving the target performance69,70. Table 1 shows last literatures on using RSM models to the performance of CO2 capture.

Table 1 Some studies on the performance of CO2 capture using RSM techniques.

In the present study, the composing effects of KOH and Pz concentrations, temperature, CO2 loading, gas film, liquid side mass transfer coefficient, partial pressure of CO2, and equilibrium CO2 partial pressure of the bulk solution on the CO2 mass transfer flux were investigated. Because the design and statistical models can be applied for modeling and optimization of the process50,80,81,82, the RSM and ANNs were used. In the RSM, the experimental data were examined by fitting a second-order polynomial model, which was statistically confirmed by performing an analysis of variance (ANOVA) and a lack-of-fit test to assess the importance of the model. Moreover, the objective was to demonstrate that the fitted RSM could serve as a tool to perform and optimize control factors in CO2 absorption (mass transfer flux). The goal of our work with the ANN approach was to explain the nature of relationships between the main process parameters using data modeling and analysis approaches. The ANN models were developed based on the relationships between parameters based on the operational data, and the sensitivity analysis method was used to reveal the order of importance among the input parameters for each model.

Experimental setup

The tests were performed in a wetted-wall column shown in Fig. 1. The experimental equipment contains a stainless-steel tube portion with 1.26 cm outer diameter and height of 9.1 cm, CO2 and N2 cylinders with a regulator, and a gas flow meter, pump, heater, and condenser, and two solution reservoirs and saturation tanks. The total contact area and longitudinal area were 38.52 cm2, and 36.03 cm2, respectively. To prepare the liquid and gas contact box, the column is surrounded by a thick-walled glass tube with an outside diameter of 2.54 cm83. The hydraulic diameter and cross-sectional area were 0.44 cm and 1.30 cm2, respectively. Cullinane's experimental data was used to evaluate the ANNs and RSM parameters83.

Figure 1
figure 1

Experimental setup employed in the CO2 absorption experiments: (1) CO2 cylinder; (2) N2 cylinder; (3) Ball valve; (4) Pressure gauge; (5) Flow controller; (6) gate valve; (7) Saturation tank; (8) Heater; (9) Solution tank; (10) Pump; (11) wetted-wall column; (12) Pressure controller; (13) Condenser75.

Theory

Response surface methodology

RSM is one of the most appropriate tools for analyzing, evaluating, and modeling the outcome of various operating interactive variables. RSM was first extended by Box and Behnken84,85. This statement was originated from the graphical representation created since its adaptation to the mathematical model and has been widely used in the field of chemometrics. The RSM model includes mathematical groups and statistical methods derived from experimental models to fit the experimental data about testing settings. For this purpose, second-degree polynomial functions were used to illustrate the process86.

In this study, CO2 mass transfer flux was investigated using RSM. The decision variables are the Pz and KOH concentrations, the temperature of the solution, the CO2 loading, the liquid and gas phase mass transfer coefficients, the CO2 partial pressure of the gas bulk, and the equilibrium CO2 partial pressure of the bulk solution. The aim is to maximize CO2 mass transfer flux. RSM uses the first- or second-order local regression models of the target function to conduct the optimization search by providing an appropriate direction for the performance of the functions87. Optimization can be apportioned into six steps as follows: (1) selection of possible factors and responses88; (2) choose of experimental design strategy82; (3) performance of experiments and obtaining results89; (4) model fit with the experimental data90; (5) obtain response diagrams and confirmation from the model (ANOVA); and (6) obtain the optimal conditions. Figure 2 depicts the optimization algorithm used in this case.

Figure 2
figure 2

The scheme of study for RSM.

The algorithm consists of two major phases. In the first phase, the process simulation is performed using a complete experimental design of factorial determination variables (a 28 design of factorial), and the mass transfer flux is calculated for each run. The first data is set according to the operating conditions presented in the other works83. The initial variables of each solvent are estimated from the usual range used in the literature83. The range of changes in design variables and responses is shown in Supplementary Table S1. In the present work, the central composite design (CCD) was used to optimize the useful variables and parameters and fit a quadratic surface, as well as to explore the interaction between parameters. Generally, the CCD consists 2n factorial runs with 2n axial and nc center run (ten repeats). The relationship between the actual and coded values of the variables is shown in Supplementary Table S2. A 28 full factorial CCD was used, with 256 factorial points, 16 axial points, and ten replicates at the center points. So, 282 experiments were obtained from Eq. (1).

$$N = 2^{n} + 2n + n_{c} = 2^{8} + 2 \times 8 + 10 = 282$$
(1)

Before regression analysis, the intention variables (xi) are encoded at intervals of (− 1, 1), called coded variables. If \(\xi_{i}\) represents the natural variable, the relationship between the natural variables and coded variables is given by Eq. (2).

$$x_{i} = \frac{{\xi_{i} - \xi_{i}^{0} }}{{\Delta \xi_{i} }}\quad {\text{for}}\quad i = \, 1, \, 2, \, \ldots , \, n\;{\text{factors}}$$
(2)

where \(\xi_{i}^{0}\) is the natural variable that located in the domain center and \(\Delta \xi_{i}\) is the variation of the natural variable relevant with the change of one unit of the coded variable. It is calculated as:

$$\Delta \xi_{i} = \frac{{\xi_{i}^{\max } - \xi_{i}^{\min } }}{2}$$
(3)

The minimum and maximum values of the distinct variables are chosen to confirm that the corresponding coded variables applied for the 28 factorial designs and CCD are integer values.

In the second phase, the process simulation is run according to the CCD of the numerous variables. The CCD is a second-order design class that avoids severe condition testing because it does not include the conditions where all factors are at their lowest or highest levels56. One of the disadvantages of using RSM with definite simulations is the lack of change in the results of the repetitive experiments at the center point; therefore, this system does not include a pure experimental error variance to simplify the switch between the second-order and first-order models. Instead, we used the R2 statistics and adjusted R2 (Adj-R2) statistics to evaluate the suitability of the first-order and second-order models to the data collected from the simulation. The step size in the steepest descend path used by the first-order model (Eq. (4)) is \(\Delta x_{i} = \beta_{i} /\beta_{j} /\Delta x_{j}\) where \(\beta_{j}\) is the coefficient of the largest absolute regression. The step size is reduced by half until it is reduced to the target performance.

$$Y = \beta_{0} + \sum\limits_{i = 1} {\beta_{i} } X_{i}$$
(4)

On the other hand, if it is found that the data according to R2 statistic cannot be accurately represented with the first-order model; the adj-R2 statistic is used to affirm that second-order factors (Eq. (5)) have considerable effects on response prediction61,91.

$$Y = \beta_{0} + \sum\limits_{i = 1} {\beta_{i} } X_{i} + \sum\limits_{i = 1} {\beta_{ii} } X_{i}^{2} + \sum\limits_{i = 1} {\sum\limits_{j = i + 1} {\beta_{ij} } X_{i} } X_{j} + \varepsilon \quad {\text{x}}_{{\text{i}}} \;{\text{for}}\quad {\text{i}} = {1},{2}, \, . \, . \, .,{\text{ n}}\;\;{\text{factors}}$$
(5)

The nonlinear programming (NLP) model related to this issue is presented as follow:

$$Y = \beta_{0} + \sum\limits_{i = 1} {\beta_{i} } X_{i} + \sum\limits_{i = 1} {\beta_{ii} } X_{i}^{2} + \sum\limits_{i = 1} {\sum\limits_{j = i + 1} {\beta_{ij} } X_{i} } X_{j} \quad - {1} \le {\text{xi}} \le {1}\;{\text{for}}\;{\text{i}} = {1},{2}, \, . \, . \, .,{\text{ n}}\;{\text{factors}}$$
(6)

where, Y is a function of the predicted response (i.e., CO2 mass transfer flux), Xi and Xj indicate the model value of variables i, and j, β0 is the offset, βi and βii (the effect of interaction coefficient) represent the coefficients of the linear and quadratic parameters, respectively, and ε represents the residual associated with the experiments49,89. Multiple regression analysis of the experimental data according to Eq. (5) was performed by the least-squares method, which produces β coefficients with the least possible residuals. The CO2 mass transfer flux was optimized using RSM. The optimal decision variables (i.e., Pz and KOH concentrations, temperature, loading, mass transfer coefficients, CO2 partial pressure) were created with the response optimizer. After getting data about each practical level of the selected scheme, it is necessary to study the behavior of points of similar response to fit a mathematical equation. This means their parameters b must be approximated from Eq. (5). Thus, in marking the matrix, Eq. (5) can be represented as92:

$$y_{m} x_{i} = X_{m} x_{n} b_{n} x_{1} + e_{m} x_{1}$$
(7)

where b and y are the model and response parameter vectors, respectively, m and n are the numbers of rows and columns of the matrices, respectively, X is the selective experimental design matrix, and e is the residual. Equation (7) is solved using a statistical method called the method of least squares (MLS) 32. A vector b can be computed as Eq. (8) in the following36:

$$b_{n - 1} = (X_{n - m}^{T} y_{m - i} )(X_{n - m}^{T} X_{m - n}^{{}} )^{ - 1}$$
(8)

The significance and fitness of the model were also confirmed using Analysis of Variance (ANOVA) in the Design-Expert software. The central premise of ANOVA is that each measured value is a function of three components consists overall mean value (α), measured factors’ effects on the system response (β), and residual error (ε).

$$Y_{i} = \alpha_{i} + \beta_{i} + \varepsilon_{i}$$
(9)

By converting this model, the following equation can be obtained for the residual error.

$$\varepsilon_{i} = Y_{i} - \alpha_{i} - \beta_{i}$$
(10)

Further variation prepares an equation for the sum of squares for the residual errors.

$${\text{RSS}} = \sum ( Y_{i} - \alpha_{i} - \beta_{i} )^{2}$$
(11)

In ANOVA, data set changes are evaluated by surveying their dispersion. Evaluate the deviation (di) in which each observation (yi) or its iteration (yij) is present concerning the medium (ȳ), or, more precisely, the square of this deviation is represented in Eq. (12):

$$d_{i}^{2} = (y_{ij} - \overline{y} )^{2}$$
(12)

The sum of the square for all examination deviations about the medium is called the total sum of the square (SStot); it can be fragmented into the sum of the square according to the regression (SSreg) and the sum of the square according to the residuals generated by the model (SSres), as shown below 89:

$$SS_{{{\text{tot}}}} = SS_{{{\text{res}}}} + SS_{{{\text{reg}}}}$$
(13)

As the center point is replicated, the pure error associated with the repetition can be estimated. Therefore, the sum of the square for residuals can be divided into two sums of the square for pure error (SSpe) and lack of fit (SSlof) parts, as represented below89:

$$SS_{{{\text{res}}}} = SS_{{{\text{pe}}}} + SS_{{{\text{lof}}}}$$
(14)

By dividing the sum square for each square's source (pure, residual, regression, total, and lack of fit error) by their degree of freedom (d.f.), the “media of the square” (MS) is obtained. The numbers representing the degree of freedom for these sources are represented in supplementary Table S385,86.

To determine the importance of parameters, the p-value test with a minimum of 95% confidence was used for the experimental results. The goodness of fit of the polynomial model was represented by the coefficients of specification, the absolute average deviation (AAD), R2, and adj-R2 through Eqs. (15)–(17). AAD should be as low as possible between the predicted and observed values, and adj-R2 should be close to 1.0. AAD shows the deviations between the experimental and calculated quantities, and adj-R2 displays the ratio of data variables specified by the model32.

$${\text{AAD}}\% = 100\left[ {\sum\limits_{i = 1}^{n} {(\left| {y_{i,\exp } - y_{i,cal} } \right|/y_{i,\exp } )} } \right]/n$$
(15)

where n is the experiment numbers and yi,cal and yi,exp, are the calculated and experimental responses, respectively.

$$R^{2} = 1 - \frac{{SS_{res} }}{{SS_{\bmod el} + SS_{res} }}$$
(16)
$$Adj - R^{2} = 1 - \frac{{SS_{res} /d.f._{res} }}{{(SS_{\bmod el} + SS_{res} )/(d.f._{\bmod el} + d.f._{res} )}}$$
(17)

Here, SS is the sum of squares and, DF is the degrees of freedom. Moreover, some parameters are used to influence diagnostics through the following equations presented in Table 2.

Table 2 Some parameters definitions and their application in the RSM-CCD model.

Artificial neural network

The neural network method (ANN) conforms to the human nerves and brain structure, and it has become trendy over the last two decades93. ANN is a computational and mathematical model that can simulate phenomena by taking patterns from biological neural networks. In the 1940s, the ANN structure was used to classify data for different subjects94. Researchers such as Grossberg, Widrow, Hopfield, and Rumelhart improved the ANN method in the 1980s95,96,97. The benefits of neural networks include their fast processing speed, relationship between input and output data, network compatibility, noise data response, fault tolerance, and learning98. Neurons are the smallest units of data processing. Numbers enter the neuron unit as the input and output data. The input signals of a cell are aggregated together. The aim of ANN is to acquire the suitable values of weights (\(w\)) for a given function (\(f\)). Each input (\(x_{i}\)) is multiplied by the corresponding weight factor (\(w\)), then sum of all the values is computed, and the bios value (\(b\)) or threshold is added to the sum of values. A summary of this process for the input data is expressed in Eq. (18).

$$net = \left( {\sum\limits_{i = 1}^{n} {\left( {\omega_{i} x_{i} } \right) + b} } \right)$$
(18)

The final yields are fed into a transfer function (\(f\)), then the Eq. (19) generates as values (\(y\)):

$$y = f(net)$$
(19)

Transfer functions are typically sloping, linear, step, or sigmoid (\(S\) shape). The neural layers are made up of nerve cells, and the network is made up of one or more layers of neurons.

Multi-layer perceptron (MLP)

Learning ANN means finding algorithms for specifying the weighted relationships between neuron. One of the most popular neural networks used to create nonlinear mappings is the multi-layered perceptron. The MLP function method is based on Eq. (20). In this relation, \(g\) is the output vector, and \(\theta\), \(w\), and \(x^{k}\) indicates the threshold limit, the weighted vector of coefficients, and the input vector, respectively99. A multi-layer preceptor receives, processes, and indicates information via an input layer, one or more latent layers, and an output layer.

$$g = f(wx_{i}^{k} + \theta )$$
(20)

Supplementary Figure S1 shows the MLP neural network's structure, which includes the input layer, hidden multilayers, and output layer. In this approach, the training algorithm process is divided into two parts: the forward pass and the backward pass, to lower the average squares of general error. The input vector is fed into the network in the forward pass, and the network output is generated, while the error is determined in the backward pass as the ratio between the network and experimental data outputs. This output moves from the existing layer to the back layer, then the weights in the input layer change. The weights are corrected so that the mean square of the total errors is minimized. Finally, the training process ends with weight correction and bias. In this study, the Bayesian training method is applied to solve the network, which can be used to train the NN feed-forward propagation algorithm. The amount of weights and biases is first assumed to be related to a distribution function with an unknown variance in this method98,100. The MLP neural network output can be developed as follows:

$$\gamma_{jk} = F_{k} \left( {\sum\limits_{i = 1}^{{N_{k - 1} }} {w_{ijk} \gamma_{i(k - 1)} + \beta_{jk} } } \right)$$
(21)

where \(\gamma_{jk}\), and \(\beta_{jk}\) are the neuron j’s output from k’s layer and bias weight for neuron j in layer k, respectively. The wijk is the link weights that were chosen at random at the start of the network training process, and Fk is the nonlinear activation transfer functions, which may take many various forms such as identity, bipolar sigmoid, binary step function, binary sigmoid, linear, and Gaussian functions101.

Radial basis function (RBF) network

For the first time, Brodhead and Lowe in 1998102 introduced the radial-based functions (RBF) network, which is a kind of forward-feed network with a single hidden layer. Although the RBF approach is similar to MLP, the definition of single hidden layer neurons differs significantly. The data from the input layers is gathered in the single hidden layer and transmitted to the Gaussian transfer function, causing the data to become nonlinear. The transfer functions between the input layer and the single hidden layer are nonlinear in the RBF neural network, whereas the transfer functions between the single hidden layer and the output layer are linear. The distance (geometric or Euclidean size) between both the input vector and the weights is measured by the hidden neurons in RBF networks. Linear combiners are the RBF output layer, as defined by Eq. (22).

$$f(x) = \sum\limits_{i = 1}^{N} {w_{ij} G(\left\| {x - c_{i} } \right\|} *b)$$
(22)

N, \(w_{ij}\), x,\(c_{i}\), and b are the number of training data sets, the weight associated with each hidden neuron, the input variable, the center points, and the bias parameters, respectively. Equation (23) yields the centralized response from the concealed location using a Gaussian function:

$$G(\left\| {x - c_{i} } \right\|*b) = \exp \left( {( - \frac{1}{{2\sigma_{i}^{2} }}(\left\| {x - c_{i} } \right\|*b)^{2} } \right)$$
(23)

where, \(\sigma i\) is the Gaussian function's spread. Supplementary Figure S2 shows the RBF NN structure, which has an input layer that includes temperature, pressure, and concentration, a single hidden layer with neurons, and an output layer that includes.

Design procedure of ANN structure

The flowchart diagram for the ANN model design process is demonstrated in supplementary Figure S3. In the first phase, we have data collection. The thermodynamic and process parameters, such as the solution concentration of KOH and Pz, temperature, CO2 loadings, gas side mass transfer coefficient (kg), liquid film mass transfer coefficient (kl), the partial pressure of CO2 at the gas–liquid interface (\(p_{{{\text{CO}}_{2} ,b}}\)), and the equilibrium partial pressure of CO2 in bulk (\(p_{{{\text{CO}}_{2} }}^{*}\)) as input data and absorption flux of a gas into a liquid (\(N_{{{\text{CO}}_{2} }}\)) was used as targets data. In the next phase, the ANN model is defined by the input and output or target variables and parameters. The network architecture was developed in the following stage with the selection of learning algorithms using the input and target normalized data as machine language. In the fifth stage, the input and output data are rectified if required, and the ANN model training method is utilized, together with network training validation, to match the data and choose the training, including network input and output adjustment. The selected model is taught using a set of training data to maximize the model's performance by setting network parameters such as weight, bias, and threshold. The network validation data is enhanced using a set of data collected during the training, and the network testing process is guided by the test data. When generalizations have improved, the training procedure comes to an end. To compare the model outputs with the data set, statistical metrics such as the R2 value and the mean square error (MSE) are used to evaluate the trained model's performance. The optimum network pattern is chosen and developed in the final phase. Numerous setups were evaluated in the development of the MLP model, and the performance of the network was optimized by changing the number of hidden layers, the number of neurons in the hidden layers, and the network training algorithm to obtain the best network for predicting the absorption flux of a gas into a liquid (\(N_{{{\text{CO}}_{2} }}\)). How to select neurons in the RBF network is typically a trial-and-error process in which the learning algorithm begins with a large number of neurons in the single hidden layer and then reduces this number as much as possible. This reduction of neurons is associated with minimizing the proportional error, and when the optimized error is obtained from testing data, the training process of the algorithm ends.

Result and discussion

RSM-CCD approach

The RSM-CCD model was used to analyze various parameters affecting the performance of CO2 mass transfer flux. The independent variables are the different values that should be specified by the experiments. All of these independent variables were examined at five levels. Before using the RSM, an experimental design was chosen that determined what experiments should be performed in the experimental area under survey as a set of different combinations of the independent variable levels. In this work, the effects of eight variables, including KOH concentration (X1), Pz concentration (X2), temperature (X3), CO2 loading (X4), gas film mass transfer coefficient(X5), liquid side mass transfer coefficient (X6), the gas bulk partial pressure of CO2 (X7), and equilibrium CO2 partial pressure (X8) on CO2 mass transfer flux were investigated using RSM.

Analysis of variance (ANOVA)

The central composite responses are studied, and the analysis of variance (ANOVA) results are presented in Table 3. To find an appropriate model and evaluate the accuracy of the RSM method, various indicators such as the F-value and p-value are used. This comparison is performed using the value of Fisher's F-test as indicated in Table 3. According to ANOVA tables, p-values for the models of the CO2 mass transfer flux are less than 0.0001; therefore, the models are significant (p-value < 0.05). Besides, RSM analysis is used to generate semi-empirical correlations, which link process and response parameters. The effects of independent variables on \(N_{{{\text{CO}}_{2} }}\) are presented in Eqs. (24) and (25) as the coefficients of the second-order polynomials, in terms of coded and actual factors.

$$\begin{aligned} {\text{Actual Equation}} \hfill \\ N_{{CO_{2} }} =& - 56.638 + 9.585X_{1} - 22.689X_{2} + 4.805X_{3} + 284.685X_{4} + 18.775X_{5} - 506.306X_{6} \hfill \\ &+ 0.027X_{7} - 0.024X_{8} + 1.427X_{1} X_{2} - 0.277X_{1} X_{3} - 34.675X_{1} X_{4} - 2.760X_{1} X_{5} + 47.172X_{1} X_{6} \hfill \\ &+ 4.10 \times 10^{ - 4} X_{1} X_{7} - 3.89 \times 10^{ - 4} X_{1} X_{8} - 0.995X_{2} X_{3} - 24.34X_{2} X_{4} - 0.385X_{2} X_{5} \hfill \\ &+ 116.039X_{2} X_{6} + 5.66 \times 10^{ + 4} X_{2} X_{7} - 14.10 \times 10^{ + 4} X_{2} X_{8} + 14.073X_{3} X_{4} + 0.289X_{3} X_{5} \hfill \\ &- 13.628X_{3} X_{6} + 1.90 \times 10^{ - 5} X_{3} X_{7} + 4.40 \times 10^{ - 5} X_{3} X_{8} - 6.237X_{4} X_{5} - 609.01X_{4} X_{6} \hfill \\ &- 0.0417X_{4} X_{7} + 0.037X_{4} X_{8} - 23.018X_{5} X_{6} - 3.58 \times 10^{ - 4} X_{5} X_{7} + 1.09 \times 10^{ - 4} X_{5} X_{8} \hfill \\ &+ 0.006X_{6} X_{7} - 0.012X_{6} X_{8} - 1.239 \times 10^{ - 7} X_{7} X_{8} - 1.114X_{1}^{2} + 0.159X_{2}^{2} + 0.052X_{3}^{2} \hfill \\ &+ 213.88X_{4}^{2} - 0.404X_{5}^{2} + 820.059X_{6}^{2} + 7.879 \times 10^{ - 9} X_{7}^{2} + 2.048 \times 10^{ - 7} X_{8}^{2} \hfill \\ \end{aligned}$$
(24)
$$\begin{aligned} {\text{Coded Equation}} \hfill \\ N_{{CO_{2} }} = &- 117.67 - 50.14X_{1} + 19.12X_{2} + 129.54X_{3} - 186.91X_{4} - 408.08X_{5} + 15.12X_{6} + 235.39X_{7} \hfill \\ &- 240.58X_{8} + 3.96X_{1} X_{2} - 26.03X_{1} X_{3} - 16X_{1} X_{4} - 80.86X_{1} X_{5} + 37.53X_{1} X_{6} + 22.8X_{1} X_{7} \hfill \\ &- 13.22X_{1} X_{8} - 75.82X_{2} X_{3} - 9.11X_{2} X_{4} - 9.14X_{2} X_{5} + 74.85X_{2} X_{6} + 25.5X_{2} X_{7} - 38.81X_{2} X_{8} \hfill \\ &+ 51.63X_{3} X_{4} + 232.19X_{3} X_{5} - 1297.69X_{3} X_{6} + 29.51X_{3} X_{7} + 41.21X_{3} X_{8} - 24.64X_{4} X_{5} \hfill \\ &- 65.34X_{4} X_{6} - 312.15X_{4} X_{7} + 168.76X_{4} X_{8} - 156.73X_{5} X_{6} - 170.1X_{5} X_{7} + 31.59X_{5} X_{8} + 82.9X_{6} X_{7} \hfill \\ &- 91.71X_{6} X_{8} - 73.29X_{7} X_{8} - 3.81X_{1}^{2} + 0.357X_{2}^{2} + 133.19X_{3}^{2} + 13.31X_{4}^{2} - 101.39X_{5}^{2} + 151.63X_{6}^{2} \hfill \\ &+ 7.1X_{7}^{2} + 69.02X_{8}^{2} \hfill \\ \end{aligned}$$
(25)
Table 3 ANOVA for response surface quadratic model.

The Eqs. (24) and (25) can be used to predict the response for given levels of each factor. The coded equation is useful for identifying the relative effect of the factors by comparing the factor coefficients. The actual equation should not be used to determine the relative effect of each factor. Because the coefficients are scaled to accommodate the units of each factor, there is no interruption in the center of the design space. In Eqs. (4) and (25) a positive value indicates the effect that causes the optimization, while a negative value represents the inverse relationship between the responses and factors. The quadratic model is accepted as it is selected by the software for the response. The R2 (coefficient of determination) and Adj-R2 (adjusted-R2) models are 0.9822 and 0.9795, respectively. This value showed a good fit between the modeled value and the experimental data point (Fig. 3A and B). Also, this means that 98.22% of the changes in \(N_{{{\text{CO}}_{2} }}\) are explained by the independent variables.

Figure 3
figure 3

The CCD Predicted value of mass transfer flux; (A) actual absorption vs. predicted, (B) externally studentized residuals vs. normal plot (C) predicted vs. internally studentized residuals, and (D) predicted vs. externally studentized residuals.

As shown in Fig. 3A and B, the residuals are placed on a straight line and are normally distributed. The predicted and actual values for all responses were close to each other. This observation indicates that these models are appropriate for the experimental data and can be used to analyze and predict the \(N_{{{\text{CO}}_{2} }}\) performance.

The least-squares residuals are a powerful tool for checking the models’ adequacy. The hypothesis of constant variance at specific levels in Fig. 3C and D was considered by plotting the residual obtained from the model versus the predicted response values. There is a random distribution of points above and below the x-axis between − 3.8419 and + 3.8419 and − 3 to + 3 for externally and internally studentized residuals, respectively, as shown in Fig. 3C and D. This conclusion examines the adequacy and reliability of the models, and a constant variance is observed via the response amplitude. As an additional tool to check the adequacy of the final model, the normal probability diagram of the studentized residuals is represented in Fig. 3B. If the model is appropriate, the points on the normal probability graph versus the residuals should form a straight line. In these graphs, the points follow a straight line and confirm that the errors are typically distributed with a constant and zero but indeterminate variance as the primary hypothesis of the studies. Thus, all graphs were shown to be desirable, and there was no reason to reject the results. ANOVA and the statistical parameters for the CO2 mass transfer flux from the CCD are introduced in Table 3. The model fit analysis was performed using an ANOVA test and a lack of fit. An experiment's data will fit well with the model if it has significant regression and does not show a significant lack of fit. The statistical significance was assessed using the factors, interactions, and p-value of the model.

As can be seen from Table 3, the predicted R2 of 0.9740 is in reasonable agreement with the adjusted-R2 of 0.9795; i.e., the difference is less than 0.2. Adeq precision measures the signal-to-noise ratio. A ratio greater than 4 is desirable. The ratio of 135.873 indicates an adequate signal. This model can be used to navigate the design space. The model p-value for CO2 mass transfer flux is smaller than 0.0001, which indicates that the model parameters are significant. The P-values less than 0.050 indicate that model terms are significant. The values greater than 0.100 indicate the model terms are not nominal. If there are many insignificant model terms (not counting those required to support hierarchy), model reduction may improve the model. Also, the model F-value of 360.43 implies the model is significant. There is only a 0.01% chance that an F-value this large could occur due to noise.

Interaction of factors

In this study, version 11.0 of Design-Expert was used to represent three-dimensional (3-D) response surfaces and segmentation of the data set in supplementary Table S1. The 3-D response surface curves for mutual understanding of variable parameters (i.e., KOH and Pz concentrations, temperature, CO2 loading, gas film, and liquid side mass transfer coefficients, the gas bulk partial pressure of CO2, and equilibrium CO2 partial pressure), and determining the best level of each variable for maximum response in CO2 mass transfer flux were plotted in Fig. 4. Also, in these figures, the graphs were used to identify the best range of all eight variable factors. The label lines on the scheme and the different colors of the graphs and response surfaces represented the types of degrees of interaction based on the CO2 mass transfer flux (see Eqs. (24), and (25)).

Figure 4
figure 4

Response surfaces plots of CO2 transfer flux variation with KOH concentration as a function of; (a) gas film side mass transfer coefficient, (b) liquid film side mass transfer coefficient, (c) CO2 loading, (d) Pz concentration, (e) temperature, and (f) equilibrium CO2 partial pressure.

The interaction of the two variable parameters (KOH concentration and equilibrium CO2 partial pressure) on the mass transfer flux was significant compared to the other six parameters shown in Fig. 4. The interaction effects of the KOH and Pz concentrations, CO2 loading, gas film mass transfer coefficient, liquid side mass transfer coefficient, and equilibrium CO2 partial pressure were similar. However, the slope of the curve at the three-dimensional response surface (see Fig. 4b and f) showed that the kl and \(p_{{{\text{CO}}_{2} }}^{*}\) a higher effect on the mass transfer flux compared to the other parameters level of setup. In conclusion, these two parameters played an essential role in influencing the mass transfer flux, which was consistent with the results obtained from the regression model ANOVA. The effect of these two parameters on the mass transfer flux comes from the number of CO2 molecules in the liquid bulk, which causes an enhancement in the reaction between the free Pz and CO2 molecules and leads to an increase in the mass transfer flux. In contrast, according to the RSM, the optimum value of mass transfer flux decreased (blue area in Fig. 4e) when the temperature and the bulk CO2 partial pressure increased to the given level, which suggested that reaction rate is the principal parameter in controlling the CO2 absorption, too. As seen from the 3-D response surface in Fig. 4a, c, and d, the \(N_{{{\text{CO}}_{2} }}\) was modified with increases in the Pz concentration, CO2 loading, and gas film side mass transfer coefficient. It was apparent from the figures that there was weak interaction between the gas bulk partial pressure and another parameter. In contrast, the importance of the interaction between the gas film side mass transfer coefficient and the other parameters was great.

Process optimization

One goal of this study is to observe independent variables (i.e., CKOH, CPz, T, CO2 loading, kg, kl, \(p_{{{\text{CO}}_{2} ,b}}\), and \(p_{{{\text{CO}}_{2} }}^{*}\)) in some way to obtain maximum mass transfer flux performance. According to the experiments performed, the optimization of the response surface method has proposed various combinations of variables to obtain a mass transfer flux performance of more than 95%. In this method, an optimization run with 48.6 kmol/m2.s of CO2 mass transfer flux was chosen. This optimal point can be acquired within the following limitations, as shown in Table 4.

Table 4 Optimization of the CO2 absorption by RSM-CCD.

Using numerical optimization, you can select the desired value for each input and response parameter. Here, the possible input optimizations that can be selected include the range, maximum, minimum, target, none (for responses), and adjust to specify an optimized response value for a certain set of specified conditions. In this study, the input data variables were given to determine the range value of the response to obtain the maximum.

The deviation diagram shows the overall effect of all parameters on the response performance, and the center point (0) is the operating midpoint range. The perturbation scheme for comparing the effects of all eight operating parameters (KOH and Pz concentrations, temperature, CO2 loading, gas film, liquid side mass transfer coefficient, gaseous bulk partial pressure of CO2, and equilibrium CO2 partial pressure of the bulk solution) at the reference point is shown in Fig. 5. It was observed from Fig. 5 that the CO2 mass transfer flux increases with the decrease of KOH concentration (A), Pz concentration (B), CO2 loading (D), gas film mass transfer coefficient (E), liquid side mass transfer coefficient (F), and the equilibrium CO2 partial pressure of the bulk solution (H). Nevertheless, the reduction rate of response was further for kl and \(p_{{CO_{2} }}^{*}\) in comparison to the four above-mentioned parameters (KOH concentration, Pz concentration, kg, and kl). It is also clear that the \(N_{{{\text{CO}}_{2} }}\) increases with an increase in the temperature (C) and \(p_{{{\text{CO}}_{2} ,b}}\) (G) due to increase CO2 agents in the interface and then in the solution.

Figure 5
figure 5

Deviation curves for CO2 mass transfer flux with coded factors.

ANN approach

Back-propagation algorithm for the absorption process

To find the optimal method for the ANN, study used three different learning algorithms, including the MLP Backpropagation (BP) algorithms, i.e., Levenberg–Marquardt (LM)103, Bayesian Regularization (BR)104, and scaled Conjugate Gradient (SCG)105. After preparation of the data gathering from the experimental work, the solution concentration of KOH and Pz, temperature, CO2 loadings, gas side mass transfer coefficient (kg), liquid film mass transfer coefficient (kl), the partial pressure of CO2 at the gas–liquid interface (\(p_{{{\text{CO}}_{2} ,b}}\)) and the equilibrium partial pressure of CO2 in bulk (\(p_{{{\text{CO}}_{2} }}^{*}\)) as input data and absorption flux of a gas into a liquid (\(N_{{{\text{CO}}_{2} }}\)) as output data. Training, validation, and test datasets were created using the CO2 absorption data. The network chooses all of the input data at random. Network training accounted for 70% of total data points (237 datasets), whereas network validation and testing accounted for 30% of total data points (15%) (102 datasets). To approximate the network weights and biases, the training dataset was used, and to distinguish the predictability of the developed network, the test and validation datasets were used. In the next step, the input and output dataset variables in the range of − 1 and 1 were normalized based on Eq. (26).

$$X_{{{\text{norm}}}} = 2\frac{{(X - X_{\min } )}}{{(X_{\max } - X_{\min } )}} - 1$$
(26)

where, \(X_{{{\text{norm}}}}\), \(X\),\(X_{\max }\) and, \(X_{\min }\) are normalized data, raw input variable, maximum and minimum of the dataset, respectively. To identify the appropriate quantity of network parameters during network training, the predicted network error must be kept to a minimal at each step of the MSE in each iteration. We applied the same criteria for this aim, such as the MSE and the square of the coefficient of correlation (R2). The following are the mathematical equations for the functions given102,106:

$${\text{MSE}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {Y_{{{\text{predicted}}}} - Y_{{{\text{actual}}}} } \right)^{2} }$$
(27)
$$R^{2} = \frac{{\sum\limits_{i = 1}^{n} {\left( {Y_{{{\text{predicted}}}} - Y_{{{\text{actual}}}} } \right)^{2} } }}{{\sum\limits_{i = 1}^{n} {\left( {Y_{{{\text{predicted}}}} - Y_{{{\text{mean}}}} } \right)^{2} } }}$$
(28)

where, \(Y_{{{\text{actual}}}}\) is experimental data and \(Y_{{{\text{predicted}}}}\) is the network predicted data.

Optimization of MLP and RBF models

In this research, different neuron activation functions were used, and eventually, sigmoid, and the pure-line transfer functions for hidden and output layers were chosen, respectively. The mathematical functions were applied as activation functions for all neurons of the covered layers that are given in Table 5.

Table 5 Mathematical and graph symbols of activation functions.

During the training phase, the ideal number of neurons was chosen based on the minimal MSE and R2. As shown in Fig. 6, the number of hidden neurons in the MLP model was changed from 1 to 55, with the ideal number of neurons determined by the MSE minimum value. Hidden layer activation functions should add nonlinearity to analyze network reliability. For the comparison research, thirty-one neurons were investigated, and the BR algorithm was chosen because it had the lowest MSE of 1.910–4 and the highest regression value (R2) of 0.9971, as shown in Table 6.

Figure 6
figure 6

Optimized number of neurons for RBF and MLP models structures.

Table 6 MLP training algorithms for the absorption process.

The ideal number of neurons for the RBF model with Gaussian functions at the single layer was 100, as shown in Fig. 6. MLP and RBF model networks had excellent MSE validation performances of 0.00019 and 0.00048 at 100 epochs, respectively, as shown in Fig. 7.

Figure 7
figure 7

Comparison MSE of MLP (a) and RBF (b) models performance for the absorption process.

Supplementary Figure S4 depicts a basic overview of MLP and RBF structure approaches. The first layer in MLP structure is the input layer, which receives data (i.e., solution concentration of KOH and piperazine (Pz), temperature, CO2 loadings, kmol/Pa m2 s, liquid film mass transfer coefficient (kl), the partial pressure of CO2 at the gas–liquid interface (\(p_{{{\text{CO}}_{2} ,b}}\)) and the equilibrium partial pressure of CO2 in bulk (\(p_{{{\text{CO}}_{2} }}^{*}\)) as input parameters data) is imported into the network. The last layer is the output one, i.e., absorption flux of a gas into a liquid (\(N_{{{\text{CO}}_{2} }}\)) as output data, which gives the target data. Regarding the optimization process for the neurons in the MLP structure, the number of neurons at the first and second hidden layers were twenty-five and five, respectively. Furthermore, the number of neurons in the output layer with the pure-linear (linear) transfer function had to be one.

Figure 8 shows the close relationship between the MLP model of ANN outputs and our target values during the network training and testing of CO2 absorption process data. In the network training and testing of the CO2 absorption process data, Figure 8 illustrates the strong correlation between the MLP model outputs and our objective values. The values for the MLP model displayed in Figure 8 (i.e., the training, validation, test, and all the data) were almost one (R2= 0.9971). Further confirmation of the close resemblance between the anticipated absorption values from the CO2 absorbed dataset and the MLP outputs is provided in Figure 8.

Figure 8
figure 8

MLP model regression plots prediction (a) during training; (b) during the test, and (c) all the data.

The 3-D curves of a response surface for both algorithms, as shown in Fig. 9, were drawn to understand the interaction of the variable parameters (i.e., solution concentration of KOH and piperazine (Pz), temperature, CO2 loadings, kmol/Pa.m2.s, liquid film mass transfer coefficient (kl), the partial pressure of CO2 at the gas–liquid interface (\(p_{{{\text{CO}}_{2} ,b}}\)) and the equilibrium partial pressure of CO2 in bulk (\(p_{{{\text{CO}}_{2} }}^{*}\)) to locate each variable response of the absorption process. Furthermore, in these figures, the plots were applied to detect the CO2 absorption range of each of the two variables. The predicted and experimentally normalized data of the MLP and RBF models are fitted in Fig. 10 for the absorption process. These results show that the predicted models are well matched with the experimental flux absorption data. Table 7 (structure 25 and 5 neurons for the first and second hidden layers, MSE = 0.00019 and, R2 = 0.9971 for all data) shows the weights (w) and biases (b) obtained by the best ANN-MLP model. Using the data in Table 7 and the specified ANN-MLP structure provided in the ANN section, the best ANN-MLP model discussed in this work can be completely recreated. The findings of this study could be added to the database of CO2 absorption and used to forecast gas absorption behaviors using the suggested prediction matrix. A reliable and effective hybrid strategy for simulating CO2 absorption under various process circumstances has been proposed in this study. The ANN algorithm could be modified and improved by adding new data, has a fast prediction speed, and is based on experimental data. When building different factors, the observed CO2 absorption at various process circumstances is necessary, but the ANN structure does not use the absorption model parameters as an input to anticipate the absorption parameters. The interaction of variables for nonlinear problems that are difficult for numerical methods to capture could be discovered using experimental results, and ANN has excellent training and adaptation functionality.

Figure 9
figure 9figure 9

Response surfaces plot artificial neural networks of the MLP (a) and RBF (b) models for the prediction of flux absorption (NCO2).

Figure 10
figure 10

MLP (a) and RBF (b) normalized models fitting for experimental and predicted absorption flux (NCO2).

Table 7 Weight matrix using the MLP-ANN model output developed.

Conclusion

Growing requests for both analysis and process optimization combined with the increasing availability of statistical software and growing computing power have led to the widespread use of RSM and ANN modeling tools. In the RSM approach, the impact of multiple parameters on the NCO2 has been studied. We optimized the Pz-KOH-CO2 system (in the CO2 removal process) to maximize the mass transfer flux using simulation/optimization RSM. The RSM with CCD was applied for the development of the suitable model using the least-squares procedure. The deviation errors were acquired for all absorption functions lower than 0.0001, and the results were as follows: (i) The R2 and Adj-R2 models were obtained at 0.9822 and 0.9795, respectively, and this value showed a good fit between the modeled value and the experimental data point. Therefore, it was found that the models were successfully examined and all verified with experimental data, and specified the values of the optimum variables to maximize absorption functions. (ii) The model p-value was less than 0.0001, which indicates that the model parameters are significant. Furthermore, the model F-value was 360, which indicates only a less than 0.01% possibility that this high value of F-value may be due to noise. (iii) The optimization of the response surface method has proposed various combinations of variables to obtain a mass transfer flux performance of more than 95%. In this method, an optimization run with 48.6 kmol/m2 s of CO2 mass transfer flux was obtained. This optimal absorption condition was acquired at Pz and KOH concentration of 1.31 and 4.34, respectively, temperature of 22.28 C, CO2 loading of 0.398, gas film and liquid side mass transfer coefficient of 7.68 kmol/Pa m2 s and 1.046 m/s, respectively, bulk partial pressure of 45,862.2 Pa, and an equilibrium partial pressure of 10,080.2 Pa. The experimental yield of \(N_{{{\text{CO}}_{2} }}\) was 699 kmol/m2 s.

In the ANN approach, the operating parameters for the prediction of the mass transfer flux were developed through artificial neural networks. The BR back-propagation algorithm has the best performance among other MLP models with a tangent sigmoid transfer function (trainer) as hidden layers and a linear transfer function (pure-line) as the output layer. The best integrated MLP model for the CO2 absorption flux process gives an overall R2 value of 0.9971 with an MSE value of 0.00019 at 100 epochs. The optimal number of neurons for another algorithm (RBF) model was 100 at the single layer, with the bet MSE validation performance network of 0.00048 at 100 epochs. A comparison of MSE and performance values ​​showed little difference between the results of both RBF methods, but the MLP model with fewer neurons provided more accuracy. By utilizing ANN algorithms, we have the possibility of significantly reducing the amount of time spent on experimental tasks. The ML algorithms could then be updated whenever a new dataset becomes available because they would have been trained using the previously collected experimental data. This ML technique could also be employed to build a substantial numerical frame to improve the CO2 absorption process by utilizing process conditions. We created models that can reliably and accurately reproduce the experimental data on CO2 absorption. The present study's findings were used to construct the ANN matrix, which would be improved in the future to forecast CO2 absorption. The current algorithm proposed is not strictly specific to the general flux absorption of CO2, but can be used for absorption. The ANN development based on different materials with extended various conditions can be added in future studies.