Design of a Machine Learning Model for the Accurate Fabrication of Modified Green Cementitious Composites with Granite Powder Waste

Statistical analyzes of the results obtained

In the experimental program, only three variables were modified: age (7, 28 and 90 days), drying conditions (air drying, moist air drying and water drying) and water/cement ratio (0.5, 0.56, 0.63 and 0.71) as an expression of decreasing amount of cement and increasing amount of granite powder. Thus, because the compression tests were performed on 2 halves after the tensile strength tests, the total number of samples studied was 216. In Fig. 3, compressive strength results are shown as a function of age, cure conditions and water. compared to cement.

picture 3

The relationship between compressive strength and (a) age, (b) curing conditions and (vs) amount of granite powder.

According to Figure 3, there is only a correlation between age and compressive strength. This is supported by the value of the coefficient of determination, which is equal to R2= 0.807. For the other variables and the compressive strength, there is a lack of correlation, as evidenced by the very low values ​​of the coefficient of determination, which are less than R2= 0.4. As expected, the highest compressive strength values ​​are obtained for samples stored in water; their hardening conditions are noted CC1. The older the samples, the higher the value of the compressive strength obtained. However, the addition of the granite powder does not make it possible to obtain compressive strength values ​​equal to the 60 MPa of the reference sample, but due to the filling effect of the powder, the minimum values compressive strength increases with increasing granite powder content (from about 20 MPa to 28 MPa for a 10% replacement of cement with granite powder and to 25 MPa for a 20% replacement of cement with granite powder). This effect is very promising for the design of low quality cementitious composite mixtures.

Modeling Compressive Strength Using Assembly Models

As mentioned above, there is no strong correlation between the variables which are components of mix proportions, curing conditions or test age and compressive strength. Thus, it is reasonable to perform numerical analyzes using more sophisticated techniques, for example, ensemble models.

These decision tree-based models, which are considered supervised machine learning algorithms, are capable of solving both regression and classification problems. The structure of such a decision tree consists of nodes in which a binary decision is made, and this division continues until the algorithm is unable to separate the data in the node33. This node, called the leaf of the tree, provides the solution to the problem. The advantage of using this type of algorithm is the simplicity of the model obtained. However, on the other hand, this is also a disadvantage as it can lead to overfitting of the algorithm. Decision trees are accurate and work well on datasets with large variations in variables and when the number of records is not large34.

This problem can be solved by using a random forest algorithm, which uses many decision trees to get the solution to a problem. Each tree in the forest is constructed by a random training set, and at each node, splitting is performed based on randomly selected input variables.35.

However, in some cases, the performance of the random forest algorithm is not accurate and efforts should be made to improve it. For this purpose, among the various ensemble learning algorithms, the adaptive boosting algorithm (AdaBoost) is the most typical and the most widely used.36. This algorithm is efficient because the next tree of the algorithm is modified according to the precision of the previous tree, thus enhancing the learning ability. The structural diagram of a decision tree, where the input variables are denoted XI and the output variable is denoted YesIis presented in Fig. 4 combined with random forest and AdaBoost algorithm schemes.

Figure 4
number 4

Diagrams of the assembly models: (a) decision tree, (b) random forest and (vs) AdaBoost.

The level of precision of the models is evaluated using a few parameters which, according to37can include the linear correlation coefficient (R), mean absolute error (MAE), the root mean square error (RMSE) and the average mean error in percentage (MAP). The calculations of these parameters are performed as follows:

$$ R = sqrt {1 – frac{{sum left( {y – hat{y}} right)^{2} }}{{sum left( {y – overline{y }} right)^{2} }}} $$


$$MAE = frac{1}{n}sum left| {y – hat{y}} right| $$


$$ RMSE = sqrt {frac{{sum left( {y – hat{y}} right)^{2} }}{n}} $$


$$ MAPE = frac{1}{n}sum left| {frac{{y – hat{y}}}{y}} right| cdot 100 $$


where theremeasured value of the experimental test; (hat{y})predicted value from analyses; (overline{y})average value; notnumber of data samples in the process.

Note that a R a value closer to 1 corresponds to a better prediction of the algorithm. In turn, lower values ​​of MAE and RMSE and MAP mean that the algorithm predicts output variables better than other algorithms. Additionally, to avoid overfitting, a tenfold cross-validation is performed according to38as presented in Fig. 5.

Figure 5
number 5

The division of cross-validation folds.

Based on the split of the dataset shown in Fig. 5, a numerical analysis is performed. The performance of each ply is evaluated and presented in Fig. 6 in terms of values ​​of R, MAE, RMSE and MAP. Moreover, the relationships between the experimentally measured compressive strength value and those obtained using machine learning algorithms are shown in Fig. 7, combined with the error distribution in FIG. 8.

Figure 6
number 6

The performance of the analyzes evaluated by (a) the linear correlation coefficient, (b) mean mean error, (vs) root mean square error and (D) mean mean error percentage.

Picture 7
number 7

The relationships between the measured compressive strength and the compressive strength predicted by (a) decision tree, (b) random forest and (vs) AdaBoost algorithms.

Figure 8
figure 8

Distribution of prediction errors: (a) values ​​and (b) percentage.

According to Figs. 6, 7 and 8, all studied ensemble models are significantly accurate in terms of predicting the compressive strength of mortar containing granite waste. This is evidenced by the very high values ​​obtained for the linear correlation of the coefficient R, which are close to 1.0. The performance accuracy is also supported by the very low error values, which as shown in Fig. 7, are less than 4%. Moreover, according to Figure 8, the proposed models accurately predict the compressive strength values ​​and fail to correctly predict the strength of a few samples (the error percentage is more than 10%).

The proposed model is also accurate compared to other machine learning algorithms used for the purpose of predicting the compressive strength of green cementitious composites containing different admixtures. Some selected works are presented in Table 4 in addition to the results obtained by the models presented in this work.

Table 4 Comparison of algorithms used for the prediction of the compressive strength of green cementitious composites containing different admixtures.

Analysis of the results in Table 4 shows that the accuracy levels of the compressive strength of green cementitious composites using machine learning algorithms are very high. Moreover, in this work, a very accurate model to predict the compressive strength of green cementitious composite containing different admixtures, compared to those previously studied, is built.

Sherry J. Basler