Mineral Models: Comparison of Multiple Algorithms

 
top_image.PNG
 

The work that I’ve posted so far was a bit lengthy due to an attempt to cover as many details as I could.  For this post, I’ll take a different approach and focus on a set of comparisons and the associated visualizations.  I’ll be using the same data set that I described in the previous post, but this time I’m simply going to use a collection of models and compare their accuracy.  

One quick disclaimer, I’m not including the code for this work. If you would like the code, feel free to contact me and I’ll be happy to send it.      

As was done previously, the data set that will be used will be elemental (XRF) and mineral (XRD) data from three wells.  The elemental data is used as input to build a mineral model and the output is trained using the XRD measurements. For all the models, the same blind data set will be used as the metric of comparison.  For the blind, the full data from a single well is withheld. The blind could have been randomly selected from all three wells, but using a single well as the blind is typically a more difficult hurdle for a model to overcome.  This is due to the fact that a single-well-blind can introduce new variance in the form of vendor, vintage, and/or geologic variability in the data.  

Therefore, each model will be built using the same training data set and will be evaluated using the same blind.  This brings us to the models, which are:

  • Multiple regression (MR)

  • Genetic algorithm (GA)

  • Neural network (NN)

  • Random forest (RF)

  • Random forest-1Y (RF1Y) - this is the same as RF, except each mineral was predicted one at a time (1Y).  

The objective of this post is not to get into the mathematical steps for each of the above models, that topic has been covered at length by a number of other sources, blogs, etc.  For this post, I will instead focus on the performance of each model for this particular data set. And that leads to one other important disclaimer, just because one of the above models works well for this project does not mean it will be the best for the next project.  My approach is to evaluate several algorithms and use the simplest one with the best performance.    

I specify the objective because there is quite a bit that could be covered regarding the development of each model, such as sensitivities, collinearity issues, etc.  These are all real issues and highly relevant for any model development, but in order to avoid being bogged down by a long list of disclaimers and alternative considerations, I’m going to omit that discussion. 

Mineral Plots

The initial set of plots are meant for qualitative assessment of the various models.  The plots can be a bit busy since each figure shows five models and the actual data. In all cases the actual data is shown in blue (true=blue), and the legend above each plot shows the color code for each model.  For these plots, all of the data is being shown, both training and blind data. It would be expected that the models should perform reasonably well for the training samples, but the true test is the performance on the blind samples.  The area shaded in grey represents the blind data samples and you can see that the RF and RF1Y models struggle the most with the blind data.

 
Overall comparison of secondary minerals modeled using genetic algorithm, multiple regression, neural network, and random forest (blind data is defined by the grey shaded region).

Overall comparison of secondary minerals modeled using genetic algorithm, multiple regression, neural network, and random forest (blind data is defined by the grey shaded region).

 

In general, all the models follow the trends of the data, especially when considering the quartz, calcite, and illite models as these are the most abundant minerals.  However, as we drop into minerals of lower abundance (below), the performance of all the models degrades.

 
Overall comparison of secondary minerals modeled using genetic algorithm, multiple regression, neural network, and random forest (blind data is defined by the grey shaded region).

Overall comparison of secondary minerals modeled using genetic algorithm, multiple regression, neural network, and random forest (blind data is defined by the grey shaded region).

 

The good news is that the models continue to follow the general trends.  Further, and as was mentioned in post 2, even though the r-square is greatly reduced for some of these models, the absolute error is still pretty small because these minerals are all present in low-abundance.  Basically, these models will get you in the right “ballpark,” but they would not be recommended for mapping subtle changes in low-abundance minerals.  

Cross-plots

Using a cross-plot visualization provides a more focused view of how each model performs with each mineral prediction.  The color scheme is the same as depicted above and the blind data is shown as the grey squares in all cases. The modeled data is on the x-axis and the actual data is on the y-axis.  The r-square values at the top-left of each cross-plot are with regard to the blind data.      

Quartz

 
Cross-plots showing quartz models that were generated using a variety of different algorithms.

Cross-plots showing quartz models that were generated using a variety of different algorithms.

 

All the models do a good job of predicting quartz, but as will be seen in subsequent plots, the RF models mostly bring up the rear.  I know that RF is a powerful algorithm that has been successfully deployed on numerous projects, but for this particular project, its performance lags relative to the others.  

Calcite

 
Cross-plots showing calcite models that were generated using a variety of different algorithms.

Cross-plots showing calcite models that were generated using a variety of different algorithms.

 

Both the GA and MR approaches yield the best results for the calcite predictions.  However, the performance of the NN, RF, and RF1Y are also reasonable.  

Illite

 
Cross-plots showing illite models that were generated using a variety of different algorithms.

Cross-plots showing illite models that were generated using a variety of different algorithms.

 

Illite is the primary constituent of the clay mineralogy for the lithologic units that this data set is derived from.  For this reason, a good illite model leads to a good clay model. All five of the models show high r-square values, with the GA having the best performance.    

Dolomite

 
Cross-plots showing dolomite models that were generated using a variety of different algorithms.

Cross-plots showing dolomite models that were generated using a variety of different algorithms.

 

Dolomite is really a low-abundance mineral, with only a few samples that occur at greater than 5 weight percent, with the highest being a blind sample.  The GA, MR, and NN models all perform comparably and with high r-square.    

Plagioclase

 
Cross-plots showing plagioclase models that were generated using a variety of different algorithms.

Cross-plots showing plagioclase models that were generated using a variety of different algorithms.

 

Plagioclase is a low-abundance mineral and it is one that all of the models struggle with.  As will be seen in the error tables below, even though the r-square is low for these models, the mean absolute error is minimal.  This is a case where none of the models should be used to create reliable plagioclase maps, but it’s inclusion could be relevant when looking at interval averages or as an input into matrix density.  Furthermore, these models yield better results than just using a plagioclase average, as is revealed by the root relative squared error (further below). 

Chlorite

 
Cross-plots showing plagioclase models that were generated using a variety of different algorithms.

Cross-plots showing plagioclase models that were generated using a variety of different algorithms.

 

I won’t pretend that these crossplots demonstrate the desired model performance. In fact, most of these models could be replaced by using the chlorite average.  For this particular mineral, the RF1Y is the best model (based on r-square), but none perform admirably. Many petrophysical models won’t focus on the specific clay components, but rather the sum of all the clay minerals (e.g. illite + chlorite + mix Ill/smec = Clay).  If your aim is to use an average clay density to feed into a matrix density, which later feeds a porosity model, then these chlorite models should be sufficient.   

Mixed Illite/Smectite

 
Cross-plots showing mixed illite/smectite models that were generated using a variety of different algorithms.

Cross-plots showing mixed illite/smectite models that were generated using a variety of different algorithms.

 

The mixed illite/smectite model looks a lot worse than it is.  The reality is that the blind data contains what is more than likely a bogus data point at ~30 weight percent.  This is a point that could likely be removed on justification that in almost no cases do we see mixed illite/smectite this elevated for any samples derived from these lithologic units (even outside of this particular data set).  However, for this project the point has been retained for completeness.    

Pyrite

 
Cross-plots showing pyrite models that were generated using a variety of different algorithms.

Cross-plots showing pyrite models that were generated using a variety of different algorithms.

 

Pyrite is a low-abundance mineral, but still an important one to model accurately due to its impact to both the density and photoelectric-effect logging tools (high density and high PE).  It can also be a tough mineral to model, and in light of that, most of the algorithms showed relatively good performance. In fact, if sulfur were included as an elemental input, then this would more than likely help the models.  It was excluded due to the fact that not all of the XRF data sets for this project had sulfur as an available input.     

Error Tables

True and Predicted Mean Values:

This table is not greatly informative with regard to model error, but is useful as a reconnaissance tool for general model performance.  The values are simply the average mineral weight percent for all the blind samples (n=21). The column shaded in blue includes the mean values derived from the actual XRD data set, which is the basis of comparison for the models.  

As an example, if we look at the calcite models (Cal), we see that the actual data showed a mean value of 7.66 weight %, and the GA, MR, and NN models were reasonably close to this value.  The RF and RF1Y models tended to overestimate the mean, and this is not surprising if we go back and review the cross-plots, which show that these models strongly overpredicted several of the blind samples.  

The main takeaway from this table is that the models perform well on average, excepting the comments above about calcite.  All the low and high abundance minerals are predicted within reasonable tolerances. If we were trying to provide an estimate of the average quartz in a given stratigraphic interval, then any one of these models should be adequate.  Again, this doesn’t really help us determine which model is the best, it just provides a summary of overall performance.

 
Table showing mean values of the blind data for each mineral and algorithm. The column highlighted in blue represents the true mean values based on XRD data.

Table showing mean values of the blind data for each mineral and algorithm. The column highlighted in blue represents the true mean values based on XRD data.

 

Mean Absolute Error (MAE):

I think most are familiar with MAE, but it is a metric for defining the average absolute difference between a model prediction and the true value.  MAE can be thought of as, “how far, on average, our model predictions are from the bullseye.” The data below shows that the GA model is, on average, +/-1.4 weight % in error of the true value.  In this way, we can compare each model’s MAE for each mineral. Continuing with calcite as an example, the MAE values show that the GA model has the lowest error, followed by the NN at +/- 3.1 weight %, and so on.  

If MAE were the only metric that we used to characterize the quality of the models, then we’d find that GA has the best (lowest) error for most minerals.  In fact, GA shows the lowest MAE for 6 out of 8 minerals (Cal, Chl, Dol, Ill, Plg, and Qtz). This of course doesn’t mean that it is the best model, it just means that with respect to MAE, the GA derived model outperformed the others.

 
Table reporting mean absolute error for each mineral and each algorithm.

Table reporting mean absolute error for each mineral and each algorithm.

 

R-squared:

The r-square measure is one of the most familiar metrics for characterizing model quality.  A high r-squared value does not mean that a model is good, only that it has good precision. It is possible to have an r-square that approaches 1, yet the model could greatly over or under predict the actual values.  This is demonstrated using a generic dataset below. In this example, all three models have an r-squared value in excess of 0.95, but only one model shows a 1:1 relationship between prediction and actual values (blue, y1 model).  The black (y3) model consistently under predicts, and the red (y2) model consistently over predicts.

 
Generic data set and cross-plot demonstrating 3 models with the same r-square value but with varying degrees of accuracy.

Generic data set and cross-plot demonstrating 3 models with the same r-square value but with varying degrees of accuracy.

 

In light of the above comments, if we consider the r-square values below in conjunction with the MAE values, then we can get an idea of which models display both low error (MAE) and high precision (r-square).  Of course, we can also inspect the linear equation for each of the associated predicted versus actual crossplots, and we’d expect the better models to have a high r-square, slope near 1, and intercept near 0.  

As was done with MAE, if we just use r-square as the metric of model quality, then we see that the GA model has the highest r-square for 4 out of 8 of the models.  The next best are the NN and MR with 2 out of 8 of the highest r-squares.  

So at this point, based on both MAE and r-square, the GA model seems to be doing the best job of reducing error and increasing precision.  

 
Table reporting r-square values for all minerals and all algorithms evaluated.

Table reporting r-square values for all minerals and all algorithms evaluated.

 

Root Relative Square Error (RRSE):

One metric that I think is very helpful is root relative squared error.  I spent some time in my previous post describing the RRSE calculation, so I won’t repeat that here.  But I will say that what makes this metric useful is that it compares each model to a standard, and that standard is one of the most simple models, the mean.  This metric is basically a ratio of the model error to that of the error of a simple model. This implies that if the RRSE is close to 1, then our model is not much better than just using the mean.  As the RRSE approaches 0, then the implication is that our model has much lower error than that of a simple model. The value can also exceed 1, which indicates that the model is even worse than using the mean.  

So what does the table below tell us about these models?  If we use the calcite row (Cal) as an example, then we see that the GA has the lowest RRSE of 0.17.  This implies that the GA model performed much better than using just the mean, and it also shows that it outperformed all the other models.  If we do as was done with the previous metrics and review each mineral one at a time, then we find that the GA model has the best performance for 5 out of 8 models.

 
Root relative square error for all minerals and algorithms evaluated.

Root relative square error for all minerals and algorithms evaluated.

 

Final thoughts

For this particular data set, the genetic algorithm tended to yield the best results based on MAE, r-square, and RRSE.  I suspect with a larger data set that the conclusion might change, but based on the currently available data, GA is the best approach.  

What other options do we have?  We could test additional algorithms and evaluate their performance as was done for the existing algorithms.  We could also build an ensemble model that uses the average, or a weighted average, from each of the 5 models discussed.  The MR, NN, and RF models could all use additional input data to feed the model. Additional inputs were not used in this example for two reasons.  The first was to enable a direct comparison between GA and all the other models. Secondly, if logging tool derived elemental data is to be utilized, then many of the elements provided by XRF are not available from the logging tools.  For this reason, building a model that used as few elemental inputs as possible was advantageous. We could also use a mix of models, for example, GA for some minerals, NN for others, RF for the remaining.  

What has been discussed in this post are just a handful of ways to map from elemental data to mineral predictions and a comparison of these different approaches.  In the current budget constrained oil and gas industry, making the most out of your inexpensive data is necessary and prudent. Once these models are built, accurate mineral models can be generated where you have the available XRF data.  And since XRF data can be collected on core, cuttings, or outcrop, and it can be delivered at low cost, it makes for an ideal data set to extract maximum value from.

Scott McCallumComment