| blog
Kobus Rust
August 19, 2024
Rating fast and slow
Writing a post about machine learning in insurance pricing is very difficult without sounding like an old worn-out record. There are various resources online detailing the promises of machine learning and how new data sources can revolutionise the whole pricing process. However, in practice we rarely see the benefits of these technologies being fully realised. I believe the insurers who have been able to do so successfully are few and far between.
During the process of building our platform, I’ve had the opportunity of speaking to many pricing teams with differing modelling ideologies. I’ve witnessed all ends of the spectrum, ranging from breaking down their pure premium into hundreds of smaller models, to pricing with almost no models at all. From these discussions, I’ve been led to believe that modelling ideologies are largely driven by trust. There exists a tension between trusting your model output and your own understanding, and this balance drives the extent to which teams harness the power of new technologies. The questions remain regarding to what extent we should trust our data, our model outputs and our own understanding and expertise? Each one of us will answer these questions differently, with some people being completely skeptical and others completely oblivious to what is being presented to them. I believe we can strike a perfect harmony between man and machine to produce an outcome that far exceeds either extreme.
It would be fitting to start with taking a greater look at the effect data has on the modelling process. Some argue that when fitting models all credibility should be given to the data. However, in making this argument they do not always consider the quality of their data, the implicit and explicit assumptions, and the way the dataset was captured and combined. There is a well-known saying in modelling: “Garbage in, garbage out”. Bad data leads to bad models, period. The job of a machine learning algorithm is to extract meaningful patterns out of the noise contained in magnitudes of data. Introducing further noise through poor-quality data is a considerable hinderance to this process. The process of modelling starts with data collection, not afterwards, so this step should be treated with equal scrutiny and intensity as any other step in the modelling process. Remove poorly populated variables and avoid unnecessarily duplicating data. When evaluating patterns you deem impossible, adopt caution and question your own biases. Have the curiosity to seek the truth and you will build better models. Dataset construction is the one part of the modelling process you should be slow to automate until you have enough evidence-based trust in your process.
Now to tackle the contentious point of the trade-off between trusting domain expertise versus model outputs. Some people blindly trust their modelling tools without applying any judgement while others don’t trust them at all. Once again, there is a good balance that exists between these two extremes. Machines are very good at picking up patterns in datasets but they can sometimes overfit. It is very beneficial to have a good understanding of how your models work so that you can be aware of and compensate for its shortcomings. Using transparent algorithms addresses this problem very well as it enables you to understand the inner workings of your model. It also allows you to modify the outputs to correct for where the machine is misguided. Another highly valuable prospect that machines bring to modelling is the process of automation. The classic modelling approach contains many repetitive and manual tasks that can easily be automated away. One example of this is the bottoms- up approach used in classic modelling whereby variables are added sequentially and then tested for significance. The categories within each variable are also binned to increase model performance and decreased model complexity. If done by hand, this is a very subjective and time consuming process. There are millions of unique models that can be fitted based on different subsets of variables, the binning of their levels and any possible transformations applied to these variables. As a result, the modelling process often requires many iterations before arriving at a satisfactory model. The odds of you finding the optimal model become so small that its nearly impossible. This is a prime example of where adopting modern technologies can add significant value. Through borrowing techniques from machine learning, we can find satisfactory combinations of variables in a fraction of the time. By offloading the processing to the cloud we can quickly run many different variable combinations to arrive at the optimal solution. Machines can fit many models in parallel and thus quickly identify the most significant and accurate models. The end result of the process is a subset of significant models that have been fitted on key metrics. It is then up to the modeller to edit and select the best model for their use case. This diagnostics-driven approach to modelling allows the modeller to arrive at a better answer much faster. The small feedback loop makes it much easier to detect errors in the data and in your models and enables you to fix them quickly. It allows you to find a suitable base model from which to iterate, on which you can then make important judgement calls. It is at this point where applying expert knowledge is crucial and where this precious resource is most fruitfully and beneficially spent. Combining the power of expert judgement and machine capabilities in a skillful way can truly unlock unprecedented benefits.
I would compare building models to learning how to drive. It takes a relatively short time to acquire the base skills but to really become a proficient and skilful driver is a thorough process. Unfortunately, many people become overconfident in their driving way too soon. This is most likely accounted for by the high accident rate among the youth. And almost everybody I know reckons they are a good driver. But with experience and some hard introspection we learn both our car’s limitations and our own. Hopefully, propelling us towards mastering the skill and developing sound judgement. Sometimes you have to go slow to go fast, but don’t be the guy riding a horse when he could be driving a car. As with most things in life, the answer lies somewhere in the balance.